On Jenga-Playing Dogs and Cats

This post is a continuation of my posts on Reinforcement Learning. In my previous posts, I talked up Reinforcement Learning as if it was a wondrous generalized learning algorithm. In this post, I’ll make it clear that, actually, it isn’t all that general when compared to animal “intelligence.”

It was listening to Dennis Hackethal on the Do Explain Podcast that really drove home to me just how limited our current Reinforcement Learning algorithms really are:

The other day, I saw a video online of a dog playing Jenga with its owner. And it was really cool. And it was offered as a as evidence of how clearly animals must be intelligent and it’s rooted in the same mistake.

Now, we could say that well, look… The argument I made earlier about how if something is inborn, we have to explain it in terms of selection pressure from the past. No way there would have been selection pressure in the past for a dog to know how to play Jenga. Right? So that would be an argument in favor of the opposing view that the dog must have learned how to play Jenga on its own.

However, I think that is not the case that if you watch that video, you will see and it’s amazing, really, I mean, the dog is super steady, and really manages to pull out some of the pieces without the tower falling with with its mouth. And so if you watch that, you will if you pay close attention, you will notice something and that is that the dog does not watch the tower as it’s playing, and as it’s pulling so it has its snout on you know, it gets its canines on on one of the pieces and it pulls it out slowly. And what is the dog watching as it does this: the owner.

And that is an important distinction because if I play Jenga, I’m not going to look at the opponent. I’m going to look at the tower to make sure I’m doing it right. So I say the dog has no idea what it’s doing there. What the dog is looking for is clues in the owners face as to whether or not it’s getting it right. And that absolutely evolved biologically. So this is a biological adaptation, because people bred dogs who were submissive, and people bred dogs that were willing to do what the owner praised them for.

And sure enough, once it successfully pulls out one of the sticks that we see the owner praising the dog. So this is how over a long quote, learning period again in scare quotes because it’s not really learning. This kind of reinforcement can lead to new behavior. (From here)

I believe the video Dennis is referring to is this one found here.

This discussion caused me to think of Reinforcement Learning and really drove home to me just how poor RL really performs compared to many animals, particularly more “intelligent” ones like dogs and cats.

But before I explain the connection to Reinforcement Learning, I should first note a few quick criticisms of Dennis’s view that this is a “Clever Hans” style trick. As I’ll explain, I believe this can’t be the case.

Is the Dog Looking at Her Owner?

It’s interesting that when I showed this video to my wife (without mentioning Dennis’s opinion) her very first comment was “Wow! look at how that dog is paying so much attention to the Jenga tower!”

Studying it closely (right at seconds 16-17) I thought I could see the dog shifting her attention back to the owner when the task was completed. So maybe it’s not so obvious that the Dog is watching the owner.

Also, if we need a clear example of an animal not looking at its owner while playing Jenga, this Jenga playing cat should do the trick. (I particularly love where the cat tries to cheat.)

But there is a stronger reason why I believe the dog is not playing Jenga via information in the owner’s face.

Jenga is Not Like Clever Hans

Dennis goes on likens the dog to “Clever Hans” the “counting horse.” Now it’s well known now that the horse was actually just waiting for the owner to smile and then it stopped ‘counting.’ So all the real counting was being done by the human.

Now suppose I ask you to write a program for a robotic automaton that duplicates what Clever Hans is doing? Just to make it easier, we’ll even allow the human ‘owner’ of the robot to push a button to stop the counting. Could you write such a program?

This would be a trivial program to write, of course. It would simply start to make a sound and continue to do so until you push the button to make it stop. So I agree with Dennis’s analysis that there is nothing remarkable about Clever Hans.

But now suppose you need to write a program to take the same robotic automaton and have it play Jenga. Again, I’ll allow you to have one or more buttons that represent facial emotions. (Maybe one button for “oh, be careful” and another for “You’re doing great!”?) I’m open to any number of buttons that map to facial emotions here but they have to realistically map to facial emotions. There are no facial emotions equivalent to a joystick.

How would you write such a program?

Of course, you can’t. A person watching Jenga — but not touching it — simply doesn’t know enough to advise the automaton via only facial expressions (or equivalent buttons). There is far too much information required to play Jenga than a face can realistically express. Plus, really you have to have your own fingers on the block to be able to play Jenga well because Jenga requires being able to actually feel the block. So I believe this refutes the idea that this could possibly be equivalent to a Clever Hans trick where the human is doing the actual work.

Animals Actually Do “Learn”

I have another reasonable conjecture as well. I doubt this dog was playing Jenga for the first time in her life in this video. I’d venture a guess that actually the dog had to learn to play and probably didn’t play this well initially. The dog probably had multiple Jenga attempts before she got this good as it.

Dogs don’t learn tricks without praise, of course. But the dog is learning how to carefully select and remove a block without knocking the stack over. The truly remarkable thing we are seeing here is that dogs and cats can — when incentivized via nothing more than praise — learn to play Jenga! This is a stunning feat!

Remarkable Animal “Intelligence”

So if the dog is learning to play Jenga via practice, could we replicate this feat using current Artificial Intelligence techniques? In other words, is it possible to build a Jenga playing robot?

Yes, it is possible, though it wouldn’t be easy. If we were seriously trying to build a Jenga playing robot, most likely we’d want to program it the way Boston Dynamics does. Boston Dynamics robots do not use Machine Learning. Instead, they just use pre-programmed algorithms to move the robot to accomplish the desired tasks. Such algorithms are all painstakingly worked out by a programmer in advance using probabilistic robotics techniques. See Sebastian Thurn’s book for details on how this is done.

But clearly the dog didn’t learn to play Jenga this way as there is no way to put such programming directly into the dog’s head like we can with a Boston Dynamics robot. Nor is playing Jenga something that could have directly evolved as an algorithm hardwired into the dog’s DNA. This is further evidence that the dog had to learn to do something equivalent to what would require years of work by many talented programmers to get a robot to do it.

One common mistake that people make when talking about Artificial Intelligence is that they silently assume that there is nothing remarkable about animal “intelligence” (or “smarts” if you prefer) when in reality animal learning is profoundly amazing when compared to what we are capable of doing using Artificial Intelligence techniques.

Dog Learning vs Reinforcement Learning

This all brings us to our comparison with reinforcement learning. My previous articles on RL might be helpful here if you need a refresher: Part 1 gives an introduction, Part 2 explains the mathematical theory, and Part 3 explains how it works using a simple grid world example.

If the dog can’t have been directly programmed to play Jenga (like a Boston Dynamics robot), could the dog be using a Reinforcement Learning algorithm instead? What would it take to train a robot using RL? Is this even possible?

At least in theory, yes, you could use RL to teach a robot to play Jenga. But the problems you’d bump into trying to do so are instructive here for learning what the current limits of Machine Learning are.

In fact, it’s probably impossible in practice today to teach a robot to play Jenga using even our best RL algorithms. Though in principle it is doable.

One problem is that the state and action space (I explain what this means in my RL articles linked above) are continuous (read: infinite) which breaks our existing RL algorithms. There are techniques to approximate these by bootstrapping artificial neural networks (like Alpha Go does), but they don’t work on most domains. I have no idea if such an approach could in principle work with Jenga. More likely we’ll need to improve our current understanding of how to encode action and state spaces to make this a more realistic problem to solve.

Is Reinforcement Learning Really a General Purpose Learner?

The reason Reinforcement Learning is exciting to AI researchers is because it’s the first time we’ve had anything like a “general purpose” learning algorithm. Compared to regular Supervised Learning, RL certain feels far more flexible. In regular Supervised Learning, you have to find just the right technique for your problem space, the right feature set to use, and then you need a human to label everything with the ‘ground truth’. (This is probably impossible for a game like Jenga. What is the ground truth of a good or bad Jenga move?) Then via a long trial and error process (with the human constantly tweaking things), you will eventually find something that gives satisfactory results. There is an overwhelming amount of human intervention in supervised machine learning.

By comparison, Reinforcement Learning uses a single algorithm (that is actually a half-truth as there are numerous variants) and needs no human intervention to label ground truth. Instead, the one “all-purpose” algorithm uses some ‘signal’ in the environment as a ‘reward’ and then mathematically does it’s magic and figures out how to maximize that reward. The reduction to the amount of human intervention seems almost miraculous by comparison to regular supervised machine learning.

But take a careful look at Part 3 of my posts on Reinforcement Learning. One thing even our most sophisticated Reinforcement Learning algorithm have in common today is that they require a programmer to come up with a state space, an action space, and a way to designate rewards. The algorithm can’t do this for you.

Without these ‘spaces’, the algorithm just doesn’t work. For this reason, you can’t take Alpha Go and then have it learn chess too. You basically need to start from scratch with a newly designed state and action space and definition of rewards. This is why RL is entirely narrow. It can only learn a single task at a time. Additionally, this is a huge injection of creative knowledge required from a human for each new ‘trick’ you want your robot to learn.

Dogs don’t need any of this and we have not the slightest idea how dogs are such flexible learners by comparison to our RL algorithms.

The “Perspiration Phase”

Suppose we somehow work out a clever way to represent the Jenga state and action space for the problem. Then you’d have to let your algorithm run millions of iterations practicing until it got good at the game. This is the so-called “perspiration” phase.

Now running millions of iterations is intractable with a game like Jenga that has to exist in a physical space. It would take a very long time to train this program because you’d need a way to let it play the game using initially random moves until it accidentally knocks the stack over. Then you’d need to reset the stack and try again. Repeat several million times.

This is the main reason current reinforcement learning can’t solve problems like Jenga yet. You could probably teach it to do it in VR using a simulation, of course, but it’s well known that RL in a simulation won’t currently transport to an actual physical version. Our physics simulations today are just not realistic enough.

That this dog probably only needed to practice a handful of times (like a human) to get proficient at Jenga is a wondrous mystery of “smarts” that we have no idea how to duplicate with robots.

Animals vs Machine Learning

Now consider how remarkable this Jenga playing dog and cat actually are!

The animals definitely have not had anyone first inject knowledge of Jenga into their heads in the form of a state space, action space, and rewards. And they didn’t need to play several million games to, via trial and error, learn to play. If someone invented a Reinforcement Learning algorithm that was as flexible and efficient as a dog — needing only praise from the ‘owner’ to learn multiple domains — they’d win the Nobel Prize.

So we have a huge explanation gap to fill. Animal “Intelligence” (or “smarts”) is not something we currently understand well enough to program and we have no idea what it would actually take to do so.

Conclusions

Animal “Intelligence” is remarkably more flexible than anything we currently know how to do via machine learning/reinforcement learning. The specific differences are:

  1. The dog requires no sophisticated injection of human knowledge equivalent to a state space, action space, and rewards
  2. The dog is able to learn many different things whereas our most sophisticated algorithm learns exactly one thing.
  3. The dog is able to learn efficiently (like a human does) through minimal trial and error and not millions of iterations.

A Note on Animal Consciousness

This article is not about animal consciousness. But I suspect the question everyone will still ask is, “Does this mean animals are ‘conscious'”?

This post doesn’t really tell us anything interesting about that question. It is just as easy to conjecture that animals are more flexible than machine learning because they are “conscious” as it is to conjecture that animals are more flexible than current machine learning because they are using a currently unknown machine learning algorithm. You have a significant explanation gap either way.

Personally I’d be plenty excited if animal “intelligence” turned out to be a sign that we can expect huge breakthroughs in machine learning in the future. But if you want my honest opinion, I think the question “Are dogs conscious?” is actually an ill-posed philosophical question because the word “conscious” actually lumps together many things that will turn out to be separate phenomena.

Leave a Reply

Your email address will not be published. Required fields are marked *