Tuesday, July 17, 2012
Common Mistakes that can Delay Training Success
The idea behind positive reinforcement is to cause dogs to repeat actions humans desire, and in time have a repertoire of behaviors that make them good companions and canine citizens. It is a realistically achievable goal. In theory, positive reinforcement is as effective as positive punishment, which in operant conditioning is the delivery of something unpleasant to stop an unwanted action from recurring. More importantly, positive reinforcement strengthens the relationship between dog and owner, dog and his social group, and generally dog and the environment, while positive punishment comes with a high probability of ruining it. I argue that one cannot have true companionship with a dog if one chooses to train with force and pain. In addition, punitive training suffocates the dog’s welfare and potentially leads to side effects like aggression, anxiety and avoidance. It is well documented, and I talked about that in the past.
In laboratory settings and scientific studies, positive reinforcement leads to reliable and self-directed behavior quickly, but dog owners aren’t scientists or savvy trainers. Dog owners come with diverse levels of experience, skill and know-how, and some are completely new to dogs and reward-based training. Even though neither is very complicated, they make mistakes, like anybody new to anything would. The good news is that with positive reinforcement mistakes won’t have long-lasting or irreparable negative consequences, but it can delay progress and cause enough frustration in the impatient person to give up on the concept altogether.
Here are the most common errors I see people, including some trainers, make.
Your intended reward is the dog’s punishment.
Not only can a pooch feel “Meh” about a reward, but it can be perceived as an aversive, and when that happens you get the exact opposite of what you’re aiming for: when your reward is the dog’s punishment, the behavior you meant to happen again actually decreases. A perfect example is taps on the head or hearty pats along the ribcage. Most dogs don’t like it and shy away.
If your dog avoids or refuses a reward, it isn’t a reward in his mind and won’t reinforce the behavior you are after. Don’t offer it again if he doesn’t want it. And that can include food. Don’t shove a treat into your dog’s mouth if he wants distance to a worrisome trigger, or play ball, or read peemail.
Not long ago I had a cattle dog client who perfectly demonstrated that: he brought the ball right back into the owner’s hand, who promptly gave him a pat on the head, with the result that the dog first snapped at the hand, and then refused to bring the ball all the way in the next time. The dog was labeled aggressive and erratic, when in fact he just acted that way as a result of being “punished” for bringing the ball back. When we “rewarded” him with throwing the ball again without delay, he stopped snapping and eagerly retrieved the toy all the way in.
The reverse also happens in many households: your punishment is the dog’s reward. The best example for that is the inadvertent reinforcement when the jumping dog is pushed off. That is attention, and from the dog’s point of view perhaps even an invitation for a wrestling game, and exactly what he wanted. Jumping is reinforced and therefore will happen again.
So, the take-away message is that a reward is what the dog wants, or it won’t reinforce the behavior you are after. Be creative. You don’t always have to be elaborate, although sometimes your pizzazz can greatly impress your dog, but know what he wants and use it to your advantage.
Right now, on walks, Will wants me to get rid of the pesky deer flies that bury themselves in her coat. I comply and pluck them off, but each time before I do I say “halt”, which in our world means don’t move and wait till I get to you. Her halting on command is powerfully reinforced with me killing the insects, plus we have many naturally occurring opportunities to practice, and because of both I can use the command in situations when it matters to me that Will stops in her tracks.
Your timing is off.
It means that your dog doesn’t form an association between behavior and reward. Especially for fleeting moments, a reward marker, for example a clicker, helps because it bridges action and reinforcement and clarifies to the dog exactly which behavior made the reward happen.
Along that train of thought, holding a grudge, although understandably human, is counterproductive.
If you are still upset about the dug-up flowerbed and gruff when your dog comes straight away when you called him out of the planters, you punish a perfect recall and he might not be so keen to return to you in the future.
It is the last action that counts, and if it is one you like, reinforce it. There is a hitch though: when a bad and a good behavior are lumped together, so when two actions occur very close in time with the first being undesired, but the second reinforced, there is a risk that the dog connects both and will always perform them in sequence. The best example is a dog who lovingly celebrates your homecoming with jumping, but a flash-moment later is sitting – either self-corrected or obeyed your command. Of course you want to reinforce the sit, but not the jump/sit combination. I deal with that by keeping the dog mentally engaged for a few seconds while she is in a sit, followed by asking her to do something else desirable, which I then reinforce. In other words, I give the sit some attention, but then invite the dog into a short, fun, interaction I like and that is rewarding, and/or rewarded. After that I inform the pooch with the “all-done” word and hand signal that I’m about to disengage and that she’s on her own for entertainment for a while.
Your reinforcement schedule is off.
Without getting too technical, reinforcements have to happen in rapid succession, right away, when the dog learns something new.
When the dog gets it, connects the dots between cue and a certain action – a good rule of thumb is the dog complying instantly and correctly 9 out of 10 times when prompted, but also deliberately offering the behavior to elicit a reward - you have two options: If it is the end goal behavior, continue to reinforce randomly, without a fixed pattern. That cements the behavior. An end behavior would be reliably coming when called. It doesn’t get any better than a dog returning to you enthusiastically. You should always acknowledge him for being so accommodating, but you don’t have to toss a handful of treats his way each time.
If it is an approximation, so just a step toward your goal, stop reinforcing altogether and raise the bar by a small increment. For example, if you shape a lie down on a mat, glancing at the mat, or having one paw on it, is not the final behavior, but you must reinforce each step constantly until the dog gets it, and once he does, so once you have the 9 out of 10 times reliability or he seeks out the mat when he is bored and proudly puts one paw on it, stop reinforcing that step and raise the criteria to bring you closer to your end goal, and then you reinforce that constantly until he gets it, and raise the bar again, and so on.
You are not orchestrating enough opportunities for your dog to earn a reward.
In other words, you are not practicing enough. If you can’t find reward-worthy behaviors often, lower your criteria and/or change the situation for the dog so that he can succeed.
The more you do it, the more the action you are training becomes a habit, and then your dog has one more good one up his sleeve. Habit means that the behavior learned with the help of operant conditioning becomes classical conditioned. Steve White, one of my favorite dog gurus, says: “ Anytime you use operant conditioning, Pavlov is sitting on your shoulder. And that is one dude you really want on your team.”
Not managing the dog wisely before a behavior is solid, thus setting him up for failure.
Don’t put your dog’s favorite bed near the picture window when barking at passersby is a problem. Their moving along is reinforcing for your dog and maintains barking at the window. If he has opportunity to do that all day long, the little bit of “quiet” practice you do when you are home won’t have much of an effect.
For example using the same command for two behaviors, or not enforcing a command.
In that category also falls making unreasonable requests and raising the bar too quickly – in other words, being impatient and asking for more than the dog can do, but also chaining behaviors together before each one is solidly learned separately. If you work on a position stay, reinforce when the dog is still in position. If you call him out of position and reward him when he comes to you, you are practicing come, not the position stay. I will write more about command clarity sometime in the future, but for now remember that dogs are brilliant, but not mind readers. Say what you mean and reinforce when your dog does what you say. If you can’t enforce what you say, don’t say it.
Taking good behaviors for granted. Dogs offer behaviors we like all the time: don’t ignore, but capture and reinforce them. Don’t ignore the dog calmly chewing a bone on his blanket, and give attention when he steals your leather Italian pump.
If you made mistakes, don’t beat yourself up. The beauty of force and punitive free training is that you can’t really mess things up too badly. Positive reinforcement can be adjusted without creating unwanted and unexpected fallout. But if you can avoid making those common errors in the future, you’ll accelerate your training success and reach your goal faster.
When I see clients, positive reinforcement is a big part of the consultation, and the humans receive all the information they need to do it effectively. Just about everyone I meet gets it. It makes sense to them and is aligned with how they feel: most people don’t want to hurt their dog. Yet, at times and typically after a prolonged pause, I hear the question: “Yes - but how do I correct my dog when he misbehaves?” Indeed, how do we punish? Or should we?
That, I will sort out for you in the next post. Look for it the end of July.