A recent discussion arose on a forum as to whether or not it’s best to delivery a treat (or other form of primary reinforcement) every time we click when clicker training.
Part of the discussion stemmed from the fact that eventually, we want to fade out those treats. Random treats are more interesting and besides, no one wants to be carrying around a bucket of grain or bag of dog treats forever. One of the arguments for clicking without treats was that variable treat delivery would be stronger for maintaining behavior than delivering a treat for every correct response.
What this is describing is called a variable reinforcement schedule. Once initial learning takes place, a variable reinforcement schedule is indeed much more powerful than a fixed schedule.
Here’s a nice analogy for others who might not be familiar with these terms:
A fixed reinforcement schedule is like a vending machine. Every time you put in a quarter, you get a pack of gum. Very reliable, but pretty darn boring after awhile.
A variable reinforcement schedule is like a slot machine. It’s a little more stressful, but a lot more exciting! Often you get nothing when you put a quarter in, but sometimes you hit the jackpot. This kind of schedule, if built right, is very resistant to extinction. Meaning, it’s hard to quit.
This is why people have gambling addictions, but not vending machine addictions.
Now, when using a clicker, there are two (main) ways to transition to a variable reinforcement schedule.
Possibility One
Train the behavior with a high rate of reinforcement, that is, with a fixed schedule and a high rate of click/treat. This means every time the animal performs the behavior you want, you click. And every time you click, you deliver a treat. Once the animal understands what you want, add a cue so you can tell the animal when to perform the behavior.
Gradually stop clicking and treating every time the animal performs the behavior. So, sometimes you click, sometimes you don’t. Sometimes the animal has to do the behavior twice before you click. Basically, you’re putting the click AND the treat on a variable schedule. (However, you’re still giving a treat every time you click.)
When I teach the yearlings to lead, I’m practically clicking every two steps. Later, I don’t click or treat at all, unless they do something extra special, like walking through a nasty puddle.
This method is also a great way to start building duration and length of behaviors. Now that Sebastian knows how to back up, I sometimes click/treat him for taking 1 step back, sometimes I don’t click until he takes 20 back. Backing up is on a variable reinforcement schedule, and he’s learned to keep backing until he hears a click or until I ask him for a different behavior.
This method is based on the assumption that if the animal understands the behavior and understands what is being asked, the click signal is no longer constantly needed. But, it’s better in the beginning during the initial learning to always treat when you click.
Why is this? A one-to-one relationship between click and treat is best for learning because you want to tell the animal every time they do something right. If you start out with variable reinforcement from the beginning, some correct responses will not be rewarded. It will take the animal longer to figure out what you’re asking and what counts for correct.
Possibility Two
Train the behavior with a high rate of reinforcement, that is, with a fixed schedule and a high rate of click/treat.
Then, keep on clicking to tell the animal when he’s done something right, but don’t always treat or gradually fade out the treats.
This assumes that you are using the clicker as a secondary reinforcer and that the clicker in itself has some value to reinforce behavior. Big assumptions. If the click is actually strengthening the behavior, than go for using the clicker in this way. But I’d caution to first check that the clicker is actually serving this function for the animal.
If using the clicker without food, experiment around. Has the clicker taken on a similar meaning as click+treat? Can you build new behavior using just the click (no treat and also no pressure)?
The difference between these two possibilities of fading out the food really boil down to figuring out what the clicker actually means to your horse.
I’ll address that in a follow-up post, What’s the purpose of the click in clicker training?