Notes  on Learning and Memory for Nov.22 and Dec 6 and 13, 2002

After the test we began on learning. We will have a fifth tes, in order to give folks another chance to bring their test average up. (I drop the lowest grade of the test scores.) It will be a fun format, with a guarantee of passing just for taking the test. It will be the last day of class. Also, I will be planning a review session before the final for people who want to participate.)


1.  Learning: Learning is defined as a relatively enduring change in behavior as a result of experience. Learning allows us to respond flexibly to an ever-changing environment.

 Learned vs innate behaviors:  Humans have very few innate (inborn) behaviors; most of what they need to know in order to survive must be learned. (In contrast, for instance, with baby ducks, who follow the first moving thing they see after hatching (usually their mother) know how to swim, how to eat the same foods their mother eats, and crouch and hold still when the outline of a hawk passes overhead even though they have never had the opportunity to learn about these things.) An infant knows how to suck on a nipple and automatically turns toward something that touches his/her cheek, but is primed to learn very rapidly by making associations, connections between  things that happen concurrently. For example, the infant rapidly learns that the smell, sound and sight of the primary caretaker (most often the mother) are associated with food and comfort: a hungry infant cries until fed, but after a short time, reacts to the smell or sound of the mother by quieting even when she/he has not yet been fed.

Research on infant rats ('pups') shows the power of  these earliest learning experiences. Rat pups were fed milk from a lemon-scented nipple. When given an empty (surrogate) nipple to suck , they would suckle 80% of a ten minute period, while pups in a control group who had only been fed from a normal smelling nipple suckled only  20% of the time. While most learning is dependent on repeated exposure to paired stimuli, the pups learned this new set of associations (lemon scent = milk) with only one exposure to the two combined stimuli. And this  period of almost instantaneous learning is unique to the newborn pups: older rat pups didn't learn as quickly and had to have to stimuli (milk and lemon scent) presented closer in time than the new pups. The scientists hypothesize that this powerful learning mechanism is the result of  the fact that milk, so critical to the pup's survival and the mother's odor, under  natural circumstances, is such a significant signal for milk, that  rat pups are 'hard wired' to learn this association as soon as possible. (Human infants also very quickly learn their primary caretaker's scent as well as the smell of the milk. The findings of this type of study have practical applications in helping infants transition to other caretakers w or formulas when substitutes must be found.)

2. Classical conditioning: the simplest type of learning in which the subject comes to make associations between stimuli or antecedent conditions. This is a passive form of learning based on developing  mental expectations based on past events occurring at the same time or in the same sequence (as in the rat pups making the connection between 'milk' and 'lemon scent'.

 History of classical conditioning:  

  • Pavlov: Russian scientist, interested in digestion, notices that the dogs he is working with salivate not only when presented with food, but when they see signals that food is coming. He experiments with pairing the food with other stimuli that do not normally have anything to do with food. (rings bell when food given) and discovers that  after a number of pairing of food/bell, not only will dogs salivate to the bell even when no food is presented but that they will salivate to other stimuli that are then paired with the bell alone. He called this process conditioning; it is now referred to as classical conditioning.
  • Watson: applied the concept of conditioning to humans. Example: showed 'Little Albert', about a year old,  a white rat, which interested him initially. Then the rat was presented to 'Albert' at the same time as a very loud noise was made, and he reacted in fright to the noise. After repeated pairings (rat  and noise), he learned the association of rat to noise and began to cry at the sight of the rat even when there was no noise.
  • This type of conditioning is relied on to this day by advertisers, who rely on our having positive reactions to certain stimuli ( the sight of people having a good time or a beautiful outdoor setting) and then tries to have us make the positive association with their product (a brand of soda or  those 'Golden Arches'....)

4. Terms/concepts in classical conditioning:

  • Unconditioned Stimulus (US): a stimulus that normally produces an involuntary response (food, a positive stimulus, activates salivation;  pain, an aversive stimulus, activates pulling away or fear. In the case of 'little Albert',   loud noises are aversive to babies.)  
  • Unconditioned Response (UR): the subject's natural reaction to the unconditioned stimulus (salivation, fear, etc).
  • Conditioned Stimulus (CS): the other stimuli in the environment that initially do not elicit a response ('neutral stimuli) which the subject learns to associate with the unconditioned stimuli (the bell when the food is given, the rat when the loud noise occurs)
  • Conditioned Response (CR): this is the same response (salivation) that the unconditioned stimulus (meat) originally elicited, but now it is a reaction to the conditioned stimulus (bell) as well.
  • Acquisition phase: the stage period when the neutral stimulus is becoming an acquired or conditioned stimulus.
  • Acquisition depends on a number of factors:
  1. Frequency:  the pairing of the unconditioned stimulus (Us) and the neutral stimulus has to be repeated until the learner's behavioral response to both is the same. Then the former neutral stimulus has become a conditioned stimulus (CS). (If 'Little Albert's white rat had not been consistently paired with the startling noise, the noise might have been associated instead with some other aspect of the environment that was consistently present.
  2. Contingency: The two kinds of stimuli have to be introduced at nearly the same time, with the US preceding the CS. If the rat had been given to Albert before he got startled by the noise, he might not have become afraid of the rat.
  3. Consistency: When first learning a new association between stimuli., they must be consistently paired. If it only happens sometimes, the learner also may not make the connection between the two.
  4. Timing ('latency'): the resulting reinforcement or punishment has to happen soon enough after the behavior that the learner makes the connection. (If too much time passes between the presentation of the US and the neutral stimulus, the association may not get made.)
   There are a few exceptions to these 'rules' of acquisition. We discussed taste aversion: if you get sick from spoiled or poisonous food, the nausea may not occur until hours later, but you still are classically conditioned to avoid that food's taste or smell. This is obviously adaptive, as it's rare that something poisonous causes instant illness. This is also a case where a single exposure causes learning; you don't want to have to eat a poisonous mushroom twice to learn that it causes you to have a very unpleasant response.
    Situations that are highly emotional also are not as dependent on  frequency, contingency, consistency and timing.   Often a phobia (see below) is the result of a single exposure to a very frightening situation.
  • Generalization: the subject responds to other stimuli that are similar in some way to the conditioned stimuli (Albert's fear of other white furry animals, not just the white rat) . This is actually very useful in terms of survival: if you have been bitten by one snake, you are then wary of all snakes that look the same, you don't have to personally test each one to see if it bites!  The problem with generalization is, we can carry it too far, resulting in phobias..
  • Phobias:  unreasonable or unrealistic fears that are severe enough to interfere with normal behavior. These are a result of classical conditioning and generalization. These are also called conditioned emotional responses: (Example: a small child accidentally locks himself in an abandoned refrigerator or small closet. He then becomes panic stricken when he is in any small enclosed space,  such as an elevator, which he previously may have enjoyed.)   Phobias can be treated by utilizing relaxation techniques (you cannot be relaxed and panic stricken at the same time, the nervous system doesn't work that way. Remember the sympathetic and parasympathetic nervous systems?) Then when the person is relaxed, he/she is introduced in a non-threatening way, a little at a time, to the situation that makes him/her phobic until the panic response is extinguished.
  • Stimulus Discrimination:  The way we cope with the problem of over-generalization is by learning stimulus discrimination, the ability to tell the difference - and to respond differently - to varied stimuli. (you learn to panic when it is a real rattlesnake at your feet, but not to the plastic toy one your little brother likes to toss at you now and then!  It may look real but you can discriminate between the real thing and the 'toy'.
  • Higher order conditioning:  When a well-learned conditioned stimulus can be used to reinforce further learning. Pavlov could have taught the dogs to become classically conditioned by pairing the ringing of the bell to another stimulus, such as flickering the lights.  My cat originally was conditioned to the sound of the actual opening of the can of cat food, but now she comes running when I take the can opener out of the drawer. This is higher order conditioning, which is what the advertisers rely on  as you watch TV (They don't actually give you the beer when they show you the mountains, people having a good time, etc. It is 'higher order' conditioning, using your associations to a positive experience to imply that their brand of beer is best..)
  • Extinction: If a conditioned stimulus is presented long enough without being paired with the unconditioned stimulus, the conditioned response weakens and eventually ceases, becomes 'extinct'.  (For instance, my dog loves carrots and has learned that the sound of the vegetable peeler means she will get a taste of carrot; she comes running. If I were to stop giving her a piece of carrot each time I peel them, she would eventually stop coming at the sound of the peeler. Then she would 'unlearn' the association of that sound with a food she likes; it would have become extinct. And  I could cause extinction even faster if I only use it to peel onions, which she hates.)
  • Spontaneous recovery:  Conditioning which has undergone extinction may reappear at a later time, and even a number of times before it disappears completely. This reappearance of a behavior after apparent extinction is called spontaneous recovery. In a way, this gears us to deal with life's changing conditions: just because something didn't work today doesn't necessarily mean it won't work tomorrow. (Remember the soda machine? Well, maybe today it will work because perhaps the repair man came since you last tried it and lost all your change!)
  • Vicarious conditioning: Often referred to as 'secondhand' or 'social' learning.   Because we are born with so few natural instincts, we have to learn what to fear, what to like, etc, and if we couldn't learn to avoid deadly situations from others' experiences, many of us would never survive to grow up! Many of these associations are learned by observing how others react; we don't have to be bitten by a poisonous snake to learn to fear it (the learning might kill us!) If others around us react to the sight of a snake with fear, we can learn to fear (and avoid) snakes and thus avoid being hurt in the learning. The same with a small child learning to like new foods: they take their clues from how we react. This type of classical outcome is a Conditioned Emotional Response (CER).

3. Operant Conditioning:  Learning from the results of what we do. Behavior that occurs in order to make something happen is called Operant or Instrumental behavior. The early behaviorists (remember Skinner?) believed that you could teach a person to do or become anything you wanted if you had total control over the conditions of his/her life. By rewarding some behaviors and punishing others, you could completely control the individual's behaviors. Thorndike called this the "Law of Effect"

The easy way to remember these behavioral principles of learning are to think of the ABCs:

  • A = Antecedent: the conditions (stimuli) presented to the learner that indicate how likely a consequence is to occur
  • B = Behavior: how the learner responds under these conditions, what to do or not to do to have a good outcome.
  • C =  Consequences: the consequences or result of the behavior that either encourage the subject to repeat the behavior (reinforcement) or that discourage a repeat of the behavior (punishment).

How operant conditioning works. The results of your behavior have consequences (Where have you heard this before?) "If you study heard, you will get good grades."  Well, no, not if you are not a student....There may be other reasons to study but it won't get you good grades.

But under specific circumstances (A, the antecedents), 
                                        certain Behaviors result in particular
                                                   Consequences. 
If you are a student, then whether you study or not has consequences, whether the positive reinforcement of praise or good grades (and, when I was a kid, a monetary reward for a good report card), or punishment ("Since your report card is so bad, you can't go out on week nights anymore.")      Reinforcement increases the chance the behavior will occur, punishment decreases the likelihood that you will repeat the behavior. 

There are actually four kinds of consequences: positive reinforcement, negative  reinforcement, positive punishment, and negative punishment. ('positive' and 'negative' does not refer to whether something is good or bad but whether something is given to or done to the person or taken away from the person. 
When I come home from work (A) my dog barks (B) to be let out (C) I am reinforcing her behavior of barking by giving her what she wants. (Positive reinforcement for her)
When my dog barks when I first come home from work (A), I let her out (B) in order to stop her noise (C). (Negative reinforcement for me)
When my daughter used to come home from school (A), the dog would bark (B), and my daughter would yell at her the shut up (C).  (Positive punishment)
When I came home from work and the dog had peed on the floor (A), And she had just yelled at the dog and had not put her out (B) I would both make my daughter clean it up (C) (positive punishment) but also not let her borrow my car (C). (negative punishment, loss of a privilege)

For information on how operant conditioning is used in training dolphins at Seaworld click here:
  Operant conditioning at work

During the acquisition phase of operant conditioning, the learner rarely can accomplish a desired behavior by learning all the steps at once: the behavior must be 'shaped'.  Shaping the behavior and chaining all the needed steps in the right order are accomplished by reinforcing successively complex attempts at achieving the goal. The less than perfect steps are called 'approximations'. (For instance, if you're teaching a child to tie his shoe, you can't just show him once and expect him to do it right. You first reinforce (praise) just his attempts to cross the laces. Once the child has that down, you prompt him to pass one shoe string under the other, then to make a loop, etc, etc...And maybe at first, the shoes aren't tied tightly enough to stay tied for long, but this is still another successive approximation in the whole process.)

Again, there are four aspects of the acquisition phase that affect how quickly and thoroughly conditioningoccurs:

  • Frequency:   the sequence (the ABC's of learning) has to be repeated until the learner understands that the antecedent conditions (stimuli), behavior and consequences are all connected.
  • Contingency: Only the desired behavior is reinforced (or the undesired behavior punished) and reinforcement or punishment has to follow, not preceed, the behavior. In other words, the only thing  which causes that result is that particular behavior.
  • Consistency: When first learning a new behavior, each and every successful attempt should be rewarded. If it only happens sometimes, the learner will not make the connection between behavior and result as quickly.
  • Timing ('latency'): the resulting reinforcement or punishment has to happen soon enough after the behavior that the learner makes the connection. (If you spank the puppy an hour after he peed on the floor of the kitchen, he won't know why he is being punished. Best: Catch him in the act; he starts peeing, he gets yelled at or a swat on the rump.)

And, as in classical conditioning, there are exceptions to these rules of acquisition, in that very powerful reinforcers or punishments can cause conditioning to happen  rapidly or even with one incident. (You will never again stick your finger in a live electrical outlet!)

Using Reinforcers and Punishments

  • Each training situation requires that the trainer figure out the particular reinforcers that work best for each subject. This is highly individual. (If you don't like chocolate, then chocolate candy bars are not reinforcing to you.)
  • Primary reinforcements are those reinforcers that meet survival ('primary') needs and drives. (Food, drink, sex, sleep, etc.) The problem with primary reinforcers is the drive or need can be satiated (fully met) so one more chocolate candy bar won't motivate the subject: he's full!
  • Secondary reinforcers are those which are learned. These include praise, money, social approval, etc.  (Under some circumstances, primary reinforcers can be secondary ones as well, if the primal need has been met and the subject has learned to save the chocolate candy  he just got as a reinforcer for a time when he is hungry.   In this sense, the candy has become a 'token', something which can satisfy a need in the future, just as money can saved to satisfy needs we have in the future.)

A useful grid of the kinds of consequences is as follows:

Consequences that increase  the frequency or likelihood of a behavior recurring: Positive reinforcers: something needed or wanted by the learner, a reward for the behavior

 

Negative reinforcers:  the end  of an aversive stimulus or situation ("It feels so good when it stops!")

 

Consequences that decrease  the frequency or likelihood of a behavior recurring: Punishment: something painful unpleasant or otherwise or aversive that occurs as the result of the behavior

 

Response cost: a kind of punishment which involves the loss of something needed or wanted as a result of the behavior ( Stay out late and lose the use of your parent's car)

Once the acquisition phase is over and the behavior thoroughly learned, it is not necessary, usually, to reinforce every correct behavior. How often and when reinforcement occurs, however, does affect the frequency of the behavior as well as how long the behavior is retained after no more reinforcement is occurring (extinction):

Type of Reinforcement schedules: Frequency (When? How often?) How powerful in motivating frequecy of behavior? How durable (how long before extinction occurs when reinforcement ceases?)
Continuous reinforcement Reinforcement occurs every time behavior performed         (Ex.: my neighbor started giving her child a penny every time she pulled a weed.,)  Works until satiation or exhaustion sets in; learner can always take a break and pick up where he/she left off. Not durable:    learner has expectation that every behavior will be rewarded. Once it stops, he/she stops soon after.
Fixed ratios Reinforcement takes place after a  complete (set) number of times the  behavior is performed (example: My neighbor was running out of pennies, so she paid her child a nickel every time she pulled five weeds from the lawn  Very powerful. (Like piece rates: if you work faster, you earn more ) Not very durable, but a bit more so than continuous reinforcement schedules, as it takes the learner longer to figure out . Again, the expectation is that  the reinforcement will occur regularly every set number of times the behavior is enacted.
Variable ratios Reinforcement given on a set ratio (# of times the behavior is performed), but it's never clear exactly which of the behaviors will get the reinforcement. (Example: Now, my neighbor goes out every so often, counts up the weeds and divides by five, and give her child a nickel for each set of five.) The person will still earn more by working faster  the basic return for effort stays the same, but the issue of uncertainty usually results, especially in young children or animals, in a slower rate of efforts made. This schedule , because of the uncertainty factor, is more durable: the person  who is not sure when he/she is going to be reinforced, is also not sure when reinforcement stops and will keep at it a longer. Another example: slot machines are set to return a ratio of their earnings to the players, but you never know whichi pull of the lever will pay off.)
Fixed intervals Reinforcement  is given for the first correct response after a set time interval, regardless of how many behaviors  he/she does in that time period. (Ex: Now that my neighbor's daughter is older and a pretty good worker, she gets paid $5  for each hour spent weeding the yard and garden. Counting all those weeds was a drag!) The amount of work done is less than in fixed or variable ratio schedules, and it  varies over the time; there is no payoff, in terms of reinforcement, for getting much done at the start of the time period. (Example: if an assignment is due  every two weeks, many people don't do as much work on it the first week; they wait until it's almost due!) Extinguishes more quickly than variable reinforcement.

Predictability again plays a role in how quickly a behavior is extinguished with fixed intervals. (You are more likely to quit and look for another job when your expected paycheck doesn't come a couple of times!)

Variable intervals Reinforcement is given for the first correct response after a variable period of time.

(If my neighber sees that her daughter is slacking off, she can check up on her periodically and if she is not weeding at the time that her mother looks, she will not get paid for that hour at all!)

Because you never know when you are going to be reinforced, you would tend to work at a steady rate: you want to be 'caught' doing the right thing, but you don't get extra credit for the work you did that wasn't noticed! This is schedule of reinforcement is the most durable of all, if you can't predict when the payoff will come, neither can you predict when it won't.(As ads for the state lottery say, "You can't win if you don't play!" and the possibility of a playoff, remote as it is, keeps people shelling out their money for that ticket week after week.) 

Many of the concepts explained under Classical Conditioning are also true for Operant Conditioning:

  • Stimulus generalization:  The aspects of the environment that indicate whether the conditions are right for a behavior to be effective in producing a consequence or not are referred to as the environmental stimuli. Once a young child learns that a putting a coin in a slot produces candy, he may try to put coins into parking meters, etc.
  • Stimulus discrimination: However, the child will soon learn to discriminate between a parking meter and a candy machine, even though they may look a lot alike.
  • Stimulus control: the learner has to learn to recognize when the correct conditions exist for his/her behavior to produce the desired result (so, in a sense, the stimulus of the environment controls the the behavioral responses.) (Example, if all the lights go out during a violent storm, you will not bother to switch the light switch on; you know, based on the stimuli available in the environment, that the switch won't work.)
  • Extinction: In classical conditioning, this is when the conditioned stimulus is no longer paired with the unconditioned stimulus, so, after a while, the conditioned response no longer occurs. In operant conditioning, this is when a behavior is no longer reinforced (or punished) and the conditioning eventually becomes 'extinct'.
  • Spontaneous recovery: the behavior can be extinguished in one set of trials, but may recur spontaneously at a later time without further reinforcement. The subject is 'testing' to see if conditions have changed and the behavior  is again effective in getting the desired result.
  • Vicarious or observational learning:  Because we are, again, a social species that has little innate survival knowledge, we must learn about many things that are dangerous and potentially deadly from our elders, benefiting from their wisdom and knowledge. While we learn emotional reactions to various stimuli through vicarious conditioning, observing our parents, for instance , in how they react to situations that are new to us and picking up their 'vibes' , we also learn behaviors -what to do and what not to do, as well as HOW to do a number of things - by observation. This learning is dependent upon our paying attention to the ABC's, remembering them, and then imitating them. Before the learner demonstrates that he/she has learned a new behavior, the learning is considered latent (hidden) until it is put into practice.

4. OTHER KINDS OF LEARNING

 Not all learning is 'Conditioning'.  Internal 'thinking' processes bring about some kinds of learning. Understanding, anticipating,  figuring things out, are all cognitive processes in which the reinforcement for these cognitive activities can be just the knowledge itself.  The knowledge may be immediately useful, but curiosity is a powerful drive for many mammals, and especially for humans,in and of itself. It's as if many organisms, whose other needs and drives are satisfied, have the urge to explore the  world just in case the information should be needed in the future.

For example, consider  the concept of cognitive maps. These are internal detailed layouts of the experience of the world. If your usual route to work is blocked, you know from driving around town doing other things, basically how the town is laid out and so, even though you may not have taken that particular route before, you can follow it to your workplace. Even rats and bees have internal or cognitive maps.( If you capture a bee, put it in a dark box and carry it to someplace new, it can still find its hive and the flowers it has been using as a food source. A rat, placed in a maze and left to wander around, can later learn to find its way to a food source faster than a rat which has never been in the maze before. This is  due to the  cognitive map of the maze it has constructed in its wanderings.)

Other kinds of cognitive learning:

  • Vicarious latent  learning demonstrates one kind of cognitive learning. Because they are based on the learner  gaining conditioned responses, they are discussed in the sections on conditioning, but until they are exhibited in actual behavior, they can't be considered a form of behaviorism, which is based in measurable behaviors.
  • Rote learning (repetition until something is memorized) is an efficient way to acquire 'facts'. (The 'times tables', for instance) The 'facts' don't have to have meaning; even a parrot can learn to say the names of numbers in the right sequence, but does it understand the concept of numbers? Does it really know how to count?)
  • Discovery learning is when exploration of a problem or issue leads to understanding of the underlying principles. A child who has played with blocks that are in lengths that are multiples of each other intuitively learns that  2 half-lengths make one long block, but it takes 4 quarter-length blocks to make a full-size block. This type of block play makes an understanding of the underlying principles of adding, multiplication and fractions possible in a way that mere rote memorization cannot. This type of learning is much more flexible as it can be applied to a range of circumstances.

 

I will put the notes on Memory on the web next week before Thanksgiving....Read/study the chapters on Learning and Memory over the break and we will have a short quiz on Dec.6th.