After the test we began on learning. We will have a fifth tes, in order
to give folks another chance to bring their test average up. (I drop the
lowest grade of the test scores.) It will be a fun format, with a guarantee
of passing just for taking the test. It will be the last day of class. Also,
I will be planning a review session before the final for people who want to
participate.)
1. Learning: Learning is defined as a relatively enduring
change in behavior as a result of experience.
Learning allows us to respond flexibly to an ever-changing environment.
Learned vs innate behaviors: Humans have very few innate
(inborn) behaviors;
most of what they need to know in order to survive must be learned. (In contrast, for
instance, with baby ducks, who follow the first moving thing they see after hatching
(usually their mother) know how to swim, how to eat the same foods their mother eats, and
crouch and hold still when the outline of a hawk passes overhead even though they have
never had the opportunity to learn about these things.) An infant knows how to suck on a
nipple and automatically turns toward something that touches his/her cheek, but is primed
to learn very rapidly by making associations, connections between things that
happen concurrently. For example, the infant rapidly learns that the smell, sound and
sight of the primary caretaker (most often the mother) are associated with food and
comfort: a hungry infant cries until fed, but after a short time, reacts to the smell or
sound of the mother by quieting even when she/he has not yet been fed.
Research on infant rats ('pups') shows the power of these earliest learning
experiences. Rat pups were fed milk from a lemon-scented nipple. When given an empty
(surrogate) nipple to suck , they would suckle 80% of a ten minute period, while pups in a
control group who had only been fed from a normal smelling nipple suckled only 20%
of the time. While most learning is dependent on repeated exposure to paired stimuli, the
pups learned this new set of associations (lemon scent = milk) with only one exposure to
the two combined stimuli. And this period of almost instantaneous learning is unique
to the newborn pups: older rat pups didn't learn as quickly and had to have to stimuli
(milk and lemon scent) presented closer in time than the new pups. The scientists
hypothesize that this powerful learning mechanism is the result of the fact that
milk, so critical to the pup's survival and the mother's odor, under natural
circumstances, is such a significant signal for milk, that rat pups are 'hard wired'
to learn this association as soon as possible. (Human infants also very quickly learn
their primary caretaker's scent as well as the smell of the milk. The findings of this
type of study have practical applications in helping infants transition to other
caretakers w or formulas when substitutes must be found.)
2. Classical conditioning: the simplest type of learning in
which the subject comes to make associations between stimuli or antecedent conditions.
This is a passive form of learning based on developing mental expectations based on
past events occurring at the same time or in the same sequence (as in the rat pups making
the connection between 'milk' and 'lemon scent'.
History of classical conditioning:
- Pavlov: Russian scientist, interested in digestion, notices that the
dogs he is working with salivate not only when presented with food, but when they see
signals that food is coming. He experiments with pairing the food with other stimuli that
do not normally have anything to do with food. (rings bell when food given) and discovers
that after a number of pairing of food/bell, not only will dogs salivate to the bell
even when no food is presented but that they will salivate to other stimuli that are then
paired with the bell alone. He called this process conditioning; it is now
referred to as classical conditioning.
- Watson: applied the concept of conditioning to humans. Example: showed
'Little Albert', about a year old, a white rat, which interested him initially. Then
the rat was presented to 'Albert' at the same time as a very loud noise was made, and he
reacted in fright to the noise. After repeated pairings (rat and noise), he learned
the association of rat to noise and began to cry at the sight of the rat even when there
was no noise.
- This type of conditioning is relied on to this day by advertisers, who rely on our
having positive reactions to certain stimuli ( the sight of people having a good time or a
beautiful outdoor setting) and then tries to have us make the positive association with
their product (a brand of soda or those 'Golden Arches'....)
4. Terms/concepts in classical conditioning:
- Unconditioned Stimulus (US): a stimulus that normally produces an
involuntary response (food, a positive stimulus, activates salivation; pain, an
aversive stimulus, activates pulling away or fear. In the case of 'little Albert',
loud noises are aversive to babies.)
- Unconditioned Response (UR): the subject's natural reaction to the
unconditioned stimulus (salivation, fear, etc).
- Conditioned Stimulus (CS): the other stimuli in the environment that
initially do not elicit a response ('neutral stimuli) which the subject learns to
associate with the unconditioned stimuli (the bell when the food is given,
the rat when the loud noise occurs)
- Conditioned Response (CR): this is the same response
(salivation) that the unconditioned stimulus (meat) originally elicited, but now it is a
reaction to the conditioned stimulus (bell) as well.
- Acquisition phase: the stage period when the neutral
stimulus is becoming an acquired or conditioned stimulus.
- Acquisition depends on a number of factors:
|
- Frequency: the pairing of the
unconditioned stimulus (Us) and the neutral stimulus has to be repeated until the learner's
behavioral response to both is the same. Then the former neutral
stimulus has become a conditioned stimulus (CS). (If 'Little
Albert's white rat had not been consistently paired with the
startling noise, the noise might have been associated instead
with some other aspect of the environment that was
consistently present.
- Contingency: The two kinds of stimuli have to
be introduced at nearly the same time, with the US preceding the
CS. If the rat had been given to Albert before he got startled
by the noise, he might not have become afraid of the rat.
- Consistency: When first learning a new
association between stimuli., they must be consistently paired. If it only happens sometimes, the learner
also may not
make the connection between the two.
- Timing ('latency'): the resulting reinforcement or punishment has to
happen soon enough after the behavior that the learner makes the connection. (If
too much time passes between the presentation of the US and the
neutral stimulus, the association may not get made.)
|
|
There are a few exceptions to these
'rules' of acquisition. We discussed taste aversion: if you get sick
from spoiled or poisonous food, the nausea may not occur until hours
later, but you still are classically conditioned to avoid that
food's taste or smell. This is obviously adaptive, as it's rare that
something poisonous causes instant illness. This is also a case
where a single exposure causes learning; you don't want to have to
eat a poisonous mushroom twice to learn that it causes you to have a
very unpleasant response.
Situations that are highly emotional also are not
as dependent on frequency, contingency, consistency and
timing. Often a phobia (see below) is the result of a
single exposure to a very frightening situation. |
- Generalization: the subject responds to other stimuli that are similar
in some way to the conditioned stimuli (Albert's fear of other white furry animals, not
just the white rat) . This is actually very useful in terms of survival: if you have been
bitten by one snake, you are then wary of all snakes that look the same, you don't have
to personally test each one to see if it bites! The problem with generalization is,
we can carry it too far, resulting in phobias..
- Phobias: unreasonable or unrealistic fears that are
severe enough to interfere with normal behavior. These are a result of classical
conditioning and generalization. These are also called conditioned emotional
responses: (Example: a small child accidentally locks himself in an abandoned
refrigerator or small closet. He then becomes panic stricken when he is in any small
enclosed space, such as an elevator, which he previously may have enjoyed.)
Phobias can be treated by utilizing relaxation techniques (you cannot be relaxed and panic
stricken at the same time, the nervous system doesn't work that way. Remember the
sympathetic and parasympathetic nervous systems?) Then when the person is relaxed, he/she
is introduced in a non-threatening way, a little at a time, to the situation that makes
him/her phobic until the panic response is extinguished.
- Stimulus Discrimination: The way we cope with the problem of
over-generalization is by learning stimulus discrimination, the ability to tell the
difference - and to respond differently - to varied stimuli. (you learn to panic when it
is a real rattlesnake at your feet, but not to the plastic toy one your little brother
likes to toss at you now and then! It may look real but you can discriminate between the
real thing and the 'toy'.
- Higher order conditioning: When a well-learned conditioned
stimulus can be used to reinforce further learning. Pavlov could have
taught the dogs to become classically conditioned by pairing the ringing
of the bell to another stimulus, such as flickering the lights. My
cat originally was conditioned to the sound of the actual opening of the
can of cat food, but now she comes running when I take the can opener
out of the drawer. This is
higher order conditioning, which is what the advertisers rely on as you watch TV
(They don't actually give you the beer when they show you the mountains, people
having a good time, etc. It is 'higher order' conditioning, using your associations to
a positive experience to imply that their brand of beer is best..)
- Extinction: If a conditioned stimulus is presented long enough without
being paired with the unconditioned stimulus, the conditioned response weakens and
eventually ceases, becomes 'extinct'. (For instance, my dog loves carrots and has
learned that the sound of the vegetable peeler means she will get a taste of carrot; she
comes running. If I were to stop giving her a piece of carrot each time I peel them, she
would eventually stop coming at the sound of the peeler. Then she would 'unlearn' the
association of that sound with a food she likes; it would have become extinct. And
I
could cause extinction even faster if I only use it to peel onions, which she hates.)
- Spontaneous recovery: Conditioning which has undergone extinction
may reappear at a later time, and even a number of times before it disappears completely. This reappearance of a
behavior after apparent extinction is called spontaneous recovery. In a way, this gears us
to deal with life's changing conditions: just because something didn't work today doesn't
necessarily mean it won't work tomorrow. (Remember the soda machine? Well, maybe today it
will work because perhaps the repair man came since you last tried it and lost all your
change!)
- Vicarious conditioning: Often referred to as 'secondhand'
or 'social' learning.
Because we are born with so few natural instincts, we have to
learn what
to fear, what to like, etc, and if we couldn't learn to avoid deadly
situations from others' experiences, many of us would never survive to
grow up! Many of these associations are learned by observing how others
react; we don't have to be bitten by a poisonous snake to learn to fear it (the learning
might kill us!) If others around us react to the sight of a snake with fear, we can learn
to fear (and avoid) snakes and thus avoid being hurt in the learning.
The same with a small child learning to like new foods: they take their
clues from how we react. This type of classical outcome is a Conditioned
Emotional Response (CER).
3. Operant Conditioning: Learning from the results of what we do.
Behavior that occurs in order to make something happen is called Operant or Instrumental
behavior. The early behaviorists (remember Skinner?) believed that you could teach a
person to do or become anything you wanted if you had total control over the conditions of
his/her life. By rewarding some behaviors and punishing others, you could completely
control the individual's behaviors. Thorndike called this the "Law of Effect"
The easy way to remember these behavioral principles of learning are to think of the ABCs:
- A = Antecedent: the conditions (stimuli) presented to the
learner that indicate how likely a consequence is to occur
- B = Behavior: how the learner responds under these
conditions, what to do or not to do to have a good outcome.
- C = Consequences: the consequences or result of the behavior that
either encourage the subject to repeat the behavior (reinforcement)
or that discourage a repeat of the behavior (punishment).
How operant conditioning works. The results of your behavior have
consequences (Where have you heard this before?) "If you study heard, you will get
good grades." Well, no, not if you are not a student....There may
be other reasons to study but it won't get you good grades.
But under specific circumstances (A, the antecedents),
certain Behaviors result in particular
Consequences.
If you are a student, then whether you study or not has consequences, whether the positive
reinforcement of praise or good grades (and, when I was a kid, a monetary reward for a good
report card), or punishment ("Since your report card is so bad, you
can't go out on week nights anymore.") Reinforcement increases
the chance the behavior will occur, punishment decreases the
likelihood that you will repeat the behavior.
There are actually four kinds of consequences: positive
reinforcement, negative reinforcement, positive punishment, and
negative punishment. ('positive' and 'negative' does not refer to whether
something is good or bad but whether something is given to or done
to the person or taken away from the person.
When I come home from work (A) my dog barks (B) to be let out (C) I am
reinforcing her behavior of barking by giving her what she wants.
(Positive reinforcement for her)
When my dog barks when I first come home from work (A), I let her out (B)
in order to stop her noise (C). (Negative reinforcement for me)
When my daughter used to come home from school (A), the dog would bark
(B), and my daughter would yell at her the shut up (C). (Positive
punishment)
When I came home from work and the dog had peed on the floor (A), And she
had just yelled at the dog and had not put her out (B) I would both make
my daughter clean it up (C) (positive punishment) but also not let her
borrow my car (C). (negative punishment, loss of a privilege)
During the acquisition phase of operant conditioning,
the learner rarely can accomplish a desired
behavior by learning all the steps at once: the behavior must be 'shaped'. Shaping
the behavior and chaining all the needed steps in the right order are accomplished
by reinforcing successively complex attempts at achieving the goal. The
less than perfect steps are called 'approximations'. (For instance, if you're teaching a
child to tie his shoe, you can't just show him once and expect him to do it right. You
first reinforce (praise) just his attempts to cross the laces. Once the child has that down,
you prompt him to pass one shoe string under the other, then to make a loop, etc, etc...And
maybe at first, the shoes aren't tied tightly enough to stay tied for long, but this is
still another successive approximation in the whole process.)
Again, there are four aspects of the acquisition phase that affect how quickly and
thoroughly conditioningoccurs:
- Frequency: the sequence (the ABC's of
learning) has to be repeated until the learner understands that the
antecedent conditions (stimuli),
behavior and consequences are all connected.
- Contingency: Only the desired behavior is reinforced (or the
undesired behavior punished) and reinforcement or punishment has to
follow, not preceed, the behavior. In other words, the only thing which causes that
result is that particular behavior.
- Consistency: When first learning a new behavior, each and every
successful attempt should be rewarded. If it only happens sometimes, the learner will not
make the connection between behavior and result as quickly.
- Timing ('latency'): the resulting reinforcement or punishment has to
happen soon enough after the behavior that the learner makes the connection. (If you spank
the puppy an hour after he peed on the floor of the kitchen, he won't know why he is being
punished. Best: Catch him in the act; he starts peeing, he gets yelled
at or a swat on the rump.)
And, as in classical conditioning, there are exceptions to
these rules of acquisition, in that very powerful reinforcers or punishments
can cause conditioning to happen rapidly or even with one incident.
(You will never again stick your finger in a live electrical outlet!)
Using Reinforcers and Punishments
- Each training situation requires that the trainer figure out the particular reinforcers
that work best for each subject. This is highly individual. (If you don't like chocolate,
then chocolate candy bars are not reinforcing to you.)
- Primary reinforcements are those reinforcers that meet survival ('primary') needs and
drives. (Food, drink, sex, sleep, etc.) The problem with primary reinforcers is the drive
or need can be satiated (fully met) so one more chocolate candy bar won't motivate the
subject: he's full!
- Secondary reinforcers are those which are learned. These include praise, money,
social approval, etc. (Under some circumstances, primary reinforcers can be
secondary ones as well, if the primal need has been met and the subject has learned to
save the chocolate candy he just got as a reinforcer for a time when he is hungry.
In this sense, the candy has become a 'token', something which can satisfy a need in the
future, just as money can saved to satisfy needs we have in the future.)
A useful grid of the kinds of consequences is as follows:
Consequences that increase the frequency
or likelihood of a behavior recurring: |
Positive reinforcers: something
needed or wanted by the learner, a reward for the behavior |
Negative reinforcers: the
end of an aversive stimulus or situation ("It feels so good when it
stops!") |
Consequences that decrease the frequency
or likelihood of a behavior recurring: |
Punishment: something painful
unpleasant or otherwise or aversive that occurs as the result of the behavior |
Response cost: a kind of punishment
which involves the loss of something needed or wanted as a result of the behavior ( Stay
out late and lose the use of your parent's car) |
Once the acquisition phase is over and the
behavior thoroughly learned, it is not necessary, usually, to reinforce every
correct behavior. How often and when reinforcement occurs, however, does affect the
frequency of the behavior as well as how long the behavior is retained after
no more
reinforcement is occurring (extinction):
Type of Reinforcement schedules: |
Frequency (When? How often?) |
How powerful in motivating frequecy of behavior? |
How durable (how long before extinction occurs when reinforcement
ceases?) |
Continuous reinforcement |
Reinforcement occurs every time behavior
performed (Ex.: my
neighbor started giving her child a penny every time she pulled a
weed.,) |
Works until satiation or exhaustion sets in; learner can
always take a break and pick up where he/she left off. |
Not durable: learner has expectation that every
behavior will be rewarded. Once it stops, he/she stops soon after. |
Fixed ratios |
Reinforcement takes place after a complete
(set) number of times the behavior is performed (example: My neighbor
was running out of pennies, so she paid
her child a nickel every time she pulled five weeds from the lawn |
Very powerful. (Like piece rates: if you work faster,
you earn more ) |
Not very durable, but a bit more so than continuous
reinforcement schedules, as it takes the learner longer to figure out . Again, the
expectation is that the reinforcement will occur regularly every set number of times
the behavior is enacted. |
Variable ratios |
Reinforcement given on a set ratio (# of times the
behavior is performed), but it's never clear exactly which of the behaviors will get the
reinforcement. (Example: Now, my neighbor goes out every so often, counts up the weeds and
divides by five, and give her child a nickel for each set of five.) |
The person will still earn more by working faster
the
basic return for effort stays the same, but the issue of uncertainty usually results,
especially in young children or animals, in a slower rate of efforts made. |
This schedule , because of the uncertainty factor, is
more durable: the person who is not sure when he/she is going to be reinforced, is
also not sure when reinforcement stops and will keep at it a longer. Another example:
slot machines are set to return a ratio of their earnings to the players, but you never
know whichi pull of the lever will pay off.) |
Fixed intervals |
Reinforcement is given for the first correct
response after a set time interval, regardless of how many behaviors he/she
does in that time period. (Ex: Now that my neighbor's daughter is older
and a pretty good worker, she gets paid $5 for each hour spent
weeding the yard and garden. Counting all those weeds was a drag!) |
The amount of work done is less than in fixed or
variable ratio schedules, and it varies over the time; there is no payoff, in
terms of reinforcement, for getting much done at the start of the time period. (Example: if an assignment is due
every two weeks,
many people don't do as much work on it the first week; they wait until it's almost due!) |
Extinguishes more quickly than variable reinforcement. Predictability
again plays a role in how quickly a behavior is extinguished with fixed intervals. (You
are more likely to quit and look for another job when your expected paycheck doesn't come
a couple of times!) |
Variable intervals |
Reinforcement is given for the first correct response
after a variable period of time.
(If my neighber sees that her daughter is slacking off, she can
check up on her periodically and if she is not weeding at the time
that her mother looks, she will not get paid for that hour at all!)
|
Because you never know when you are going to be
reinforced, you would tend to work at a steady rate: you want to be 'caught' doing
the right thing, but you don't get extra credit for the work you did that wasn't noticed! |
This is schedule of reinforcement is the most durable
of all, if you can't predict when the payoff will come, neither can you predict when it
won't.(As ads for the state lottery say, "You can't win if you don't
play!" and the possibility of a playoff, remote as it is, keeps people
shelling out their money for that ticket week after week.) |
Many of the concepts explained under Classical Conditioning
are also true for Operant Conditioning:
- Stimulus generalization: The aspects of the environment
that indicate whether the conditions are right for a behavior to be
effective in producing a consequence or not are referred to as the
environmental stimuli. Once a young child learns that a putting a coin
in a slot produces candy, he may try to put coins into parking meters,
etc.
- Stimulus discrimination: However, the child will soon learn to
discriminate between a parking meter and a candy machine, even though
they may look a lot alike.
- Stimulus control: the learner has to learn to recognize when the correct conditions
exist for his/her behavior to produce the desired result (so, in a sense, the stimulus of
the environment controls the the behavioral responses.) (Example, if all the lights go out
during a violent storm, you will not bother to switch the light switch
on; you know, based on the
stimuli available in the environment, that the switch won't work.)
- Extinction: In classical conditioning, this is when the conditioned stimulus is no
longer paired with the unconditioned stimulus, so, after a while, the conditioned response
no longer occurs. In operant conditioning, this is when a behavior is no longer reinforced
(or punished) and the conditioning eventually becomes 'extinct'.
- Spontaneous recovery: the behavior can be extinguished in one set of trials, but
may recur spontaneously at a later time without further reinforcement. The subject is
'testing' to see if conditions have changed and the behavior is again
effective in getting the desired result.
- Vicarious or observational learning: Because we are, again, a social species that
has little innate survival knowledge, we must learn about many things that are
dangerous and
potentially deadly from our elders, benefiting from their wisdom and
knowledge. While we learn emotional reactions to various stimuli through vicarious
conditioning, observing our parents, for instance , in how they react to situations that
are new to us and picking up their 'vibes' , we also learn behaviors -what to do and what not
to do, as well as HOW to do a number of things - by observation. This learning is
dependent upon our paying attention to the ABC's, remembering them, and
then imitating them. Before the learner demonstrates that he/she has
learned a new behavior, the learning is considered latent (hidden) until it is put into practice.
4. OTHER KINDS OF LEARNING Not all
learning is 'Conditioning'. Internal 'thinking' processes bring about some kinds of
learning. Understanding, anticipating, figuring things out, are all cognitive
processes in which the reinforcement for these cognitive activities can be just the
knowledge itself. The knowledge may be immediately useful, but curiosity is a
powerful drive for many mammals, and especially for humans,in and of itself. It's as if
many organisms, whose other needs and drives are
satisfied, have the urge to explore the world just in case the information should be needed
in the future.
For example, consider the concept of cognitive maps. These are
internal detailed layouts of the experience of the world. If your usual route to work is
blocked, you know from driving around town doing other things, basically how the town is
laid out and so, even though you may not have taken that particular route before, you can
follow it to your workplace. Even rats and bees have internal or cognitive maps.( If you
capture a bee, put it in a dark box and carry it to someplace new, it can still find its
hive and the flowers it has been using as a food source. A rat, placed in a maze and left
to wander around, can later learn to find its way to a food source faster than a rat which has
never been in the maze before. This is due to the cognitive map of the maze it
has constructed in its wanderings.)
Other kinds of cognitive learning:
- Vicarious latent learning demonstrates one kind of cognitive learning. Because
they are based on the learner gaining conditioned responses, they are discussed in
the sections on conditioning, but until they are exhibited in actual behavior, they can't
be considered a form of behaviorism, which is based in measurable behaviors.
- Rote learning (repetition until something is memorized) is an efficient way to acquire
'facts'. (The 'times tables', for instance) The 'facts' don't have to have meaning; even a
parrot can learn to say the names of numbers in the right sequence, but does it understand
the concept of numbers? Does it really know how to count?)
- Discovery learning is when exploration of a problem or issue leads to understanding of
the underlying principles. A child who has played with blocks that are in lengths that are
multiples of each other intuitively learns that 2 half-lengths make one long block, but it takes 4
quarter-length blocks to make a full-size block. This type of block play makes an
understanding of the underlying principles of adding, multiplication and fractions possible in a way that
mere rote memorization cannot. This type of learning is much more flexible as it can be
applied to a range of circumstances.
|