Melberg, Hans O. (1997), Bees, marbles and generalizations based on one example: A reflection on the concept of statistical significance

_{[Note for bibliographic reference: Melberg, Hans O. (1997) Bees, marbles and
generalizations based on one example: A reflection on the concept of statistical
significance, http://www.oocities.org/hmelberg/papers/970117.htm]

Bees, marbles and generalizations based on one example

A reflection on the concept of statistical significance
by Hans O. Melberg

Introduction

How many examples do we need before we can make a reliable generalization? The standard
reflex of a statistician is to tell you that at least you need more than one example.
Exactly how many you need then depends on the size of the whole population and assumptions
about the distribution of the variable in question. However, I recently encountered one
illustration which made me think that this reflex is wrong. In fact, the number of
examples needed is much more context-sensitive and I believe only one example is sometimes
enough to create a reliable generalization.
Two examples: Bees and marbles

The example which convinced me of this was James L. Gould's famous experiment with bees.
The background to this experiment was Karl von Frish's discovery of the complex bee-dance.
This dance, he argued, was used to communicate where the bees could find food. Against
this Adrian Wenner argued that it was a mistake to believe that the dace was a form of
communication He did not deny than the dance showed both the direction and the distance to
the food source, but he denied that the other bees were able to understand this. The last
- and final - participant in this debate was James L. Gould who designed an experiment
which proved that the dance in fact transmitted information to the other bees.
To prove his thesis Gould put a light-bulb inside the hive. This is important because
the bee-dance signals the position of the food relative to the sun. When there is a light
bulb in the hive the bees act as if the light bulb was the sun and they
"calculate" in which direction to fly based on the direction on the dance
relative to the position of the light-bulb. However, Gold also painted the eye's of the
dancing bee (but not the other bees watching the dancing bee) with black shellac. This
meant that the dancing bee could not use the light-bulb as a proxy for the sun. When they
are unable (as they normally are inside the hive) to use the light to indicate the
direction of the food, bees use gravitation. This means that the position of the
food-source relative to the sun is measured by the degree of deviation (of the bee's
straight runs) from an imagined vertical line. This was Gould's experiment. What were his
conclusions?
Gould found that the blinded bee used the vertical line to show the direction, while
the watching bees flew off in the direction predicted by the angle of the dance relative
to the light-bulb. This proved that the dance really conveys information. The only way to
explain flight-path of the bees after the dance, was that they were fooled by false
information in the dance (false because the dancer was giving the wrong direction since he
was blindfolded). This implies that they in fact used the dance to determine the direction
of their flight.
So, you might ask, where is the statistical lesson in all this biology? The sentence
which caught my attention when I read about this went as follows: "He [J. Gould] did
the experiment not with just one blindfolded bee, of course, but with a proper statistical
sample of bees and variously manipulated angles." (p. 101 in Richard Dawkins' book River
out of Eden). My immediate though was that I was quite willing to drop Wenner's theory
of no communication after only a single experiment with one bee. A few more experiments
may make me even more confident, but this would only be a marginal effect compared to the
revision of my belief and confidence after the first experiment. In short, I am convinced
about a hypothesis after one example.
The conventional statistician would cry. "You need many more instances in order to
prove that this is statistically significant", he would argue. As Dawkins writes in
the mentioned book "... one example of any phenomenon is not enough to base
generalizations on" (p. 136). It is easy to understand why this is the standard
reflex of statisticians and most people: Consider a black box with 100 marbles. Assume
that the marbles are either blue or yellow, but that we do not know exactly how many blue
or yellow marbles there are. If the researcher pick a single marble at random from the box
and it was red, you cannot infer from this that the belief that "all the marbles are
red" is true. Why are we convinced by one example in the first case but not in the
second?
The example with the marbles conforms to the standard assumption in introductory
statistics. The first example is different because we believe that it is highly unlikely
that we should pick a bee or a hive which somehow reacted differently from all the other
bees/hives. Whereas there is no (or little) reason to believe the next marble to be read
simply because the first was red, there is strong reason to suspect that the other bees
will conform to the behavior observed in the single experiment. There is strong reason
because our background knowledge tells us that since all bees are reasonably similarly
build (evolved through the same process) the reflexes of one bee can safely bee taken as
an example of how all bees will react.
Conclusion

What are the consequences if we accept that the statistical significance of one example
depends on the context. I believe the above examples show that we should use what is
called a bayesian approach when we judge the reliability of various beliefs. In short,
there updating of beliefs takes place in a context of other beliefs about how the world is
related and these beliefs are important when we judge what weight to give a single
example.

Note: If you want to read more about the misunderstood concept of statistical
significance, there is a good article by Deirdre McCloskey and Stephen T. Ziliak
("The Standard Error of Regressions") in Journal of Economic Literature
34 (1996), pp. 97-114).
_{[Note for bibliographic reference: Melberg, Hans O. (1997) Bees, marbles and
generalizations based on one example: A reflection on the concept of statistical
significance, http://www.oocities.org/hmelberg/papers/970117.htm]}}