Melberg, Hans O., Critical statistics

_{[Note for bibliographic reference: Melberg, Hans O. (1996), Critical Statistics -
A Review of "Statistical Concepts and Methods" by Bhattachryya and Johnsonhttp://www.oocities.org/hmelberg/papers/970507.htm]

Critical Statistics

A Review of "Statistical Concepts and Methods" by
Bhattachryya and Johnson

_{Gouri K. Bhattachryya and Richard A. Johnson

Statistical Concepts and Methods

John Wiley & Sons

New York, 1977

ISBN: 0-471-07204-4}
Introduction

In the preface to Statistical Concepts and Methods Bhattachryya and Johnson (BJ)
write that "Our purpose in this book is to build both an understanding of [the basic
principles and techniques of statistical analysis] and an awareness of the limitations of
the methods" [p. v]. Judged against these criteria the book is a partial success.
While it is reasonably good at explaining statistical concepts, it could be better in
conveying the limits of statistics.
Fostering understanding

My first criticism concerns the length of the book - 639 pages. Obviously the length
reflects the authors attempt to give good explanations, including many examples and always
explaining every concept in detail. In this way one may hope that even the weaker students
can follow the book. I wonder, however, whether this strategy is counter-productive. The
very length of the book deters students; the level of detail makes students loose the
general inutition; and the many examples produce a book which borders comes close to be
boring. In this way the long length may not produce the result that everybody is able to
follow the gradual build-up of statistical understanding. The smart are bored, the
not-so-smart are deterred and loose sight of the general picture.
In all fairness it must immediately be added that most of the examples are superb, and
the same goes for the exercises that follow each chapter. Most of these are real-life
examples with obvious significance and interpretations. For example, in one exercise the
reader is asked to compute the number of different pizzas one might make if one is
restricted to use only three (out of twenty possible) ingredients at once on one pizza
(Answer: 20 * 18 * 19 = 6840 different pizzas). This gives the student an excellent
intuitive idea of permutations. Moreover, in order to explain the idea of combinations,
they present an experiment in which a psychologist is supposed to apply three stimuli - a
bell, a light, or an electric shock - to a subject (p. 101, no 21). Once again the example
is good - not only because it invites jokes about "mad" psychologists - but
because it is easy to convey the feeling that the order in which the stimuli are applied
is important for the outcome. After receiving an electric shock one might not react to a
bell in the same way as one would if the bell came first (Contrast this with the pizzas -
the order of the ingredients are usually not that important). Altogether, I consider these
real-life examples to be a major strength of the book.
Not only are the examples good, BJ often present the examples before they
present the general rules or formulas. This, in my experience, is the best way to teach
statistics. First to present a concrete problem, then try to solve it, and finally to see
whether the solution can be generalized to a rule (see, for example, p. 79). I also found
the way they isolated major formulas and definitions - in small boxes - to be a good way
to guide the reader to what is important - avoiding the problem some weaker students have,
who sometimes try to memorize almost every small mathematical expression and definition.
Despite all these qualities, there is room for improvement. One would be to include a
very short review at the end of each chapter - just listing the major concepts discussed,
and providing a diagram which showed the location of the concept in the overall frame of
statistics. For example, after the chapter on descriptive statistics one might produce the
following diagram:

mean
Measure of central tendency

One might then go on to produce even more details, which I have left out since it is difficult to construct using HTML. For example, after the mean and variance may be subdivided according to various data (simple, grouped, interval, and coded), and finally - the formulas could be written. Hence, if we start at variance, we would have something like the following:

Simple s² = sigma (x_i - mean x)² / (n - 1)
... 2 = sigma (m_i - mean x)² * f_i / n

On screen this might not look too illuminating, but my experience is that these simple maps of concepts often help students organize their knowledge in a way which makes them both understand, remember, and repeat the content more easily. It is simply a method of providing small boxes in which isolated bits of knowledge can be located - a way of integrating and giving a particular location to each formula. Although BJ do try this, I felt short reviews at the end of each chapter - often in a diagrammatic form, could improve the book in terms of helping the students to understand the material.

A critical attitude

As mentioned BJ want to convey a feeling for the limitations of statistical methods. As
they write, "Students should be encouraged to develop a critical attitude in applying
the methods and to be cautious when interpreting the results" (p. vi). The question
is then how well they succeeds in this.
First of all they should be given credit for at least trying to be critical. For
example, they frequently point of the importance of the assumption of normality (see, p.
220, p. 250, and p. 316) - and they warn that the confidence interval for the standard
deviation is more sensitive to the assumption of normality than the confidence interval
for the mean of a population (p. 270).
A second point which is BL correctly note, is that one should not have 95% confidence
in 95% confidence intervals (see p. 247 and p. 372). This sounds confusing, but the simple
argument is that when we compute a confidence interval we make certain assumptions
(normality), and we leave out the unquantifiable sources of uncertainty. So, we are not
really 95% certain that a predicted value will be within our band. Rather, if our
assumptions are correct and if there are no important unquantifiable sources of
error, then we are 95% certain. These two if's indicate that we are really less than 95%
certain, but since this uncertainty cannot be quantified it is left out.
As a concrete example, one might use Literary Digest's famously incorrect
prediction that A. Landon was going to win over F. D. Roosevelt in the 1936 presidential
election. One source of the mistake was that the sample was unrepresentative, since it
consisted of people with a telephone or a car i.e. mostly rich people in the 1930s. Even
if one used their sample to find a 95% confidence interval for the percentage of votes
going to Landon or Roosevelt, we should obviously not be 95% certain that the result from
the election would be in this interval. Although this example is discussed in BJ (p.
552-554), it deserves a much closer examination. Unless statistics want to be a second
decimal science - focusing on the quantifiable but relatively small sources of error -
they should try harder to develop methods to deal with the (apparently) unquantifiable. At
least one might discuss these sources of error in a more systematic way. (For an imperfect
attempt to do this, see my article Against Correlation .)
The same kind of criticism applies to BJ's discussion of correlation, causation and
spurious correlation. While they include the ritual warning not to confuse correlation
with causation, they do not really discuss the problem in a systematic way. One could, for
example, try to develop a list of how spurious correlation might occur (adverse selection,
moral hazard). One might also discuss the more philosophical issue of causation (starting
with David Hume) and whether a belief can be justified by means other than correlation
(once again, I try to do some of this in the article mentioned above. I also discuss the
issue of justification in The Cultural Approach to Russian History.
Jon Elster also has an excellent discussion on the difference between causation and
correlation in the first chapter of his book Nuts and Bolts for the Social Sciences).

One might object that these discussions are too advanced for a class of introductory
statistics. I strongly disagree. My experience is that discussing these philosophical
issues makes the students more motivated to work hard with the boring topics. It also
makes the students appreciate the role of statistics as "the logical basis for
comparing evidence" (p. 286. See also p. 4-5). Consider, for example, the following
questions: "What was the probability in 1980 that the USSR would collapse before
1992?" In my experience this serves as an excellent starting-point for a
philosophical discussion of statistics. First, it reveals how dependent traditional
statistics is on the frequency concept of probability (i.e. how probable something is
depends on how often the event has occurred in the past). Clearly, there is no simple
historical frequency which allows us to say how probable it was that the USSR would
collapse. One is then left to wonder whether there are other kinds of evidence which
allows us to say how probable the collapse was. We could, for example, make some
inferences based on the economic theory of the efficiency of central planning (von Mises,
Hayek). Second, the question is good when discussing causation because one may list the
various "causes" of the collapse - social, political, military, economic. One
might then argue that the correlation between social problems and the collapse was
spurious in the sense that it was economic problems which caused the social problems which
in turn contributed to the collapse. Altogether, a simple question like this introduces
the students to many important non-standard statistical questions.
The above criticism - the lack of systematic discussion of unquanitifiable sources of
error, causation and spurious correlation - should not be interpreted as an attack on BJ.
Their book is in this respect no worse than others. However, two chapter were even less
critical than one might expect from a book which explicitly aims to develop a critical
attitude in the reader. These were the chapter on Basic Concepts of Testing Hypotheses
(chapter 6, p.165 ff) and the related chapter on Inferences About a Population (ch.
8, p.233ff).
One problem was the discussion of the power curve (p. 171 ff). Almost all my students
failed to understand this discussion - and I am not sure I understood it either after
reading BJ. Their second discussion of the power curve did not help (p. 259 ff) - it only
made the confusion more widespread. Maybe this is an unfair criticism since I cannot say
exactly what was wrong. Maybe, also, that the mistake was ours in not investing enough
time on the topic. On the surface BJ's discussion looks good - with many concrete examples
and calculations. Still, we failed to understand this properly, and instead of struggling
with BJ we read a printout from SurfStat which gave an good intuitive account of the power
of a test (available on the Net).
A more serious problem was the lack of emphasis on the gradual nature of hypothesis
testing. Once again there were the ritual warning that "there is an element of
uncertainty in the conclusions reached" (p. 166). But, they also state that "the
primary goal is to determine whether a conjecture is supported or contradicted" (p.
255, see the list on p. 179). While there is no direct contradiction here, I feel the
choice of words is unfortunate. Our aim is to find the degree to which the data
supports a statement - not to reject or to "not-reject" the hypothesis.
To this effect the p-value (or what they call the significance probability) is a much
better tool than naive hypothesis testing. Moreover, the context is important in
determining the strength of our evidence (see Bees, marbles and
generalizations based on one example: A reflection on the concept of statistical
significance ). Once again, BJ - and other authors - both know and note this. However,
they still write chapters which in practice encourage students to do naive
hypothesis testing, only adding a sub-chapter which says that the p-value is a more
informative measure than the dichotomous rejection or non-rejection of a hypothesis. My
proposal would be to turn this around: To have a chapter entitled "A Measure of the
Strength of the Evidence - The p-value" - relegating traditional hypothesis testing
to a sub-chapter within this main chapter, and including a long discussion on the
importance of the context with many concrete examples.
Another problem is the impression that "the realization of a type I error is
deemed more serious than a type II error" (p. 173,) and that "the analysts
should consider a statement false unless the contrary is strongly supported by the
data" (p. 168). This is corrected on p. 174 when they write that "The
specification of the tolerance level for the type I error probability is not a statistical
problem." Still, I missed a critical discussion of the 95% significance level and the
factors determining the aversion against type I errors. In some contexts - airline safety
regulation is one example - the probability of a type I error is more serious than in
other contexts (For more on this, please see my short observation What
should we believe? A reflection on the flawed use of traditional hypothesis-testing. )

A last detail of a criticism, is their claim that one may find the alternative
hypothesis by negation of the null hypothesis (p. 167). As I have argued in another
article, this does not always work since many statements do not have simple negations.
Consider the statement "He is a Christian." What is the negation? Is it "He
is an atheist" or "He is a Satanist" or "He is an agnostic?" (For
more on this see True and False at the same time? Russian
religiousness and statistical theory). This is, maybe, a small detail - but these are
the interesting details that make statistics interesting - as opposed to the mechanical
exercise it is often made to be.
Overall

Although this review has focused on criticising BJ, I would like to end by praising the
book. It's extensive use of real-life examples and the excellent exercises makes this a
very good book for an introductory course in statistics. On the negative side I emphasised
its somewhat over-detailed explanations of basic concepts - which in turn makes this a
rather long book. I also criticised the book for not being good enough at developing a
critical attitude toward statistics - though it should be noted that it is by no means
uncritical. It should also be noted that few books are much better than BJ in fostering
this critical attitude. Overall this amounts to a recommendation of BJ's Statistical
Methods and Concepts.

Afternotes:

1. Without really looking, I discovered two errors in the book. A small typographical
error on p. 79 ("agrument" instead of "argument"), and a more serious
error in exercise 13 on p. 325. According to the table 100 women are in the sample (tribe
A), but if one actually sums the data the answer is 103.)
2. It may be objected that my criticism is inconsistent: I claim the book is too long
at the same time that I complain about issues that are left out. However, I believe it is
possible to reduce the length of almost all the chapters - without significant reduction
in understanding. This would make room for more philosophical discussion/criticism, and
still reduce the overall length of the book.

_{[Note for bibliographic reference: Melberg, Hans O. (1996), Critical Statistics - A
Review of "Statistical Concepts and Methods" by Bhattachryya and Johnsonhttp://www.oocities.org/hmelberg/papers/970507.htm]}}