Navigation
Papers by Melberg
Elster Page
Ph.D work

About this web
Why?
Who am I?
Recommended
Statistics
Mail me
Subscribe
Search papers
List of titles only
Categorised titles

General Themes
Ph.D. in progress
Economics
Russia
Political Theory
Statistics/Econometrics
Various papers

The Questions
Ph.D Work
Introduction
Cost-Benefit
Statistical Problems
Social Interaction
Centralization vs. Decentralization

Economics
Define economics!
Models, Formalism
Fluctuations, Crisis
Psychology

Statistics
Econometrics

Review of textbooks

Belief formation
Inifinite regress
Rationality

Russia
Collapse of Communism
Political Culture
Reviews

Political Science
State Intervention
Justice/Rights/Paternalism
Nationalism/Ethnic Violence

Various
Yearly reviews

Philosophy
Explanation=?
Methodology

 


[Note for bibliographic reference: Melberg, Hans O. (1996), Critical Statistics - A Review of "Statistical Concepts and Methods" by Bhattachryya and Johnsonhttp://www.oocities.org/hmelberg/papers/970507.htm]




Critical Statistics
A Review of "Statistical Concepts and Methods" by Bhattachryya and Johnson


Gouri K. Bhattachryya and Richard A. Johnson
Statistical Concepts and Methods
John Wiley & Sons
New York, 1977
ISBN: 0-471-07204-4

Introduction
In the preface to Statistical Concepts and Methods Bhattachryya and Johnson (BJ) write that "Our purpose in this book is to build both an understanding of [the basic principles and techniques of statistical analysis] and an awareness of the limitations of the methods" [p. v]. Judged against these criteria the book is a partial success. While it is reasonably good at explaining statistical concepts, it could be better in conveying the limits of statistics.

Fostering understanding
My first criticism concerns the length of the book - 639 pages. Obviously the length reflects the authors attempt to give good explanations, including many examples and always explaining every concept in detail. In this way one may hope that even the weaker students can follow the book. I wonder, however, whether this strategy is counter-productive. The very length of the book deters students; the level of detail makes students loose the general inutition; and the many examples produce a book which borders comes close to be boring. In this way the long length may not produce the result that everybody is able to follow the gradual build-up of statistical understanding. The smart are bored, the not-so-smart are deterred and loose sight of the general picture.

In all fairness it must immediately be added that most of the examples are superb, and the same goes for the exercises that follow each chapter. Most of these are real-life examples with obvious significance and interpretations. For example, in one exercise the reader is asked to compute the number of different pizzas one might make if one is restricted to use only three (out of twenty possible) ingredients at once on one pizza (Answer: 20 * 18 * 19 = 6840 different pizzas). This gives the student an excellent intuitive idea of permutations. Moreover, in order to explain the idea of combinations, they present an experiment in which a psychologist is supposed to apply three stimuli - a bell, a light, or an electric shock - to a subject (p. 101, no 21). Once again the example is good - not only because it invites jokes about "mad" psychologists - but because it is easy to convey the feeling that the order in which the stimuli are applied is important for the outcome. After receiving an electric shock one might not react to a bell in the same way as one would if the bell came first (Contrast this with the pizzas - the order of the ingredients are usually not that important). Altogether, I consider these real-life examples to be a major strength of the book.

Not only are the examples good, BJ often present the examples before they present the general rules or formulas. This, in my experience, is the best way to teach statistics. First to present a concrete problem, then try to solve it, and finally to see whether the solution can be generalized to a rule (see, for example, p. 79). I also found the way they isolated major formulas and definitions - in small boxes - to be a good way to guide the reader to what is important - avoiding the problem some weaker students have, who sometimes try to memorize almost every small mathematical expression and definition.

Despite all these qualities, there is room for improvement. One would be to include a very short review at the end of each chapter - just listing the major concepts discussed, and providing a diagram which showed the location of the concept in the overall frame of statistics. For example, after the chapter on descriptive statistics one might produce the following diagram:


                                                                    mean
                                     Measure of central tendency  

One might then go on to produce even more details, which I have left out since it is difficult to construct using HTML. For example, after the mean and variance may be subdivided according to various data (simple, grouped, interval, and coded), and finally - the formulas could be written. Hence, if we start at variance, we would have something like the following:
                                                                    
                 Simple s2 = sigma (xi - mean x)2 / (n - 1)
... 2 = sigma (mi - mean x)2  * fi / n
 

On screen this might not look too illuminating, but my experience is that these simple maps of concepts often help students organize their knowledge in a way which makes them both understand, remember, and repeat the content more easily. It is simply a method of providing small boxes in which isolated bits of knowledge can be located - a way of integrating and giving a particular location to each formula. Although BJ do try this, I felt short reviews at the end of each chapter - often in a diagrammatic form, could improve the book in terms of helping the students to understand the material. 

A critical attitude
As mentioned BJ want to convey a feeling for the limitations of statistical methods. As they write, "Students should be encouraged to develop a critical attitude in applying the methods and to be cautious when interpreting the results" (p. vi). The question is then how well they succeeds in this.

First of all they should be given credit for at least trying to be critical. For example, they frequently point of the importance of the assumption of normality (see, p. 220, p. 250, and p. 316) - and they warn that the confidence interval for the standard deviation is more sensitive to the assumption of normality than the confidence interval for the mean of a population (p. 270).

A second point which is BL correctly note, is that one should not have 95% confidence in 95% confidence intervals (see p. 247 and p. 372). This sounds confusing, but the simple argument is that when we compute a confidence interval we make certain assumptions (normality), and we leave out the unquantifiable sources of uncertainty. So, we are not really 95% certain that a predicted value will be within our band. Rather, if our assumptions are correct and if there are no important unquantifiable sources of error, then we are 95% certain. These two if's indicate that we are really less than 95% certain, but since this uncertainty cannot be quantified it is left out.

As a concrete example, one might use Literary Digest's famously incorrect prediction that A. Landon was going to win over F. D. Roosevelt in the 1936 presidential election. One source of the mistake was that the sample was unrepresentative, since it consisted of people with a telephone or a car i.e. mostly rich people in the 1930s. Even if one used their sample to find a 95% confidence interval for the percentage of votes going to Landon or Roosevelt, we should obviously not be 95% certain that the result from the election would be in this interval. Although this example is discussed in BJ (p. 552-554), it deserves a much closer examination. Unless statistics want to be a second decimal science - focusing on the quantifiable but relatively small sources of error - they should try harder to develop methods to deal with the (apparently) unquantifiable. At least one might discuss these sources of error in a more systematic way. (For an imperfect attempt to do this, see my article Against Correlation .)

The same kind of criticism applies to BJ's discussion of correlation, causation and spurious correlation. While they include the ritual warning not to confuse correlation with causation, they do not really discuss the problem in a systematic way. One could, for example, try to develop a list of how spurious correlation might occur (adverse selection, moral hazard). One might also discuss the more philosophical issue of causation (starting with David Hume) and whether a belief can be justified by means other than correlation (once again, I try to do some of this in the article mentioned above. I also discuss the issue of justification in The Cultural Approach to Russian History. Jon Elster also has an excellent discussion on the difference between causation and correlation in the first chapter of his book Nuts and Bolts for the Social Sciences).

One might object that these discussions are too advanced for a class of introductory statistics. I strongly disagree. My experience is that discussing these philosophical issues makes the students more motivated to work hard with the boring topics. It also makes the students appreciate the role of statistics as "the logical basis for comparing evidence" (p. 286. See also p. 4-5). Consider, for example, the following questions: "What was the probability in 1980 that the USSR would collapse before 1992?" In my experience this serves as an excellent starting-point for a philosophical discussion of statistics. First, it reveals how dependent traditional statistics is on the frequency concept of probability (i.e. how probable something is depends on how often the event has occurred in the past). Clearly, there is no simple historical frequency which allows us to say how probable it was that the USSR would collapse. One is then left to wonder whether there are other kinds of evidence which allows us to say how probable the collapse was. We could, for example, make some inferences based on the economic theory of the efficiency of central planning (von Mises, Hayek). Second, the question is good when discussing causation because one may list the various "causes" of the collapse - social, political, military, economic. One might then argue that the correlation between social problems and the collapse was spurious in the sense that it was economic problems which caused the social problems which in turn contributed to the collapse. Altogether, a simple question like this introduces the students to many important non-standard statistical questions.

The above criticism - the lack of systematic discussion of unquanitifiable sources of error, causation and spurious correlation - should not be interpreted as an attack on BJ. Their book is in this respect no worse than others. However, two chapter were even less critical than one might expect from a book which explicitly aims to develop a critical attitude in the reader. These were the chapter on Basic Concepts of Testing Hypotheses (chapter 6, p.165 ff) and the related chapter on Inferences About a Population (ch. 8, p.233ff).

One problem was the discussion of the power curve (p. 171 ff). Almost all my students failed to understand this discussion - and I am not sure I understood it either after reading BJ. Their second discussion of the power curve did not help (p. 259 ff) - it only made the confusion more widespread. Maybe this is an unfair criticism since I cannot say exactly what was wrong. Maybe, also, that the mistake was ours in not investing enough time on the topic. On the surface BJ's discussion looks good - with many concrete examples and calculations. Still, we failed to understand this properly, and instead of struggling with BJ we read a printout from SurfStat which gave an good intuitive account of the power of a test (available on the Net).

A more serious problem was the lack of emphasis on the gradual nature of hypothesis testing. Once again there were the ritual warning that "there is an element of uncertainty in the conclusions reached" (p. 166). But, they also state that "the primary goal is to determine whether a conjecture is supported or contradicted" (p. 255, see the list on p. 179). While there is no direct contradiction here, I feel the choice of words is unfortunate. Our aim is to find the degree to which the data supports a statement - not to reject or to "not-reject" the hypothesis. To this effect the p-value (or what they call the significance probability) is a much better tool than naive hypothesis testing. Moreover, the context is important in determining the strength of our evidence (see Bees, marbles and generalizations based on one example: A reflection on the concept of statistical significance ). Once again, BJ - and other authors - both know and note this. However, they still write chapters which in practice encourage students to do naive hypothesis testing, only adding a sub-chapter which says that the p-value is a more informative measure than the dichotomous rejection or non-rejection of a hypothesis. My proposal would be to turn this around: To have a chapter entitled "A Measure of the Strength of the Evidence - The p-value" - relegating traditional hypothesis testing to a sub-chapter within this main chapter, and including a long discussion on the importance of the context with many concrete examples.

Another problem is the impression that "the realization of a type I error is deemed more serious than a type II error" (p. 173,) and that "the analysts should consider a statement false unless the contrary is strongly supported by the data" (p. 168). This is corrected on p. 174 when they write that "The specification of the tolerance level for the type I error probability is not a statistical problem." Still, I missed a critical discussion of the 95% significance level and the factors determining the aversion against type I errors. In some contexts - airline safety regulation is one example - the probability of a type I error is more serious than in other contexts (For more on this, please see my short observation What should we believe? A reflection on the flawed use of traditional hypothesis-testing. )

A last detail of a criticism, is their claim that one may find the alternative hypothesis by negation of the null hypothesis (p. 167). As I have argued in another article, this does not always work since many statements do not have simple negations. Consider the statement "He is a Christian." What is the negation? Is it "He is an atheist" or "He is a Satanist" or "He is an agnostic?" (For more on this see True and False at the same time? Russian religiousness and statistical theory). This is, maybe, a small detail - but these are the interesting details that make statistics interesting - as opposed to the mechanical exercise it is often made to be.

Overall
Although this review has focused on criticising BJ, I would like to end by praising the book. It's extensive use of real-life examples and the excellent exercises makes this a very good book for an introductory course in statistics. On the negative side I emphasised its somewhat over-detailed explanations of basic concepts - which in turn makes this a rather long book. I also criticised the book for not being good enough at developing a critical attitude toward statistics - though it should be noted that it is by no means uncritical. It should also be noted that few books are much better than BJ in fostering this critical attitude. Overall this amounts to a recommendation of BJ's Statistical Methods and Concepts.



Afternotes:
1. Without really looking, I discovered two errors in the book. A small typographical error on p. 79 ("agrument" instead of "argument"), and a more serious error in exercise 13 on p. 325. According to the table 100 women are in the sample (tribe A), but if one actually sums the data the answer is 103.)

2. It may be objected that my criticism is inconsistent: I claim the book is too long at the same time that I complain about issues that are left out. However, I believe it is possible to reduce the length of almost all the chapters - without significant reduction in understanding. This would make room for more philosophical discussion/criticism, and still reduce the overall length of the book.



[Note for bibliographic reference: Melberg, Hans O. (1996), Critical Statistics - A Review of "Statistical Concepts and Methods" by Bhattachryya and Johnsonhttp://www.oocities.org/hmelberg/papers/970507.htm]