Papers by Melberg
About this web
Who am I?
List of titles only
Ph.D. in progress
Review of textbooks
Collapse of Communism
[Note for bibliographic reference: Melberg, Hans O. (1996), Against correlation, http://www.oocities.com/hmelberg/papers/960415.htm]
by Hans O. Melberg
In most introductory texts on statistics there is a warning about confusing correlation
and causation. The standard example being that the number of storks and new-born babies
are strongly correlated. However, as we all know, it is not the number of storks that
causes the number of babies. Rather there is a third variable which causes both to be
correlated: the weather. In this article I attempt to classify some of the causes of the
confusion. I then try to construct an alternative to correlation as the basis for the
estimating the probability of statements.
What is correlation?
Informally we say that two phenomena are correlated if we to a large extent observe that
the two appear together. For a short technical introduction to this intuitive idea, click here.
Why are we interested in correlation?
Correlation is often used as an argument to justify explanations. Hence, the study of the
theory of correlation is important because it may reveal flaws which in turn reduce the
plausibility of explanations.
Two introduce the reader to this kind of reasoning I want to give two examples
In 1867 a doctor named J. Lister published a paper which showed that surgery was much
safer when the environment was sterilised. Previously the Hungarian doctor I. Semmelweis
had been ridiculed for suggesting that there was a connection between the dirty hands of
hospital staff and infections caught by women after childbirth. Although there was a
correlation there was no "scientific reason" to support it. Pasteur then
provided the causal mechanism by demonstrating how bacteria in the air or on hands could
cause the disease. After Lister's paper hygienic standards were raised and the occurrence
of the disease decreased drastically.
An example of the second use of correlation, is G. A. Cohen's argument that historical
correlations can be used to justify the Marxist theory of historical change. Basically his
argument is that if we observe a historic correlation between two factors we are justified
on believing that there is a causal relationship between the two even if we cannot
elaborate on the exact nature of the link between the two variables. 1
These examples may seem perfectly acceptable. However, as I shall try to argue below, the
intuitive appeal is sometimes deceptive because there are many cases in which we are
falsely led to believe that a correlation also implies a causal relationship.
The conceptual problem
In the social sciences we want to explain facts and events. There is some conceptual
disagreement as to what an explanation is. Jon Elster argues that to explain an event
"is to give an account of why it happened as it happened."2 Hence, even when we prove historically that two variables are
correlated, we do not have an explanation since we do not know why the two are correlated.
Cohen, on the other hand, argues that correlations may constitute explanations, although
incomplete explanations. For example, if I prove that one type of revolutions is always
correlated with a certain level of development of the economy (or more precise: of the
productive forces), I have an incomplete explanation of this type of revolutions. For the
explanation to be complete we must, as Cohen agrees and Elster insists, provide the causal
mechanisms between the two variables.
I do not want to enter into this terminological issue. My focus is on the reasons why
correlation may lead us to the wrong conclusions, not the definition of the term
How correlations may lead to wrong conclusions
In the first part I shall assume that there are no practical problem with the data (such
as measurement errors and data scarcity)
a) Strong correlation but no causal link
Assume we have a strong correlation. The question is then how this could be if there is no
causal connection between the two variables.
Some correlations are purely accidental and hence constitute no proof of causation. I may
take a phenomena, such as the unemployment numbers, and then search for a data series that
is highly correlated with these figures. I might find that the number of rattle snakes in
USA is highly correlated with unemployment in Norway, but the two are obviously in no way
In the example above it was easy to discover the accidental nature of the correlation.
However, in other cases we are less sure whether the correlation is accidental or not.
Consider a recent case on Norway in which a nurse in a nursing home was jailed because
there had been an abnormal number of deaths when she was at duty. 3
The police suspected that she was a murderer, she maintained her innocence. The problem,
of course, was that the correlation could be accidental. To determine this we could try to
estimate the strength of our belief that the correlations was accidental by statistical
techniques. However in this case such tests are difficult. Statistically there is bound to
be some nurses that have more deaths when they are on duty than others, even when deaths
are distributed randomly. We cannot every year accuse the nurse with the highest number of
deaths of being a murderer even if we are more than 95% sure that her "death
average" is significantly different from the average nurse. The problem illustrates
that accidental correlations may be important in the formation of false beliefs. (In this
specific case the police decide not to prosecute precisely because of the mentioned
2. Ignored third variable effects I (Common Cause)
In the introduction I mentioned the standard textbook example of spurious correlation:
That the number of storks are correlated with the number of babies. The reason these two
variables are correlated could be a third variable, the weather, which might be a common
cause of both variables. When the weather is warm there are many babies and humans are
more likely to have sex.4 Hence we have
One should note that this common cause could be found at a lower level i.e. the
structure could be:
In other words D and E are correlated, but not causally linked since there is a common
factor (A) which causes B and C which in turn causes D and E. This structure could be
extended so the common cause was even deeper down in the system. In this way the problem
of third variable effects becomes even more difficult to discover since the connections
may not be as obvious as the stork/babies case.
3. Ignored third variable effects II (Intervening variable)
In a famous study of suicide one of the fathers of modern sociology, Durkheim, discovered
that there was a correlation between the number of suicides and climate. 5
The warmer it was, the more suicides were committed. However, we should not draw the
conclusion that it was the weather which caused the number of suicides to increase. It
could be that warm weather mean that people interact more and the feeling of loneliness
and failure intensifies in potential suicide candidates. Hence, the causal chain is:
Temperature --> Social interaction --> Feeling of isolation --> Suicide
We may wonder whether this is really a case of spurious correlation. Even if we do not
have a direct link between suicides and temperature, there is at least an indirect link in
contrast to the case of babies and storks. It is true that in some cases and for some
purposes the indirect nature of these links are not important. For example if you simply
want to predict a variable you do not care whether the basis for you prediction is
indirect as long as it works. However, sometimes it is very important to discover the
indirect nature of the correlation. If you were to design policies aimed at reducing the
number of suicides, it would be useful to know that it was caused by a feeling of social
isolation and not temperature directly.
Even if we have a strong correlation and we suspect it implies causation, it would still
be wrong to conclude that the correlation implies causation in a specific instance
(fallacy of composition). For example cancer is highly correlated with an early death.
However, I might be wrong to conclude that the death of a person who had cancer was caused
by cancer. The reason being that there might be a pre-emptive cause, such as the man being
killed in a car- accident, which was the real cause of the death.6
The same applies at the social level. When historians try to explain the causes of the
Russian revolution of 1917, they cannot simply infer the causes from a general theory of
revolutions even if this theory is backed by strong correlation evidence. Every revolution
must be examined on its own (as well as comparatively) because in specific instance there
might be different causes at work than what is generally the case.
b) Weak correlation but strong causal link
Strong correlations need not imply causation, as discussed above. But the converse is also
true. We may have weak correlations but strong causal connections.
5. Third variable effects III
In the same way that a third variable may create a spurious correlation, a third variable
may also disguise a strong relationship. Wonnacott and Wonnacott in their classic
introduction to econometrics gives the example of rainfall and yield in agriculture.7 First they find a weak negative correlation between rainfall and
yield. This may seem strange and the answer is that a third variable, temperature, is at
work. Rainfall tends to be associated with low temperatures and low temperatures results
in lower yield. Hence there is a third variable which if we were unaware of it, might lead
us into either concluding that there is no strong relationship between rainfall and yield
or that the relationship is of the wrong direction.
The correlation coefficient measures only the contemporaneous correlation between
variables. However, some variables have causal links that come into effect only after some
time. The contemporaneous correlation between inflation and the government budget deficit
may be low, but the causal relationship may be strong because it may take some time before
the deficit creates inflation. This, of course, need not be a large problem. We might
simply try the correlation between the deficit of previous year (or another year) and
current inflation. However, the serious problem arises when the length of the lag varies
(maybe as the result of increased speed of belief revision). In this case we would not
find any correlation even if there was a strong causal link.
7. Non-linear relationship
The correlation coefficient measures only the strength of a linear relationship. This well
known fact leads to the almost equally well known warning that a weak correlation
coefficient may mask a strong non-linear relationship. This is true, though one should
note that also a strong non-linear correlation need not imply causation. An amusing
example may be the relationship between my wife's anger and the amount of water I pour
over her. A small glass may provoke a rather large reaction. A somewhat bigger glass of
water does not increase the amount of anger, but a bucket of water greatly increases the
anger. Thus, the (linear) correlation might be weak, but the causal relationship is
Another example could be the relationship between the cost of sending a letter and the
weight of the letter. We know by definition that there is a strong and deterministic
relationship. However, the (linear) correlation is less than one because the actual
relationship is discrete.
Certain non-linear relationships are well know - exponential, quadratic, hyperbolic - but
one might wonder whether there are some kinds of causal connections we have not
discovered. For example, so-called chaotic relationships were not really studied until a
few years ago. An example of such an relationship is the function: x t+1 = k x
(1-x ). This difference function was originally used (in its differential form) by
Verhulst in 1845 to model population changes.8 For some values of k
(and x is between 0 and 1) this relationship generates patterns that seems random on a
graph. If we did not know that the relationship was generated by a function we could have
falsely concluded that there was in no deterministic causal process at work. Certainly
there is no correlation between the two that would lead to the suspicion of a
relationship. By focusing on linear and standard non-linear relationships we may have
ignored other and potentially more important causal patterns.9
c) Interpretative problems
Even if we assume we have a correlation which we suspect some kind of causal relationship
between the variables, we still have serious problems in determining the precise nature of
8. The direction of the causation
A correlation between two variables does not say which variable causes the other. I
recently read a newspaper-article which illustrated this problem very clearly. The article
was about a large study which indicated a strong (positive) correlation between the
quality of a persons' sex life and how young the person looked (above a threshold of
course). The newspaper interpreted this to the effect that good sex causes a person's look
to age more slowly. However, the causal relation could also be in the opposite direction:
People with good looks may be more likely to have good sex. This demonstrates the problem
of interpreting the direction of causation even if you have two variables which are both
correlated and causally related.
9. Joint causation
Sometimes the flow of causation goes in both directions and it would be wrong to interpret
the correlation as a proof that one variable causes the other. For example, in economics
the price and the quantity sold of a good is often (negatively) correlated. We may
interpret this to the effect that the price of the good determines quantity traded, which
is probably partly true. However, it might also be true that quantity determines prices.
In effect we have a relationship in which the variables are simultaneously determined.
Prices affect quantity and quantity affects prices. The correlation coefficient is of no
help in determining the causal structure of this relationship.
10. Wrong kind of causation
In one of his books Jon Elster gives an amusing illustration of "wrong
causation." He writes that his son once tried to command him to laugh. And, Elster
admits, of course he laughed. We may thus observe a strong correlation between the command
"laugh" and laughter. However, we would be wrong to conclude that the command
"laugh" causes laughter. It was precisely because Elster knew that it is
impossible to produce laughter on command that he laughed at the command. One might argue
that this is simply another example of an ignored third variable (and maybe it is),
however it is an almost unavoidable one since such mental operations cannot be quantified
(and hence tested for), as opposed to the ignored variable (social interaction) in the
mentioned example of suicides rates (see 3: Intervening variables).
11. Self-confirming correlations
Assume you are opening a firm in a rather poor society composed of one large homogeneous
ethnic group and a small minority. Assume further that this minority has a bad reputation
for stealing. It might even be true, when you arrive, that a disproportionate amount of
the crime is committed by this minority. You then quite rationally decide not to hire
people from the ethnic minority (this is obviously a country with few laws regarding
discrimination). This, in turn, means that since they constantly loose out, the minority
may engage in more criminal activity (either to survive or because they have nothing to
loose). This, then, only exacerbates the problem. What we have is thus a correlation
between criminal activity and being member of an ethnic group. The problem is how we
interpret this correlation. Some may say that the group is "by nature" more
untrustworthy and prove this by the statistical correlation. However, a little more
reflection also shows that the correlation may sustain itself (once it is started). It is
the correlation which forms the basis for belief formation which in turn results in
actions that leads to the correlation. We thus have a correlation which indicates some
kind of causal chain, but not the one we might think at first.
12. The time aspect
Even if we find a correlation and even if this correlation implies a causal relationship,
we do not know whether this is a long run or a short run effect or whether it is a steady
state or a disequilibrium effect. An example may clarify these statements.
Assume you live in a society in which the parents determine who their children should
marry. 10 Some people may go against this tradition, but it is
observed that these marriages often fail or become unhappy. One might then conclude that
the observed correlation between marrying for love and unhappy marriages is causal.
However, we might inquire about the precise nature of this causal relationship. For
example, the observed correlation need not apply in a different steady state such as a
state in which people married out of love. There are at least two reasons for this. First,
the observed correlation may be caused by adverse selection i.e. people who do not conform
to the tradition may be people who have stubborn personalities and therefore their
marriages are not very happy. Second, there might be a causal after-effect in that people
who marry against tradition are treated differently by the rest of society which in turn
makes them unhappy. Both these effects would disappear in a state in which all people
married out of love.
The distinction between the short run and the long run is of similar structure. For
example, a democratic system may initially result in greater diversity of expressed
opinions than under a dictatorship or an autocracy. However, as time passes different
causal mechanisms may undermine this effect to produce a more conform society again. Hence
we cannot take the initially observed short run correlation as an indication of long run
d) Practical problems
So far I have assumed that there have been no problems with respect to the data. In the
real world this is a most serious problem of correlation analysis.
13. Few observations
Often we do not have enough data to reach reliable conclusions. Certain events only occur
once or only a few times. There was only one Russian revolution (though there might be
other revolutions that might be comparable), The industrial age did not rise many times
under slightly different conditions. Even if I have data on discrimination in 20
countries, I am no closer to a reliable explanation if I have 19 explanatory variables.
14. Not enough variation
To avoid problems of multicorrelation among the explanatory variables we would like our
data to be spread out. Unfortunately this is often not possible. This means that it is
difficult to determine the relative importance of the various variables (since there is
not enough variation to distinguish how important they are relative to each other).
15. Too much noise
We may believe that two variables are causally related, but unable to prove this because
there is so many other variables interfering thus making it impossible to determine
whether the variation is caused by the variable we think or noise from the other
16. Flawed assumptions about the underlying distributions
When we use the correlation coefficient we usually assume that the underlying populations
are normally distributed. However, if we assume that the distribution of one variable is
highly skewed, and the other is highly skewed in the other direction, the highest possible
correlation coefficient is 0.6. In this case 0.6, and not 1 as usual, indicates the
strongest possible relationship. If we are unaware of the underlying distribution we might
therefore discard a correlation of 0.6 as not very strong, while in fact the opposite is
17. Measurement errors
Usually there are some unknown degree of measurement error involved in the data, if
nothing else we might press the wrong keys when we enter the numbers into the computer.
The problem becomes serious when there are systematic measurement errors - such as when
old data is less reliable than new data. The correlation coefficient may then not reflect
the true correlation, but simply the measurement error.
18. Non-quantifiable factors
Some factors are notoriously difficult to quantify and some are inherently so. For
example, cultural factors such as "commercial talent" seems to be difficult to
measure isolated. Another example could be "inferiority complex" as a cultural
trait. A third example could be discrimination: How do we put a number on how much
discrimination there is in a society?11 These problems are serious
because it means that we often leave them out of our analysis despite their undeniable
significance, simply because they do not fit our frame of correlation analysis.
The sum of these problems
What are the implications of the problems mentioned so far? First of all I do not advocate
the abandonment of statistical analysis even if there are problems. Many of the problems
can be reduced. We can develop measures to examine non-linear correlation, tests for the
existence of lags, tests for the existence of third variable effects and the data may be
manipulated to remove some spurious correlation (by differencing the data). We may also
consider second best strategies that account for the weaknesses described so far. Hendry's
"test, test, test" strategy (also called the general-to-simple methodology) may
be viewed as such a strategy as opposed to the simple-to-general methodology commonly
Yet, sometimes the limitations may be too great to be remedied. The question is then
whether we have any alternative strategies for determining the probability of statements.
An alternative strategy: Reflective estimation
Assume you want to examine to which degree the laws in a country are unjust defining
injustice are differential treatment according to morally irrelevant criteria such as
gender, race and status. One strategy could be to simply collect laws to see how many of
them are discriminatory. Based on this one might arrive at an estimate the amount of
injustice in the legal system.
An alternative approach might be to start with the assumption that people are in general
selfish. We may then deduce the consequences of this selfishness with respect to the laws
given the political system. For example, we did not need to examine the laws of Apartheid
to have confidence in the belief that it discriminated against the blacks. A political
system in which blacks did not have the right to vote and the belief that people are
selfish, are enough to give some degree of reliability to the belief that the blacks were
These two strategies may be combined in what I call reflective estimation. First I may
arrive at a probability estimate based on deduction from a few basic beliefs. These basic
beliefs may be based on correlation (i.e. induction) which are reliable in the sense that
they do not suffer from many of the weaknesses identified above. Having arrived at an
estimate deductively I may try to estimate the probability inductively i.e. by gathering
statistics and finding the correlation coefficients. I now have two estimates of the
amount of injustice in a legal system. One arrived at by deduction, one by induction. I
would then suggest a compromise between the two. The relative weight attached to each
would depend upon how many of the problems listed above we suspect the correlation might
be suffering from.
Another concrete example may help. Assume I want to examine the causes of successful
revolution. One approach would be to start with a few basic facts about peoples desires
and beliefs and combine this with the existing political system to estimate the likely
causes deductively. Another approach would be to gather aggregated statistical data
(strikes, food consumption, industrialisation, urbanisation) and examine which is
correlated with successful revolutions. If the two estimates diverge I must make a
compromise between them according to how likely I believe they are.
A third example may be needed, at least to clarify the though in my mind. G.A. Cohen
writes that the difference between him and Elster is exemplified by the following example 13 : Imagine a man has died after a dinner party. You suspect the cause
was food-poisoning. How do you examine the probability of this claim. Cohen suggests that
it is good enough to examine the other participants at the dinner. If those who ate the
same food that the dead man ate also died, we are quite sure the cause was food poisoning.
However, Cohen claims, Elster would use a different approach to assess the probability of
food poisoning. He would examine the man in a hospital to see whether it actually was the
food that killed the man. I suggest doing both and then if the results diverge go over the
evidence once more. (I have to admit that this is not a very original or very
Why should this method work? Both the process of induction and deduction involves
uncertainties. Induction, which I have associated with correlation, suffers potentially
from the problems identified above. Deduction suffers from the fact that the deductive
process may be unreliable (we might ignore some effects) and that the initial starting
point is sometimes unreliable. For example, economists often start by assuming selfish and
rational individuals. Sociologists then complain that this assumption is not always
correct and hence it leads economists to the wrong conclusions. The point in the above
method is that divergent estimates may suggest that there is a problem which in turn means
that we examine both processes to see whether we might have missed something. In this way
we arrive at what I think is the best estimate of the probability of a statement.
1 Cohen (1982a), p. 490 and Cohen (1982b), p. 53
2 Elster (1989), p. 6
3 "Landås saken"
4 I am somewhat sceptical of the stork/babies example because there is a time-lag between
sex and having babies which may destroy the contamperaneous correlation.
5 Giddens (199?), p. 680
6 Elster (1989), p. 6
7 Wonnacott and Wonnacott (1979), p. 96
8 Baker G. L. and Gollub J. P. (1990), p. 77
9 It is true that exaplanation by reference to chaotic processes have not been very
successful so far. Maybe they never will be, but the point that one should look for
processes that create patterns that are not simply linear or standard non-linear remains -
even if one such attempt was (initially) unsuccesful.
10 The example is from Elster (1993), p. 104, who in turn is inspired by Tocqueville's
discussion in Democracy in America
11 For more on this see Blalock (1984), p. 49-56
12 See Gilbert (1986) for a discussion of this
13 Cohen (1982a), p. 491
What is correlation?
- A short review
(If you are new to the subject you are better off reading a textbook, such as Wonnacott
and Wonnacott's "Econometrics" p. 150-)
The usual definition of correlation is "a measure of the strength and direction of a
linear relationship." Depending on the nature of the data (cardinal, ordinal) there
are many different measures of correlation. In the following I shall show how the most
common measure - the Pearson's correlation coefficient - is derived.
How do we measure the strength and direction of the relationship between two variables?
The first idea that comes to mind is to use the sum of the product of the deviations. The
larger the sum is, the stronger is the positive relationship. However, the number may also
become larger when we simply add new observations. To adjust for this we divide the sum by
the number of observations. However, there is still one problem because the measure
changes when we change the scale of measurement. For example, we may give a class of 30
students two tests - one in mathematics and another in verbal skills. We may then mark the
scores of both tests on a scale from 0 to 50. Having done that we try to find the sum of
the product of the deviations divided by the number of observations. This is a measure of
the strength of the correlation. However, if we had marked the scores on a scale from 0 to
100, the measured strength of the relationship would change. To avoid this we standardise
the deviations by dividing them by their average deviation (i.e. the standard deviation of
the sample). After these adjustments the sum of the product of the deviations is a good
measure of correlation.
More formally we have the following:
We have two data series: X and Y (X may be the scores on a test in mathematics, Y a test
of the scores on a verbal test). We also have n number of observations (for example 30 if
the tests were given to a class of thirty persons).
The deviation of one observations is then: (this particular X - average X) [often called
little x] The product of the deviations is: (this X - average X) * (this Y - average Y) [
i.e. xy] If the people who score well on the math test also score well on the verbal test,
this is a high, positive number. We add the product of the deviations for all the
observations: Sigma (xy) We divide the resulting number by the number of observations:
Sigma (xy)/n However, to adjust for differences in scale, we use the standardised
deviations: (x) / (he standard deviation of X)
We now have:
r = 1/n sigma ( (x / s.d. X) * (y / s.d. Y) )
This is the sample correlation coefficient. It is a measure that varies between -1 and +1
depending on the strength and direction of the measured linear relationship. A strong
negative relationship should come out as close to -1 on our scale, no relationship should
give a measure close to zero, and a strong positive relationship should approach +1.
Note: To adjust for the loss of one degree of freedom one could use n-1 instead of n.
Press "Back" to go back to the main text.
Baker G. L. and Gollub J. P (1990), Chaotic dynamics: An introduction, Cambridge:
Cambridge University Press
Blalock, Hubert M. Jr. (1984), Basic Dilemmas in the Social Sciences, Beverly
Hills: Sage Publications
Cohen, G. A. (1980), Functional explanation: Reply to Elster, Political Studies 28:
Cohen, G. A. (1982a), Functional explanation, Consequence explanation, and Marxismism, Inquiry
Cohen, G. A. (1982b), Reply to Elster on "Marxism, Functionalism and Game
Theory" , Theory and Society 11: 483-495
Darnell, A. C. and Evans J. L. (1990), The Limits of Econometrics, Aldershot:
Elster, Jon (1980), Cohen on Marx's theory of history, Political Studies 28:
Elster, Jon (1982), Marxism, functionalism and game theory, Theory and Society 11:
Elster, Jon (1983), Reply to comments (on "Marxism, functionalism and game
theory"), Theory and Society 12: 111-120
Elster, Jon (1983), Explaining Technical Change, Cambridge: Cambridge University
Elster, Jon (1986, Further thoughts on Marxism, functionalism and game theory, in J.
Roemer (1986), ed., Analytical Marxism, Cambridge: Cambridge University Press
Elster, Jon (1986), An Introduction to Karl Marx, Cambridge: Cambridge University
Elster, Jon (1987), The possibility of rational politics, European Journal of Sociology
(Archives Europennes Sociologique) 28: 67-103
Elster, Jon (1989), Nuts and Bolts for the Social Sciences, Cambridge: Cambridge
Elster, Jon (1993), Political Psychology, Cambridge: Cambridge University Press
Giddens, Anthony (199?), Sociology
Gilbert, Christopher L. (1986), Professor Hendry's Econometric Methodology, Oxford
Bulletin of economics and statistics 48(3): 283-307
Kline, Paul (1981), No Smoking, London Review of Books, 19. Feb.- 4 March 1981,
Koutsoyiannis, A. (1977), Theory of Econometrics, London: Macmillan (second
edition, first edition: 1973)
Ovind, Jan (1996), Hold deg ung med god sex, Verdens Gang 8. januar 1996: 22 (An
article in a Norwegian newspaper)
Simon, Herbert A. (1971), Spurious Correlation: A Causal Interpretation, in H. M. Blalock
(1971), Causal Models in the Social Sciences, Macmillan (Reprint from Journal of
the American Statistical Association, 1954)
Wonnacott R. J. and T. H. (1979), Econometrics, New York: Wiley (2nd ed., first:
[Note for bibliographic reference: Melberg, Hans O. (1996), Against correlation, http://www.oocities.com/hmelberg/papers/960415.htm]