Principle Foundations Home Page
|
|||||
Goodness of Fit (R2) and Correlation-Coefficient (r) |
|||||
The closer the observations fall to the regression line (ie the smaller
the residuals), the greater is the variation in Y "explained" by
the estimated regression equation. The total variation in Y is equal to
explained plus the residual variation:
=
+
Total variation
Explained variation
Residual variation in Y (or total
in Y (or regression
in Y (or error sum sum of squares) sum
of squares)
of squares) TSS
=
RSS
+ ESS Dividing both
sides by TSS gives
The
coefficient of determination, or R2, is then defined as the
proportion of the total variation in Y "explained" by the
regression of Y on X:
R2
can be calculated by
where
R2
ranges in value from 0 (when the estimated regression equation explains
none of the variation in Y) to 1 (when all points lie on the regression
line). Thus, R2
is unit -free and
because
. R2=0 when for example all sample points lie on a horizontal
line
or on a circle. R2=1 when all sample points lie on the
estimated regression line, indicating a perfect fit. However, whereas a
correlation coefficient measures only (linear) association, R2
measures linear dependence.
Example When
for instance we have estimated the value of R2, and we have
found that R2= 0.9710 or 97.10% we say that: The
regression equation explains about 97% of the total variation in Y (eg.
corn output). The remaining 3% is attributed to factors included in the
error term.
The correlation-coefficient, r, measures the degree of association between two or more variables. In the two-variable case, the simple linear correlation coefficient for a set of sample observations is given by
Its value
varies form -1 to +1, ie
. Where r<0 means that X and Y move in opposite directions, such as for
example, the quantity demanded for a commodity and its price.
r>0 indicates that X and Y change in the same direction, such as
the quantity supplied of a commodity as its price. r = -1 refers to a
perfect negative correlation (ie all the sample observations lie on a
straight line of negative slope). However, r = 1 refers to perfect
positive correlation (ie all the sample observations lie on a straight
line of positive slope).
is seldom found. The closer r is to
. The greater is the degree of positive or negative linear relationship.
It should be noted that the sign of r is always the same as that of
. A zero correlation coefficient means that there exists no linear
relationship whatsoever between X and Y (ie they tend to change with no
connection with each other). For example, if the sample observations fall
exactly on a circle, there is a perfect non-linear relationship but a zero
linear relationship and r = 0. Regression analysis implies (but does not
prove) causality between the independent variable, X and dependent
variable, Y. however, correlation analysis implies no causality or
dependence but refers simply to the type and degree of association between
two variables. For example, X and Y may be highly correlated because of
another variable that strongly affects both. Thus, correlation analysis is
a much less powerful tool than regression analysis and is seldom used by
itself in the real world. In fact, the main use of correlation analysis is
to determine the degree of association found in regression analysis. This
is given by the coefficient of determination, which is the square of the
correlation coefficient.
Copyright
© 2002
Back
to top Evgenia Vogiatzi <<Previous Next>> |