Goodness of Fit (R2) and Correlation-Coefficient (r)
 Principle Foundations      Home Page Goodness of Fit (R2) and  Correlation-Coefficient (r) The closer the observations fall to the regression line (ie the smaller the residuals), the greater is the variation in Y "explained" by the estimated regression equation. The total variation in Y is equal to explained plus the residual variation:                            =       +      Total variation                Explained variation         Residual variation in Y (or total                  in Y (or regression          in Y (or error sum sum of squares)             sum of squares)             of squares) TSS                    =        RSS                  +         ESS   Dividing both sides by TSS gives The coefficient of determination, or R2, is then defined as the proportion of the total variation in Y "explained" by the regression of Y on X: R2 can be calculated by                                             where   R2 ranges in value from 0 (when the estimated regression equation explains none of the variation in Y) to 1 (when all points lie on the regression line). Thus, R2 is unit -free and because . R2=0 when for example all sample points lie on a horizontal line or on a circle. R2=1 when all sample points lie on the estimated regression line, indicating a perfect fit. However, whereas a correlation coefficient measures only (linear) association, R2 measures linear dependence.     Example  When for instance we have estimated the value of R2, and we have found that R2= 0.9710 or 97.10% we say that:   The regression equation explains about 97% of the total variation in Y (eg. corn output). The remaining 3% is attributed to factors included in the error term.    Correlation -coefficient (r) The correlation-coefficient, r, measures the degree of association between two or more variables. In the two-variable case, the simple linear correlation coefficient for a set of sample observations is given by Its value varies form -1 to +1, ie . Where r<0 means that X and Y move in opposite directions, such as for example, the quantity demanded for a commodity and its price.  r>0 indicates that X and Y change in the same direction, such as the quantity supplied of a commodity as its price. r = -1 refers to a perfect negative correlation (ie all the sample observations lie on a straight line of negative slope). However, r = 1 refers to perfect positive correlation (ie all the sample observations lie on a straight line of positive slope). is seldom found. The closer r is to . The greater is the degree of positive or negative linear relationship. It should be noted that the sign of r is always the same as that of . A zero correlation coefficient means that there exists no linear relationship whatsoever between X and Y (ie they tend to change with no connection with each other). For example, if the sample observations fall exactly on a circle, there is a perfect non-linear relationship but a zero linear relationship and r = 0. Regression analysis implies (but does not prove) causality between the independent variable, X and dependent variable, Y. however, correlation analysis implies no causality or dependence but refers simply to the type and degree of association between two variables. For example, X and Y may be highly correlated because of another variable that strongly affects both. Thus, correlation analysis is a much less powerful tool than regression analysis and is seldom used by itself in the real world. In fact, the main use of correlation analysis is to determine the degree of association found in regression analysis. This is given by the coefficient of determination, which is the square of the correlation coefficient. Copyright © 2002 Evgenia Vogiatzi