Describing Quantitative Data (2)

 

Problem

 

An automated system for marking large numbers of student computer programs, called AUTOMARK, has been used successfully at McMaster University in Ontario, Canada. AUTOMARK takes into account both program correctness and program style when marking student assignments. AUTOMARK was used to grade the FORTRAN77 assignments of a class of 33 students. To evaluate the effectiveness of the automated system, these grades were compared to the grades assigned by the instructor. The results are shown in the table.

 

AUTOMARK

GRADE x

INSTRUCTOR

GRADE y

AUTOMARK

GRADE x

INSTRUCTOR

GRADE y

AUTOMARK

GRADE x

INSTRUCTOR

GRADE y

12.2

10

18.2

15

19.3

17

10.6

11

15.1

16

19.5

17

15.1

12

17.2

16

19.7

17

16.2

12

17.5

16

18.6

18

16.6

12

18.6

16

19

18

16.6

13

18.8

16

19.2

18

17.2

14

17.8

17

19.4

18

17.6

14

18

17

19.6

18

18.2

14

18.2

17

20.1

18

16.5

15

18.4

17

19.2

19

17.2

15

18.6

17

19.3

17

12.2

10

19

17

19.5

17

 

Question

 

  1. Construct a scattergram for the data. After examining the scattergram, do you think that x and y are correlated?
  2. Find the correlation coefficient r and interpret its value.

 

 

Answer Using SAS

 

 

*--- SAS program: DESCRIBING_QUANTITATIVE_DATA_2.SAS ;

 

options nodate pageno=1;

 

*---Create SAS data set;

data automark;

  input automark_grade instructor_grade @@;

  cards;

12.2  10    18.2  15    19.3  17

10.6  11    15.1  16    19.5  17

15.1  12    17.2  16    19.7  17

16.2  12    17.5  16    18.6  18

16.6  12    18.6  16    19        18

16.6  13    18.8  16    19.2  18

17.2  14    17.8  17    19.4  18

17.6  14    18        17      19.6    18

18.2  14    18.2  17    20.1  18

16.5  15    18.4  17    19.2  19

17.2  15    18.6  17    19.3  17

12.2  10    19        17      19.5  17

;

run;

 

*---Run PROC UNIVARIATE on automark_grade*instructor_grade;

proc univariate data=automark;

  title 'Univariate Descriptive Statistics on automark_grade and instructor_grade';

  var automark_grade instructor_grade;

run;

 

SAS Output (description of variables of interest)

 

     Univariate Descriptive Statistics on automark_grade and instructor_grade

 

                             The UNIVARIATE Procedure

                            Variable:  automark_grade

 

                                     Moments

 

         N                          36    Sum Weights                 36

         Mean               17.6111111    Sum Observations           634

         Std Deviation      2.21537198    Variance            4.90787302

         Skewness           -1.7230407    Kurtosis            2.87688939

         Uncorrected SS       11337.22    Corrected SS        171.775556

         Coeff Variation    12.5793993    Std Error Mean      0.36922866

 

 

                            Basic Statistical Measures

 

                  Location                    Variability

 

              Mean     17.61111     Std Deviation            2.21537

              Median   18.20000     Variance                 4.90787

              Mode     17.20000     Range                    9.50000

                                    Interquartile Range      2.30000

 

      NOTE: The mode displayed is the smallest of 3 modes with a count of 3.

 

 

                            Tests for Location: Mu0=0

 

                 Test           -Statistic-    -----p Value------

 

                 Student's t    t  47.69703    Pr > |t|    <.0001

                 Sign           M        18    Pr >= |M|   <.0001

                 Signed Rank    S       333    Pr >= |S|   <.0001

 

 

                             Quantiles (Definition 5)

 

                              Quantile      Estimate

 

                              100% Max          20.1

                              99%               20.1

                              95%               19.7

                              90%               19.5

                              75% Q3            19.2

                              50% Median        18.2

 

 

 

 

     Univariate Descriptive Statistics on automark_grade and instructor_grade

 

                             The UNIVARIATE Procedure

                            Variable:  automark_grade

 

                                     Moments

 

         N                          36    Sum Weights                 36

         Mean               17.6111111    Sum Observations           634

         Std Deviation      2.21537198    Variance            4.90787302

         Skewness           -1.7230407    Kurtosis            2.87688939

         Uncorrected SS       11337.22    Corrected SS        171.775556

         Coeff Variation    12.5793993    Std Error Mean      0.36922866

 

 

                            Basic Statistical Measures

 

                  Location                    Variability

 

              Mean     17.61111     Std Deviation            2.21537

              Median   18.20000     Variance                 4.90787

              Mode     17.20000     Range                    9.50000

                                    Interquartile Range      2.30000

 

      NOTE: The mode displayed is the smallest of 3 modes with a count of 3.

 

 

                            Tests for Location: Mu0=0

 

                 Test           -Statistic-    -----p Value------

 

                 Student's t    t  47.69703    Pr > |t|    <.0001

                 Sign           M        18    Pr >= |M|   <.0001

                 Signed Rank    S       333    Pr >= |S|   <.0001

 

 

                             Quantiles (Definition 5)

 

                              Quantile      Estimate

 

                              100% Max          20.1

                              99%               20.1

                              95%               19.7

                              90%               19.5

                              75% Q3            19.2

                              50% Median        18.2

 

 

   Univariate Descriptive Statistics on automark_grade and instructor_grade

 

                           The UNIVARIATE Procedure

                         Variable:  instructor_grade

 

                                   Moments

 

       N                          36    Sum Weights                 36

       Mean               15.5833333    Sum Observations           561

       Std Deviation      2.43046145    Variance            5.90714286

       Skewness           -0.9607591    Kurtosis            -0.0398179

       Uncorrected SS           8949    Corrected SS            206.75

       Coeff Variation    15.5965441    Std Error Mean      0.40507691

 

 

                          Basic Statistical Measures

 

                Location                    Variability

 

            Mean     15.58333     Std Deviation            2.43046

            Median   16.50000     Variance                 5.90714

            Mode     17.00000     Range                    9.00000

                                  Interquartile Range      3.00000

 

 

                          Tests for Location: Mu0=0

 

               Test           -Statistic-    -----p Value------

 

               Student's t    t  38.47006    Pr > |t|    <.0001

               Sign           M        18    Pr >= |M|   <.0001

               Signed Rank    S       333    Pr >= |S|   <.0001

 

 

                           Quantiles (Definition 5)

 

                            Quantile      Estimate

 

                            100% Max          19.0

                            99%               19.0

                            95%               18.0

                            90%               18.0

                            75% Q3            17.0

                            50% Median        16.5

                            25% Q1            14.0

                            10%               12.0

 

 

                      The UNIVARIATE Procedure

                    Variable:  instructor_grade

 

                      Quantiles (Definition 5)

 

                       Quantile      Estimate

 

                       5%                10.0

                       1%                10.0

                       0% Min            10.0

 

 

                        Extreme Observations

 

                ----Lowest----        ----Highest---

 

                Value      Obs        Value      Obs

 

                   10       34           18       18

                   10        1           18       21

                   11        4           18       24

                   12       13           18       27

                   12       10           19       30

 

 

 

*---Construct a scattergram using PROC PLOT;

proc plot data=automark;

  title 'Scattergram';

  plot automark_grade*instructor_grade;

quit;

 

 

SAS Output (appear to be positive correlation between the variables)

 

                                                                  Scattergram

 

                                  Plot of automark_grade*instructor_grade.  Legend: A = 1 obs, B = 2 obs, etc.

 

            20 ˆ                                                                                                                A

                                                                                                                 A             A

                                                                                                                 D             A

                                                                                                                 A             B             A

                                                                                                   A

                                                                                                   A             A             A

                                                                       A             A                           B

            18 ˆ                                                                                                  A

                                                                       A                                         A

                                                                                                   A

automark_grade                                                        A             A             A

              

                                           A             A                           A

                                           A

            16 ˆ

              

              

                                           A                                                       A

              

              

              

            14 ˆ

              

              

              

              

              

               ‚B

            12 ˆ

              

              

               

              

                             A

              

            10 ˆ

               Šˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆ

               10            11            12            13            14            15            16            17            18            19

 

                                                                        instructor_grade

 

 

 

*---Run PROC CORR to examine the correlation between automark_grade and instructor_grade;

proc corr data=automark;

  title 'Correlation between automark_grade and instructor_grade';

  var automark_grade instructor_grade;

run;

 

 

SAS Output (correlation coefficient = 0.86)

 

                            Correlation between automark_grade and instructor_grade

 

                                               The CORR Procedure

 

                                2  Variables:    automark_grade   instructor_grade

 

 

                                               Simple Statistics

 

       Variable                   N          Mean       Std Dev           Sum       Minimum       Maximum

 

       automark_grade            36      17.61111       2.21537     634.00000      10.60000      20.10000

       instructor_grade          36      15.58333       2.43046     561.00000      10.00000      19.00000

 

 

                                   Pearson Correlation Coefficients, N = 36

                                           Prob > |r| under H0: Rho=0

 

                                                      automark_      instructor_

                                                          grade            grade

 

                                automark_grade          1.00000          0.86051

                                                                          <.0001

 

                                instructor_grade        0.86051          1.00000

                                                         <.0001