Comparing Multinomial Proportions: Two Way Table

 

Problem

 

According to research reported in the Journal of the National Cancer Institute (Apr. 1991), eating foods high in fiber may help protect against breast cancer. The researchers randomly divided 120 laboratory rats into four groups of 30 each. All rats were injected with a drug that causes breast cancer; then each rat was fed a diet of fat and fiber for 15 weeks. However, the levels of fat and fiber varied from group to group. At the end of the feeding period, the number of rats with cancer tumors was determined for each group. The data are summarized in the contingency table.

 

Contingency Table

 

Cancer Tumors

High Fat / No Fiber

High Fat / Fiber

Low Fat / No Fiber

Low Fat / Fiber

Total

Yes

27

20

19

14

80

No

3

10

11

16

40

Totals

30

30

30

30

120

 

Question

 

  1. Calculate the expected cell counts for the contingency table.
  2. Calculate the X2 statistic.
  3. Is there evidence to indicate that diet and presence/absence of cancer are independent? Test using a = .05.
  4. Compare the percentage of rats on high fat/no fiber diet with cancer to the percentage of rats on a high fat/fiber diet with cancer using 95% confidence interval. Interpret the result.

 

 

Solution

 

The following SAS program generates all statistics need for solving this problem.

 

*---Create SAS data set in the contingency table ;

data cancer;

  input tumor $ diet $ count;

  cards;

YES HF_NF 27

YES HF_F  20

YES LF_NF 19

YES LF_F  14

NO  HF_NF  3

NO  HF_F  10

NO  LF_NF 11

NO  LF_F  16

;

run;

 

proc freq data=cancer;

  tables tumor*diet / expected chisq ;

  weight count;

run;

 

Note:

  • The FREQ procedure generates a frequency (or contingency) table for the data.
  • The TABLES statement defines two classification variables for the contingency table. Variable names are separated by ‘*’. The options EXPECTED and CHISQ request that expected cell frequencies and the X2 statistic for the contingency table be printed.
  • The WEIGHT statement defines the weighting variable of the contingency table. This statement is necessary when the cell counts is known.

 

The output from the above SAS program is shown below:

 

 

                         The FREQ Procedure

 

                       Table of tumor by diet

 

       tumor     diet

 

       Frequency‚

       Expected ‚

       Percent  ‚

       Row Pct  ‚

       Col Pct  ‚HF_F    ‚HF_NF   ‚LF_F    ‚LF_NF   ‚  Total

       ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

       NO       ‚     10 ‚      3 ‚     16 ‚     11 ‚     40

                ‚     10 ‚     10 ‚     10 ‚     10 ‚

                ‚   8.33 ‚   2.50 ‚  13.33 ‚   9.17 ‚  33.33

                ‚  25.00 ‚   7.50 ‚  40.00 ‚  27.50 ‚

                ‚  33.33 ‚  10.00 ‚  53.33 ‚  36.67 ‚

       ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

       YES      ‚     20 ‚     27 ‚     14 ‚     19 ‚     80

                ‚     20 ‚     20 ‚     20 ‚     20 ‚

                ‚  16.67 ‚  22.50 ‚  11.67 ‚  15.83 ‚  66.67

                ‚  25.00 ‚  33.75 ‚  17.50 ‚  23.75 ‚

                ‚  66.67 ‚  90.00 ‚  46.67 ‚  63.33 ‚

       ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

       Total          30       30       30       30      120

                   25.00    25.00    25.00    25.00   100.00

 

 

               Statistics for Table of tumor by diet

 

       Statistic                     DF       Value      Prob

       ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

       Chi-Square                     3     12.9000    0.0049

       Likelihood Ratio Chi-Square    3     14.1827    0.0027

       Mantel-Haenszel Chi-Square     1      1.9040    0.1676

       Phi Coefficient                       0.3279

       Contingency Coefficient               0.3116

       Cramer's V                            0.3279

 

                         Sample Size = 120

 

Answer:

 

  1. The yellow color high-lighted row in the SAS output are the expected cell counts for the contingency table.

 

Cancer Tumors

High Fat / No Fiber

High Fat / Fiber

Low Fat / No Fiber

Low Fat / Fiber

Yes

20

20

20

20

No

10

10

10

10

 

  1. The blue color high-lighted row in the SAS output generates X2 statistic. From the output, X2 = 12.9000.
  2. The rejection region for the test is X2 > X2(0.05). The approximate degrees of freedom for contingency table analysis will always be (r-1)(c-1), where r is the number of rows and c is the number of columns in the table. For this data, we have r=2, c=4; hence the appropriate number of degrees of freedom for X2 is df  =  (r-1)(c-1)  =  (2-1)(4-1) = 3.  This is also shown in the blue color high-lighted row in the SAS output. The p value in the SAS output is 0.0049. Since this value is less than 0.05. We reject the H0 that diet and presence/absence of cancer are independent.
  3. We have

       p1 (the percentage of rats on high fat/no fiber diet with cancer) = 27/30 = 0.9

       p2 (the percentage of rats on a high fat/fiber diet with cancer) = 20/30 = 0.6667

       Hence

       95% confidence interval = (p1-p2) ± 1.96sqrt*(p1(1-p1)/n1+ p2(1-p2)/n2)

                                            = (0.9-0.667) ± 1.96*sqrt((0.9*0.1)/30+(0.667*0.333)/30)

                                            =  0.233 ± 0.2