Tests to Compare k Treatment and b Block Means
for a Randomized Design
Problem
A supermarket advertisement in the
|
Grocery Item |
Albertson’s |
Kash’n Karry |
Publix |
Food 4 Less |
|
Cheerios Cereal |
1.1 |
1.18 |
1.39 |
1.18 |
|
Jell-O
geletin |
0.24 |
0.24 |
0.31 |
0.26 |
|
Dial
soap |
0.52 |
0.6 |
0.63 |
0.55 |
|
Crisco
oil |
1.26 |
1.7 |
2.27 |
1.29 |
|
Kleenex |
0.67 |
0.7 |
0.79 |
0.7 |
|
Star-Kist_tuna |
0.63 |
0.66 |
0.79 |
0.63 |
|
Del
Monte peas |
0.43 |
0.47 |
0.65 |
0.47 |
|
Cheerios Cereal |
1.1 |
1.18 |
1.39 |
1.18 |
Solution
We need to conduct an analysis of variance for a randomized block
design. The columns of above table correspond to k=4 treatments (supermarkets) and the rows corresponds to b=7 blocks (grocery items), each
consists of 4 observations. The observations within a block are matched because
all process within a block are for the same item on
the same day. (A randomized block design is necessary to ensure that the same
items are compared at the four supermarkets.)
Since the supermarkets represent the treatments, we want to test
H0: μ1 = μ2 = μ3 = μ4
Ha: At least two of the treatment
means differ
Where μ1 = mean price charged at Albertson’s,
μ2 = mean price at Kash’n
Karry, μ3 = mean price at Publix, and μ4 = mean price at Food 4 Less.
The SAS program and Output are shown below.
The test statistic, F= MST/MSE, is found by substituting the values of
MST = 0.1117 and MSE = .0246 obtained from the SAS output:
F = MST/MSE = 0.1117/0.0246 = 4.540
The F statistic will have the numerator degrees of freedom (k-1) = 3 (df for MST) and denominator
degrees of freedom (n-b-k+1) = 18 (df for MSE). The
tabulated value of F0.05 with 3 and 18 df is 3.16. Therefore, we will reject H0 if the
calculated value of F is F > 3.16. Since the computed value of the test
statistic, F = 4.54, exeeds 3.16, we have sufficient
evidence to reject H0 at a=.05.
There appear to be significant difference among the mean prices of grocery
items at the four supermarkets.
F statistic for testing block means is F = MSB/MSE. Substituting the
values of MSB and MSE found in the SAS output, we have
F = MSB/MSE = 0.8718/0.0246 = 35.40
The F statistic will have numerator degrees of freedom (b-1) = 6, and
the denominator degrees of freedom will be the df associated with MSE – namely, 18. Therefore, the
rejection region for the test is
Reject H0 if F > F0.05
= 2.66
Since the F value of 35.40 is falls well within the rejection region,
there is sufficient evidence at a =
0.05 to conclude that the block (item) means differ. It appears that blocking
was effective in removing the item-to-item variation in prices.
SAS program: Randomized_Block.SAS
options pageno=1;
*---Readin data to SAS;
data grocery;
input @1 item $1-15 @;
do market="ALBERTSON'S","KASH'N
KARRY","PUBLIX","FOOD 4 LESS";
input
price @;
output;
end;
cards;
Cheerios_Cereal 1.1 1.18 1.39
1.18
Jell-O_geletin .24 .24 .31 .26
Dial_soap .52 .6
.63 .55
Crisco_oil 1.26 1.7
2.27 1.29
Kleenex
.67 .7 .79 .7
Star-Kist_tuna .63 .66 .79 .63
Del_Monte_peas .43 .47 .65
.47
;
run;
proc print data=grocery;
title2 "Supermarket Survey Results";
run;
proc anova data=grocery;
title2 "Analysis of Variance";
class market item;
model price=market item;
means market/bon;
quit;
Notes
SAS Output
Supermarket Survey Results
Obs
item market price
1 Cheerios_Cereal ALBERTSON'S 1.10
2 Cheerios_Cereal KASH'N KARR 1.18
3 Cheerios_Cereal PUBLIX 1.39
4 Cheerios_Cereal FOOD 4 LESS 1.18
5 Jell-O_geletin ALBERTSON'S 0.24
6 Jell-O_geletin KASH'N KARR 0.24
7 Jell-O_geletin PUBLIX 0.31
8 Jell-O_geletin FOOD 4 LESS 0.26
9 Dial_soap ALBERTSON'S 0.52
10 Dial_soap KASH'N KARR 0.60
11 Dial_soap PUBLIX 0.63
12 Dial_soap FOOD 4 LESS 0.55
13 Crisco_oil ALBERTSON'S 1.26
14 Crisco_oil KASH'N KARR 1.70
15 Crisco_oil PUBLIX 2.27
16 Crisco_oil FOOD 4 LESS 1.29
17 Kleenex ALBERTSON'S 0.67
18 Kleenex KASH'N KARR 0.70
19 Kleenex PUBLIX 0.79
20 Kleenex FOOD 4 LESS 0.70
21 Star-Kist_tuna ALBERTSON'S 0.63
22 Star-Kist_tuna KASH'N KARR 0.66
23 Star-Kist_tuna PUBLIX 0.79
24 Star-Kist_tuna FOOD 4 LESS 0.63
25 Del_Monte_peas ALBERTSON'S 0.43
26 Del_Monte_peas KASH'N KARR 0.47
27 Del_Monte_peas PUBLIX 0.65
28 Del_Monte_peas FOOD 4 LESS 0.47
Analysis of Variance
The ANOVA Procedure
Class
Level Information
Class Levels Values
market 4 ALBERTSON'S FOOD 4 LESS KASH'N KARR PUBLIX
item 7 Cheerios_Cereal Crisco_oil Del_Monte_peas Dial_soap Jell-O_geletin Kleenex
Star-Kist_tuna
Number of observations 28
Analysis of Variance
The
ANOVA Procedure
Dependent
Variable: price
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 9 5.56626786 0.61847421 25.11
<.0001
Error 18 0.44334286 0.02463016
Corrected Total 27 6.00961071
R-Square Coeff
Var Root
MSE price Mean
0.926228 19.69664 0.156940
0.796786
Source
market
3 0.33518214 0.11172738 4.54
0.0155
item 6 5.23108571 0.87184762 35.40
<.0001
Analysis of Variance
The ANOVA Procedure
Bonferroni (Dunn) t Tests for price
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher
Type II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom
18
Error Mean Square 0.02463
Critical Value of t
2.96273
Minimum Significant Difference
0.2485
Means with the same letter are not significantly different.
Bon Grouping Mean N
market
A 0.97571 7
PUBLIX
A
B A 0.79286 7
KASH'N KARR
B
B 0.72571 7
FOOD 4 LESS
B
B 0.69286 7
ALBERTSON'S
Overview of PROC ANOVA
The ANOVA procedure performs analysis of variance (ANOVA) for
balanced data from a wide variety of experimental designs. In analysis of
variance, a continuous response variable, known as a dependent variable,
is measured under experimental conditions identified by
classification variables, known as independent variables. The variation
in the response is assumed to be due to effects in the classification, with
random error accounting for the remaining variation.
The ANOVA procedure is designed to
handle balanced data (that is,
data with equal numbers of observations for every combination of the
classification factors), whereas the GLM procedure can analyze both balanced
and unbalanced data. Because PROC ANOVA takes into account the special
structure of a balanced design, it is faster and uses less storage than PROC
GLM for balanced data.
Use PROC ANOVA for the analysis of
balanced data only, with the following exceptions: one-way
analysis of variance, Latin square designs, certain partially balanced
incomplete block designs, completely nested (hierarchical) designs, and designs
with cell frequencies that are proportional to each other and are also
proportional to the background population. These exceptions have designs in
which the factors are all orthogonal to each other. PROC ANOVA works for
designs with block diagonal X'X matrices where the elements of each block all have the same
value. The procedure partially tests this requirement by checking for equal
cell means. However, this test is imperfect: some designs that cannot be
analyzed correctly may pass the test, and designs that can be analyzed
correctly may not pass. If your design does not pass the test, PROC ANOVA
produces a warning message to tell you that the design is unbalanced and that
the ANOVA analyses may not be valid; if your design is not one of the special
cases described here, then you should use PROC GLM instead. Complete validation
of designs is not performed in PROC ANOVA since this would require the whole X'X
matrix; if you're unsure about the validity of PROC ANOVA for your design, you
should use PROC GLM.
Caution: If you use PROC ANOVA for analysis
of unbalanced data, you must assume responsibility for the validity of the
results.