Erin E.
Blankenship and Linda J.
Young
Department of Biometry
University of
Nebraska–Lincoln
Lincoln, NE 68583-0712
Statistics Teaching
and Resource Library, July 25, 2001
© 2001 by Erin
E. Blankenship and Linda J.
Young, all
rights reserved. This text may be freely shared among
individuals, but it may not be republished in any medium without
express written consent from the authors and advance notification
of the editor.
This group activity illustrates the
concepts of size and power of a test through simulation.
Students simulate binomial data by repeatedly rolling a ten-sided
die, and they use their simulated data to estimate the size of a
binomial test. They carry out further simulations to
estimate the power of the test. After pooling their data
with that of other groups, they construct a power curve. A
theoretical power curve is also constructed, and the students
discuss why there are differences between the expected and
estimated curves.
Key
words: Power, size,
hypothesis testing, binomial distribution
Materials
Each group (2-4 students) will need
one ten-sided die, two tabulation sheets, and a binomial
probability table.
Objective
Carry out a simulation study to
estimate the size and power of a binomial test using data
simulated by rolling a ten-sided die.
Description of Activity
This activity is used
during the laboratory section of a graduate level course on
introductory statistical methods, and was developed because the
students were having trouble with the concepts of size and
power. To solidify these ideas in the context of a
hypothesis test for a binomial parameter, the students carry out a
simulation study based on Example 3.1, page 60, in Dowdy and
Wearden (1991). The null hypothesis in the example is
Ho:p=0.5 versus
the alternative Ha:p¹0.5,
and the students begin by performing a simulation study to
estimate the size of the test (i.e., assuming p=0.5). Each simulated experiment has
n=20 trials, and the experiment is repeated 25 times. The
simulation study is repeated to estimate the power of the test of
Ho:p =0.5 under
various alternative values of p. Again, there are n=20 trials in each of 25
simulated experiments. This time, however, the true value of
p is something other than
0.5.
To simulate binomial
data, the students repeatedly roll a 10-sided die. At the
beginning of the lab period, the students arrange themselves into
groups of 2–4 members, and each group receives a 10-sided die and
a particular alternative true value of p. Depending on class size, it may be necessary
to rearrange the groups so
that there are as many groups as there are alternative
p values. The true values that work well
with the 10-sided die are p={0.1, 0.2, 0.3, 0.4, 0.6,
0.7, 0.8, 0.9}, which imply a total of 8 groups. On the
prototype activity, there is a blank left for the group’s
p value. This prototype activity is
identical to the version handed out during the lab session, and
the true p value is filled in by the
lab instructor as the directions are distributed to the
groups. All groups work with the hypothesized value,
p=0.5, in the study to estimate the size of the
test of Ho:p=0.5.
The prototype activity includes
several questions for the group members to discuss during the
activity. For example, the group members decide which rolls
should constitute a success under their specified value of
p. Other discussion questions included in
the activity are more abstract. For example, the groups are
asked whether the estimated size of the test is satisfactorily
close to the theoretical a. It is often
instructive for the lab instructor to bring the groups together
after they have all completed the activity and discuss these types
of questions again. The groups often have different
perspectives. The activity also asks the groups to pool
their results with the other groups to construct a theoretical
power curve and an estimated power curve. It works well to
have the lab instructor, after reassembling the groups, construct
the power curves on the blackboard using the information supplied
by the groups. For the point at p=0.5 (the hypothesized value, used by each group) the lab
instructor may pick one at random or may average all of the
possibilities. After the curves are constructed, questions
about the differences between the curves can be discussed by the
class as a whole. This is also a good time to discuss any
problems the groups encountered during the
activity.
Assessment
This activity was used in lab
because students were having trouble understanding the concepts of
power and size; they saw them as abstract ideas. This
activity was designed to make those concepts more concrete.
Therefore, the test and homework questions previously used to
assess student understanding really did not change. The
anticipated changes were in the quality of the answers, and the
responses did improve. Below is a sample exam question to
test an understanding of power (and binomial hypothesis testing in
general):
The CDC reported that 6.7% of men
aged 45-54 have coronary heart disease (CHD). We want to
know if this rate also holds for men who are heavy coffee drinkers
(>100 cups per month). To investigate this, 25 heavy
coffee drinkers aged 45-54 are randomly selected, and a physician
determines the number out of the 25 that suffer from
CHD.
- State the most reasonable null
and alternative hypotheses to test. Rather than the 6.7%
reported by the CDC, use 10% as the rate of CHD so that the
binomial tables from the textbook may be used.
- In the context of this example,
describe Type I and Type II errors. Which type of error do
you consider more serious? Explain.
- If we want to test the
hypotheses from (1) at the a=0.05 level, what is the rejection
region?
- Suppose 4 out of the 25 men have
CHD. What is the conclusion?
- Assume that the true proportion
of heavy coffee drinking males aged 45-54 that have CHD is
15%. What is the power of the test under this
alternative? What does this value mean?
Teacher
notes
The alternative values of
p that work well with
this activity are p={0.1, 0.2, 0.3,
0.4, 0.6, 0.7, 0.8, 0.9} so that the ten-sided die can be used,
although other true values could be used with a different
randomizing device. Sample tabulation sheets for estimating
size (p =0.5) and
estimating power (for the alternative p=0.2) are included. Also included is an example of
a theoretical and estimated power curve plot. (This plot is
based on real data, and it looks almost too good!) Two dice
can be provided to each group to speed up the data collection. The
number of rolls and number of simulated experiments can be
changed. The Dowdy and Wearden (1991) text only gives
binomial tables for n=20 and n=25, but a more extensive binomial
table could be used to accommodate any number of rolls in an
experiment. The number of simulated experiments can be
changed to fit the amount of time allotted for the activity.
The larger the number of simulated experiments, the more closely
the estimated power curve will follow the theoretical curve.
We allot a two-hour laboratory session for this activity, but it
does not take the entire time. The activity could easily be
completed during a 75-minute class period, or during one and a
half 50-minute class periods. Of course, if discussion
becomes more involved, it will take
longer.
The ten-sided dice are available at
game stores. They are also available on the web at http://www.chessex.com/,
but we do not have experience ordering online from this
company.
Acknowledgements
This manuscript has been assigned
Journal Series No. 01-6, College of Agricultural Sciences and
Natural Resources, University of Nebraska.
References
Dowdy, S. and Wearden, S. (1991).
Statistics for Research, Second Edition New York: John Wiley &
Sons.
Editor's
note: Before 11-6-01, the "student's version" of an
activity was called the "prototype".