Blankenship and Young (2001)

Student's version

Simulating Size and Power Using a 10-Sided Die

Erin E. Blankenship and Linda J. Young
Department of Biometry
University of Nebraska–Lincoln
Lincoln, NE 68583-0712

Statistics Teaching and Resource Library, July 25, 2001

© 2001 by Erin E. Blankenship and Linda J. Young, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

This group activity illustrates the concepts of size and power of a test through simulation. Students simulate binomial data by repeatedly rolling a ten-sided die, and they use their simulated data to estimate the size of a binomial test. They carry out further simulations to estimate the power of the test. After pooling their data with that of other groups, they construct a power curve. A theoretical power curve is also constructed, and the students discuss why there are differences between the expected and estimated curves.

Key words: Power, size, hypothesis testing, binomial distribution

Materials

Each group (2-4 students) will need one ten-sided die, two tabulation sheets, and a binomial probability table.

Objective

Carry out a simulation study to estimate the size and power of a binomial test using data simulated by rolling a ten-sided die.

Description of Activity

This activity is used during the laboratory section of a graduate level course on introductory statistical methods, and was developed because the students were having trouble with the concepts of size and power. To solidify these ideas in the context of a hypothesis test for a binomial parameter, the students carry out a simulation study based on Example 3.1, page 60, in Dowdy and Wearden (1991). The null hypothesis in the example is H_o:p=0.5 versus the alternative H_a:p¹0.5, and the students begin by performing a simulation study to estimate the size of the test (i.e., assuming p=0.5). Each simulated experiment has n=20 trials, and the experiment is repeated 25 times. The simulation study is repeated to estimate the power of the test of H_o:p =0.5 under various alternative values of p. Again, there are n=20 trials in each of 25 simulated experiments. This time, however, the true value of p is something other than 0.5.

To simulate binomial data, the students repeatedly roll a 10-sided die. At the beginning of the lab period, the students arrange themselves into groups of 2–4 members, and each group receives a 10-sided die and a particular alternative true value of p. Depending on class size, it may be necessary to rearrange the groups so that there are as many groups as there are alternative p values. The true values that work well with the 10-sided die are p={0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9}, which imply a total of 8 groups. On the prototype activity, there is a blank left for the group’s p value. This prototype activity is identical to the version handed out during the lab session, and the true p value is filled in by the lab instructor as the directions are distributed to the groups. All groups work with the hypothesized value, p=0.5, in the study to estimate the size of the test of H_o:p=0.5.

The prototype activity includes several questions for the group members to discuss during the activity. For example, the group members decide which rolls should constitute a success under their specified value of p. Other discussion questions included in the activity are more abstract. For example, the groups are asked whether the estimated size of the test is satisfactorily close to the theoretical a. It is often instructive for the lab instructor to bring the groups together after they have all completed the activity and discuss these types of questions again. The groups often have different perspectives. The activity also asks the groups to pool their results with the other groups to construct a theoretical power curve and an estimated power curve. It works well to have the lab instructor, after reassembling the groups, construct the power curves on the blackboard using the information supplied by the groups. For the point at p=0.5 (the hypothesized value, used by each group) the lab instructor may pick one at random or may average all of the possibilities. After the curves are constructed, questions about the differences between the curves can be discussed by the class as a whole. This is also a good time to discuss any problems the groups encountered during the activity.

Assessment

This activity was used in lab because students were having trouble understanding the concepts of power and size; they saw them as abstract ideas. This activity was designed to make those concepts more concrete. Therefore, the test and homework questions previously used to assess student understanding really did not change. The anticipated changes were in the quality of the answers, and the responses did improve. Below is a sample exam question to test an understanding of power (and binomial hypothesis testing in general):

The CDC reported that 6.7% of men aged 45-54 have coronary heart disease (CHD). We want to know if this rate also holds for men who are heavy coffee drinkers (>100 cups per month). To investigate this, 25 heavy coffee drinkers aged 45-54 are randomly selected, and a physician determines the number out of the 25 that suffer from CHD.

State the most reasonable null and alternative hypotheses to test. Rather than the 6.7% reported by the CDC, use 10% as the rate of CHD so that the binomial tables from the textbook may be used.
In the context of this example, describe Type I and Type II errors. Which type of error do you consider more serious? Explain.
If we want to test the hypotheses from (1) at the a=0.05 level, what is the rejection region?
Suppose 4 out of the 25 men have CHD. What is the conclusion?
Assume that the true proportion of heavy coffee drinking males aged 45-54 that have CHD is 15%. What is the power of the test under this alternative? What does this value mean?

Teacher notes

The alternative values of p that work well with this activity are p={0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9} so that the ten-sided die can be used, although other true values could be used with a different randomizing device. Sample tabulation sheets for estimating size (p =0.5) and estimating power (for the alternative p=0.2) are included. Also included is an example of a theoretical and estimated power curve plot. (This plot is based on real data, and it looks almost too good!) Two dice can be provided to each group to speed up the data collection. The number of rolls and number of simulated experiments can be changed. The Dowdy and Wearden (1991) text only gives binomial tables for n=20 and n=25, but a more extensive binomial table could be used to accommodate any number of rolls in an experiment. The number of simulated experiments can be changed to ﬁt the amount of time allotted for the activity. The larger the number of simulated experiments, the more closely the estimated power curve will follow the theoretical curve. We allot a two-hour laboratory session for this activity, but it does not take the entire time. The activity could easily be completed during a 75-minute class period, or during one and a half 50-minute class periods. Of course, if discussion becomes more involved, it will take longer.

The ten-sided dice are available at game stores. They are also available on the web at http://www.chessex.com/, but we do not have experience ordering online from this company.

Acknowledgements

This manuscript has been assigned Journal Series No. 01-6, College of Agricultural Sciences and Natural Resources, University of Nebraska.

References

Dowdy, S. and Wearden, S. (1991). Statistics for Research, Second Edition New York: John Wiley & Sons.

Editor's note: Before 11-6-01, the "student's version" of an activity was called the "prototype".