Significance Testing Principles

  • A cartoon that can be used to discuss the multiple testing issue and the concept of p-hacking. The cartoon was used in the June 2021 CAUSE cartoon caption contest and the winning caption was written by Jim Alloway from EMSQ Associates. The cartoon was drawn by British cartoonist John Landers (www.landers.co.uk) based on an idea by Dennis Pearl from Penn State University.

    0
    No votes yet
  • A cartoon that can be used to discuss the importance of using a paired analysis to reduce the variability in the response for a heterogeneous population. The cartoon was used in the February 2021 CAUSE cartoon caption contest and the winning caption was written by Jeremy Case from Taylor University.. The cartoon was drawn by British cartoonist John Landers (www.landers.co.uk) based on an idea by Dennis Pearl from Penn State University.

    0
    No votes yet
  • A poem about type II errors in diagnostic testing using a diabetes test context.  The poem was written by Lawrence Lesser from The University of Texas at El Paso and received an honorable mention in the non-song category of the 2023 A-mu-sing Competition.  The author also provided the following outline for a lesson plan:

    Some sample questions (one per stanza) students can explore or discuss
    as a practical application of statistics to a prevalent disease
    that likely affects (or will) a friend or relative of almost everyone.

    First stanza: Look up history of diabetes prevalence to explore questions such as: Is “1 in 10” roughly accurate for the United States and how does that compare to other countries? Was the 2003 lowering of the threshold for a prediabetes diagnosis based on updated medical understanding of the disease or more of a policy decision to give an “earlier warning”?

    Second stanza: How does a hypothesis testing framework apply to an oral glucose tolerance test (OGTT)? It’s warned that a false positive is possible if the patient did not eat at least 150g of carbohydrates for each of the 3 days before the test. (This is likely what happened to the poet, whose diagnosis was overturned just 2 months later by an endocrinologist.)

    Third stanza: Given the usual trend that the null hypothesis usually means no effect, no difference, nothing special, explain whether it seems consistent that a normality test such as Anderson-Darling would let normality be the null. When might it make sense for a doctor to view having a particular disease as the null hypothesis (and what would be the Type I and Type II errors?)?

    Fourth stanza: Explain how having only a few individual values each day from a blood glucose meter (BGM) risks missing dangerously high variability of glucose (students can Google how high variability can be a risk factor for hypoglycemia and diabetes complications). Discuss how output from a Continuous Glucose Monitor (CGM) that records values every 5 minutes can be used to check, for example, that the coefficient of variation is sufficiently low (e.g., < 36%) and that “time in range” (e.g., 70-180 or 70-140 mg/dL) is sufficiently high. Example output is on page S86 of https://diabetesjournals.org/care/issue/45/Supplement_1.

    Fifth stanza: Have students look up current FDA guidelines on how accurate over-the-counter BGM readings need to be (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7753858/) and have them connect this to margin of error, confidence intervals, etc.

    Sixth stanza: Find online the diabetes “plate method” of taking a circular plate (9” in diameter) for a meal where half of the plate would have non-starchy vegetables, a quarter having lean protein, and a quarter with carbohydrate foods such as whole grains. How do this breakdown and total quantity compare to a pie chart of a typical meal that you (or typical college undergraduates) eat?

    0
    No votes yet
  • This limerick was written in April 2021 by Larry Lesser of The University of Texas at El Paso to be used as a vehicle for​ discussing the issues and pitfalls of using .05 as a bright-line threshold for declaring statistical significance, in light of ASA recommendations.  The poe was also published in the June 2021 AmStat News.

    0
    No votes yet
  • A song satirizing the use of fixed significance level hypothesis testing.  The song was written by Dennis K Pearl from Penn State University.  Lyrics may be sung to the tune of the Beatles 1967 hit "When I'm Sixty-Four." (Paul McCartney wrote the song in 1958).  The audio recording was produced by Nicolas Acedo with vocals by Alejandra Nunez Vargas, both students in the Commercial Music Program at The University of Texas at El Paso.

    0
    No votes yet
  • Summary: This High School AP activity examines whether students can tell the difference between CokeTM and PepsiTM by taste? During the “tasting part”, data are collected and the class keeps track of how many students can differentiate between Coke and Pepsi. During the “simulation part” of the activity, a simulation is conducted with dice. Finally, students compare their classroom results in the taste test with the simulated results about what would happen when subjects just guess randomly from the three possible choices. The activity is described in F. Bullard, “AP Statistics: Coke Versus Pepsi: An Introductory Activity for Test of Significance: AP Central – The College Board,” 2017 on the AP Central website at https://apcentral.collegeboard.org/courses/ap-statistics/classroom-resources/coke-versus-pepsi-introductory-test-significance

     

    Specifics: The activity is performed in the following steps:

    1. The Tasting part:
      1. First, two students will label three cup positions “A,” “B,” and “C.” Then they will roll a die and pour drinks into the cups such that all combinations of two of one drink and one of the other are represented, and the die roll makes each combination equally likely and keep track of the treatment.
      2. Students will be called out into the hall one by one to taste the three drinks and decide which cup contains the different drink. They do not need to identify the drinks as Coke or Pepsi, they only have to identify the cup containing the different soda, either A, B, or C.
    2. The Simulation part:
      1. The next stage of the discussion is to ask the students how many correct identifications they need before they can conclude that people were not just randomly guessing: “11 out of 30 is more than a third, but not enough more to be convincing, right?” Students will probably volunteer different dividing lines, but they will not be good at defending them. At the point when all the students understand the question but are unsure of how to answer it, the dice should be introduced into the activity.
      2. The students can suggest a simulation in which two die outcomes (say, 1 and 2) are considered a correct cup identification, and the other four die outcomes (say, 3, 4, 5, and 6) are considered incorrect cup identifications. Demonstrate by rolling a set of dice or one die many times. You should have as many die rolls as there are subjects in the study. Count the 1s and 2s. Suppose there are 8 out of 30 that “guessed correctly.” On your number line at the blackboard, make an X over the number 8. The students or group of students should do five or 10 simulations each (it’s good to have about 100—200 simulations) and then come to the blackboard and stack their Xs over the appropriate integers, making a histogram of the distribution of “number of correct cup identifications if everyone is randomly guessing.”
    3. Conclusion:
      1. Upon the conclusion of the tasting, the number of correct identifications is then counted. At this point, if the number is unusually high (say, 18 out of 30), then most students are prepared to conclude (correctly) that there is evidence that at least some people can tell the difference between Coke and Pepsi.
      2. Some statement like this would be great: “If everyone were randomly guessing, we would almost never see 18 students get it right by luck, because we did that 100 times with dice, and the highest we ever got was 16, and that was only once.”
      3. In the author’s experience, usually, about half or a little more will identify the correct drink. When the author, did this activity with a class: 13 out of 21 students correctly identified the different drinks.

    (Resource photo illustration by Barbara Cohen, 2020; this summary compiled by Bibek Aryal)

    0
    No votes yet
  • A poem written in 2019 by Larry Lesser from The University of Texas at El Paso to discuss statistics examples involving social justice, inspired by his paper in March 2007 Journal of Statistics Education. The poem is part of a collection of 8 poems published with commentary in the January 2020 issue of Journal of Humanistic Mathematics.

    0
    No votes yet
  • A joke for discussing the over-use of hypothesis testing methods.  The joke was written in April 2019 by Larry Lesser from The University of Texas at El Paso.

    3
    Average: 3 (1 vote)
  • A cartoon suitable for use in teaching about publication bias and the small sample caution in hypothesis testing. The cartoon is number 2020 (July, 2018) from the webcomic series at xkcd.com created by Randall Munroe. Free to use in the classroom and on course web sites under a creative commons attribution-non-commercial 2.5 license.

    0
    No votes yet
  • A cartoon suitable for use in teaching about the idea of a falsifiable hypothesis. The cartoon is number 2078 (November, 2018) from the webcomic series at xkcd.com created by Randall Munroe. Free to use in the classroom and on course web sites under a creative commons attribution-non-commercial 2.5 license.

    0
    No votes yet

Pages