Chance News 109
July 1, 2016 to December 31, 2016
Quotations
From an 1840s letter from Charles Babbage to Alfred, Lord Tennyson, about two lines in a Tennyson poem: “Every minute dies a man, / Every minute one is born.”
“I need hardly point out to you that this calculation would tend to keep the sum total of the world’s population in a state of perpetual equipoise, whereas it is a well-known fact that the said sum total is constantly on the increase. I would therefore take the liberty of suggesting that in the next edition of our excellent poem the erroneous calculation to which I refer should be corrected as follows: ‘Every moment dies a man / And one and a sixteenth is born.’ I may add that the exact figures are 1.167, but something must, of course, be conceded to the laws of metre.”
Submitted by Margaret Cibes
"You can slice and dice it any way you like, but this isn’t like Consumer Reports, which tests something to see if it does or doesn’t work. The interaction between a student and an institution is not the same as the interaction between a student and a refrigerator."
"There is no better way to build confidence in a theory than to believe it is not testable."
Submitted by Margaret Cibes
Forsooth
"The LSAT predicted 14 percent of the variance between the first-year grades [in a study of 981 University of Pennsylvania Law School students]. And it did a little better the second year: 15 percent. Which means that 85 percent of the time it was wrong."
Submitted by Margaret Cibes
“These chemicals are largely unknown,” said David Bellinger, a professor at the Harvard University School of Public Health, whose research has attributed the loss of nearly 17 million I.Q. points among American children 5 years old and under to one class of insecticides.
Submitted by Margaret Cibes at the suggestion of Jim Greenwood
Guide to bad statistics
Our nine-point guide to spotting a dodgy statistic
by David Spiegelhalter, The Guardian, 17 July 2016
Published in the wake of the Brexit debate, but obviously applicable to upcoming US presidential election, the article offers these nine strategies for twisting numbers to back a specious claim.
- Use a real number, but change its meaning
- Make the number look big (but not too big)
- Casually imply causation from correlation
- Choose your definitions carefully
- Use total numbers rather than proportions (or whichever way suits your argument)
- Don’t provide any relevant context
- Exaggerate the importance of a possibly illusory change
- Prematurely announce the success of a policy initiative using unofficial selected data
- If all else fails, just make the numbers up
Submitted by Bill Peterson
Cancer, lifestyle, and luck
Helpless to prevent cancer? Actually, quite a bit Is in your control
By Aaron E. Carroll, TheUpshot blog, New York Times, 5 July 2016
A controversial news story last year suggested that whether or not you get cancer is mostly dependent on luck. For more discussion see Cancer and luck in Chance News 103.
The present article has a different message. Rather than aggregating the analysis across all types of cancer, which led to some of the earlier misinterpretations, it focuses on how healthy lifestyles can reduce the risk of particular cancers. Data are reported from a study in the journal JAMA Oncology. For example, lung cancer is the leading cancer cause of death in the US, and the study found that "about 82 percent of women and 78 percent of men who got lung cancer might have prevented it through healthy behaviors." Obligatory reminder here that these are observatoinal data, but the smoking and lung cancer story is of course a famous one in statistics!
Overall, the study estimated that a quarter of cancers in women and a third of those in men are preventable by lifestyle choices.
Submitted by Bill Peterson
Statistical reasoning in journalism education
Bob Griffin set a link to the following:
Chair support, faculty entrepreneurship, and the teaching of statistical reasoning to journalism undergraduates in the United States
by Robert Griffin and Sharon Dunwoody, Journalism, July 2015
Did Melania plagiarize?
A physicist has calculated the probability Melania Trump didn't plagiarise her speech
by Fiona MacDonald, Science Alert, 20 July 2016
The reference is to a humorous Facebook post by McGill University physics professor Robert Rutledge. He notes that Trump representative Paul Manafort,had argued in Melania's defense that "it's the English language, there are a limited number of words, so what if Melania Trump chose some of the same ones Michelle Obama did?"
From transcipts appearing in Vox, Rutledge identifies 14 key phrases ("values", "work hard" "for what you want in life", "word is your bond", "do what you say", "treat people with...respect", "pass [them] on to many generations", "Because we want our children", "in this nation", "to know", "the only limit", "your achievements", "your dreams", "willingness to work for them") that appear in both speeches, and observes that they also happen to appear in the same order. But 14! = 87,178,291,200. So even if Melania just happened to choose some of the same words as Michelle, he finds that there is less than one chance in 87 billion that they would appear in the same order.
Discussion
- This is effectively computing a p-value. What assumptions are being made?
- In any case, why is is not "the probability that Melania didn't plagiarize"?
Billion dollar lotteries
The billion dollar lottery jackpot: Engineered-to-drain-your-wallet
by Jeff Sommer, New York Times, 12 August 2016
The article described how lotteries have successfully boosted sales by readjusting odds in games like Powerball to generate ever-larger jackpots. It cites analyses by Salil Mehta's Statistical Ideas blog; see this post on A loser's lottery. We read there:
One should remember that the only objective for the Lottery, anywhere in the world, is not to make you rich. Contrary to their advertisements, the objective is not to show you a good time nor satisfy your dreams. Wasting your money is never a good time. The lottery’s only objective is to maximize the funds you pay for educational activities...
The whole scheme is an educational tax for those who instead could use a free education in probability theory (that’s where this blog comes in!)
Indeed, here is a discussion of "neglect of probability" that explains Why you’ll soon be playing Mega Trillions.
Noise in polling
Here is a series of articles written to help readers cope with the avalanche of polling results as the election approaches. The first was sent by Jeff Witmer to the Isolated Statisticians list:
- Confused by Contradictory Polls? Take a Step Back
- by Nate Cohn, 'TheUpshot' blog, New York Times, 20 September 2016
Included in Cohn's analysis is a simulation of 100 polls, generated under the assumption that Hillary Clinton has a 4 point lead over Donald Trump:
The point of this illustration is that a lot of the apparent disagreement we see in polls taken around the same time might reflect nothing more than random sampling error.
The next article deals with some other sources of error.
- We gave four good pollsters the same raw data. They had four different results.
- by Nate Cohn, 'TheUpshot' blog, New York Times, 20 September 2016
The common margin of error statements attached to pollling reports are based on the formula for error in simple random sampling. Thus a survey of around 1000 people is said to have a margin of plus of minus 3 percentage points (1/√1000) ≈ 0.03). But national political polls are not based on simple random samples. Polls presented the New York Times are usually accompanied by a statement on "How the Poll was Conducted." In a recent example, we read
The combined results have been weighted to adjust for variation in the sample relating to geographic region, sex, race, Hispanic origin, marital status, age, education and (for landline households) the number of adults and the number of phone lines. In addition, the sample was adjusted to reflect the percentage of the population residing in mostly Democratic counties, mostly Republican counties and counties more closely balanced politically....
Some results pertaining to the election are expressed in terms of a “probable electorate,” reflecting the probability of each individual’s voting on Election Day. This likelihood is estimated from responses to questions about registration, past voting, intention to vote, interest in the campaign and enthusiasm about voting in this year’s contest.
The effect of these adjustments is not covered in the margin of sampling error. To gauge the impact, the Upshot did their own analysis of a poll with n=867 respondents, and asked 4 professional pollsters for their adjustments. The five results: Clinton +3, Clinton +1, Clinton +4, Trump +1, Clinton +1. For comments on how surprised we should be see Andrew Gelman's blog post Trump +1 in Florida; or, a quick comment on that “5 groups analyze the same poll” exercise. For teachers, Shonda Kuiper of Grinnell College has developed extensive materials for classroom activities on weighed data.
For an extreme example of what can go wrong with weighting, see:
- How one 19-year-old Illinois man Is distorting national polling averages
- by Nate Cohn, 'TheUpshot' blog, New York Times, 20 September 2016
FInally, for good general advice on how to evaluate the quality of a poll, see
- The savvy person’s guide to reading the latest polls
- by Nate Cohn, 'TheUpshot' blog, New York Times, 12 October 2016
Election post mortem
Even with all of the collected wisdom about polling, the election results still surprised most professionals Here are some commentaries on what happened.
- Why FiveThirtyEight gave Trump a better chance than almost anyone else
- by Nate Silver, Fivethirtyeight.com, 11 November 2016
While not predicting a Trump victory, Silver was still out of step with most other analyses. His last analysis before the election gave Trump a 29% chance of winning the electoral college. Silver was criticized in some circles for suggesting that there was any substantial chance of a Trump win.
- Putting the polling miss of the 2016 election in perspective
- by Nate Cohn, Josh Katz and Kevin Quealy, 'TheUpshot' blog, New York Times, 13 November 2016
The results of the election were certainly stunning. But Hillary Clinton did win the popular vote, by about 1.5 percentage points rather than the 4 percentage points predicted by polls. The article notes that this difference does not exceed the size of normal polling errors. The real problem seems to be the state level errors, which were historically high. Reproduced below are data from a graphic in the article giving the "average absolute difference between polling average and final vote in the ten states closest to the national average with at least three polls."
Year | Difference |
---|---|
1988 | 3.4 pts |
1992 | 3.4 pts |
1996 | 2.3 pts |
2000 | 1.8 pts |
2004 | 1.7 pts |
2008 | 1.7 pts |
2012 | 2.3 pts |
2016 | 3.9 pts |
Bret Largent noted on the Isolated Statisticians list that the New York Times had an eerily prescient article the day before the election: Donald Trump’s big bet on less educated whites (7 November 2016).
Trump succeeds where health is failing
Daily chart: Trump succeeds where health is failing
Economist, 21 November 2016
Suggested by Peter Doyle
Statins and Alzheimer's
Why statins probably don’t reduce risk of Alzheimer’s disease, despite what headlines say
by Alan Cassels, HealthNewsReview blog, 14 December 2016
The journal JAMA Neurology recently published results of a large (400,000 subject) observational study that compared the risk of Alzheimer's disease in patients with "high exposure" to statins vs. those with "low exposure." Higher use of statins was found to be associated with lower Alzheimer's risk. The results were widely covered in news media, several of which featured this quote from the corresponding author for the study
We may not need to wait for a cure to make a difference for patients currently at risk of the disease. Existing drugs, alone or in combination, may affect Alzheimer’s risk.
Readers who continued past the headlines were informed that this was not a randomized experiment, so it was premature to draw causal conclusions. of course, leading with this would not not make for a sensational news story.
The HealthNewsReview post describes the problem of heathy user bias. For illustration it cites a 2009 study that found that patients who faithfully followed a statin regime were less likely to be involved in car crashes. Does this