Chance News 96
Quotations
"The world is a messy place. The scientific method imposes some order, but in the case of climate change, that order is probabilistic. For the sake of science and the planet, we should not become distracted by a false sense of certitude. Imprecise truths are the most inconvenient ones."
Submitted by Bill Peterson
I am particularly fond of this example [the Linda problem] because I know that the [conjoint] statement is least probable, yet a little homunculus in my head continues to jump up and down, shouting at me—"but she can’t just be a bank teller; read the description.”
(See the discussion of the Linda Problem Item 6 below).
Submitted by Bill Peterson
“Once managers start asking themselves ‘what is this distribution?’ instead of ‘what is this number?’, they are in a position to use the various tools … for their own analysis. …. What is needed is a convenient way to pass distributions around ….”
Statistics and Public Policy (1997)
Submitted by Margaret Cibes
“The name MongoDB stems from ‘humongous database’, but chair and co-founder Dwight Merriman says there are more sizes and shapes to data than just ‘big’.”
Siliconrepublic, October 30, 2013
Submitted by Margaret Cibes
From The Big Short, by Michael Lewis, Norton, 2011:
"Above the roulette tables [at The Venetian in Las Vegas], screens listed the results of the most recent twenty spins of the wheel. Gamblers would see that it had come up black the past eight spins, marvel at the improbability, and feel in their bones that the tiny silver ball was now more likely to land on red. That was the reason the casino bothered to list the wheel’s most recent spins: to help gamblers to delude themselves. To give people the false confidence they needed to lay their chips on a roulette table. The entire food chain of intermediaries in the subprime mortgage market was duping itself with the same trick, using the foreshortened, statistically meaningless past to predict the future. " [p. 147]
"Craps offered the player the illusion of control – after all, he rolled the dice – and a surface complexity that masked its deeper idiocy. “For some reason, when these people are playing it they actually believe they have the power to make the dice work,” said [an analyst]. " [pp. 150-151]
Submitted by Margaret Cibes
Forsooth
“A somewhat comic, though not necessarily typical reaction to the use of sampling by a Federal agency occurred in 1936 when the National Resources Planning Board published its report Consumer Incomes in the United States, …. It showed a highly skewed distribution of income, with the top 10 per cent of the families and single individuals receiving 36 per cent of the income. …. [T]his was the first time that a Federal agency had published such data, and it was based on a sample. The U.S. Chamber of Commerce issued a blast against this report, which it considered it to be socialistic propaganda. It said that the estimates were based on ‘less than a 1 per cent sample, and a random sample at that!’”
cited by W. Allen Wallis, “Statistics in Washington, 1935-1945”
Statistics and Public Policy (1997)
Let's make an XKCD
Thanks to Brian Abend for sending a link to this cartoon from XKCD
The Monty Hall problem just won't stay solved, so it was only a matter of time before it was enshrined in XKCD. Here are two other recent appearances in the news:
- Stick or switch? Probability and the Monty Hall problem, BBC News Magazine, 11 September 2013
- “Readers’ Challenge report: The Monty Hall problem”, Significance, October 2013
Corrupt Ivy League admissions?
Writing last year, in the American Conservative (The myth of American meritocracy: How corrupt are Ivy League admissions?, 28 November 2012) Rob Unz claimed that today's Asian students were now being discriminated against in ways that Jewish students had been in the past. Namely, despite growing numbers in the population and impressive academic accomplishments, their share of the admissions to top institutions was being restricted by quotas. Furthermore, Unz went on to assert that Jewish students are now actually over-represented relative to equally qualified Asians and non-Jewish whites. Unz's article includes various statistical graphics to support these claims; for example, see
At the time of the article, a blog post by Andrew Gelman took the analysis at face value and asked, Should Harvard start admitting kids at random? (28 November 2012). Subsequently, however, Gelman and others were led to re-examine the Unz data, and found that many of the earlier claims do not stand up. For example, instead of inferring Jewishness via a family name, Janet Mertz actually contacted some of the individuals who were on the Math Olympiad team. She writes here that
The actual count of Jews is at least 10¼ out of 78 (counting part-Jews fractionally), i.e., 5-fold higher [than Unz's claim of only two ]. When an author refuses to admit to an error about which there is no possibility he is correct, academics have no choice but to then question the validity of everything that author has ever written because they can no longer trust the veracity of his statements.
Most recently, in a post entitled Ivy Jew update (22 October 2013), Gelman quotes Nurit Baytch:
Unz’s conclusion that Jews are over-admitted to Harvard was erroneous, as he relied on faulty assumptions and spurious data: Unz substantially overestimated the percentage of Jews at Harvard while grossly underestimating the percentage of Jews among high academic achievers.
This latest post by Gelman has a number of interesting quotations, that are worth bearing in mind when looking at any statistical claim:
- "My take on all this is that it can be harder than it looks to do research using statistics."
- "It’s perfectly natural to get excited when one’s initial hypothesis is confirmed by an examination of some data, but the next step is to recognize that these exciting discoveries do not always hold up."
Regarding the particular analysis in question, he writes
Unz, who spends so much of his time in the political arena, is used to politically-motivated criticisms and responds in kind, and so I think he sees the statistics provided by Mertz and Baytch as attacks to be dodged or parried rather than as useful information that can help him modify his understanding of the world. But for those of us how are not so invested in a particular position, Baytch’s article, and Mertz’s from a few months ago, should be helpful to anyone interested in further study of ethnicity and high-end college admissions.
In a related post, My beef with Brooks: the alternative to “good statistics” is not “no statistics,” it’s “bad statistics” (20 February 2013), Gelman takes columnist David Brooks to task for adopting an "anti-data" posture in a NYT column. He observes that Brooks had previously been happy to quote the Unz analysis in a column two months earlier. According to Gelman, "Janet Mertz contacted him and the Times to report that his published numbers were in error, and I also contacted Brooks (both directly and through an intermediary). But no correction has appeared."
Submitted by Paul Alper
Dance of the p-values
Paul Alper sent a link to a wonderful YouTube video, Dance of the p-values. This is an animated simulation--with sound effects keyed to emotional responses--designed to show how erratically the p-value can vary in replications of the same experiment.
Paul found it through Andrew Gelman's blog (7 November) which also features this cartoon:
The Linda Problem
Kahneman and Tversky's "Linda Problem" is a famous illustration of the conjuction fallacy:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is of the following is more probable?
- Linda is a bank teller.
- Linda is a bank teller and is active in the feminist movement.
From a formal logic perspective, the answer is obviously the first. Yet many people, because of the information about Linda's behavior and outlook on life, choose the second. In his book Gut Feelings: The Intelligence of the Unconscious, Gerd Gigerenzer points out that the reason for the confusion about the conjunction "and" is that a natural language does not work the way of logic. His first example to illustrate the discrepancy is
- Peggy and Paul married and Peggy became pregnant.
- Peggy became pregnant and Peggy and Paul married.
His second example is
- Mark got angry and Mary left.
- Mary left and Mark got angry.
Obviously, either of the above violates Prob (A and B) = Prob (B and A). The following does not violate Prob (A and B) = Prob (B and A):
- Verona is in Italy and Valencia is in Spain.
- Valencia is in Spain and Verona is in Italy.
Further,
Even more surprising, we also know without thinking when and should be interpreted as the logical [inclusive] OR, as in the sentence
- We invited friends and colleagues.
One of Gigerenzer's continuing themes is that instead of probability we should concentrate on frequency, which is much easier to understand. He would rephrase Linda to
There are a hundred persons who fit the description above (i.e., Linda's). How many of them are
- bank tellers?
- bank tellers and active in the feminist movement?
His empirical claim is that people easily figure things out with this rephrasing.
A Chance News item from a few years back discussed John Allen Paulos's take on the Linda Problem, from his NYT article "Stories and Statistics."
Submitted by Paul Alper
Flaws in cholesterol risk calculator?
Risk calculator for cholesterol appears flawed
by Gina Kolata, New York Times, 17 November 2013
The article includes an online graphic summarizing the findings:
For commentary on a related story, see the post The Economics & Politics of Drugs for Mild Hypertension from HealthNewsReview.org (4 November 2013), which begins
The Cochrane Collaboration’s Hypertension Group published a systematic review of drug treatment for mild hypertension in August 2012 showing no evidence that drugs benefit patients while about 11% have side effects severe enough to stop treatment. As coauthor of that review, I [Dr. David Cundiff] will comment on the economics, politics, regulatory intrigue, financial conflicts, and subsequent media coverage involved.
Submitted by Paul Alper
Update
Bumps in the road to new cholesterol guidelines
by Gina Kolata, New York Times, 25 November 2013
Update on International Year of Statistics
Allie Weinstein provided a newspaper clipping of the following:
Odds lot: Statisticians party like it's 2.013 x 10 cubed
by Daniel Michaels Wall Street Journal, 15 November 2013
The article reports that, now 88% through the International Year of Statistics, the event is viewed as success. It repeats the famous claim, made in 2009 by Google's Hal Varian , that statistics would be "the sexy job in the next 10 years."
The online article links to these videos from
- A Day Without Statistics, from the Research Center for Statistics of the University of Geneva
- My Statistician Friend,
- Stats Can Be Cool, You See, by Prof. Michael Posner of Villanova University
- Why Statistics Matters?
Plotting political ideology
The center cannot hold
by Thomas Edsall, New York Times, 4 December 2012
It's not everyday you encounter a scatterplot in the New York Times. In this column, Edsall presents a plot of political ideology of the 2012 electorate
Submitted by Bill Peterson