Chance News 64
Quotations
We tolerate the pathologies of quantification — a dry, abstract, mechanical type of knowledge — because the results are so powerful. Numbering things allows tests, comparisons, experiments. Numbers make problems less resonant emotionally but more tractable intellectually. In science, in business and in the more reasonable sectors of government, numbers have won fair and square.
--Gary Wolf
Writing in The data-driven life, New York Times, 26 April 2010
Submitted by Bill Peterson
No method of measuring a societal phenomenon satisfying certain minimal conditions exists that can’t be second-guessed, deconstructed, cheated, rejected or replaced. This doesn’t mean we shouldn’t be counting – but it does mean we should do so with as much care and wisdom as we can muster.
--John Allen Paulos
Writing in “Metric Mania”, The New York Times , May 10, 2010
Submitted by Margaret Cibes
All I can say is, beware of geeks bearing formulas.
--Warren Buffett in October 2008 interview by Charlie Rose
Quoted in “Quants,” The New York Times, May 10, 2010
Submitted by Margaret Cibes
The skepticism that people like Gregg [US Senator from NH] apply to statistics, if applied to other sciences, would get them lumped with the anti-vaccinationists and the homeopaths.
--Jordan Ellenberg
Writing in The census will be wrong. We could fix it. Washington Post, 1 May 2010
(This article was recommended by Laura Chihara on the Isolated Statisticians list.)
In the world of cancer research, there is something called a Kaplan-Meier curve, which tracks the health of patients in the trial of an experimental drug. In its simplest version, it consists of two lines. The first follows the patients in the “control arm,” the second the patients in the “treatment arm.” …. Seven years ago … a team presented the results of a colorectal-cancer drug trial at the annual meeting of the American Society of Clinical Oncology …. The lead … researcher took the audience through one slide after another … laying out the design and scope of the study, until he came to the crucial moment: the Kaplan-Meier. At that point, what he said became irrelevant. The members of the audience saw daylight between the two lines, for a patient population in which that almost never happened, and they leaped to their feet and gave him an ovation. Every drug researcher in the world dreams of standing in front of thousands of people at ASCO and clicking on a Kaplan-Meier like that.
--Malcolm Gladwell
Writing in “The Treatment”, The New Yorker, May 17, 2010 (Full text may require subscription.)
See a Vanderbilt PowerPoint[1] addressing the purpose and method of the Kaplan-Meier method/chart of survival analysis.
Submitted by Margaret Cibes
The probability estimate is just one piece of a risk assessment. The other is quantifying just how much damage a rare event, like a massive oil spill, would cause. "What is important is, what is the damage?" says [an economist]. "If something happens with high probability but just a few fishes die, well, sorry for the fishes."
--Carl Bialik
Writing in “Near Misses Are a Hit in Disaster Science”, The Wall Street Journal, June 12, 2010
Submitted by Margaret Cibes
Forsooth
Pitching is 80% of the game. The other half is hitting and fielding.
--former Yankee Mickey Rivers
Quoted in “Is Greinke the Unluckiest Pitcher Ever?”
The Wall Street Journal, May 5, 2010
See more great quotes online at "Mickey Rivers' Words of Wisdom About Baseball".
Submitted by Margaret Cibes
Odds are, it’s wrong--Part II
An entry in Chance News 63 presented a Science News article by Tom Siegfried. The article, which focuses on statistics used in the medical field, may be found here and is worth some elaboration; be sure to read the comments reacting to what Siegfried writes. There you will find mention of circumcision, condoms, defense of statistics in medicine, praise for the author, condemnation of the author--and somehow, reference to Scott Reuben, who faked data for Pfizer and Merck (see Serious medical fraud in Chance News 45).
Siegfried’s main contention is that despite its prevalence in the medical sphere (and dominance elsewhere as well), Fisher’s p-value approach is inadequate and misleading at best. Because of this “p-value mania,” Siegfried quotes two researchers who claim “that in modern [medical] research, false findings may be the majority or even the vast majority of published research claims,” and “There are more false claims made in the medical literature than anybody appreciates,” respectively.
Criticism of p-value is hardly new. Put “criticism of p-value” into a browser and you will get 4,520,000 hits, many of which are more informative than Siegfried’s article. Try The P-value, devalued from the International Journal of Epidemiology as an example.
Discussion
1. To see why critics of p-value say it is the wrong-way round, consider Prob ( brown eyes | Costa Rican) and Prob (Costa Rican | brown eyes). Compare with Prob (data | Null Hypothesis is true) and Prob (Null Hypothesis is true | data). For an interesting illustration of the difference between these conditional probabilities regarding the O.J. Simpson murder case see Steven Strogatz’s NYT article (25 April 2010).
2. Critics of p-value say that the above #1 is not strong enough of a criticism because p-value deals not with “data” that actually occurred but with “data at least this extreme.” Why is this a potent criticism?
3. Siegfried rightfully refers to “randomized, controlled clinical trials that test drugs for their ability to cure or their power to harm” as the “gold standard” for medical research. “Such trials assign patients at random to receive either the substance being tested or a placebo.” However, see Judson’s NYT article, Enhancing the Placebo (3 May 2010), which discusses how non-placebo a placebo can be. What does this do to clinical trials and the gold standard?
4. Siegfried suggests that Bayesian inference is preferable to the frequentist p-value approach of Fisher. If this is so, why is it that p-value approach is so dominant, long after Fisher himself died?
Submitted by Paul Alper
Health insurers finagling fees?
“For WellPoint, Math Error Spurs More Scrutiny”
by Avery Johnson, The Wall Street Journal, May 5, 2010
The health insurer WellPoint, an affiliate of Anthem Blue Cross, has decided to withdraw its request for up to a 39% price increase on individual plans in California.
The decision followed an actuarial consultant’s report about math errors in the company’s calculations. Alleged mistakes included overestimating future medical costs and double-counting the effect of aging on its policyholders.
The overestimation is thought to be related to a new requirement that health insurers spend “80% of their premium revenues on healthcare for plans that cover individuals and small businesses, and 85% for policies with large employers.”[2]
Health and Human Services Secretary Kathleen Sebelius has asked a nation-wide review of the health-related data on which costs are based, and she noted that $250 million has been allocated to states for that purpose.
Submitted by Margaret Cibes
Girls becoming less careful drivers
“Do Girls Speed More Than Boys?”
by Joseph B. White and Anjali Athavaley, The Wall Street Journal, May 5, 2010
Allstate Foundation has sponsored a survey that found that young women are not necessarily more responsible than young men – at least in their driving habits. An Allstate spokesman stated, “It would be fair to say the gap is closing,” although “teenage girls continue to be a better risk than boys.”
The data came from online interviews with 1,063 teens across the country in May 2009. About half of the girls felt that they are more likely to drive 10 mph over the speed limit or to phone/text while driving, compared to fewer than 40% of boys for each activity.
State Farm, the nation’s largest insurance company, charges 40% more for teenage boys than girls, down from a 1985 gap of 61%. While the company is raising the rates for teenage female drivers, it says that it is raising them based on claims experience and other factors, not these survey results.
For the current report, see “Shifting Teen Attitudes: 2009 State of Teen Driving”. For 2005 report, see “Chronic - A Report on the State of Teen Driving”. Unfortunately, neither the questionnaires, nor any raw data, appears to be available online.
Discussion
1. What additional information about the survey would you like to know in order to draw any conclusion about the driving habits of all teenage girls and boys across the country?
2. Do you believe that self-reporting by teenagers, in general is reliable? Would you expect a difference in relying on teen boys’ versus teen girls’ responses?
3. In the article, Progressive Direct quoted 6-month premiums for a boy and a girl in identical situations, including one speeding ticket apiece in the last 3 years, at $2,938 and $2,627, respectively. By what percent does the boy’s Progressive premium exceed the girl’s Progressive premium? How does it compare with Allstate’s 40% figure? Can you think of any reason(s) for the reported disparity?
4. The article referred to girls as becoming more “aggressive” drivers. If this is true, can you think of any reason(s) for this?
Submitted by Margaret Cibes
Black swan author linked to black swan event
“Did a Big Bet Help Trigger ‘Black Swan’ Stock Swoon?”
by Scott Patterson and Tom Lauricella, The Wall Street Journal, May 10, 2010
On May 6 the Dow Jones Industrial average fell nearly 1,000 points in less than half an hour.[3]. The decline was initially attributed to a trading error in which a Citigroup trader incorrectly keyed in a “b” instead of an “m” for an intended $16 million trade of Proctor & Gamble stock. (This was referred to as a "fat finger" trading error in another article.) It was later reported by Reuters [4] that the rumor was untrue.
Apparently one of the key factors in the May 6 stock-market “collapse” was a large trade by Universa Investments during a day when “all varieties of financial markets were deeply unsettled.” Ironically, Universa is a hedge fund advised by the author of The Black Swan: The Impact of the Highly Improbable.
The trade by Universa, a hedge fund advised by Nassim Taleb … led traders on the other side of the transaction … to do their own selling to offset some of the risk …. The working theory among traders and others involved in the exchange meltdown is that the "Black Swan"-linked fund may have contributed to a "Black Swan" moment, a rare, unforeseen event that can have devastating consequences.
Submitted by Margaret Cibes
If you take away my time for your research, you owe me ten bucks
$63,000 worth of abusive research . . . or just a really stupid waste of time? Andrew Gelman on his Statistical Modeling, Casual Inference, and Social Science blog.
Two researchers, Katherine L. Milkman of the University of Pennsylvania and Modupe N. Akinola of Columbia University, wanted to use email to find out patterns in the responsiveness of professors to a request for their time. They sent out emails to 6,300 professors asking for an appointment for help in one of two possible time frames. When the professors responded, they recorded the results and then canceled the request for the appointment. Later they sent an email explaining that the original request was part of a research study.
One of the people who received this request, Andrew Gelman, did not take kindly to being part of a research study without first getting his consent and asked on his blog for $10 compensation. Later, he softened his anger about the research.
The Chronicle of Higher Education has a story on this research and the fuss it created in the academic community, though you may need a subscription to read the full article.
Questions
1. This research involved deception. Do you feel that deception should ever be allowed in ethical research. If so, under what conditions?
2. The study did not provide informed consent prior to getting subject participation in the study. Did that make the study unethical?
3. What would be the appropriate response to a subject who protested that he/she did not wish to participate in the research?
4. Is there a different way this study could have been conducted that would have avoided this controversy?
Submitted by Steve Simon
Deception and waste of time
To expand a bit on the previous post, deception in psychology is quite common. In fact, there exists an entire book devoted to the subject, Illusions of Reality: A History of Deception in Social Psychology, by James H. Korn. “Stanley Milgram [famous for obedience studies] used the term technical illusions because he thought the word deception had a negative moral bias”--italics in the original. Most people outside of the psychology realm recognize a convenient euphemism when they see one.
However, the researchers who so annoyed Andrew Gelman are not psychologists but business school assistant professors, who, unfortunately like their psychologist colleagues, have to publish and seek research which can be done inexpensively. The one thing that sets this research apart is that the duped individuals were faculty members rather than that customary captive, victimized class known as convenient undergraduates.
Nevertheless, Gelman’s initial reaction was to say “$63,000 worth of abusive research…or just a waste of time?” As noted above, he later modified his views but perhaps he should not have. The lay public is all too familiar with the “Lies, damned lies, and statistics” apocryphally attributed to Twain and Disraeli. Almost as well known is “If you torture the data enough, it will confess to anything.” Perhaps an even more serious condemnation of the use of statistics is that many studies which utilize statistics to justify their existence are just not worth undertaking despite their titillation value and low p-values. Intercessory prayer, lucky charms and extrasensory perception come readily to mind. Unfortunately, just these kinds of investigations resonate with journalists.
Discussion
1. Consider the following University of Michigan research as an illustration of a deception study. It is entitled “Washing Away Postdecisional Dissonance.” The technical illusion in this paper had to do with preference for a product--first, music CDs (40 undergraduates) and then, jam jars (85 undergraduates)-- when in fact the real interest was in hand washing afterwards to determine its effect on regret. Why did the Wall Street Journal choose to comment on it? Estimate the cost of doing the study. If inference to a larger population is desired, what would be the relevant larger population?
2. The hand washing study begins with the statement: “Hand washing removes more than dirt--it also removes the guilt of past misdeeds, weakens the urge to engage in compensatory behavior, and attenuates the impact of disgust on moral judgment.” Based on the music CDs and the jam jars, it concludes with hand washing “can also cleanse us from traces of past decisions, reducing the need to justify them.” How would you set up a different deceptive experiment to show whether or not this is true?
3. Quite apart from dubious statistics, deception can be a dangerous endeavor as James O'Keefe might attest to. After his initial success with duping ACORN, he overreached when he tried “to tamper with Democratic Sen. Mary Landrieu’s office phones” by posing as a telephone worker. More germane to this discussion, see the case of Francis Flynn. He wrote to 240 New York restaurants “claiming to have contracted food poisoning while dining at their establishments” in order to “to help collect data for a research study he had developed to determine how restaurateurs responded to complaints.” Eventually, when his deception was found out, 10 of the restaurants “filed a $100 million class-action lawsuit against Flynn and the school [Columbia University], claiming libel and emotional distress.” That was about ten years ago and in spite of this, he has now been promoted and has moved to another coast.
4. There is a spectrum: explanation, euphemism, deception, fraud. For each of the following oft-seen advertisements, justify what category is applicable.
a. “Absolutely free. Shipping and handling charges may apply.”
b. “Up to 30% off.”
c. “For your convenience, dinner that night is not included.”
d. “The illustration shown on the cereal box is enlarged to better display the contents.”
e. “Taxes and fees are extra.”
f. “Premium quality.”
g. “No entrance or sign-up fee.”
h. “Limited supply only.”
Submitted by Paul Alper
Updated research reporting guidelines
“CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials”
An international group of healthcare researchers concerned about poorly conducted or poorly reported aspects of clinical trials originally proposed guidelines known as CONSORT (Consolidated Standards of Reporting Trials) in 1996. See the CONSORT group’s web site[5].
The aim of the statement is to assist authors, peer reviewers, and readers of medical studies in evaluating research results. The statement itself refers only to the transparency of the report, not to the quality of the research.
The web site provides downloadable flowcharts of (a) the phases of a parallel randomized trial of two groups and (b) a checklist of information to be included in a report about a randomized trial.
Submitted by Margaret Cibes
Animal magnetism
“Big Cats Obsess Over Calvin Klein's 'Obsession for Men'”
by Ellen Byron, The Wall Street Journal, June 8, 2010
In 2003 a Bronx Zoo curator experimented with 24 fragrances to see how the scents caught the attention of 2 cheetahs:
Estée Lauder's Beautiful occupied the cheetahs on average for just two seconds. Revlon's Charlie managed 15.5 seconds. Nina Ricci's L'Air du Temps took it up to 10.4 minutes. But the musky Obsession for Men triumphed: 11.1 minutes.
Other wildlife scientists have found Obsession helpful in attracting jaguars for photographing and study.
One of Obsession’s creators stated:
It's a combination of this lickable vanilla heart married to this fresh green top note—it creates tension…. The cologne also has synthetic "animal" notes like civet, a musky substance secreted by the cat of the same name, giving it particular sex appeal.
Submitted by Margaret Cibes
Sex discrimination lawsuit
Report warned Wal-Mart of risks before bias suit.
by Steven Greenhouse, New York Times, 3 June 2010
The article asserts that “Six years before the biggest sex discrimination lawsuit in history was filed against Wal-Mart Stores, the company hired a prominent law firm to examine its vulnerability to just such a suit. The law firm, Akin Gump Strauss Hauer & Feld, found widespread gender disparities in pay and promotion at Wal-Mart…”
The article quoted statistics provided by “the plaintiffs’ main expert, Richard Drogin, an emeritus statistics professor at California State University, East Bay, who examined payroll data from 1966 to 2002 that Wal-Mart provided in the case.” These showed “that among hourly workers in 2001,…,women earned about $1,1000, or 6 percent, less a year than men, while among salaried employees, women earned $14,500, or 26 per cent, less.”
The article went on to state that “A study by Joan Haworth, an expert hired by Wal-Mart, disputed that analysis, finding that more than 90 percent of stores had no statistically significant pay differences between men and women.”
Discussion
1. Critique Joan Haworth’s reported analysis. Can you propose an analyses that would be more appropriate?
2. If you wanted to try to refute the claim of discrimination suggested by Professor Drogin’s analysis, how would you proceed?
Submitted by Gerry Hahn
Betting on long odds in the long run
“Time Magazine Profiles Online Poker Math Brats”
by Dan Cypra, Poker News Daily, June 21, 2010
Capra summarizes a June 28 TIME article (which requires a subscription to access). (Note that this summary article is dated prior to the print date of the TIME article.)
After his recent run of bad luck in poker tournaments, a former world champion commented:
[He] blames the new breed of math nerd, those guys using a mountain of sortable data from the millions of hands played online to dominate the game. “The reason I won 11 bracelets is my ability to read opponents,” he explains. “These new guys are focused on the math. And they are changing everything.”
The original TIME article explained that, while the “20 key probabilities” have not changed, these online players have developed an aggressive style based on the fact that “betting big on once-in-a-blue-moon odds will work – but only if you play often enough.” Some of these new contestants play 30 online tournaments a night, compared to the traditional players, who might play 30 live tournaments a year. The online players’ advantage thus lies in their seemingly unpredictable behavior in live tournaments.
[Another former champion] called today’s poker tournaments a “crapshoot.”
Discussion
1. What would you consider the 20 key probabilities?
2. What strategy might you adopt in “betting big on once-in-a-blue-moon odds” if you had compiled statistics on your success from doing it in multiple online games?
Submitted by Margaret Cibes
Miscellaneous studies
“Beer and Mosquitoes”
Research based on 43 West African men and 4,300 mosquitoes concluded that “mosquitoes preferred the odor of beer drinkers to outdoor air by a margin of nearly 2 to 1.”
For a brief summary of results, see this topic[6] as one section of Jeremy Singer-Vine’s May 4 summary of current research projects, a regular feature of The Wall Street Journal. For the full May 2010 report of results, see “Beer Consumption Increases Human Attractiveness to Malaria Mosquitoes”, which includes an ethics statement, as well as methodology and statistics.
“The Falling Time Cost of College”
Research[7] based on a December 2009 study of the “academic time investment” of full-time U.S. college students for the period 1961-2003 showed a decline from 40 to 27 hours week. The report includes discussions of framing effects, representativeness, and composition effects. The authors have another December 2009 report on this topic, “Leisure College, USA”.
"Mathematicians Take On Luck"
Carl Bialik follows up[8], on an earlier related Wall Street Journal article described in Chance News 63, "Lucky charms and disappointing journalism".
“Piano stairs”
Research[9] showed that the redesign of a subway station’s stairs led to more people choosing the stairs than the escalator. Enjoy the YouTube video!
Submitted by Margaret Cibes
A probability puzzle
The death at age 95 of the legendary puzzler Martin Gardner on May 22, 2010 resulted in many interesting stories that were inspired by his work. Here is one of these.
Alex Bellos, writing in the
May 24 New Scientist gives us a probability puzzle
that appeared in one of the famous Gatherings for Gardner. It was proposed by Gary Foshee.
Here is the puzzle.
I have two children
One is a boy born on Tuesday.
What is the probability that I have two boys?
Discussion Question
What is this probability?
(Note: additional discussion and comments can be found on Andrew Gelman's blog).
Submitted by Laurie Snell
The follow comment was added by Emil Friedman:
Would the following be a good way to explain the general concepts discussed by Prof Gelman?
Let's start with no knowledge except that he has two children and with the simplification that boys and girls are equally probable. (Or make it I flipped two coins. One was a head and that flip was done on a Tuesday.) In that case the probability of two boys (or two heads) is 1/4. Now we are given more information. We are told that one of the flips was a head. That makes TT impossible so the probability of 2 heads rises to 1/3.
We can use the above to reinforce what they have presumably learned earlier. If we are going to calculate probability by counting we need each of things we are counting to have equal probability. HH, HT, and TH have equal probability. HH and one of each do not.
A major idea to get across is that we started with minimal information and calculated an unconditional probability of 1/4. Adding additional information changed the probability and we call that "conditional probability".
The Tuesday information seems irrelvant at first, but consider that after eliminating the girl, girl families there are three equally likely possibilities for families whose first child was born on Tuesday and whose second child was born on Wednesday:
Tuesday boy, Wednesday boy Tuesday boy, Wednesday girl Tuesday girl, Wednesday boy
Of those three, girl then boy is eliminated but two boys are not eliminated. That increases the conditional probability of two boys. The same holds true for other day-of-week pairings except for the families where both or neither were born on Tuesday.
The most straightforward way to use the Tuesday information is tedious. We write down all the possibilities, and we have to make sure the students see that head (or boy) on Wed followed by tail (or girl) on Tuesday is not consistent with the extra information. That calculation is tedious but some calcuations are. We now point out that the extra information about Tuesday changes the probability from 1/3 to 13/27. Since 13/27 is not an intuitive number we also need to point out that it's quite a bit more than 1/3 and is almost 1/2.
An alternate method to bring in the Tuesday information would be to build on the Tuesday then Wednesday families. We could say that there are 11 other analogous families. Each of those 12 has a 50/50 chance of having two boys. There are also the Tuesday, Tuesday families who have a 1/3 chance of having two boys. But I suspect that the alternate approach is easier to mess up.
We might want to add one more thing that Prof Gelman would probably agree with. This particular problem is not a practical one, but (1) conditional probability comes up in many different fields of study (chemistry, physics, engineering, pharmaceutical manufacturing, etc), (2) thinking through this is good training for using conditional probability, (3) problem solving is important to everyone and thinking through this helps one hone his or her problem solving skills.