Chance News 75
Quotations
"I suspect the amygdala did not evolve to store odds ratios and heterogeneity P scores, but when an adverse event has prompted me to review the literature, I come away with a clearer understanding. There’s nothing like a baby free-floating in the abdomen to drive home the lessons from a prospective study of risk factors for uterine rupture. And that clarity of understanding will serve the next at-risk patient I encounter."
Submitted by Steve Simon.
"On the nature of common errors. ...[M]ost of these mixups involve simple switches or offsets. These mistakes are easy to make, particularly if working with Excel or if working with 0/1 labels instead of names... [A] mixup in annotation affecting roughly a third of the data which was driven by a simple one-cell deletion from an Excel file coupled with an inappropriate shifting up of all values in the affected column only."
Forensic bioinformatics and reproducible research in high-throughput biology .
Submitted by Paul Alper
“As every student of statistics quickly learns, statistical significance is not the same as substantive significance. …. The latter is often addressed by going beyond the question of whether an association exists to ask how strong the relationship is. But there is an even more basic problem in this case – statistical significance tests are inappropriate for this data analysis. They can determine whether two values derived from a random sample are different enough that they cannot be attributed to chance (sampling error). But the FARS [Fatality Analysis Reporting System] data are not a sample of some larger data set; they represent the entire data set of fatal auto accidents. The authors [of this study] note this problem but choose to use this approach anyway, following ‘official recommendations and our previous practice.’”
STATS, July 14, 2011
Submitted by Margaret Cibes
"Coefficients for physical traits are on the average +0.2--not so high as for personality traits (+0.4) or religion (+0.9), but still significantly higher than zero. For a few physical traits the correlation is even higher that 0.2--e.g., an astonishing 0.61 for length of middle finger. At least unconsciously, people care more about their spouse's middle-finger length than about his or her hair color and intelligence!"
Submitted by Paul Alper
(Note: See 2D:4D in Chance News 44 for some other musings about the mysteries of finger length distributions.)
Forsooth
Freshman composition, for example, “does not demand that faculty ask existential questions.” Ditto for courses in “Security and Protective Services,” and “Business Statistics.” These are, she says, “fields of study with fairly definitive answers” and it would be hard to argue that they are “essential to civilization.” Those who teach these and similarly vocational subjects “don’t really need the freedom to ask controversial questions in discussing them.”
Naomi Schaefer Riley, The Faculty Lounges: and Other Reasons Why You Won't Get the College Education You Paid For
as quoted in Vocationalism, academic freedom and tenure, by Stanley Fish, New York Times, 11 July 2011
Submitted by Paul Alper
“Each of [spacecraft] Juno’s three [solar-powered] wings is 29 feet long and 9 feet wide, necessary given that Jupiter receives 25 times less sunlight than Earth.”
Marcia Dunn (Associated Press), Solar-powered Jupiter probe brings NASA into ‘new era’
published in The Tennessean, August 2, 2011, page 3A.
Submitted by Emil Posavac
"The Bureau of Labor Statistics (BLS) projects that the U.S. will add 15.3 million new jobs between 2008 and 2018, and a whopping 15 out of 30 jobs with the most projected openings and vacancies will pay wages that are above the national median wage for all workers in the United States."
Christian Hudspeth, "Employees Wanted: 10 Middle-Class Jobs That Are Actually Growing", August 3, 2011 [1]
Submitted by Chris Andrews
Discussion of Ariely
A post in Chance News 74 described Dan Ariely's 2008 book Predictably Irrational: The Hidden Forces That Shape Our Decisions as a great summer read [2], while pointing out that it was not written as an academic work. Paul Alper wrote to say that he had occasion to review the book in the context of some related work, and had identified some statistical concerns. As Paul writes:
Ariely enjoys concocting experiments to demonstrate the irrationality. For example, he finds that satisfaction with a product depends on the price paid for the product--for example Bayer aspirin vs. the identical generic. Or, the enticing but utterly misleading “Free gift” will alter a decision. Reviewers loved his book. Nonetheless, there are some serious shortcomings.
- He invariably gives the average value of one group (e.g., satisfaction of Bayer aspirin users) compared to the other group (e.g., satisfaction of generic aspirin users) but he almost never indicates the variability. Averages alone are meaningless.
- Almost never does he state how many subjects are involved in each arm of a study.
- Almost all of his samples are convenience ones, rather than random samples.
- Almost all of his samples are MIT students, but his implicit inference is to the world at large.
- His examples of predictable irrationality appear unfailingly successful leading me to suspect a “file-drawer” issue--experiments which showed nothing in particular or the negative of what he theorizes, are put aside and not counted.
The earlier post also noted that Ariely has a new book, The Upside of Irrationality: The Unexpected Benefits of Defying Logic at Work and at Home. This was reviewed by the New York Times; you can find a link to the review and read Ariely's reaction on his blog here. He notes that that he had consciously adopted a more conversational style in the book, and that this had drawn some criticism from the Times. He invited readers to submit their own opinions on this. Readers come down on both sides, and it is interesting to read the comments. One statistically minded reader wrote:
Of course your [sic] irrationally asking for personal thoughts in comments instead of a (slightly) more accurate poll or a (very) accurate scientific survey.
Submitted by Bill Peterson, based on a message from Paul Alper
The perils of genetic testing
How Bright Promise in Cancer Testing Fell Apart by Gina Kolata, The New York Times, July 7, 2011.
We have seen a lot of advances in genetics recently, and there has been the hope that these would translate into better clinical care. But making the bridge from the laboratory to clinical practice has been much more difficult than expected. A program at Duke, for example, was supposed to identify weak spots in a cancer genome so that drugs could be targeted to that weak spot rather than just trying a range of different drugs in sequence.
But the research at Duke turned out to be wrong. Its gene-based tests proved worthless, and the research behind them was discredited. Ms. Jacobs died a few months after treatment, and her husband and other patients’ relatives are suing Duke.
The problems at Duke are not an isolated problem.
The Duke case came right after two other claims that gave medical researchers pause. Like the Duke case, they used complex analyses to detect patterns of genes or cell proteins. But these were tests that were supposed to find ovarian cancer in patients’ blood. One, OvaSure, was developed by a Yale scientist, Dr. Gil G. Mor, licensed by the university and sold to patients before it was found to be useless.
The other, OvaCheck, was developed by a company, Correlogic, with contributions from scientists from the National Cancer Institute and the Food and Drug Administration. Major commercial labs licensed it and were about to start using it before two statisticians from M. D. Anderson discovered and publicized its faults.
The two statisticians, Keith Baggerly and Kevin Coombes, have made a career of debunking medical claims. In 2004, they (along with another M.D. Anderson statistician, Jeffrey Morris, published a paper that demonstrated serious flaws in the use of proteomic mass spectra to identify early ovarian cancer from normal tissue. The complex method proposed by Petrocoin et al in 2002 was apparently an artefact of equipment drift that would have been prevented if the original researchers had taken simple steps like randomizing the order of analysis of cancer and normal tissues.
Baggerly and Coombes had also found problems with the data supporting the Duke test.
Dr. Baggerly and Dr. Coombes found errors almost immediately. Some seemed careless — moving a row or a column over by one in a giant spreadsheet — while others seemed inexplicable. The Duke team shrugged them off as "clerical errors."
Even though Baggerly and Coombes published a critique in a statistics journal, the Duke team continued to promote their genetic test. It was something else entirely that caused the problems with the Duke test to be treated seriously by the broader research community.
The situation finally grabbed the cancer world’s attention last July, not because of the efforts of Dr. Baggerly and Dr. Coombes, but because a trade publication, The Cancer Letter, reported that the lead researcher, Dr. Potti, had falsified parts of his résumé. He claimed, among other things, that he had been a Rhodes scholar.
Researchers in this area have a new-found sense of humility.
With such huge data sets and complicated analyses, researchers can no longer trust their hunches that a result does — or does not — make sense.
Submitted by Steve Simon
In measuring hunger, quality may be more important than quantity
A Taste Test for Hunger Robert Jensen and Nolan Miller, The New York Times, July 9, 2011.
The traditional measure of global hunger is the number of calories consumed. If you consume less calories than you need, then you are classified as hungry. But this had led to some paradoxical results. There is an alternative way of measuring hunger. You need to
start with a baseline, namely the share of calories people get from the cheapest foods available to them: typically staples like rice, wheat or cassava. We call this the “staple calorie share.” We measure how many calories people get from these low-cost foods and how much they get from more expensive foods like meat. The greater the share of calories they receive from the former, the hungrier they are.
The rationale for this is simple.
Imagine you are a poor consumer in a developing country. You have very little money in your pocket, not enough to afford all the calories you need. And suppose you have only two foods to choose from, rice and meat. Rice is cheap and has a lot of calories, but you prefer the taste of meat. If you spent all your money on meat, you would get too few calories. You might do this every so often, but usually you would spend almost all of your money on rice; when faced with true hunger, taste is a luxury you can’t afford.
But suppose you had a bit more money. You would probably add some meat to your diet, because now you can afford to do so while still getting the calories you need. You might even like meat so much that you start to switch away from rice even if you haven’t quite met your complete calorie needs, as long as you aren’t too far below.
The authors argue that this approach removes some of the variations associated with the traditional calorie count method: some people need more calories than others, for example. They illustrate how the new measure of hunger performs better than the traditional measure in explaining trends in hunger in China from 1991 to 2001.
Submitted by Steve Simon
Questions
1. What are some other ways that you might assess hunger on a global scale?
2. What are some possible pitfalls to the use of this new measure of hunger.
More on US visa lottery
"Plaintiffs Lose Fight Over Visa Lottery"
by Miriam Jordan, The Wall Street Journal, July 15, 2010
Applicants who had been notified that they had won a US visa in the May 2011 lottery drawing have lost a federal court case in which they tried to stop a new drawing of the 2011 lottery. (See Chance News 74[3].)
It turns out that the May sampling process had violated a legal requirement of randomness: "a computer error caused 90% of the 50,000 winners to be selected from the first two days of applications instead of from the entire 30-day registration period."
The court hearing earlier this week focused on the meaning of "random." Lawyers for the plaintiffs argued that they had been randomly chosen and that they didn't know that by filling their applications in the first two days that they would gain any advantage. They also contended that the "outcome was indeed not uniform, but nevertheless still random as required by law."
The State Department argued that the results didn't represent a fair, random selection.
This example reminds me of the 1970 Vietnam War draft lottery issues related to sampling in an inadequately stirred-up jar of birthday capsules. For this first draft lottery since 1940, 366 balls were dumped from a box into a glass jar, from which balls were drawn. The result was that higher numbers (fall birthdays) were somewhat more likely to have been chosen first and, consequently, their holders somewhat more likely to have been drafted into military service. See a 10-minute video of the actual Fall 1969 drawing for the 1970 draft, “CBS news draft lottery nov 1969”, and/or Norton Starr's 1997 article, "Nonrandom Risk: The 1970 Draft Lottery", for the raw data and a statistical discussion of it.
Submitted by Margaret Cibes
Questions
1. The article does not specify the daily distribution of applications over the 30-day application period.
(a) If the daily distribution of applications had been uniform over the 30 days, what percent of the 50,000 winners would you have expected from a random selection process?
(b) If the distribution of applications had been skewed so that an overwhelming majority of them had been received during the first two days, could the resulting 22,000 "winners" have been expected?
2. Would you consider the May result an "error"? Would you attribute the "error" to a "computer"?
3. Might you have agreed with the plaintiffs that the May outcome was "still random"?
Note: See Diversity Visa Lottery 2011 Results for official figures and other information from the U.S. State Department.
More cancer risk for taller women?
“Cancer Risk Increases With Height”
by Sten Stovall, The Wall Street Journal, July 21, 2011
British researchers studied 1.3 million middle-aged women in Britain during the period 1996-2001 and reported their results in The Lancet Oncology, August 2011. (See an abstract of the article, “Height and cancer incidence in the Million Women Study”.) Their finding – that height was positively correlated with cancer risk – is apparently consistent with the results of earlier studies around the world.
Stovall notes in the article that height depends directly upon factors such as diet and infections in childhood, growth hormone levels, and genetic factors, and hence, in a commentary accompanying the article, another researcher noted:
In the future, researchers need to explore the predictive capacities of direct measures of nutrition, psychosocial stress, and illness during childhood, rather than final adult height.
A blogger[4] suggested that “a possible reason for any correlation between height and cancer risk could be simply that the taller you are, the bigger your organs are and the more cells they have; hence, the greater the probability that one of those cells will become cancerous.”
Submitted by Margaret Cibes
Questions
1. Stovall uses the term “risk,” while the journal article abstract contains the term “relative risk.” What’s the difference, if any? Does it change his interpretation of the research results?
2. What statistical term would you use to refer to the other “direct measures,” with respect to inferring causation in the relationship between height and cancer risk, or relative risk?
3. Responding to the blogger, one of the authors of the study responded that they did discuss the issue of body mass in their Lancet article. If you can access the full text of the article, look it up and see if you can tell whether the blogger’s point was valid.
Airline’s random boarding strategy
“Airlines Go Back to Boarding School to Move Fliers Onto Planes Faster”
by Scott McCartney, The Wall Street Journal, July 21, 2011
… American Airlines undertook a two-year study to try and speed up boarding. The result: The airline has recently rolled out a new strategy—randomized boarding. Travelers without elite status get assigned randomly to boarding groups instead of filing onto planes from back to front.
The article discusses problems with other boarding strategies and reports that AA expects to be able to “shave three to four minutes off the average boarding time of 20 to 25 minutes.”
Submitted by Margaret Cibes
A "Datelab" graphic
Date Lab: By the numbers
Washington Post, 28 July 2011
These data go with a story in the Lifestyle section. One of the opening paragraphs tells it all:
Spark, chemistry, x-factor, zing — curse you, whatever you are. For five years, the editors of Date Lab have been sending single Washingtonians out on blind dates and recounting the highs (and lows) for our readers. And after wining and dining 250 couples — 20-somethings, senior citizens, straight, gay, rich, not-so-rich, and nearly every race on the Census form — the one thing we know is this: No matter how perfect a pair seems on paper, no matter how sure we are that the spark will ignite, there’s no predicting when that indefinable romantic catalyst will show itself.
The graphic summarizes the experiences of respondents to a survey about their experience
http://community.middlebury.edu/~wpeterso/Chance_News/images/CN75_datelab.png
Click here for full-size image from Washington Post.
Alas, the disclaimer: "And just like our matching process, it was unscientific. Results reflect responses from about 35 percent of past participants."
Submitted by Paul Alper