Chance News 83

From ChanceWiki
Jump to navigation Jump to search

Quotations

“A poll is not laser surgery; it’s an estimate.”

ABC News polling director in “MOE and Mojo”
ABC Blogs, December 3, 2007

Submitted by Margaret Cibes


"The most famous result of Student’s experimental method is Student’s t-table. But the real end of Student’s inquiry was taste, quality control, and minimally efficient sample sizes for experimental Guinness – not to achieve statistical significance at the .05 level or, worse yet, boast about an artificially randomized experiment."

--Stephen T. Ziliak, in W.S. Gosset and some neglected concepts in experimental statistics: Guinnessometrics II

(Ziliak is the co-author of The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives)

Submitted by Bill Peterson


“[W. S. Gossett] wrote to R. A. Fisher of the t tables, "You are probably the only man who will ever use them (Box 1978)."

“[W]e see the data analyst's insistence on ‘letting the data speak to us’ by plots and displays as an instinctive understanding of the need to encourage and to stimulate the pattern recognition and model generating capability of the right brain. Also, it expresses his concern that we not allow our pushy deductive left brain to take over too quickly and perhaps forcibly produce unwarranted conclusions based on an inadequate model.”

George Box in “The Importance of Practice in the Development of Statistics”
Technometrics, February 1984

Thomas L. Moore recommended this article in an ISOSTAT posting. (It is available in JSTOR.)


We are familiar with George Box’s famous statement: “All models are wrong but some are useful.” Here is another variant, cited in Wikipedia:

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

Submitted by Margaret Cibes

(Note. For an interesting discussion revisiting this theme, see All models are right, most are useless on Andrew Gelman's blog, 4 March 2012).


“There are two kinds of statistics: the kind you look up and the kind you make up.”

--attributed to Rex Stout, American writer (1886 - 1975)

Submitted by Paul Alper



“Definition of Statistics: The science of producing unreliable facts from reliable figures.”

"The only science that enables different experts using the same figures to draw different conclusions."

--attributed to Evan Esar, American humorist (1899–1995)

Submitted by Paul Alper


“Science involves confronting our `absolute stupidity'. That kind of stupidity is an existential fact, inherent in our efforts to push our way into the unknown. …. Focusing on important questions puts us in the awkward position of being ignorant. One of the beautiful things about science is that it allows us to bumble along, getting it wrong time after time, and feel perfectly fine as long as we learn something each time. …. The more comfortable we become with being stupid, the deeper we will wade into the unknown and the more likely we are to make big discoveries.”

UVa scientist Martin Schwartz in “The importance of stupidity in scientific research”
Journal of Cell Science, 2008

Submitted by Margaret Cibes

Forsooth

“In the first four months [at the new Resorts World Casino New York City], roughly 25,000 gamblers showed up every day, shoving a collective $2.3 billion through the slots and losing $140 million in the process. …. Resorts World offers more than 4,000 slot machines, but thanks to state law, there are no traditional card tables.”

“The Gamblers’ New Game”
The Wall Street Journal, February 18, 2012

Submitted by Margaret Cibes


Elderly drivers.jpg

“Drivers 85 and older still have a higher rate of deadly crashes than any other age group except teenagers.”

(The article also describes two women who have learned to "compensate" for their macular degeneration in various ways - not necessarily welcome news!)

“Safer Over 70: Divers Keep the Keys”
The Wall Street Journal, February 29, 2012

Submitted by Margaret Cibes


PieChartMost.png

See the "Observation" at the top of the chart.

“The Meaning of most”, downloaded from Junk Charts, March 1, 2012

originally cited in “Mobile vs. Desktop”, KISSmetrics

Submitted by Margaret Cibes


“Here is the rub: Apple is so big, it’s running up against the law of large numbers. Also known as the golden theorem, with a proof attributed to the 17th-century Swiss mathematician Jacob Bernoulli, the law states that a variable will revert to a mean over a large sample of results. In the case of the largest companies, it suggests that high earnings growth and a rapid rise in share price will slow as those companies grow ever larger.”

James Stewart in “Confronting a Law of Limits”
The New York Times, February 24, 2012

Bill Peterson found Andrew Gelman’s comments[1] about this article.

Submitted by Margaret Cibes

Kaiser Fung on Minnesota’s ramp meters

A number references to Kaiser Fung’s book, Numbers Rule Your World, appear in Chance News 82. From a Minnesotan’s point of view, however, the most important topic he discusses is not hurricanes, not drug testing, and not bias in standardized testing. Rather, the most critical issue is ramp metering as a means of improving traffic flow, relieving congestion and reducing travel time on Minnesota highways. “Industry experts regard Minnesota’s system of 430 ramp meters as a national model.”

Unfortunately, “perception trumped reality.” An influential state senator, Dick Day, now a lobbyist for gambling interests, “led a charge to abolish the nationally recognized program, portraying it as part of the problem, not the solution.”

Leave it to Senator Day to speak the minds of “average Joes”--the people he meets at coffee shops, county fairs, summer parades, and the stock car races he loves. He saw ramp metering as a symbol of Big Government strangling our liberty.

In the Twin Cities, drivers perceived their trip times to have lengthened [due to the ramp meters] even though in reality they have probably decreased. Thus, when in September 2000, the state legislature passed a mandate requiring MnDOT [Minnesota Department of Transportation] to conduct a “meters shutoff” experiment [of six weeks], the engineers [who devised the metering program] were stunned and disillusioned.

To make a long story short, when the ramp meters came back on, it turns out that:

[T]he engineering vision triumphed. Freeway conditions indeed worsened after the ramp meters went off. The key findings, based on actual measurements were as follows:

  • Peak freeway volume dropped by 9 percent.
  • Travel times rose by 22 percent, and the reliability deteriorated.
  • Travel speeds declined by 7 percent.
  • The number of crashes during merges jumped by 26 percent.

“The consultants further estimated that the benefits of ramp metering outweighed costs by five to one.” Nevertheless, the-above objective measures had to continue to battle subjective ones:

Despite the reality that commuters shortened their journeys if they waited their turns at the ramps, the drivers did not perceive the trade-off to be beneficial; they insisted that they would rather be moving slowly on the freeway than coming to a standstill at the ramp.

Accordingly, the engineers decided to modify the optimum solution to take into account driver psychology. “When they turned the lights back on, they limited waiting time on the ramps to four minutes, retired some unnecessary meters, and also shortened the operating hours.” Said differently, the constrained optimization model the engineers first considered left out some pivotal constraints.

Discussion

1. Do a search for “behavioral economics” to see the prevalence of irrational perceptions and subjective calculations in the economic sphere.

2. Fung discusses an allied, albeit inverse, problem of waiting-time misconception. This instance concerns Disney World and its popular so-called FastPass as a means of avoiding queues. According to Fung

Clearly, FastPass users love the product--but how much waiting time can they save? Amazingly, the answer is none; they spend the same amount of time waiting for popular rides with or without FastPass!..So Disney confirms yet again that perception trumps reality. The FastPass concept is an absolute stroke of genius; it utterly changes perceived waiting times and has made many, many park-goers very, very giddy.

3. An oft-repeated and perhaps apocryphal operations research/statistics/decision theory anecdote has to do with elevators in a very large office building. Employees complained about excessive waiting times because the elevators all too frequently seemed to be in lockstep. Any physical solution such as creating a new elevator shaft or installing a complicated timing algorithm would be very expensive. The famous and utterly inexpensive psychological solution whereby perception trumped reality was to put in mirrors so that the waiting time would seem less because the employees would enjoy admiring themselves in the mirrors. Note that older and more benighted operations research/statistics/decision theory textbooks would have used the word “women” instead of “employees” in the previous sentence.

4. A very modern and frustrating example of perception again trumping reality can often be observed in supermarkets which have installed self-checkout lanes without placing a limit on the number of items per shopper. In order to avoid a line at the regular checkout, some shoppers with an extremely large number of items will often choose the self-checkout and take much longer to finish than if had they queued at the regular checkout. Explain why said shoppers psychologically might prefer to persist in that behavior despite evidence to the contrary. Why don’t supermarkets simply limit the number of items per customer at self-checkout lanes?

Submitted by Paul Alper

Don’t forget Chebyshev

Super Crunchers, by Ian Ayres, Random House, 2007

When I taught at Stanford Law School, professors were required to award grades that had a 3.2 mean. …. The problem was that many of the students and many of the professors had no way to express the degree of variability in professors’ grading habits. …. As a nation, we lack a vocabulary of dispersion. We don’t know how to express what we intuitively know about the variability of a distribution of numbers. The 2SD [2 standard-deviation] rule could help give us this vocabulary. A professor who said that her standard deviation was .2 could have conveyed a lot of information with a single number. The problem is that very few people in the U.S. today understand what this means. But you should know and be able to explain to others that only about 2.5 percent of the professor’s grades are above 3.6. [pp. 221-222]

Discussion

1. Suppose that a professor's awarded grades had mean 3.2 and SD 0.2.
(a) Under what condition could we say that “only about 2.5 percent of the professor’s grades are above 3.6”?
(b) Without that condition, what could we say, if anything, about the percent of awarded grades outside of a 2SD range about the mean? About the percent of awarded grades above 3.6?
2. Suppose that a professor's raw grades had mean 3.2 and SD 0.2. Do you think that this would be a realistic scenario in most undergraduate college classes? In most graduate-school classes? Why or why not?
3. How could a professor construct a distribution of awarded grades with mean 3.2 and SD 0.2, based on raw grades, so that one could say that only about 2.5 percent of the awarded grades are above 3.6? What effect, if any, could that scaling have had on the worst – or on the best – raw grades?

Submitted by Margaret Cibes

Critique of women-in-science statistics

“Rumors of Our Rarity are Greatly Exaggerated: Bad Statistics About Women in Science”
by Cathy Kessel, Journal of Humanistic Mathematics, July 2011

Based on her apparently extensive and detailed study of reports about female-to-male ratios with respect to STEM abilities/careers, Kessel discusses three major problems with the statistics cited in them, as well as with the repetition of these questionable figures in subsequent academic and non-academic reports.

Whatever their origins, statistics which are mislabeled, misinterpreted, fictitious, or otherwise defective remain in circulation because they are accepted by editors, readers, and referees.

“The Solitary Statistic.” A 13-to-1 boy-girl ratio in SAT-Math scores has been widely cited since it appeared in a 1983 Science article. That ratio was based on the scores of 280 seventh- and eighth-graders who scored 700 or above on the test over the period 1980-83. These students were part of a total of 64,000 students applying for a Johns Hopkins science program for exceptionally talented STEM-potential students. Kessel faults the widespread references to this outdated data, among other issues, and she cites more recent statistics at Hopkins and other such programs, including a ratio as low as 3 to 1 in 2005.

“The Fabricated Statistic.” A “finding” that “Women talk almost three times as much as men” was published in The Female Brain in 2006. This was supposed to explain why women prefer careers which allow them to “connect and communicate” as opposed careers in science and engineering. Kessel outlines some issues that might make this explanation questionable.

“The Garbled Statistic.” An example from “The Science of Sex Differences in Science and Mathematics,” published in Psychological Science in the Public Interest in 2007, was a report that women were “8.3% of tenure-track faculty at ‘elite’ mathematics departments.” A 2002 survey produced similar math data; that survey was based on the “top 50 departments.” These and other reports generally reported only the aggregate figure and not any of the raw data by rank. Kessel gives other examples in which raw data summary tables (which she had requested and received) would have been helpful to interpreting results.

Although noticing mistakes may require numerical sophistication or knowledge of particular fields, accurate reporting of names, dates, and sources of statistics does not take much skill. At the very least, authors and research assistants can copy categories and sources as well as numbers. Editors can (and should) ask for sources.

Discussion

1. Is there anything random about the group of students applying to a university’s program for talented students - or about the top SAT-M scorers in that group? Why are these important questions?
2. Kessel quotes a statement that has been reported a number of times: “Women use 20,000 words per day, while men use 7,000." How do you think the researchers got these counts?
3. Why might it be important to consider academic rank as a variable in analyzing the progress, or lack thereof, of women in obtaining university positions?
4. Why might it be important to know more about the sponsorship of these studies – researcher affiliations, funding, etc.?

Submitted by Margaret Cibes, based on a reference in March 2012 College Mathematics Journal

Ethics study of social classes

“Study: High Social Class Predicts Unethical Behavior”
The Wall Street Journal, February 27, 2012

Here is an abstract of the study[2] referred to in the article:

Seven studies using experimental and naturalistic methods reveal that upper-class individuals behave more unethically than lower-class individuals. In studies 1 and 2, upper-class individuals were more likely to break the law while driving, relative to lower-class individuals. In follow-up laboratory studies, upper-class individuals were more likely to exhibit unethical decision-making tendencies (study 3), take valued goods from others (study 4), lie in a negotiation (study 5), cheat to increase their chances of winning a prize (study 6), and endorse unethical behavior at work (study 7) than were lower-class individuals. Mediator and moderator data demonstrated that upper-class individuals’ unethical tendencies are accounted for, in part, by their more favorable attitudes toward greed.

See also "Supporting Information", published online in Proceedings of the National Academy of Sciences of the USA, February 27, 2012.

Discussion

1. If you were going to write an article about this study, and you had access to the entire report, what would be the first, most basic, information you would want to provide to your readers about the “class” categories referred to in the abstract?
2. The article indicates that the sample sizes for the first three experiments were “250,” “150 drivers,” and “105 students.” Besides the relatively small sample sizes, what other issues can you identify as a potential problems in making any inference about ethics from these experimental results?

Submitted by Margaret Cibes