Chance News 51
Quotations
Passion is inversely proportional to the amount of real information available.
Submitted by Margaret Cibes
Re remark about the “attitudes and prejudices of the famous philosophers” in Chance News 49 [1], a 1924 Virginia sterilization law (not repealed until 1976) was upheld by the Supreme Court in Buck v. Bell in 1927, with Justice Oliver Wendell Holmes Jr. writing the majority opinion.
“This woman [Carrie Bell] got railroaded. And one of the giants of the Supreme Court was driving the train.”
USA TODAY, June 24, 2009
Submitted by Margaret Cibes
Much of the fascination of statistics lies embedded in our gut feeling--and never trust a gut feeling--that abstract measures summarizing large tables of data must express something more real and fundamental than the data themselves. (Much professional training in statistics involves a conscious effort to counteract this gut feeling.) The technique of correlation has been particularly subject to such misuse because it seems to provide a path for inferences about causality (and indeed it does, sometimes--but only sometimes).
Mismeasure of Man, second edition
Sibmotted by Paul Alper
Forsooths
…. Let’s look at basketball …. The 1993 college basketball playoffs started with 64 teams. Of these, 15 were from schools with accredited library education programs.
That’s an amazing statistic by itself, when you consider that there are only slightly more than three times that many library education programs in the United States, and that some of these don’t compete athletically in Division I. However, those 15 schools also went on to win 28 of the 63 games played, while losing only 14. The reason that there were only 14 losses is that the championship school has a library education program. So does the runnerup. Indeed, what sportswriters call the Final Four included three schools with accredited library education programs.
…. Do I believe a single word of what I have just written? Of course not, although I have seen “research” studies … for which the hypotheses were no more credible.
"Is There a Correlation Between Library Education Programs and Athletic Success?
Library Journal, August 1993
Submitted by Margaret Cibes
During The Daily Show on June 30, TV’s Jon Stewart gave out RIPPY (Rest-In-Peace) Awards [3] to television commentators for various aspects of their coverage of Michael Jackson’s death.
It’s the award for attempts at mind-blowing analysis, and the winner is Extra’s Carlos Diaz [who stated on June 25]:
People don’t realize the proximity of this whole thing. Farrah Fawcett passed away 5 hours, almost to the minute that Michael Jackson passed away 5 miles away. Ed McMahon passed away 48 hours previous [sic] at the same hospital that Michael Jackson passed away.
Submitted by Margaret Cibes
Credit utilization ratio
“Is Your Credit Too Good? Why lenders are punishing those who borrow too little and always pay on time”
by Cybele Weisser, TIME, June 22, 2009
[T]he formula for determining credit scores … looks at something called your “utilization ratio,” the total amount of credit you use vs. the amount you have available. If you have $25,000 worth of available credit and you put $5,000 on your cards every month, your utilization ratio is a healthy … 20%. But cut down that credit line to $10,000 and suddenly your ratio jumps to 50%, making you look pretty overextended.
Submitted by Margaret Cibes
Student-loan repayment for congressional staffers
“Scrutiny Grows as U.S. Pays Staffers’ Student Loans”
by Elizabeth Williamson, The Wall Street Journal, June 25, 2009
The House and Senate will spend $18 million this year repaying staffers' student loans. Last year, ... House lawmakers nearly doubled what the government can pay for their staffers' college bills. The yearly maximum repayment is $10,000 in fiscal 2009, which ends Sept. 30, up from $6,000 in fiscal 2008, with a lifetime maximum of $60,000, the same as in the executive branch. The House appropriated $13 million in 2009 for the program; as of last month, more than 2,200 House employees were getting the money.
Submitted by Margaret Cibes
Measuring excess risk
“EPA study: 2.2M live in areas where air poses cancer risk”
by Brad Heath and Blake Morrison, USA TODAY, June 24, 2009
This article gives a brief report about the National-Scale Air Toxics Assessment for 2002 [4], an EPA study of excess cancer risks from breathing 181 air toxics over an assumed lifetime of 70 years. The EPA updates information about air toxics emissions every three years, after which it conducts an analysis which is reviewed by the states, evaluated for accuracy, and released - apparently a long process.
According to the EPA, the study found 2 million people with an increased cancer risk of greater than 100 in 1 million.
According to the article, the study found air pollution to be a health threat “around major cities … although some of the counties where the air was even worse were in rural areas ….” The worst neighborhood was outside Los Angeles, where the estimated excess cancer risk was “more than 1,200 in 1 million, 34 times the national average.” The article provided no information about rural areas; however, the EPA provides a map [5] of most affected counties.
Discussion
1. How might one measure cancer risk?
2. What does it mean to measure excess, or increased, cancer risk?
3. Why does the EPA measure excess risk over a lifetime? How do you think they identified people who had lived in a region over a lifetime? Would the fact that air pollution levels might change over a lifetime affect any aspect of the study?
4. Estimate the national average excess cancer risk. Is it higher or lower than the EPA’s ceiling of 100 in 1 million? Do you think it makes sense to refer to a national average of excess cancer risk?
5. Referring to the map, are you surprised about any of the locales with the highest excess cancer risk? If so, can you find any potential reason for high excess cancer risks in those locales?
Submitted by Margaret Cibes
Too many cable TV channels?
“Time to Screen Out Unloved Channels”
by Martin Peers, The Wall Street Journal, June 27, 2009
(Full text may only be available to subscribers.)
The author suggests that there are too many TV channels available and that this situation is driving subscriber costs up. He reports that "the average household tuned into only 16 channels of the 118 channels available.” He feels that charging fees in proportion to the sizes of viewing audiences would lower the cost of cable TV.
He says that there is currently the “absence of correlation between the size of the fees paid to individual cable channels and their audiences.” Among non-premium channels, Nickelodeon was the most-watched cable channel in 2008, but its fees were not the highest (10th from the highest). Nickelodeon, with about 1.7 million daily household viewers, also had an annual affiliate revenue of about $300 per household, while Discovery Kids, with only 20,000 daily household viewers, had an annual affiliate revenue of about $1,900 per household.
Submitted by Margaret Cibes
True or false?
One hundred sleuthing statisticians running 100 different tests are about 100 times more likely than a lone investigator to find something fishy.
The Wall Street Journal, July 1, 2009
Submitted by Margaret Cibes
New lottery study
“Want False Hope With That Lottery Ticket?”
by Rick Green, The Hartford Courant, July 3, 2009
A taxpayer-funded study by Spectrum Gaming Group [6] is said to have found “no correlation between lottery sales and poverty.” The study claims that “because most successful lottery retailers were not located in higher poverty neighborhoods, there is no connection between income and ticket sales.”
The Spectrum study contradicts many other studies, including one at Cornell University, where investigators “found ’a strong and positive relationship’ between lottery ticket sales and poverty rates after examining data from 39 states over 10 years.”
The Spectrum study also contradicts a 2002 analysis done by the column’s author, Rick Green, and a colleague. They identified, by zip codes, the locales in which the highest concentrations of winners resided, not the locales in which the highest-selling retailers were located. Not surprisingly, these areas were in the poorest cities of Connecticut.
Submitted by Margaret Cibes
30% chance for rain?
For many, meaning of rain forecast is cloudy at best
USA TODAY, June 24, 2000
Doyle Rice
This news article begins with:
When your local weather forecaster announces that there is a 30% chance of rain tomorrow, not everyone knows what that means. Some think it means 30% of an area will get rain. Others think it will rain for 30% of the day. In fact, of all the forecast terms used by meteorologists, this remains one of the most baffling to the public.
Some people don't understand that the forecaster simply means there's a 30% probability it will rain at some point during the day. Susan Joslyn, a senior lecturer in the psychology department at the University of Washington in Seattle, and colleagues have been studying such confusion.
The article explains the results of their study. There have been many studies like this. The following is one such study which is often referred to.
Misinterpretations of Precipitation.
Bulletin American Meteorological Society, Vol. 61, No 7,
July 1980, p.695-701.
Murphy, Licthenstein, Fischoff and Winkler
We revewed this article in Chance News 3.08
The authors wanted to see if there was a
misunderstanding about the event being predicted, the meaning of probability or both. To test the understanding of the event, subjects were asked if the event being predicted was "rain somewhere in the region", "rain at a particular point in the region" "rain 20% of the time etc. Their answers led the authors to the conclusion that there is considerable misinterpretation on the meaning of the event. On the other hand, the subjects' answers to questions on the possible meaning of "20% chance" led them to conclude that the subjects did understand what the probability itself meant.
I also talked to a couple of meteorologists who stated that it is unlikely that the public could understand what a 20% chance of rain means. Harold Brooks provided the following statement:
According to the National Weather Service Operations Manual,The Probability of Precipitation (PoP) is The likelihood of occurrence (expressed as a percent) of a precipitation event at any given point in the forecast area. The time period to which the PoP applies must be clearly stated (or unambiguously inferred from the forecast wording) since, without this, a numerical PoP value is meaningless.
That is, it is the average point probability within the forecast area and the same PoP is assigned to each point. It can be shown that the PoP is equal to the expected area coverage of the precipitation (Schaefer, J. T. and R. L. Livingston, 1990: Operational implications of the "Probability of Precipitation". Weather. Forecasting, 5,
354-356.).
This brings out fond memories. One of the earliest Chance Courses is described [http://www.dartmouth.edu/~chance/course/Syllabi/mpls/handouts/handouts.html here. Hear we read:
This document consists of the collection of handouts for a two-week summer workshop entitled 'Geometry and the Imagination', being taught by Peter Doyle, Mark Foskey, Joan Garfield, Linda Green, and Laurie Snell at the Geometry Center in Minneapolis, 20 June-1 July, 1991. One of the documents had the following homework:
Read the materials on weather prediction.
Problems
(1) What do you make of all this?
(2) What does Marilyn means when she says, `But rain doesn't obey the laws of chance; instead it obeys the laws of science.'
(3) If the POP is 30% and it rains, was the forecaster correct?
(4) Suppose that Minneapolis gets precipitation 3 days out of 10 over the long haul. Why not report a POP of 30% every single day?
(5) San Diego county is spread out over a large area, comprising the coastal strip and inland valleys, the mountains, and the deserts. Separate forecasts are given for each region. Suppose, however, that the weather bureau computes a single POP for the whole area. On days on which this composite POP is 20%, what is the probability that a randomly selected resident of San Diego county will get rained on?
(6) What do you think is the correct answer to the Reader reader's question?
(7) There are contests to reward the best predictor of the weather. If you were running such a contest, how would you decide the winner?
Readers might like to view two Chance video lectures about weather forecasting: (1) "How are Weather Predictions Determined by the National Weather Service?" [7], by Daniel Wilks, Cornell University; (2) "How are Local Weather Predictions Determined By Local Weather Forecasters?" [8], by Mark Breen, Fairbanks Museum.
Need for evidence
“How to Cut Health-Care Costs: Less Care, More Data”
by Michael Grunwald, TIME, June 29, 2009
According to the author, President Obama has identified two major obstacles to more efficient health care delivery, the first of which is the current “fee-for-service” system in which hospitals and doctors are rewarded financially for ordering more tests and carrying out more procedures.
The other big barrier is information: evidence-based medicine is hard to practice without evidence. …. So the things we know are dwarfed by the things we don’t know. …. [The] Mayo [Clinic] … has an institutional obsession with evidence-based medicine, using electronic records for in-house effectiveness research, constantly monitoring its doctors on everything from infection rates to operating times to patient outcomes, minimizing the art of medicine and maximizing the science. “We try to drive out variation wherever we can,” says Charles (Mike) Harper, a neurologist who oversees Mayo’s clinical practice in Rochester. “Practicing medicine is not the same as building Toyotas, but you can still standardize. Uncertainty shouldn’t be an excuse to ignore data.”
Submitted by Margaret Cibes
Billions of almost-zeros
“Priced to Sell”
by Malcolm Gladwell, The New Yorker, July 6 & 13, 2009
In his new book, Free: The Future of a Radical Price, author Chris Anderson states:
Distribution [of online videos] is now close enough to free to round down. Today, it costs about $0.25 to stream one hour of video to one person. Next year, it will be $0.15. A year later it will be less than a dime. Which is why YouTube’s founders decided to give it away.
In this book review, Malcolm Gladwell notes, however:
Although the magic of Free technology means that the cost of serving up each video is “close enough to free to round down,” “close enough to free” multiplied by seventy-five billion is still a very large number.
Submitted by Margaret Cibes
Love (food) and marriage?
“First Comes Love, Then Comes Obesity?”
by Bonnie Rochman, TIME, July 6, 2009
This article discusses a University of North Carolina study of the relationship between romance and obesity. Published in the July issue of Obesity , the study found that “married individuals are twice as likely to become obese as are people who are merely dating.” The study “tracked changes over a handful of years in the weight and relationship status of 6,949 individuals.” The effect of increased risk of obesity appears to have affected women more than men, for folks who lived together, whether married or not.
Submitted by Margaret Cibes
When in the course of human events ...
“Two Centuries On, a Cryptologist Cracks a Presidential Code”
by Rachel Silverman, The Wall Street Journal, July 2, 2009
The author reports that Lawren Smithline, a mathematician at the Center for Communications Research in Princeton, NJ, has deciphered a coded message in an 1801 letter to President Thomas Jefferson from a math professor at the University of Pennsylvania.
The code [9], was not a “simple substitution cipher,” in which one letter of the alphabet is replaced with another, and so could not be cracked using ordinary frequency analysis. Nor was the code a “nomenclator,” which is a “catalog of numbers, each standing for a word, syllable, phrase or letter,” or a “wheel cipher,” which involves letters inscribed on the edge of a wheel that can be turned to scramble words.
Mr. Patterson claimed “the utter impossibility of deciphering” his code, which involved a grid of the text, broken into sections. He estimated that a de-coder might have to try “upwards of ninety millions of millions” of potential combinations in order to solve his coded message to Jefferson.
Dr. Smithline analyzed Jefferson’s State of the Union addresses and counted the frequency of every possible pair of letters in the speeches. He used a “dynamic programming” algorithm to test some “educated guesses.” Fewer than 100,000 calculations were needed to solve the cipher.
The following message emerged, a “little joke on Thomas Jefferson,” according to Dr. Smithline:
In Congress, July Fourth, one thousand seven hundred and seventy six. A declaration by the Representatives of the United States of America in Congress assembled. When in the course of human events ....
"Patterson played this little joke on Thomas Jefferson," says Dr. Smithline. "And nobody knew until now."
Two bloggers[10] commented.
(a) Ms. Silverman should have mentioned the fact that she picked up the story from the March-April 2009 edition of American Scientist, "A Cipher to Thomas Jefferson" [11].
(b) If you'd like to read a fun story in which involves a replacement code, frequency analysis, and buried treasure, see Poe's short story, "The Gold-Bug" [12].
Submitted by Margaret Cibes
Joltin’ Joe
“The Triumph of the Random”
by Leonard Mlodinow, The Wall Street Journal, July 3-5, 2009
This article discusses “streaks,” especially the 56 consecutive baseball games in which Joe DiMaggio had at least one hit, and people’s intuitions about them. The author [13] is a Caltech professor, who wrote The Drunkard’s Walk: How Randomness Rules Our Lives.
[R]andom processes do display periods of order. In a toss of 100 coins, for example, the chances are more than 75% that you will see a streak of six or more heads or tails, and almost 10% that you’ll produce a streak of 10 or more. As a result a streak can look quite impressive even if it is due to nothing more than chance. .... A few years ago Bill Miller of the Legg Mason Value Trust Fund was the most celebrated fund manager on Wall Street because his fund outperformed the broad market for 15 years straight. It was a feat compared regularly to DiMaggio’s, but if all the comparable fund managers over the past 40 years had been doing nothing but flipping coins, the chances are 75% that one of them would have matched or exceeded Mr. Miller’s streak.
The author argues that DiMaggio’s streak could have occurred by chance alone, based on DiMaggio’s lifetime batting average of 0.325, and the fact that hundreds of players had been trying for such a streak over a hundred years.
The author points out that there are many factors involved in analyzing baseball streaks, e.g., variations in batting averages over time. Samuel Arbesman and Stephen H. Strogatz, of Cornell, carried out a 10,000-case computer simulation based on baseball players’ actual statistics from each year 1871-2005. They found that streaks ranged from 39 games to 109 games, with 42% having streaks of DiMaggio’s length or longer.
In discussing people’s misconceptions about streaks, the author cites Thomas Gilovich, Robert Vallone, and Amos Tversky’s paper, “The Hot Hand in Basketball: On the Misperception of Random Sequences.” [14]
Other resources not cited in this article include Thomas Gilovich’s 1998 Chance video lecture "Streaks in Sports" [15], and Stephen Jay Gould’s 1988 book review "The Streak of Streaks" [16].
Two bloggers [17] commented:
(a) Strogatz's simulation had Cobb out-hitting DiMaggio 300 out of 10000 times, or 3%. Dunno how long he played, but much longer than 3% of baseball. 10000 "seasons" is a sample 100 times greater than reality.
(b) …. “Don’t give me brilliant generals; give me lucky generals.” –Caesar. …. As a former baseball player, I know how hard it is to get a hit on those days when you're just not feeling it. I don't think coins have those days.
Discussion
1. In a toss of 100 coins, what is the probability of seeing a streak of 6 or more heads? Here [18] is a website with an applet calculator and an explanation of the reasoning behind the calculations.
2. Show that, in a toss of 100 coins, the probability of seeing a streak of 6 or more heads or tails is more than 75%.
3. Comment on blogger (a)’s response to the article.
Submitted by Margaret Cibes