Chance News 23: Difference between revisions

From ChanceWiki
Jump to navigation Jump to search
 
(40 intermediate revisions by 4 users not shown)
Line 4: Line 4:
<div align=right>Constance Talmadge, Charming Motion Picture Star, 1930.</div></blockquote>
<div align=right>Constance Talmadge, Charming Motion Picture Star, 1930.</div></blockquote>
----
----
Steve Simon provided the following quotation:
<blockquote>When I talk to people about statistics, I find that they usually are quite willing to criticize dubious statistics--as long as the numbers come from people with whom they disagree.<div align=right>Joel Best, More Damned Lies and Statistics, page XI</div></blockquote>
<blockquote>When I talk to people about statistics, I find that they usually are quite willing to criticize dubious statistics--as long as the numbers come from people with whom they disagree.<div align=right>Joel Best, More Damned Lies and Statistics, page XI</div></blockquote>


Line 16: Line 18:
----
----
The "From the President" column of the March 2007 issue of Consumer  
The "From the President" column of the March 2007 issue of Consumer  
Reports (page 5) discusses how CR uses statistics in its testing.  He
Reports (page 5) discusses how CR uses statistics in its testing.  The President
states that in response to an article in the January issue about  
states that in response to an article in the January issue about  
contamination in chickens, "The U. S. Department of Agriculture, whose job  
contamination in chickens, "The U. S. Department of Agriculture, whose job  
Line 30: Line 32:
<center>http://www.dartmouth.edu/~chance/forwiki/bradley2.jpg</center>
<center>http://www.dartmouth.edu/~chance/forwiki/bradley2.jpg</center>


For at least 30 years we walked by this wall assuming that the tiles were randomly placed.  One day, as we were walking by it, our colleaugue John Finn said "I see they are not randomly placed."  What did he see?
For at least 30 years we walked by this wall assuming that the tiles were randomly placed.  One day, as we were walking by it, our colleague John Finn said "I see they are not randomly placed."  What did he see?
 
Solutions and further comments are available [http://www.stat.columbia.edu/~cook/movabletype/archives/2007/01/a_puzzle_from_l.html here] at  [http://www.stat.columbia.edu/~cook/movabletype/mlm/ Andrew Gelman's statistics Blog].


Submitted by Laurie Snell
Submitted by Laurie Snell
Line 40: Line 44:
Graduate student Chuck Tate enlisted the help of other psychology graduate students  
Graduate student Chuck Tate enlisted the help of other psychology graduate students  
to get students to enjoy statistics as much as they enjoy hip-hop.
to get students to enjoy statistics as much as they enjoy hip-hop.
From anova to correlation, from Peason to Fisher, the whole syllabus is mentioned.
From anova to correlation, from Pearson to Fisher, the whole syllabus is mentioned.
Let's hope <em>Da Statz Krew</em> enjoy their real stats courses as much as they seemed to enjoy making the video.
Let's hope <em>Da Statz Krew</em> enjoy their real stats courses as much as they seemed to enjoy making the video.


Line 49: Line 53:
==Hot streaks rarely last==
==Hot streaks rarely last==
[http://online.wsj.com/article_email/SB116796079037267731-lMyQjAxMDE3NjA3NTkwNjUwWj.html The Man Who Shook Up Vegas,]
[http://online.wsj.com/article_email/SB116796079037267731-lMyQjAxMDE3NjA3NTkwNjUwWj.html The Man Who Shook Up Vegas,]
by Sam Walker, January 5, 2007; Page W1.
by Sam Walker, ''Wall Street Journal'', January 5, 2007; Page W1.


Since last fall (Autumn), Las Vegas has had a problem each Thursday morning at precisely 10 a.m. Nevada time.
Since last fall (Autumn), Las Vegas has had a problem each Thursday morning at precisely 10 a.m. Nevada time.
Line 116: Line 120:


===Further reading===
===Further reading===
* [http://drbobsports.com/ drbobsports.com] offers daily, monthly or full season subsciptions to my Best Bets in all the major sports.
* [http://drbobsports.com/ drbobsports.com] offers daily, monthly or full season subscriptions to my Best Bets in all the major sports.
* Your can search [http://chance.dartmouth.edu/chancewiki/index.php/Special:Search?search=betting previous Chance News articles] for more information on some the features of sports betting.
* Your can search [http://chance.dartmouth.edu/chancewiki/index.php/Special:Search?search=betting previous Chance News articles] for more information on some of the features of sports betting.


Submitted by John Gavin.
Submitted by John Gavin.
Line 211: Line 215:
Submitted by Laurie Snell
Submitted by Laurie Snell


==Do Oscar Winners Live Longer?==
If you put "Oscar winners live longer" in Google you will get  over 7,000 hits.  Here is hit from the January 23, 2007 issue of [http://www.healthandage.com/public/health-center/37/news/7655/Oscar-winners-live-longer.html Health and Aging]
Oscar winners live longer: Reported by Susan Aldridge, PhD, medical journalist<br>
<blockquote>It is Oscar season again and, if you're a film fan, you'll be following proceedings with interest. But did you know there is a health benefit to winning an Oscar? Doctors at Harvard Medical School say that a study of actors and actresses shows that winners live, on average, for four years more than losers. And winning directors live longer than non-winners.</blockquote>
Source: Harvard Health Letter March 2006
The assertion that Oscar winners live longer was based on an article by Donald Redelmeier, and Sheldon Singh: "Survival in Academy Award-winning actors and actresses". ''Annals of Internal medicine'', 15 May, 2001, Vol. 134,No. 10 p 955-962.
This is the kind of study the news loves to report and  Medical Journals enjoy the publicity. Another such claim, that is in the news as this is written, is that the outcome  of the Superbowl game determens whether the stock market will go up or down this year.  Unlike the Oscar winners story the author of this claim admits that it is all a joke see [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_13.04.html#item11 Chance News 13.04]
A recent paper by James Hanley, Marie-Pierre Sylvestre and Ella Huszti, "Do Oscar winners live longer than less successful peers? A reanalysis of the evidence," "Annals of Internal medicine", 5 September 2006, Vol 145, No. 5, p 361-363, claimes that the Redelmeier, Singh paper was flawed and their reanalysis of the data did not  support the claim that Oscar winners live longer.
For their study Redelmeier and Singh identified all actors and actresses ever nominated for an academy award in a leading or a supporting role up to the time of the study (n = 762).  Among these there were 235 Oscar winners.  For each nominee another cast member of the same sex who was in the same film and was born in the same erea was identified (n= 887) and used as controls. 
They used the Kaplan-Meier method to provide a life table for the Oscar winners and the control group. A life table estimates for each x the probability of living longer x years.  You can see how this was done in [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_10.06.html#item10  Chance News 10.06]  Redlemier and Singh obtain the following life tables for the two groups.
<center> http://www.dartmouth.edu/~chance/forwiki/oscar.jpg </center>
The area under each curve is an estimate for the the life expectance for for the two groups. Using a test called the "log-rank test" they conclude that the overall diffrence in life expectancy was 3.9 years (79.7 vs. 75.8 years; P = .003.
While the life tables look like  standard life tables there is one big difference.  We note that  100 percent of the Oscar winners live to be aat least 30 years old.  Of course this is not suprizing because they are known to be Oscer winners. Thus we know ahead of time that the oscar winners will live longer than a traditonal life table would predict. This gives them an advantage in estimating their life time. This is called a selection bias.  Of course the controls also have an advantage because we know that were in a movie at about the same age as a nominee. But there is no reason to believe that these advantages are the same.
Here is a more obvious example of selection bias discussed in Rober Abelson's book "Statistics as Principled Argument'. As reported in Chance News 4.05:
<Blockquote>A study found that  the average life expectancy
    of famous orchestral conductors was 73.4 years,
    significantly higher than the life expectancy for
    males, 68.5,  at the time of the study.  Jane
    Brody in her "New York Times" health column
    reported that this was thought to be due to arm
    exercise.  J. D  Caroll gave an alternative
    suggestion, remarking that it was reasonable to
    assume that a famous orchestra conductor was
    at least 32 years old.  The life expectancy for
    a 32 year old male was 72 years making the 73.4
    average not at all surprising.
To avoid the possible of selective bias Redlemier and Singh did an analysis using time-dependent covariates, in which winners were counted as controls until the time of first they won the Oscar.  This  resulted in a difference of 20% (CI, 0% to 35%) Since 0 is in the confidence interval the difference not significant. In a letter to the editor in response to the study by Hanley et.al.  Relemie and Singh report that they did the same analysis with one more years data and obtained a result more obviusly not signficant.
Sylvester and colleagues analysed the data by comparing the life expectancy of the winners from the moment they win with others alive at that age.  In the  http://www.mcgill.ca/newsroom/news/?ItemID=21645 Mcgill press release Hanley remarks
"The results are not as, shall we say, dramatic, but they're more accurate."
We recommend reading this press release for more information about this new study.


When the Redlelmier and Singh paper came out our colleague Peter Doyle recognized the obsurdity of the paper and suggested a number of ways to show  the problems with the study.  Peter described one of the simplest ways  as follows:
<blockquote> The problem can be seen in a very colorful way in this case, because you can do a simulation to rewrite history, having the computer select at random new OSCAR winners from among each year's nominees.  Each time you rewrite history you compute a new p-value, and you discover that you get a value less than .05 more than 5 percent of the time.  You can do the same thing with data that are much more easily simulated, but still, it's kind of cool to have the computer churning out new OSCAR winners.  Richard Burton would approve, because he generally comes out a winner! <\blockquote>
Another way was in the form of a game described as follows:


==To live longer, choose fame over fortune==
==To live longer, choose fame over fortune==
Line 269: Line 222:
Winning a Nobel prize not only brings fame and fortune to the holder but it also brings two extra years of life, according to Matthew Rablen and Andrew Oswald at the University of Warwick.
Winning a Nobel prize not only brings fame and fortune to the holder but it also brings two extra years of life, according to Matthew Rablen and Andrew Oswald at the University of Warwick.


The paper provides evidence that an increase in status, rather than wealth alone, raises a person's lifespan, based on data about the lives of about 520 Nobel Prize winners (135) and nominees (389).  
The paper provides evidence that an increase in status, rather than wealth alone, raises a person's lifespan, based on data about the lives of about 524 Nobel Prize winners (135) and nominees (389).  


This idea was first proposed by Michael Marmot, of University College London, when he studied a large cohort of people, British civil servants, and found, against all expectations, that top civil servants were far healthier and less stressed than lower ranked civil servants.
This idea was first proposed by Michael Marmot, of University College London, when he studied a large cohort of people, British civil servants, and found, against all expectations, that top civil servants were far healthier and less stressed than lower ranked civil servants.
Other studies have confirmed this result and support the assertion that better health is not a result of higher salary.
Other studies have confirmed this result and support the assertion that better health is not a result of higher salary.
The Rablen-Oswald paper refines the approach by analysing people who are at the top of their profession, by virtue of being nominated for a Nobel, to measure the value of winning the prize, relative to merely being nominated.
The Rablen-Oswald paper refines the approach by analyzing people who are at the top of their profession, by virtue of being nominated for a Nobel, to measure the value of winning the prize, relative to merely being nominated.


The authors correct for various biases, such as grouping the data by country. So American winners live over two years longer, German winners by just over a year and other European winners by 0.7 years, based on the empirical data. The fitted model suggests a two year difference overall.
The authors correct for various biases, such as grouping the data by country. So American winners live over two years longer, German winners by just over a year and other European winners by 0.7 years, based on the empirical data. The fitted model suggests a two year difference overall.
Line 282: Line 235:
Marmot and others have previously suggested that stress hormones may be a potential factor: those at the bottom of the pile are more stressed than those at the top, even though the latter have to make decisions with more wide ranging impacts.
Marmot and others have previously suggested that stress hormones may be a potential factor: those at the bottom of the pile are more stressed than those at the top, even though the latter have to make decisions with more wide ranging impacts.
Rablen and Oswald's paper goes further by suggesting a positive effect from having a high status, rather than the absence of a negative effect, as unsuccessful nominees never know that they were being considered.
Rablen and Oswald's paper goes further by suggesting a positive effect from having a high status, rather than the absence of a negative effect, as unsuccessful nominees never know that they were being considered.
In the case of Oscar winners (see [http://chance.dartmouth.edu/chancewiki/index.php/Chance_News_23#Do_Oscar_Winners_Live_Longer.3F previous article),] the winner may live longer but the other failed nominees know that they have failed to win.   
In the case of Oscar winners (see [http://chance.dartmouth.edu/chancewiki/index.php/Do_Oscar_winners_live_longer%3F Do Oscar winners life longer?)] the winner may live longer but the other failed nominees know that they have failed to win.   


===Questions===
===Questions===
Line 288: Line 241:
* The data is based on men only, to avoid differences in life-span between the sexes. Do you think that the underlying idea can be extrapolated to women?
* The data is based on men only, to avoid differences in life-span between the sexes. Do you think that the underlying idea can be extrapolated to women?
* If the idea that social status improves lifespan is truly correct, might the size of the effect be larger in a more normal population of people?
* If the idea that social status improves lifespan is truly correct, might the size of the effect be larger in a more normal population of people?
* Oddly, Oscar winning actresses and actors live 3.6 years longers than those who are merely nominated but Oscar winning scriptwriters live 3.6 years <em>less</em> than other nominess. Why might this be?
* Oddly, Oscar winning actresses and actors live 3.6 years longer than those who are merely nominated but Oscar winning scriptwriters live 3.6 years <em>less</em> than other nominees. Why might this be?


===Further reading===
===Further reading===
[http://www2.warwick.ac.uk/fac/soc/economics/staff/faculty/oswald/nobelsrablenos07.pdf Mortality and Immortality,] Matthew D. Rablen and Andrew J. Oswald, University of Warwick, Jan 2007.
* [http://www2.warwick.ac.uk/fac/soc/economics/staff/faculty/oswald/nobelsrablenos07.pdf Mortality and Immortality,] Matthew D. Rablen and Andrew J. Oswald, University of Warwick, Jan 2007.
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2007/01/not_getting_the.html Not getting the Nobel Prize reduces your expected lifespan by two years] - This statistical blog page offers some further comments on this topic.
 
Submitted by John Gavin
 
==Comments on "To live longer, choose fame over fortune"==
 
When we discussed the Oscar studies in [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_10.05.html#item6 Chance Newsl10.05] and [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_10.06.html#item10  Chance News 10.06] we, like the media, accepted this as a serious study and did not even raise the possibility that it might be nonsense.  When Peter Doyle read our account he was skeptical and shortly thereafter he and Mark Mixer did their experiments which showed that the study was indeed nonsense. This was in May of 2001 so it took us five years and the article by James Henley and his colleagues for Chance News to report [http://chance.dartmouth.edu/chancewiki/index.php/Oscar_winners_do_not_live_longer here] that Oscar winners do not life longer.
 
Our experience with the Oscar Winners study makes us skeptical that Nobel Prize winners live longer than nominees and we feel that our new Chance Wiki should do better than our old Chance News did in  critiquing this study.  We will see if we can get the data and if so make it available and perhaps we can do better this time.


Submitted by John Gavin.
Submitted by Laurie Snell


==Momentous modelling==
==Momentous modelling==
Line 307: Line 269:


Such models consider changes in a model's mean or expected value, such as <em>what if the oil price doubles?</em>
Such models consider changes in a model's mean or expected value, such as <em>what if the oil price doubles?</em>
In contrast, economists have focussed much less on variaton around the forecast, such as working out what will happen if the oil price is likely to range between, say, $20 and $100 rather than between $50 and $60?  
In contrast, economists have focussed much less on variation around the forecast, such as working out what will happen if the oil price is likely to range between, say, $20 and $100 rather than between $50 and $60?  
The Economist's tentative explanation is that the latter question requires more difficult maths.
The Economist's tentative explanation is that the latter question requires more difficult maths.


In a recent paper, based on his PhD thesis, Standford University's Nick Bloom claims such models are important if people's behaviour changes as a result of the world suddenly becoming a less (or more) certain place.
In a recent paper, based on his PhD thesis, Stanford University's Nick Bloom claims such models are important if people's behaviour changes as a result of the world suddenly becoming a less (or more) certain place.
Sudden big second-moment shocks, measured by the volatility of American share prices, are also fairly frequent: the terrorist attacks of September 11th 2001, the assassination of John Kennedy and the collapse of big companies such as Worldcom and Enron.
Sudden big second-moment shocks, measured by the volatility of American share prices, are also fairly frequent: the terrorist attacks of September 11th 2001, the assassination of John Kennedy and the collapse of big companies such as Worldcom and Enron.


Bloom's model allows firms to choose how much to invest and how many workers to employ.  
Bloom's model of the economy allows firms to choose how much to invest and how many workers to employ.  
The world in which they operate is uncertain because their revenues can vary.  
The world in which they operate is uncertain because their revenues can vary.  
He shocks the model by suddenly increasing the variability of firms' revenues,
He shocks the model by suddenly increasing the variability of firms' revenues,
based on data from shocks over the past 45 years.
based on data from shocks over the past 45 years.
He does this by doubling the standard deviation of revenues, a common measure of variability, before  it returns to its old level a few months later.
He does this by doubling the standard deviation of revenues, a common measure of variability, before  it returns to its old level a few months later.
The model predicts that firms wait and see what happens because the value of waiting increases.
The model predicts that firms wait and see what happens because the value of waiting increases, as volatility rises.
So expanding firms defer hiring new workers and failing firms tend to delay sacking employees in the hope of a turnaround in their circumstances.
For example, expanding firms defer hiring new workers and failing firms tend to delay sacking employees in the hope of a turnaround in their circumstances.
As a result, workers are no longer being shuffled from less productive to more productive firms, which is bad for the economy as a whole, a concern for policymakers.  
As a result, workers are no longer being shuffled from less productive to more productive firms, which is bad for the economy as a whole, a concern for policymakers.  


So Bloom claims that for policymakers it is important to tell second-moment shocks, which seem not to last long, from the first-moment variety, where the effects endure for longer.  
So Bloom claims that for policymakers it is important to distinguish second-moment shocks, which seem not to last long, from the first-moment variety, where the effects endure for longer.  


===Questions===
===Questions===
* Is it plausible that people's perceptions might be more influenced by the uncertainty of a forecast rather than the forecast itself? Can you think of common examples where this is the case? For example, when you hear a weather forecast which attribute of the forecast do you tend to recall before venturing outside?  
* Is it plausible that people's perceptions might be more influenced by the uncertainty of a forecast rather than the forecast itself? Can you think of common examples where this is the case? For example, when you hear a weather forecast which attribute of the forecast do you tend to recall before venturing outside? If you are going on holiday for a week, does that change what you look for, from the weather forecast for your destination?
* Is standard deviation an appropriate measure of the shocks that are mentioned in the article? Would higher moments be more helpful?
* Is standard deviation an appropriate measure of the shocks that are mentioned in the article? Would higher moments be more helpful?
* The data used to calibrate the model covers a period of 45 years. Is data from so long ago still relevant to today's economy? What adjustments might be applied to standardise the data accross time?
* The data used to calibrate the model covers a period of 45 years. Is data from so long ago still relevant to today's economy? What adjustments might be applied to standardise the data accross time?
Line 331: Line 293:


===Further reading===
===Further reading===
* [http://cep.lse.ac.uk/pubs/download/dp0718.pdf The Impact of Uncertainty Shocks: Firm Level Estimation and a 9/11 Simulation,] Nick Bloom, Standford University.
* [http://cep.lse.ac.uk/pubs/download/dp0718.pdf The Impact of Uncertainty Shocks: Firm Level Estimation and a 9/11 Simulation,] Nick Bloom, Stanford University.


Submitted by John Gavin.
Submitted by John Gavin.
==Oscar winners do not live longer==
Because of the size of Chance News 23 we have had to make this item available separately [http://chance.dartmouth.edu/chancewiki/index.php/Oscar_winners_do_not_live_longer here].

Latest revision as of 22:08, 19 July 2007

Quotations

Light a Lucky and you’ll never miss sweets that make you fat.

Constance Talmadge, Charming Motion Picture Star, 1930.

Steve Simon provided the following quotation:

When I talk to people about statistics, I find that they usually are quite willing to criticize dubious statistics--as long as the numbers come from people with whom they disagree.

Joel Best, More Damned Lies and Statistics, page XI

Forsooth

This forsooth is from the Jan 2007 RSS News.

Carl Griffths' feet have grown to a massive size 18 - double the average for adult men in Britain.

The Times

6 October 2006


The "From the President" column of the March 2007 issue of Consumer Reports (page 5) discusses how CR uses statistics in its testing. The President states that in response to an article in the January issue about contamination in chickens, "The U. S. Department of Agriculture, whose job it is to keep our cacciatore clean, labeled our study "junk science," without even learning our methodology: 'There's virtually nothing or any conclusion that anyone could draw from 500 samples,' said a USDA spokesman."

Submitted by Jerry Grossman

A Challenge

The mathematics department at Dartmouth has just moved to a new building and the previous math building is being demolished. The students called this building "Shower Towers" suggest by this picture of one wall of the building.

http://www.dartmouth.edu/~chance/forwiki/bradley2.jpg

For at least 30 years we walked by this wall assuming that the tiles were randomly placed. One day, as we were walking by it, our colleague John Finn said "I see they are not randomly placed." What did he see?

Solutions and further comments are available here at Andrew Gelman's statistics Blog.

Submitted by Laurie Snell

Statz 4 life

Statz 4 life, homies!, Da Statz Krew, Google video.

This is an hilarious 5-minute hip-hop video on an introductory statistics course for phychology at the University of Oregon, last Summer. Graduate student Chuck Tate enlisted the help of other psychology graduate students to get students to enjoy statistics as much as they enjoy hip-hop. From anova to correlation, from Pearson to Fisher, the whole syllabus is mentioned. Let's hope Da Statz Krew enjoy their real stats courses as much as they seemed to enjoy making the video.

(Note: This video was previously mentioned briefly in Chance News 18.)

Submitted by John Gavin.

Hot streaks rarely last

The Man Who Shook Up Vegas, by Sam Walker, Wall Street Journal, January 5, 2007; Page W1.

Since last fall (Autumn), Las Vegas has had a problem each Thursday morning at precisely 10 a.m. Nevada time. Casino sports betting operations around the world were being simultaneously pounded by thousands of bettors wagering millions of dollars on the same few college football games. Odder still, most of these lock step bets were turning out to be winners, costing the casinos a fortune. The global business of sports betting was being jolted every week by an obscure 41-year-old statistician from San Francisco, using the alias Dr. Bob.

The article explains the background

Gamblers wagering against a point spread must win more than half their bets (about 53%) to make a profit and must be closer to 55% to make a comfortable living. This is no small feat. Experts say there may be fewer than 100 people who can sustain these rates over time. Most of them belong to professional betting syndicates that hire teams of statisticians, wager millions every week and keep their operations secret.

Since 1999, Bob Stoll has recommended 658 bets on college football, or about 81 per season. Here are his results. (For comparison, when betting against a point spread in Las Vegas, bettors must win 52.4% of their wagers to make a profit.)

YEAR  WIN/LOSS/TIE  %  
1999  49-31-1  61  
2000  47-25-0  65  
2001  35-28-0  56  
2002  49-44-3  53  
2003  46-55-2  46  
2004  55-34-1  62  
2005  51-21-2  71  
2006  45-34-3  57  

The article claims that in the last three months, Mr. Stoll has emerged to become one of the world's most influential sports handicappers. And when it comes to predicting the outcomes of college football games, he is peerless.

What separates Mr. Stoll from other professionals, and makes him so frightening to bookmakers, is that he distributes his bets to the public, for a fee. All that pandemonium on Thursdays was no coincidence: that's the day Mr. Stoll sends an email to his subscribers telling them which college football teams to bet on the following weekend. This makes it very difficult for bookmakers to maintain a balanced book.

His website discusses the tools he uses to analyze football games: a mathematical model to project how many points each team was likely to score in a coming matchup. He makes unapologetic use of terms like variances, square roots, binomials and standard distributions. Much of his time is spent making tiny adjustments. If a team lost 12 yards on a running play, he checks the game summary to make sure it wasn't a botched punt. He compensates for the strength of every team's opponents. It takes him eight hours just to calculate a rating he invented to measure special teams. Trivial as this seems, Mr. Stoll says the extra work makes his predictions 4% better.

He does not follow the standard business model. He has no employees and he declines to advertise or swap links with other handicapping sites. In online essays, Dr. Bob says

I have a very realistic approach to handicapping and consider sports betting an investment rather than a gamble. In case you haven't figured it out by now, there is no such thing as a sure thing and I don't respect anyone who does. But, in the long run, if you follow my Best Bet advice and use a disciplined money management strategy you will win.

Bob Stoll's handicapping career began at Berkeley when he entered a $2 NFL pool and, after doing a few minutes of simple math, won $100. From then on, his statistics classes became excuses to feed football data through campus mainframes. After winning 63% of his bets in three years, he quit school to become a tout.

Hot streaks rarely last. One handicapper says

He (Bob Stoll) needs to enjoy this while it's going on right now.

In 2005, Mr. Stoll noticed that a few minutes after he sent his advice, the lines on those games would shift slightly. By the beginning of the 2006 college football season, within 30 seconds of the moment he pressed "send" on his Thursday picks, every major casino in the world would fall into line.

The bookmakers had clearly subscribed, and were trying to change the lines before his clients could make bets. When a stock analyst moves the market with a recommendation, investors who get in early can make money on it regardless of its merits. It's just the opposite in my business. When he makes picks, it's as if brokers and traders collude to drive down the price.

It's a story Mr. Stoll says he's heard thousands of times from clients who don't look at the long term.

Even good bets lose 40% of the time but some clients don't grasp that. They think I'm either hot or I'm cold.

As for what motivates him, Stoll says:

I'm not flashy by nature. I don't need three houses and a boat. I just like to handicap. For me, it's about problem solving.

Questions

  • How likely is it that his past performance table could have happened by chance?
  • Dr. Bob advises clients to bet in a disciplined pattern that leaves less than a 1% chance of exhausting their bankrolls. Is this an acceptable performance statistic? What other information would you like to know about how much you might lose?

Further reading

Submitted by John Gavin.

Amazon's Statistically Improbable Phrases

About a year ago, Amazon.com, a popular site for the online purchase of books and other items, listed a group of phrases for certain books with the label, Statistically Improbable Phrases (SIP). These were phrases identified from the full text of a book that were common in that book relative to other books.

Amazon describes how it selects the SIPs in very vague terms on one of its help pages. I presume that it is vague because Amazon considers their approach to be a trade secret. The August 23, 2006 entry on S Anand's blog outlines how you might compute SIPs and offers an example using the Calvin and Hobbes comic strip.

One use of SIPs is clustering. You could measure the similarity between books based on the number of common SIPs and then cluster the data using that similarity matrix. Another approach to clustering that is used for RSS feeds is available here.

Questions

1. Find a well known statistics book on the Amazon web site that lists SIPs. Do these SIPs give you a good idea of the content of the book?

2. Would SIPs be valuable for a work of fiction?

3. Speculate on what book would have the highest number of SIPs.

Submitted by Steve Simon

Can Google replace your doctor?

Googling for a diagnosis—use of Google as a diagnostic aid: internet based study Hangwi Tang, Jennifer Hwee Kwoon Ng. BMJ 2006: 333; 1143-1145.

An article published in BMJ argues that Google searches can sometimes aid with developing an appropriate diagnosis of disease. The researchers selected a convenience sample of diagnostic cases presented in the New England Journal of Medicine in 2005. They extracted three to five search terms from these case studies, using "statistically improbable phrases" (see above) whenever possible. They then reviewed roughly the top thirty links suggested by Google (never more than the top fifty links) and extracted a diagnosis from the pages. The diagnoses were correct in 15 out 26 cases (58%, 95% CI 38% to 77%).

The authors admit that the success of a Google diagnosis depends on what you are looking for.

We suspect that using Google to search for a diagnosis is likely to be more effective for conditions with unique symptoms and signs that can easily be used as search terms.

and also note that

Searches are less likely to be successful in complex diseases with non-specific symptoms or common diseases with rare presentations.

The BMJ offers "Rapid Responses," a system that allows interested readers to offer their own comments on any article published. The Rapid Responses to this article include a number of criticisms as well as some suggestions for improvement.

Questions

1. Is a 58% rate of correct diagnoses good?

2. The authors used blinding--the authors were unaware of the correct diagnosis during the search phase. Comment on whether this blinding is needed and whether it is effective.

3. The authors acknowledge the importance of skill in extracting information from the pages that Google identifies. There is also skill in selecting the "statistically improbable phrases" used as search terms. How would you redesign this experiment so that the skill of the authors did not influence the results?

Submitted by Steve Simon

What can you do with 100 words?

Parrot's oratory stuns scientists Alex Kirby, BBC News, January 26, 2004.

An article about N'Kisi, a parrot with a vocabulary of 950 words, makes a rather dubious statistical claim.

About 100 words are needed for half of all reading in English, so if N'kisi could read he would be able to cope with a wide range of material.

There is a story about Dr. Seuss writing his famous book "The Cat in the Hat" using a limited vocabulary list and coming in at 220 unique words. His publisher wagered $50 that he could not write a book using only 50 words. Dr. Seuss did indeed accomplish this with "Green Eggs and Ham" which uses exactly 50 words. See the Snopes.com entry on Green Eggs and Ham for details. So if 100 words are needed for half of all reading, then the book with a median level of complexity is bracketed below and above by "Green Eggs and Ham" and "The Cat in the Hat".

Another interpretation is that the 100 most common words represent 50% of the words used in a typical book. You can find a list of these words on the web, and if you remove anything except those 100 words, the text would be rather difficult to read. Here is an example of a paragraph taken from a previous Chance News.

When a ? ? for a ? ? ?, he or she ? an ? ? with the ?. The ? may ?, but ? if he does not, others will. That is ? the ? will ? ? ?. But if the ? is ? ?, then the ? ? and ? are for ?.

A separate critique of the claims about N'Kisi published at the Skeptic's Dictionary web page comments on the problems with confirmation bias.

Questions

1. How would you interpret the phrase "100 words are needed for half of all reading"? How would you verify the accuracy of this statement?

Submitted by Steve Simon

Read before you cite

Significance, Dec. 2006, Vol. 3 issue 4.
Mikhail Simkin, Vwni Roychowdhry

This is a popular account of work the authors carried out under the title "Copied citations create renowned papers".

This article was suggested by Norton Starr who was enchanted by the author's story which might be called "What determines Great Generals?

During the “Manhattan project” (the making of nuclear bomb), Fermi asked Gen. Groves, the head of the project, what is the definition of a “great” general. Groves replied that any general who had won five battles in a row might safely be called great. Fermi then asked how many generals are great. Groves said about three out of every hundred. Fermi conjectured that, considering that opposing forces for most battles are roughly equal in strength, the chance of winning one battle is 1/2 and the chance of winning five battles in a row is 1/32. “So you are right, General, about three out of every hundred. Mathematical probability, not genius.”

The authors give as reference Deming's 1936 book "Out of the crisis." But Deming says that a student sent him the story and seems to suggest that it can be found in "The Face of Battle" by John Keegan. We could not find it there. It is in Carl Sagan's "The Demon-Haunted World" but without a reference. So we don't know if this is a true story

Now just as generals might be great generals by chance so might great scientists be great by chance. The authors comment that "a commonly accepted measure of 'greatness' for scientists is the number of citations to their papers."

Now most of us would admit that we often do not read all the citations we make in our articles. Also we would admit that we probably make mistakes occasionally in our citations: the date is wrong, the volume is wrong, we might misspell the authors name etc. Of course these errors get propagated when others copy our citations.

To get any idea how many times this might occur the authors chose a renowned paper that had 4300 citations and found that of these citations 196 contained misprints, out of which only 45 were distinct. The most popular misprint in a page number appeared 78 times.

The authors develop a model to measure the effect of citation copying on the distribution of the number of citations. This model uses a "random-citing scientist." who, when writing an article, picks up m random articles, cites them, and also copies some of their references each with probability p. So m and p are parameters. They say that a good agreement between this model and actual citation data is achieved with m = 3 and p = 1/4. They illustrate this with the following figure:

http://www.dartmouth.edu/~chance/forwiki/citations.jpg

Submitted by Laurie Snell


To live longer, choose fame over fortune

Nobel's greatest prize, The Economist, 20 Jan 2007.

Winning a Nobel prize not only brings fame and fortune to the holder but it also brings two extra years of life, according to Matthew Rablen and Andrew Oswald at the University of Warwick.

The paper provides evidence that an increase in status, rather than wealth alone, raises a person's lifespan, based on data about the lives of about 524 Nobel Prize winners (135) and nominees (389).

This idea was first proposed by Michael Marmot, of University College London, when he studied a large cohort of people, British civil servants, and found, against all expectations, that top civil servants were far healthier and less stressed than lower ranked civil servants. Other studies have confirmed this result and support the assertion that better health is not a result of higher salary. The Rablen-Oswald paper refines the approach by analyzing people who are at the top of their profession, by virtue of being nominated for a Nobel, to measure the value of winning the prize, relative to merely being nominated.

The authors correct for various biases, such as grouping the data by country. So American winners live over two years longer, German winners by just over a year and other European winners by 0.7 years, based on the empirical data. The fitted model suggests a two year difference overall.

What causes the increase in longevity is not clear but it is not the cash that comes with a Nobel prize, as the inflation-adjusted purchasing power of the prize is not correlated with longevity. So status, rather than money, appears to be responsible for the effect, the authors claim.

Marmot and others have previously suggested that stress hormones may be a potential factor: those at the bottom of the pile are more stressed than those at the top, even though the latter have to make decisions with more wide ranging impacts. Rablen and Oswald's paper goes further by suggesting a positive effect from having a high status, rather than the absence of a negative effect, as unsuccessful nominees never know that they were being considered. In the case of Oscar winners (see Do Oscar winners life longer?) the winner may live longer but the other failed nominees know that they have failed to win.

Questions

  • This result is based on data from the first half of the 20th century, only, due to the secrecy surrounding the nomination of potential prize winners. Are results based on historical data still applicable today? Speculate on what adjustments might be needed.
  • The data is based on men only, to avoid differences in life-span between the sexes. Do you think that the underlying idea can be extrapolated to women?
  • If the idea that social status improves lifespan is truly correct, might the size of the effect be larger in a more normal population of people?
  • Oddly, Oscar winning actresses and actors live 3.6 years longer than those who are merely nominated but Oscar winning scriptwriters live 3.6 years less than other nominees. Why might this be?

Further reading

Submitted by John Gavin

Comments on "To live longer, choose fame over fortune"

When we discussed the Oscar studies in Chance Newsl10.05 and Chance News 10.06 we, like the media, accepted this as a serious study and did not even raise the possibility that it might be nonsense. When Peter Doyle read our account he was skeptical and shortly thereafter he and Mark Mixer did their experiments which showed that the study was indeed nonsense. This was in May of 2001 so it took us five years and the article by James Henley and his colleagues for Chance News to report here that Oscar winners do not life longer.

Our experience with the Oscar Winners study makes us skeptical that Nobel Prize winners live longer than nominees and we feel that our new Chance Wiki should do better than our old Chance News did in critiquing this study. We will see if we can get the data and if so make it available and perhaps we can do better this time.

Submitted by Laurie Snell

Momentous modelling

Momentous modelling, Economics focus, The Economist, Feb 1st 2007.

This article highlights a growing trend in economics to focus on the uncertainty surrounding a economic forecast, rather than the forecast level itself.

Shocking is what economists do. They start with a model of the economy, administer a 'shock' to it - a sudden rise in the oil price - and work out what happens to output, prices, employment and so forth.

Such models consider changes in a model's mean or expected value, such as what if the oil price doubles? In contrast, economists have focussed much less on variation around the forecast, such as working out what will happen if the oil price is likely to range between, say, $20 and $100 rather than between $50 and $60? The Economist's tentative explanation is that the latter question requires more difficult maths.

In a recent paper, based on his PhD thesis, Stanford University's Nick Bloom claims such models are important if people's behaviour changes as a result of the world suddenly becoming a less (or more) certain place. Sudden big second-moment shocks, measured by the volatility of American share prices, are also fairly frequent: the terrorist attacks of September 11th 2001, the assassination of John Kennedy and the collapse of big companies such as Worldcom and Enron.

Bloom's model of the economy allows firms to choose how much to invest and how many workers to employ. The world in which they operate is uncertain because their revenues can vary. He shocks the model by suddenly increasing the variability of firms' revenues, based on data from shocks over the past 45 years. He does this by doubling the standard deviation of revenues, a common measure of variability, before it returns to its old level a few months later. The model predicts that firms wait and see what happens because the value of waiting increases, as volatility rises. For example, expanding firms defer hiring new workers and failing firms tend to delay sacking employees in the hope of a turnaround in their circumstances. As a result, workers are no longer being shuffled from less productive to more productive firms, which is bad for the economy as a whole, a concern for policymakers.

So Bloom claims that for policymakers it is important to distinguish second-moment shocks, which seem not to last long, from the first-moment variety, where the effects endure for longer.

Questions

  • Is it plausible that people's perceptions might be more influenced by the uncertainty of a forecast rather than the forecast itself? Can you think of common examples where this is the case? For example, when you hear a weather forecast which attribute of the forecast do you tend to recall before venturing outside? If you are going on holiday for a week, does that change what you look for, from the weather forecast for your destination?
  • Is standard deviation an appropriate measure of the shocks that are mentioned in the article? Would higher moments be more helpful?
  • The data used to calibrate the model covers a period of 45 years. Is data from so long ago still relevant to today's economy? What adjustments might be applied to standardise the data accross time?
  • Do you think that the duration of shocks might be a influential factor to consider? How might this be measured and subsequently simulated? What other information would you like to have at your disposal?

Further reading

Submitted by John Gavin.

Oscar winners do not live longer

Because of the size of Chance News 23 we have had to make this item available separately here.