Chance News 23
Quotation
When I talk to people about statistics, I find that they usually are quite willing to criticize dubious statistics--as long as the numbers come from people with whom they disagree.
Joel Best, More Damned Lies and Statistics, page XI
Forsooth
Carl Griffths' feet have grown to a massive size 18 - double the average for adult men in Britain
The Times
6 OIctober 2006
A Challenge
The mathematics department at Dartmouth has just moved to a new building and the previous math building is being demolished. The students called this building "Shower Towers" suggest by this picture of one wall of the building.
For at least 30 years we walked by this wall assuming that the tiles were randomly placed. One day, as we were walking by it, our colleaugue John Finn said "I see they are not randomly placed." What did he see?
Submitted by Laurie Snell
Statz 4 life
Statz 4 life, homies!, Da Statz Krew, Google video.
This is an hilarious 5-minute hip-hop video on an introductory statistics course for phychology at the University of Oregon, last Summer. Graduate student Chuck Tate enlisted the help of other psychology graduate students to get students to enjoy statistics as much as they enjoy hip-hop. From anova to correlation, from Peason to Fisher, the whole syllabus is mentioned. Let's hope Da Statz Krew enjoy their real stats courses as much as they seemed to enjoy making the video.
(Note: This video was previously mentioned briefly in Chance News 18.)
Submitted by John Gavin.
Hot streaks rarely last
The Man Who Shook Up Vegas, by Sam Walker, January 5, 2007; Page W1.
Since last fall (Autumn), Las Vegas has had a problem each Thursday morning at precisely 10 a.m. Nevada time. Casino sports betting operations around the world were being simultaneously pounded by thousands of bettors wagering millions of dollars on the same few college football games. Odder still, most of these lock step bets were turning out to be winners, costing the casinos a fortune. The global business of sports betting was being jolted every week by an obscure 41-year-old statistician from San Francisco, using the alias Dr. Bob.
The article explains the background
Gamblers wagering against a point spread must win more than half their bets (about 53%) to make a profit and must be closer to 55% to make a comfortable living. This is no small feat. Experts say there may be fewer than 100 people who can sustain these rates over time. Most of them belong to professional betting syndicates that hire teams of statisticians, wager millions every week and keep their operations secret.
Since 1999, Bob Stoll has recommended 658 bets on college football, or about 81 per season. Here are his results. (For comparison, when betting against a point spread in Las Vegas, bettors must win 52.4% of their wagers to make a profit.)
YEAR WIN/LOSS/TIE % 1999 49-31-1 61 2000 47-25-0 65 2001 35-28-0 56 2002 49-44-3 53 2003 46-55-2 46 2004 55-34-1 62 2005 51-21-2 71 2006 45-34-3 57
The article claims that in the last three months, Mr. Stoll has emerged to become one of the world's most influential sports handicappers. And when it comes to predicting the outcomes of college football games, he is peerless.
What separates Mr. Stoll from other professionals, and makes him so frightening to bookmakers, is that he distributes his bets to the public, for a fee. All that pandemonium on Thursdays was no coincidence: that's the day Mr. Stoll sends an email to his subscribers telling them which college football teams to bet on the following weekend. This makes it very difficult for bookmakers to maintain a balanced book.
His website discusses the tools he uses to analyze football games: a mathematical model to project how many points each team was likely to score in a coming matchup. He makes unapologetic use of terms like variances, square roots, binomials and standard distributions. Much of his time is spent making tiny adjustments. If a team lost 12 yards on a running play, he checks the game summary to make sure it wasn't a botched punt. He compensates for the strength of every team's opponents. It takes him eight hours just to calculate a rating he invented to measure special teams. Trivial as this seems, Mr. Stoll says the extra work makes his predictions 4% better.
He does not follow the standard business model. He has no employees and he declines to advertise or swap links with other handicapping sites. In online essays, Dr. Bob says
I have a very realistic approach to handicapping and consider sports betting an investment rather than a gamble. In case you haven't figured it out by now, there is no such thing as a sure thing and I don't respect anyone who does. But, in the long run, if you follow my Best Bet advice and use a disciplined money management strategy you will win.
Bob Stoll's handicapping career began at Berkeley when he entered a $2 NFL pool and, after doing a few minutes of simple math, won $100. From then on, his statistics classes became excuses to feed football data through campus mainframes. After winning 63% of his bets in three years, he quit school to become a tout.
Hot streaks rarely last. One handicapper says
He (Bob Stoll) needs to enjoy this while it's going on right now.
In 2005, Mr. Stoll noticed that a few minutes after he sent his advice, the lines on those games would shift slightly. By the beginning of the 2006 college football season, within 30 seconds of the moment he pressed "send" on his Thursday picks, every major casino in the world would fall into line.
The bookmakers had clearly subscribed, and were trying to change the lines before his clients could make bets. When a stock analyst moves the market with a recommendation, investors who get in early can make money on it regardless of its merits. It's just the opposite in my business. When he makes picks, it's as if brokers and traders collude to drive down the price.
It's a story Mr. Stoll says he's heard thousands of times from clients who don't look at the long term.
Even good bets lose 40% of the time but some clients don't grasp that. They think I'm either hot or I'm cold.
As for what motivates him, Stoll says:
I'm not flashy by nature. I don't need three houses and a boat. I just like to handicap. For me, it's about problem solving.
Questions
- How likely is it that his past performance table could have happened by chance?
- Dr. Bob advises clients to bet in a disciplined pattern that leaves less than a 1% chance of exhausting their bankrolls. Is this an acceptable performance statistic? What other information would you like to know about how much you might lose?
Further reading
- drbobsports.com offers daily, monthly or full season subsciptions to my Best Bets in all the major sports.
- Your can search previous Chance News articles for more information on some the features of sports betting.
Submitted by John Gavin.
Amazon's Statistically Improbable Phrases
About a year ago, Amazon.com, a popular site for the online purchase of books and other items, listed a group of phrases for certain books with the label, Statistically Improbable Phrases (SIP). These were phrases identified from the full text of a book that were common in that book relative to other books.
Amazon describes how it selects the SIPs in very vague terms on one of its help pages. I presume that it is vague because Amazon considers their approach to be a trade secret. The August 23, 2006 entry on S Anand's blog outlines how you might compute SIPs and offers an example using the Calvin and Hobbes comic strip.
One use of SIPs is clustering. You could measure the similarity between books based on the number of common SIPs and then cluster the data using that similarity matrix. Another approach to clustering that is used for RSS feeds is available here.
Questions
1. Find a well known statistics book on the Amazon web site that lists SIPs. Do these SIPs give you a good idea of the content of the book?
2. Would SIPs be valuable for a work of fiction?
3. Speculate on what book would have the highest number of SIPs.
Submitted by Steve Simon
Can Google replace your doctor?
Googling for a diagnosis—use of Google as a diagnostic aid: internet based study Hangwi Tang, Jennifer Hwee Kwoon Ng. BMJ 2006: 333; 1143-1145.
An article published in BMJ argues that Google searches can sometimes aid with developing an appropriate diagnosis of disease. The researchers selected a convenience sample of diagnostic cases presented in the New England Journal of Medicine in 2005. They extracted three to five search terms from these case studies, using "statistically improbable phrases" (see above) whenever possible. They then reviewed roughly the top thirty links suggested by Google (never more than the top fifty links) and extracted a diagnosis from the pages. The diagnoses were correct in 15 out 26 cases (58%, 95% CI 38% to 77%).
The authors admit that the success of a Google diagnosis depends on what you are looking for.
We suspect that using Google to search for a diagnosis is likely to be more effective for conditions with unique symptoms and signs that can easily be used as search terms.
and also note that
Searches are less likely to be successful in complex diseases with non-specific symptoms or common diseases with rare presentations.
The BMJ offers "Rapid Responses," a system that allows interested readers to offer their own comments on any article published. The Rapid Responses to this article include a number of criticisms as well as some suggestions for improvement.
Questions
1. Is a 58% rate of correct diagnoses good?
2. The authors used blinding--the authors were unaware of the correct diagnosis during the search phase. Comment on whether this blinding is needed and whether it is effective.
3. The authors acknowledge the importance of skill in extracting information from the pages that Google identifies. There is also skill in selecting the "statistically improbable phrases" used as search terms. How would you redesign this experiment so that the skill of the authors did not influence the results?
Submitted by Steve Simon
What can you do with 100 words?
Parrot's oratory stuns scientists Alex Kirby, BBC News, January 26, 2004.
An article about N'Kisi, a parrot with a vocabulary of 950 words, makes a rather dubious statistical claim.
About 100 words are needed for half of all reading in English, so if N'kisi could read he would be able to cope with a wide range of material.
There is a story about Dr. Seuss writing his famous book "The Cat in the Hat" using a limited vocabulary list and coming in at 220 unique words. His publisher wagered $50 that he could not write a book using only 50 words. Dr. Seuss did indeed accomplish this with "Green Eggs and Ham" which uses exactly 50 words. See the Snopes.com entry on Green Eggs and Ham for details. So if 100 words are needed for half of all reading then the book with a median level of complexity is bracketed below and above by "Green Eggs and Ham" and "The Cat in the Hat".
Another interpretation is that the 100 most common words represent 50% of the words used in a typical book. You can find a list of these words on the web, and if you remove anything except those 100 words, the text would be rather difficult to read. Here is an example of a paragraph taken from a previous Chance News.
When a ? ? for a ? ? ?, he or she ? an ? ? with the ?. The ? may ?, but ? if he does not, others will. That is ? the ? will ? ? ?. But if the ? is ? ?, then the ? ? and ? are for ?.
A separate critique of the claims about N'Kisi published at the Skeptic's Dictionary web page comments on the problems with confirmation bias.
Questions
1. How would you interpret the phrase "100 words are needed for half of all reading"? How would you verify the accuracy of this statement?
Submitted by Steve Simon