Chance News 50: Difference between revisions

From ChanceWiki
Jump to navigation Jump to search
 
(25 intermediate revisions by 3 users not shown)
Line 90: Line 90:


==A Probability puzzle==
==A Probability puzzle==
[http://www.msri.org/ MSRI] The Mathematical Sciences Research Institute has
a newsletter "The Emmissary" which among other things has a monthly puzzle. Here is their puzzle for Spring 2009:


<blockquote>Comment: This puzzle is due to Thomas Colhurst.<br><br>
<blockquote>Comment: This puzzle is due to Thomas Colhurst.<br><br>


  Find three random variables X, Y, Z, each uniformly distributed
  Find three random variables X, Y, Z, each uniformly distributed
on [0; 1], such that their sum is constant. (Since each random variable
on [0; 1], such that their sum is constant. Since each random variable
has expectation 1/2 , the sum must in fact be 3/2)<br><br>
has expectation 1/2, the sum must in fact be 3/2. <br><br>
 
Note: To better understand this puzzle, consider the case of two random variables X and Y with  X a random choice on 0 to 1 and Y = 1- X.  Then the sum of X and Y is the constant 1. <br><br>


To better understand this puzzle, consider the case of two random variables X and Y with  X a random choice on 0 to 1 and Y = 1- X.  Then the sum of X and Y is the constant 1. <br><br>


Comment: This problem circulated at the ITA (Information Theory
Comment from the Emmissary: This problem circulated at the ITA (Information Theory
and Applications) conference in San Diego this year. In subsequent
and Applications) conference in San Diego this year. In subsequent
discussions we’ve been surprised at how many different interesting
discussions we’ve been surprised at how many different interesting
Line 143: Line 145:
</blockquote>
</blockquote>


Peter's approach is to  provide a simple probability problem and then to show that the method used to solve this problem also applies to real life problems.<br>
Peter's approach is to  provide a simple probability problem and then show that the method used to solve this problem also applies to real life problems.
For his problem he assumes a coin is tossed a sequence of times and a pattern is a sequence of heads and tails. He imagines assigning the pattern HTT to one half the audience and the pattern HTH to the other half.  He then imagines tossing the coin many times, keeping track of the average number of tosses until each of the patterns ocurr. He asks which side of the room would experience the larger average number of tosses before there pattern acurrs?  He says that most people would say that these averages would be about the same, but Peter's mathematics shows that
the expected number of tosses until the pattern HTH occurs is 10 tosses, whereas the expected number of times until the pattern  HTT occurs is 8.


His simple problem (which we shall see is not so simple) can be described as follows.  If you toss a coin three times there are eight possible outcomes (patterns): HHH, HTT, HHT, HTH, TTT, TTH, THH, TTH. For our game Peter and Paul each choose one of these eight patters.  Let's assume that Peter chooses HTT and Paul choose HTH. Then we toss a coin a sequence of times and the first player whose pattern occurs wins. Most people would say that the probability that Paul wins is 1/2 but alas that is not correct. There is a huge literature on finding the probability that Peter wins. See for example [http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html Introduction to Probability] by Grinstead and Snell pages 428-430.
Peter's problem is from a well known problem called the coin tossing problem: It was proposed by Walter Penny in the Journal of Recreational Mathematics (October 1969, P. 241). Penney proposes a game in which each player chooses a pattern of the same length and then a coin is tossed a sequence of times and the player whose pattern occurs first is the winner. In Peter's problem one player chooses HTT and the other HTH. From symmetry this is clearly a fair game so each has the same probability of winning. On the other hand. as Peter says. the expected number of tosses until HTH occurs is 10 while the expected number of tosses until HTT occurs is 8!


You can also see a nice nice discussion of this coin tossing problem by Martin Gardner, (1974) Mathematical games, Sci. Amer. 10, 120-125. Here you will find an elegant combinatorial solution to  this coin tossing problem, due to John Conway. This article  is also included in Gardner's book "Time Travel and Other Mathematical Bewilderments" and in some of his other books.   
You can also see a nice nice discussion of this problem by Martin Gardner, (1974) Mathematical games, Sci. Amer. 10, 120-125. Here you will find an elegant combinatorial solution to  this coin tossing problem, due to John Conway. This article  is also included in Gardner's book "Time Travel and Other Mathematical Bewilderments" and in some of his other books.   


Donnelly finishes his talk by discussing how this Penney's problem has been used in his field of research DNA and the role of DNA in the courts.  He illustrates the problems of using DNA in the courts using the Sally Clark case, which we discussed in [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_11.01.html#item2 Chance News 11.01].
ˇDonnelly finishes his talk by discussing how this problem has been used in his field of research DNA and the role of DNA in the courts.  He illustrates the problems of using DNA in the courts using the Sally Clark case, which we discussed in [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_11.01.html#item2 Chance News 11.01].


Contributed by Laurie Snell
Contributed by Laurie Snell
Line 171: Line 175:


Submitted by Bill Peterson, based on posts by Nancy Boynton and others to the Isolated Statisticians mailing list.<br>
Submitted by Bill Peterson, based on posts by Nancy Boynton and others to the Isolated Statisticians mailing list.<br>
===Update===
In August, Andrew Gelman's blog included a [http://www.stat.columbia.edu/~cook/movabletype/archives/2009/08/more_on_the_ira.html follow-up] on this discussion.  In particular, he provides a link to some interesting data analysis on the [http://thomaslotze.com/iran/ Iran election] by Thomas Lotze, a PhD student at the University of Maryland.  Specifically, writing on the Beber and Scacco [http://thomaslotze.com/iran/#LastDigits last digits analysis], he writes:
<blockquote>This is certainly a little bit unusual; however, we have to recognize that in their analysis, Beber and Scacco chose to remove the invalidated counts by province...if you consider all the province-level counts, the p-value goes up to just over 0.1, which is not very significant.</blockquote>


==What’s good for the goose?==
==What’s good for the goose?==

Latest revision as of 12:59, 11 July 2011

Quotation

Probability is a mathematical discipline whose aims are akins to those, for example, of geometry of analytical mechanics. In each field we must carefully distinguish three aspects of the theory: (a) the formal logical content, (b) the intuitive background, and (c) the applications. The character, and the charm, of the whole structure cannot be appreciated without considering all three aspects in their proper relation.

William Feller

An Introduction to Probability Theory and its Applications


This quotation was found at the Probability Web which our readers will enjoy.

Forsooths

Naomi Schaefer Riley writes in the Wall Street Journal of 6/23/09 regarding a Colorado appeals court ruling which declared that "the public interest is advanced more by tenure systems that favor academic freedom over tenure systems that favor flexibility in hiring or firing." She then offers this statement, "Some of the courses taught this year by the professors who sued include American Baseball History and Business Statistics."


Paul Alper


This graphic has counter-intuitive coloring. http://graphics8.nytimes.com/images/2009/06/27/opinion/27blowlarge.jpg

The accompanying article is here

Paul Alper

Swine flu pandemonium III

"Fourth Connecticut Resident With Swine Flu Dies" by Arielle Levin Becker, The Hartford Courant, June 19, 2009
See Chance News 49 [1] for two earlier stories about the second and third cases of swine flu in Connecticut.

A fourth Connecticut death has been "linked" to swine flu.

The person was between 40 and 49 years old and had underlying medical conditions that increased the risk for serious illness from flu, the state Department of Public health said.

To date there have been 767 confirmed cases of swine flu, 28 of the cases had been hospitalized, and 19 of the hospitalized were from the largest cities. All four deaths occurred in people with other medical problems who were hospitalized at the time of death.

Here is the data to date about Connecticut deaths from swine flu:
1 death – 395 confirmed cases – June 4
2 deaths – 637 confirmed cases – June 11
3 deaths – 693 confirmed cases – June 15
4 deaths – 767 confirmed cases – June 17

Discussion

1. Would you advise Connecticut residents to move out of large cities to avoid swine flu? Could the victims have been identified as being from the largest cities because they died in large city hospitals, as opposed to having resided in large cities?
2. Would you advise Connecticut residents with swine flu to avoid hospitals?

Swine flu pandemonium IV

“Fever, Chills…and Losses: More Companies Should Be Preparing for an Influenza Pandemic”
by Amin Mawani, The Wall Street Journal, June 22, 2009
Employers are urged to prepare for a possibly sizeable increase in employee absenteeism from swine flu, the first pandemic declared by the World Health Organization in 41 years. The World Economic Forum predicts a $500 billion economic impact from the pandemic.

The good news is that employee absenteeism—and its financial toll on employers—may be controlled to a large extent with adequate planning and stockpiling of antiviral medication, masks and gowns.

The bad news is that few companies have taken steps to protect themselves. A 2007 survey reported at a Harvard Business School conference on pandemic planning found that while 88% of companies seemed prepared to deal with a power disruption and 70% with a technological failure, only 13% were prepared for the kind of labor-force disruption that would come with a pandemic.

Companies are advised to use cost-benefit analysis to justify preparedness. A company’s benefits associated with “pandemic preparedness” include “the earnings before interest, taxes, depreciation and amortization … that are preserved because employees aren’t absent. To figure this, managers must establish the contribution employees make to profits. Some of these calculations can get complex ….” A company that is prepared may also have a competitive edge due to reliability in bad times, as well as a reduced likelihood of being liable for negligence in governance.

Role of luck in golf

"Winning a Major May Just Be a Matter of Luck", by Jason Turbow, The Wall Street Journal, June 19, 2009

Using data from every PGA leadership board from 1998 to 2001, two business professors, from University of North Carolina and Dartmouth, have used "cubic spline functions" to try to explain the role of luck in professional golf.

"If being on the leaderboard at the end of a tournament was due entirely to skill, we would see the same names every week," said [the Dartmouth researcher]."

Their model aims to predict an individual's score in a tournament based on an estimate of the person's "intrinsic skill level independent of variables like course difficulty and variations over time." A golfer with a higher tournament score than predicted was considered to have had good luck; one with a lower score was considered to have had bad luck.

[I]n all the events the researchers studied, Mr. Woods was the only golfer to win a tournament despite suffering from negative luck.

The brief article includes a table [2] of expected score, actual score, and "luck factor" for players Tiger Woods, Ernie Els, Vijay Singh, Phil Mickelson, Sergio Garcia, and Jim Furyk, in the 2000 U.S. Open.

Baseball: More education, more victories?

"Who Has the Brainiest Team in Baseball?", by Jason Turbow, The Wall Street Journal, June 16, 2009

The author studied "30 team media guides" to try to determine whether there is "a correlation between education and victories" in professional baseball. He compared team standings with players' and managers' undergraduate degrees. He found that only about two dozen major league players or managers had undergraduate degrees.

[T]hree "All-Brains" division leaders -- Oakland, Arizona and Washington -- are in last place in real life, while Texas and the Dodgers were last in their divisions in smarts but first in the standings.

Two bloggers [3] wrote:

(1) "[A]re these results really surprising? The best teams are the best teams because they have more good players than the other teams. Good players are likely to have been A) so talented at baseball as to have little incentive to work hard at school and B) so dedicated to the sport that academics would have suffered. If you're a marginal major league talent like Breslow, it makes sense to get a degree with better earnings potential. Not so for the Alex Rodriguezes [sic] and Barry Bondses [sic] of the world."

(2) "How about instead of looking at university experience, check out something that almost every player (from the U.S., at least) would have: SAT scores. Surely there is the occasional ballplayer with a stratospheric score who still opts for baseball over college."

Keeping up with the Joneses by lowering utility bills

“Greening With Envy”, by Bonnie Tsui, The Atlantic, August 2009
Robert Cialdini, a social psychologist at Arizona State University, tested 4 different hotel reuse-towels signs to test how well guests responded:

The first sign had the traditional message, asking guests to “do it for the environment.” The second asked guests to “cooperate with the hotel” and “be our partner in this cause” (12 percent less effective than the first). The third stated that the majority of guests in the hotel reused towels at least once during their stay (18 percent more effective). The last message was even more specific: it said that the majority of guests “in this room” had reused their towels. It produced a 33 percent increase in response behavior over the traditional message.

As the chief scientist for Positive Energy, Cialdini is now applying what he learned to encouraging utility consumers to conserve energy by letting them know how much energy they use relative to their neighbors. Based on his software’s analysis of a neighborhood’s energy usage, a utility company can send monthly bills to consumers with information about how a particular consumer’s usage compared to that of his/her neighbors. For example, a consumer who used “58 percent less electricity” might receive a row of smiley faces, while one who used “39 percent more” might receive no smiley faces, a notice that it cost him/her $741 extra, and tips for improvement.
In Sacramento,

people who received personalized “compared with your neighbors” data on their statements reduced their energy use by more than 2 percent over the course of a year. … [W]ith the pilot sample of 35,000 homes, it’s the equivalent of taking 700 homes off the grid. And the cost to the utility is minor: for every dollar a utility spends on a solar power plant, it produces 3 to 4 kilowatt-hours; for every dollar a utility spends on the energy reports, it saves 10 times that.

A Probability puzzle

MSRI The Mathematical Sciences Research Institute has a newsletter "The Emmissary" which among other things has a monthly puzzle. Here is their puzzle for Spring 2009:

Comment: This puzzle is due to Thomas Colhurst.

Find three random variables X, Y, Z, each uniformly distributed on [0; 1], such that their sum is constant. Since each random variable has expectation 1/2, the sum must in fact be 3/2.

To better understand this puzzle, consider the case of two random variables X and Y with X a random choice on 0 to 1 and Y = 1- X. Then the sum of X and Y is the constant 1.

Comment from the Emmissary: This problem circulated at the ITA (Information Theory and Applications) conference in San Diego this year. In subsequent discussions we’ve been surprised at how many different interesting

solutions are possible.,


Submitted by Laurie Snell

The Ted Talks

At the Ted talks website we read:

Each year, the world's leading thinkers and doers gather in for an event many describe as the highlight of their year. Attendees have called it "The ultimate brain spa," "Davos for optimists" and "A four-day journey into the future, in the company of those creating it." This event is called TED, and it's truly a conference like no other.

"It was incredible." Malcolm Gladwell
"A mind-opening experience." Amy Tan
"One of the highlights of my entire life." Billy Graham
"I've never experienced anything remotely like it." Jeffrey Katzenberg

"The combined IQ of the attendees is incredible." Bill Gates

Of course we are interested in statistics or probability talks. We found two statistics talks, one by Hans Rosling and another by Peter Donnelly. You can listen to Rosling's talk here.

From the Tedtalk website we read:

Even the most worldly and well-traveled among us will have their perspectives shifted by Hans Rosling. A professor of global health at Sweden’s Karolinska Institute, his current work focuses on dispelling common myths about the so-called developing world, which (he points out) is no longer worlds away from the west. In fact, most of the third world is on the same trajectory toward health and prosperity, and many countries are moving twice as fast as the west did.

What sets Rosling apart isn’t just his apt observations of broad social and economic trends, but the stunning way he presents them. Guaranteed: You’ve never seen data presented like this. in Rosling’s hands, data sings. Trends come to life. And the big picture — usually hazy at best — snaps into sharp focus.

We did indeed find his talk amazing

You can listen to Peter Donnelly's talk here.

From the Ted Talk website we read:

Oxford mathematician Peter Donnelly reveals the common mistakes humans make in interpreting statistics -- and the devastating impact these errors can have on the outcome of criminal trials.

Peter begins with a couple of jokes which we did not find all that funny:

Statisticians are people who like figures but do not have the personality to become accountants

How do you tell the introverted statistician from the extroverted statistician? The extroverted statistician is the one who looks at the other person's shoes.

Peter's approach is to provide a simple probability problem and then show that the method used to solve this problem also applies to real life problems. For his problem he assumes a coin is tossed a sequence of times and a pattern is a sequence of heads and tails. He imagines assigning the pattern HTT to one half the audience and the pattern HTH to the other half. He then imagines tossing the coin many times, keeping track of the average number of tosses until each of the patterns ocurr. He asks which side of the room would experience the larger average number of tosses before there pattern acurrs? He says that most people would say that these averages would be about the same, but Peter's mathematics shows that the expected number of tosses until the pattern HTH occurs is 10 tosses, whereas the expected number of times until the pattern HTT occurs is 8.

Peter's problem is from a well known problem called the coin tossing problem: It was proposed by Walter Penny in the Journal of Recreational Mathematics (October 1969, P. 241). Penney proposes a game in which each player chooses a pattern of the same length and then a coin is tossed a sequence of times and the player whose pattern occurs first is the winner. In Peter's problem one player chooses HTT and the other HTH. From symmetry this is clearly a fair game so each has the same probability of winning. On the other hand. as Peter says. the expected number of tosses until HTH occurs is 10 while the expected number of tosses until HTT occurs is 8!

You can also see a nice nice discussion of this problem by Martin Gardner, (1974) Mathematical games, Sci. Amer. 10, 120-125. Here you will find an elegant combinatorial solution to this coin tossing problem, due to John Conway. This article is also included in Gardner's book "Time Travel and Other Mathematical Bewilderments" and in some of his other books.

ˇDonnelly finishes his talk by discussing how this problem has been used in his field of research DNA and the role of DNA in the courts. He illustrates the problems of using DNA in the courts using the Sally Clark case, which we discussed in Chance News 11.01.

Contributed by Laurie Snell

Fraud in Iranian election?

The devil is in the digits
Washington Post, 20 June 2009
Bernd Beber and Alexandra Scacco

Beber and Scacco are doctoral students in political science at Columbia University. In this article they argue that certain patterns in the reported electoral totals from this month's Iranian presidential elections give strong indications of tampering. Iran's Ministry of the Interior released data for 29 provinces, and the authors examined the reported vote totals for the four main candidates, Ahmadinejad, Mousavi, Karroubi and Mohsen Rezai. Among these 116 numbers, the authors focus on the last two digits, which they assert should be uniformly distributed. However, they report two statistical irregularities. First, regarding the final digits, they write

We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4 percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.

Next, they considered the last two digits together, and asked how many of the pairs contain non-adjacent digits (e.g., 32 has adjacent digits while 35 has non-adjacent digits). They report that only 62% of the pairs had non-adjacent digits, compared with the 70% that would be expected for random digits.

Further investigations by Beber and Scacco, this time involving county level data (an average province in Iran contains about 12 counties), are discussed on Andrew Gelman's blog. At the county level, the last digits do not look suspicious, and the county results do add up to the reported province totals. Beber and Scacco speculate that if the province totals were indeed fabricated, as their earlier analysis suggests, then the county totals could have been made to match as follows: the first few digits of each county could be padded get the total to get in the right ballpark, and then fine adjustment on the last digits of just one county would be required to match the province figure. Indeed, they cite discussion of work by Walter Mebane suggesting that the leading digits do look suspicious. Mebane has been regularly updating his analysis online.

A good collection of links related to this discussion from Pollster.com is available here.

Submitted by Bill Peterson, based on posts by Nancy Boynton and others to the Isolated Statisticians mailing list.

Update

In August, Andrew Gelman's blog included a follow-up on this discussion. In particular, he provides a link to some interesting data analysis on the Iran election by Thomas Lotze, a PhD student at the University of Maryland. Specifically, writing on the Beber and Scacco last digits analysis, he writes:

This is certainly a little bit unusual; however, we have to recognize that in their analysis, Beber and Scacco chose to remove the invalidated counts by province...if you consider all the province-level counts, the p-value goes up to just over 0.1, which is not very significant.

What’s good for the goose?

“NYC Issues Geese Evictions”, by Martha T. Moore, USA Today, June 22, 2009

The U.S. Department of Agriculture Wildlife Services has begun removing, for euthanization, about 2,000 geese from areas near New York City’s two airports, at a cost of about $100,000. In addition to the euthanization program, a Port Authority spokesperson stated that it was also “training airport employees to use a shotgun” as a “last resort.”

This program is apparently a response to the January incident in which Canada geese “hit” US Airways Flight 1549, “shutting down the jet's engines and forcing the pilot to ditch in the Hudson River.” Not only were there no fatalities in this incident, but also, according to the NTSB, the geese were not local.

According to FAA data, while the last airline fatality from a bird occurred in Boston in 1960, “the average annual number of large bird strikes has increased 62% since the 1990s.”

Bloggers suggest that, at the very least, the geese could provide food for needy people.[4]

Discussion
1. The last airline fatality from a bird occurred almost 50 years ago. If geese were only a problem with respect to airline safety, might this program be “overkill”? What information and/or data might you need in order to decide whether to support this program?
2. The article tells us that the average annual number of large bird strikes at airplanes has increased by 62% since the 1990s. What additional information and/or data might you need in order to decide whether an increase of 62% is significant, statistically or otherwise?

Framing choices

“About Time: Regulation Based On Human Nature”
by Jason Zweig, The Wall Street Journal, June 20, 2009
In this column, Zweig writes about the need to provide consumers with clear, understandable options in choosing among complicated products such as mortgages.
He refers to the 2009 revised edition of “Nudge: Improving Decisions About Health, Wealth, and Happiness”, by Richard Thaler (University of Chicago) and Cass Sunstein (Harvard Law School), who believe that financial institutions should be required to offer at least some “generic” mortgage plans that would make comparison-shopping easier.

The central idea in “Nudge” is what Profs. Thaler and Sunstein call “choice architecture" – the context, format and framing of how decisions are presented to consumers. You will eat more nuts from a big bowl than from a small bowl. You will choose surgery if you are told it offers a 90% chance of survival; you will reject it if you are told there is a 10% chance it will kill you. The same people who would skip investing in a 401(k) if they had to "opt in" to the plan will participate if they have to "opt out" in order to skip it.

More on Infuse and Kuklo II

Here is More on the Infuse and Kuklo II story that appeared in Chance News 49.

From [5] Orthopedics This Week we learn that "Medtronic has finally told Senator Charles Grassley how much it paid former Army surgeon Timothy Kuklo, M.D. Over $850,000 between 2001 and 2009." The article goes on to say, "Medtronic continues to dribble out details that raise more questions than answers." The article concludes with "Machiavelli advised his prince to get all the bad news out at once and dribble out the good news. It would be good advice for Medtronic to heed."

Submitted by Paul Alper