Chance News 17
Quotation
There are two times in a man's life when he should not speculate: when he can't afford it, and when he can.
Forsooths
Part of the fun of looking at Forsooths is trying to figure out why they are Forsooths. You should certainly try but if you get stumped you can read one person's idea of why they are Forsooths at the end of this Chance News.
The first three Forsooths are from the May 2006 RSS News.
Of the US Fortune 500 companies, 84 percent now have women on their boards: in the UK among the directors of companies in the FTSE 100, only 9 percent are women.
The Observer
19 March 2006
Thursday is the least productive day for finance workers, research has found. The start of the week is the best time with 18 per cent claiming they were most productive on a Monday.
Metro
26 January 2006
Question:
Kim has three vases in her living room, each containing the same number of flowers. Kim adds three fresh flowers to one vase which now has two more than the new average. How many flowers were in the vases originally?
2006 Mensa puzzle calander
[note: answer given as "six", which is quite correct of course.]
Peter Winkler pointed out the the following question is not a forsooth:
Kim has *some* vases in her living room, each containing the same number of
flowers. Kim adds three fresh flowers to one vase which now has two more than
the new average. How many *vases* are there?
Walking on Water
For the most part, scientists, mathematician and statisticians labor in obscurity. Almost all of what they do is of no interest to the general public. The exception used to be if sex could somehow get connected and then the scientist/mathematician/statistician would suddenly be on the rolodexes of the various talk-show programs. As an example, not so long ago a statistical study regarding the size of the ratio of the length of the forefinger to the ring finger was everywhere and anywhere. Why? Because the authors [Nature, 30 March, 2000] claimed there was a statistical significance for the difference of the ratio for homosexuals as compared to heterosexuals. Thus, an easy noninvasive, visual way of spotting sexual preference. The flaws in the study were numerous. The participants were chosen from gay pride celebrations in the vicinity of San Francisco, an area not known to be typical of the United States; multiple comparisons were made and with enough data dredging it is not statistically surprising that there would be the odd comparison that had a p-value less than 5%. The clinical (substantive, practical) significance was more or less zero in keeping with the negligible effect size coupled with measurement error. Nevertheless, titillation was high enough for several weeks of joking, hand comparisons and bad puns by the public and the media.
But sex, while always interesting, has given way to religion in American life. The phenomenal success of Dan Brown's The Da Vinci Code and the rise of the religious right guarantee that any scientific/mathematical/statistical research which can be tied to the Bible will bring instant celebrityhood. Even when the investigation appears in the unlikely Journal of Paleolimnology [2006 35:417-439] and involves "a small freshwater lake (148 km squared and a mean depth of 20 m)." The current name is Lake Kinneret but in Biblical days it was known as the Sea of Galilee upon which Jesus is said to have performed one of his miracles: walking on water. To walk on water is now a phrase that has come into the English language as being synonymous with extra-human, divine talent.
The paper by Nof, McKeague and Paldor is not an easy read, combining as it does analysis based on sea surface temperature, (warm and salty) springs, plume dynamics, ice dynamics and time series. The paper would never have made the talk-show circuit if it were only the typically dry--no pun intended-- presentation in such a technical journal. What sets it apart is its scientific explanation of how Jesus could manage to walk on water. In essence, after much physics, mathematics, and a bit of statistics, the authors have "proposed that the unusual local freezing process might have provided an origin to the story that Christ walked on water. Since the springs ice is relatively small, a person standing or walking on it may appear to an observer situated some distance away to be 'walking on water'." To avoid being inundated by hate mail (which they received in any event) they carefully state, "Whether this [walking on ice] happened or not is an issue for religion scholars, archeologists, anthropologists and believers to decide on."
In essence, the result of most of the highly mathematical argument in the paper is that things were occasionally colder back then and ice could have formed every once in a while, about every 160 years. Strangely enough, much of their data for this allegation comes from two core samples of temperature taken 2000 km away. The justification for this strange assertion is "because this distance is not any greater than the typical weather system scale in this part of the world." They do have some data much closer to the Lake but only from 1986 to 2003 yet "only the first 9 years of data were deemed suitable for use in the subsequent model." Because "the residual plots displayed some wild transitory behavior (as often seen for example, in financial time series data)," so they "added "a GARCH(1,1) component" to an AR(3) model resulting in the prediction of ice formation about every 160 years.
In their summary, the authors carefully state, "We hesitate to draw any conclusions regarding the implications of this study to the actual events that took place...Our springs ice calculations may or may not be related to the origin of the account of Christ walking on water." Nonetheless, Nof and Paldor are not strangers to conjuring up scientific explanations for Biblical phenomena. In 1992 they wrote an article, "Are There Oceanographic Explanations for the Israelites' Crossing of the Red Sea?" [Bulletin American Meteorological Society, 73; 305-314] This time, instead of temperature, it is wind which parted the Red Sea just long enough: "It is suggested that the crossing occurred while the water receded and that the drowning of the Egyptians was of a result of the rapidly returning wave." Nof likened this event to "It's like blowing across the top of a cup of coffee. The coffee blows from one end of the cup to the other." Statistics are completely absent in this paper. However, in 1993 they published a paper, "Statistics of Wind over the Red Sea with Application to the Exodus Question" [Journal of Applied Meteorology, 33, No 8; 1017-1025]. Here they "used the Weibull distribution ...applied to winds in the part of the Indian Ocean adjacent to the Red Sea" to argue that the likelihood of a proper storm would occur "roughly once every 2000 years."
---DISCUSSION---
1. Someone commented that "The reaction among Biblical scholars to Nof's theory ranged from bemused detachment to real irritation." Why the detachment and why the irritation?
2. Were the Israelites lucky to have picked the exactly correct moment? What calculations do you believe they did?
3. What physical phenomenon could explain the destruction of the walls of Jericho? Noah's flood? The Biblical burning bush?
4. The conflict between Darwinism and Biblical fundamentalism has been much in the news the past few years. Why hasn't there been any clash between fundamentalism and aspects of chemistry such as Avogadro's number?
Submitted by Paul Alper
Measuring poverty in London over 100 years
There goes the neighbourhood,
From The Economist print edition, May 4th 2006.
Booth redux,
From Economist.com, May 4th 2006.
This on-line article uses recent census data to graphically update a 100-year old map of poverty in London by district and street. The original project, led by the shipping magnate Charles Booth, colour-coded every street in the capital according to its social make-up. It shows the extent to which poverty depends on location and how little has changed over the past century.
The article illustrates one area, north Chelsea, in 1898 and 2001, colour-coding each street as either wealthy, well-off, middling or poor. In 1898, Chelsea was socially mixed, neither especially rich nor especially poor. Today Chelsea is considered a very desirable place to live, with many wealthy streets and some of the poverty has disappeared. But on closer inspection the Economist claims that
poverty has not been altogether banished from this part of Chelsea, nor has it moved much. Most of the poorest areas in 2001 were also poor in 1898, and in almost exactly the same places. The reason is that the worst Victorian slums have been knocked down and replaced with tracts of social housing.
Neither the original survey nor its updated version use complicated statistical models. In 1898, researchers peered through windows and into back gardens, or asked police officers for opinions, in order to classify each street into one of seven categories from wealthy at the top to 'vicious, semi-criminal' at the bottom of the poverty scale. The 2001 census measures people's socio-economic status as one of eight categories. So to combine the two datasets a subset of four categories was used by the Economist. Having calculated the number of people, within the smallest unit available from the 2001 census, who fall into the four new categories, the single largest group is taken to represent the character of the area.
Questions
- The Economist gives an example of its classification methodology: if an output area contains 80 members of the upper managerial and professional class 'the wealthy' and 60, 40, and 20 members, respectively, of the other three new categories, it is taken to be wealthy. Is it reasonable to based the classification of an area on the most common category of resident? e.g. should the number of people in each steet be taken into account?
- How might missing data be handled, old streets that have disappeared or new streets that didnt exist in 1898?
Further reading
- The Charles Booth Online Archive is a searchable resource giving access to archive material from the Booth collections of the British Library of Political and Economic Science (the Library of the London School of Economics and Political Science) and the University of London Library.
- Poverty maps of London - this interactive webpage allows viewers to zoom in on an area of London to see the original 1898 map juxtaposed with a modern view of the same area.
- 2001 UK census
Submitted by John Gavin
Facial Attraction
In a recent Chance News article, it is alleged that "sex, while always interesting, has given way to religion in American life" when it comes to getting research and researchers into the rolodexes of the media. That this is clearly not the case is evidenced by "Reading men's faces: women's mate attractiveness judgments track men's testosterone and interest in infants" which appeared in the Proceedings of the Royal Society, 2006. In summary, it is postulated that females, when eyeing a potential mate, are able to discern from facial cues which males are likely to provide good genetic quality for offsprings and which males would help raise offspring.
In order to determine the genetic quality of masculinity, the authors had the males' saliva tested for testosterone. Each male also "completed an interest in infants test" in which "subjects were asked to indicate whether they preferred pictures of adult or infant faces when both were presented simultaneously in pairs." The males then "posed for digital photographs" with hairstyles excluded and "Young women subsequently rated these photos for the degree to which the men depicted like children, as well as for physical attractiveness, masculinity, kindness, attractiveness as a short-term mate and attractiveness as a long-term mate."
According to the article, "The results of this study suggest that women's perceptions of men's faces track actual characteristics of men that are theoretically important for mate choice ... the present study provides the first direct evidence that women's attractiveness judgments specifically track both men's affinity for children and men's hormone concentrations."
Discussion
1. The study started with "51 University of Chicago students who were recruited from a University website and paid $10 for their participation." The 29 "Women raters were University of California, Santa Barbara (UCSB) undergraduates who participated in exchange for course credit." Starting with this non-random sample, what inferences if any can be made to a larger population? Undergraduates, students in general, Americans, the rest of the planet? Speculate on how seriously the women did their rating.
2. "Five [male] subjects who reported a gay sexual orientation and seven others who refused to have their photos taken were dropped from the data analysis." Justify and criticize this exclusion.
3. The women rated the men on a scale of 1 to 7 and "a rating of 4 indicates that he is about average, a rating of 1 means he is far below average and a rating of 7 means he is far above average." Comment on whether "distance" between a 5 and a 4 is the same as the distance between a 2 and a 1. Comment on whether a 6 is twice as good as a 3. What is the similarity between this type of rating and student evaluations of instructors?
4. The men were instructed "to look straight into the camera and assume a neutral facial expression." Define a neutral facial expression.
5. If you were given paired photos of adults and infants how much time would be necessary to choose a preference within a given pair? If you were paid more money for participating, would you spend more time choosing? Could someone who greatly prefers infants to adults be accused of pedophilia tendencies?
6. The mean testosterone for this group was 88.38 pg/ml with a standard deviation of 27.97 and was "normally distributed once an outlier three standard deviations above the mean was dropped from the sample." Have you ever had your testosterone measured? Do you have any idea what your pg/ml score is?
7. The article has an abundant number of t-values and related p-values, the latter usually of the form p-value < some number. Speculate on why effect size coupled with some sort of interval doesn't seem to be present.
8. One attribute that was not discussed was spirituality, a popular term in this age of religiosity. How could that be measured, either facially or otherwise?
9. Why is this variant of an old Yiddish joke relevant? A young woman goes to a shadchen [matchmaker or marriage broker] to seek a husband. The shadchen is an up-to-date techie and uses a spreadsheet to find the right male. She lists all the characteristics she wants in a husband: age, height, weight, athletic ability, eye color, etc. He uses his spreadsheet to find a fellow who fits the constraints, and arranges a meeting between the two of them. Next week the woman comes back and instead of paying him she ask him to find another candidate. The shadchen is surprised and says, "Wasn't he of the right age, right height, weight, athletic ability, eye color, etc." She replies, "Yes, but I didn't like him."
Submitted by Paul Alper
A New Statistical Misrepresentation
Every elementary statistics textbook warns the readers about statistical misrepresentations. For example: a bar graph comparison should never have different widths because to do so would exaggerate the difference which should depend only on heights; a graph where the origin is missing inflates differences; histograms should exhibit equal widths; when comparing contributions, per capita contribution is better than total contribution; regression graphs should avoid extrapolation. Paul Krugman's op-ed piece in the New York Times of May 29, 2006 referred to a flagrant misrepresentation I had never heard of. He entitled his article "Swift Boating The Planet" because he feels it is a fraudulent misrepresentation of global warming. According to Krugman, Dr. James Hansen, a climatologist at NASA, had numerically predicted rising temperatures as far back as 1988. "The original paper showed a range of possibilities, and the actual rise in temperature has fallen squarely in the middle of the range." However, his critic, Dr. Patrick Michaels, "claimed that the actual pace of global warming was falling far short of Dr. Hansen's predictions." Dr. Michaels concluded this by erasing "all the lower curves, leaving only the curve that the original paper described as being 'on the high side of reality'."
Discussion
1. Krugman claims that Dr. Michaels "has received substantial financial support from the energy industry." How does this affect your view of Dr. Michaels' assertions?
2. Of Dr. Michaels' removal of the lower curves, Dr. Hansen is quoted as saying "Is this treading close to scientific fraud?" Krugman's response is "no: it isn't 'treading close,' it's fraud pure and simple." What do you believe Dr. Michaels would say to justify his removal of the lower curves?
Submitted by Paul Alper
The Kindness of Strangers?
This is a review of a recent article:
Long-awaited study questions the power of prayer
The New York Times, March 31, 2006, Page A1
Benedict Carey
that is based on the following paper.
Study of the Therapeutic Effects of Intercessory Prayer (STEP) in cardiac bypass patients: A multicenter randomized trial of uncertainty and certainty of receiving intercessory prayer American Heart Journal, Volume 151, Issue 4, April 2006, Pages 934-942 Herbert Bensen, MD et.al.
Suppose you are about to undergo coronary artery bypass surgery. Would you want to have strangers praying for your successful recovery? And if so, would you prefer to know, or not to know, that such prayers were being offered?
The results of this study, which represents nearly 10 years of research, are described in the New York Times article as “the most scientifically rigorous investigation” to date of the effects of prayer on illness and medical recovery. In addition, the researchers also studied whether patients who knew they were receiving prayers fared better than those who were told only that they might be prayed for. Leaving aside the perhaps surprising fact that “rigorous investigation” of the connection between prayer and medical recovery is deemed a worthy expenditure of research time and money, the study did produce some unexpected conclusions. While there was no difference between the recovery outcomes of the patients who were prayed for and those who were not, the patients who knew they were receiving prayers actually fared worse than those didn’t know they were receiving prayers.
In the study, roughly two-thirds of the 1802 subjects were told that they may or may not receive prayers—of these, 604 were prayed for and 597 were not. The remaining 601 patients received prayers after being told that they would receive them. Prayers began the night before surgery and continued for two weeks, and were provided by members of three Christian congregations in Massachusetts, Minnesota, and Missouri. The prayer givers, known as intercessors, were asked to include the phrase “for a successful surgery with a quick, healthy recovery and no complications” to their usual prayers. The primary outcome of interest was the development of any complication within 30 days of a subject’s bypass graft surgery.
At least one complication arose in 971 patients, or roughly 54% of the total. Of these, 315 were in the first group (52%), 304 were in the second group (51%), and 352 were in the last group (59%.) A Chi-squared test applied to the values for the first and third groups (both of whom received prayers but only the third knew they were receiving them) indeed implies that the difference between the outcomes is significant (p = .025.)
While the researchers state in the their paper that “We have no clear explanation for the observed excess of complications in the patients who were certain that intercessors would pray for them,” the Times article suggests that a kind of “performance anxiety” may have been responsible: “It may have made them uncertain,” a co-author of the study remarks, “wondering am I so sick they had to call in their prayer team?” In addition, the authors note that a single outcome category was responsible for most of the excess complications in the third group, but they fail to mention that a Chi-squared test applied to the values for this category alone yields a p value of .011. Instead they merely remark that “the excess may be a chance finding,” a comment echoed without clarification in the Times article. One wonders if such hedging may be a reflection of the background of the lead investigator of the study, Dr. Herbert Bensen. According to the Times, in his work Dr. Bensen has “emphasized the soothing power of personal prayer and meditation.” Moreover, most of the $2.4 million cost of the study was provided by the John Templeton Foundation, which supports research on spirituality and promotes a more close relationship between religion and science.
Perhaps even more curious is the discussion in the paper about prayer and its use in the study. For example, after noting that the subjects may have had friends and family praying for them, or may have prayed for themselves, the authors note that “our study subjects may have been exposed to a large amount of non-study prayer, and this could have made it more difficult to detect the effects of prayer provided by the intercessors.” However, they do not suggest that there is any reason to believe that the amount of non-study prayer varied significantly between the three groups. Once again, one senses a reluctance to accept the results of the study, which is also conveyed in the Times article by a comment provided by Dean Marek, a chaplain at the Mayo Clinic in Rochester, Minnesota and co-author of the study: “You hear tons of stories about the power of prayer, and I don’t doubt them.” Although Marek is referring to the effects of personal prayer and the prayers of friends and family, not the prayers of strangers, the remark clearly misses a crucial point: one assumes that he doesn’t hear many stories about the prayers of friends and family that did not lead to an improved outcome, so we have no way of evaluating the efficacy of such prayers. Indeed, wasn’t the purpose of the study to investigate the validity of what is otherwise merely anecdotal reporting? Apparently the researchers don’t think so, given their comment near the end of the report: “Private or family prayer is widely believed to influence recovery from illness, and the results of this study do not challenge this belief.”
Discussion
1. As noted above, this study cost $2.4 million. In addition, the Times reports that since 2000, the U.S. government has spent nearly the same amount on prayer research. Do you think this is money well spent? Why or why not?
2. The reporter for the Times article notes that the study’s authors “left open the possibility” that their results were due to chance. Do you agree with the authors? Do you think that the reporter should have worked harder to understand and describe the significance level of the report’s findings?
3. In the last sentence of the report’s discussion section the authors write, “Our study focused only on intercessory prayer as provided in this trial and was never intended to and cannot address a large number of religious questions, such as whether God exists [and] whether God answers intercessory prayers…” Why do you think they included this statement?
4. How do you respond to the questions posed at the beginning of this article?
Submitted by Jeanne Albert
The Birth-Month Soccer Anomaly
A Star is Made
New York Times, May 7, 2006, Sect. 6, p. 24
Stephen J. Dubner and Steven D. Levitt
Readers may recognize Dubner and Levitt as the authors of Freakonomics. The present article opens with the curious observation that top soccer players tend to have birth-months early in the calendar year. Recent data from England, for example, show that half of the top teenage players have birthdays in January, February or March.
The authors offer the following possible explanations:
(a) certain astrological signs confer superior soccer skills;
(b) winter-born babies tend to have higher oxygen capacity, which increases soccer stamina;
(c) soccer-mad parents are more likely to conceive children in springtime, at the annual peak of soccer mania;
(d) none of the above.
As one might suspect, the authors' answer is (d). Their explanation flows from the larger theme of the article, which is that native ability matters a lot less than "deliberate practice" in determining what makes people successful. They cite a forthcoming book, the Cambridge Handbook of Expertise and Expert Performance, which is based on research by Florida State University psychologist Anders Ericsson and his colleagues. The research spans performance in such diverse areas as sports, music, computer programming and investing. As quoted in the article, Ericsson summarizes the findings by saying, "I think the most general claim here, is that a lot of people believe there are some inherent limits they were born with. But there is surprisingly little hard evidence that anyone could attain any kind of exceptional performance without spending a lot of time perfecting it." (This, by the way, reminded us of Fred Mosteller's acronym T.O.T., for "Time on Task").
As a concrete example, the article offers the following recommendation for medical training. In many specialties, performance tends to degrade over time, but not so for surgeons. The key, according to this account, is continual practice, with immediate feedback on the success of the procedure. By contrast, mammographers do not get immediate feedback on their recommendations; it may take weeks for biopsy results, and years to see whether cancer does or does not appear. The authors suggest that these professionals could enhance their skills through regular practice reading old scans, having the actual followup histories available for immediate review.
With this in mind, here is the explanation proposed by Dubner and Levitt for the soccer puzzle. Youth leagues organize players by age, with brackets often defined by age at the end of the calendar year. But a child who turns ten, say, in December is nearly a year younger than one who turned ten the previous January. The greater physical development of the older child can easily be confused with native talent for the sport. And those selected (by whatever means) for increased attention gain access to the practice and feedback that are essential for reaching the top levels of performance.
Dubner and Levitt maintain links to more research on this topic, as well as previous Freakonomics pieces from the New York Times.
Submitted by Bill Peterson
Why the Forsooths are Forsooths.
(1) Letter to the editor: The Observer, March 26, 2006.
In the story 'Where women get real respect' (News, last week), you said: 'Of the US Fortune 500 companies, 84 per cent now have women on their boards; in the UK among directors of companies in the FTSE 100, only 9 per cent are women.' So what?
If every FTSE 100 company had 11 board members, and one of those was a woman, then 100 per cent of FTSE 100 companies would have a female board member and still only 9 per cent would be women.
If 84 per cent of F500 companies have a woman on the board, and every board has 20 members, then (about) 4 per cent of F500 board members are women.
Meaningless comparisons do not make an argument.
Jeremy Miles
University of York
(2) Zack Says:
March 10th, 2006
Digital Home of Zack Stewart >> Puzzled
n = the original number of flowers in each vase.
So after Kim adds 3 flowers to one vase it contains n+3 flowers.
The new average is thus (n+n+n+3)/3 = (3n+3)/3 = n+1 flowers.
So the special vase has (n+3) - (n+1) = 2 flowers more than the new average.
All of the above is true for any n.
I have to wonder what made them pick 6 as their answer - I would have gone for something interesting, like 5930912377. That way, when you turn the page over you at least get some fun schlock value before you realize they're full of it.