Chance News 41: Difference between revisions
Line 3: | Line 3: | ||
<blockquote>We never like to say zero in statistics</blockquote> | <blockquote>We never like to say zero in statistics</blockquote> | ||
<div align="right">Andrew Gelman<br> | <div align="right">Andrew Gelman<br> | ||
</div | </div> | ||
==Forsooth== | ==Forsooth== |
Revision as of 21:30, 3 November 2008
Quotation
We never like to say zero in statistics
Forsooth
http://www.pmean.com/images/ForgottenMissingValue.jpg
This graphical forsooth was submitted by Steve Simon.
The following Forsooth was suggested by Paul Alper
According to an article by Nicolas Kristofof the New York Times, Christopher Ruhm, an economist at the University of North Carolina, Greensboro claims that "each one-percentage-point drop in unemployment in the United States is associated with an extra 3,900 deaths from heart attacks." More generally, "Ruhm argues that death rates go down during economic slowdowns. Professor Ruhm's research indicates that suicides rise but total mortality rates drop, as do deaths from heart attacks, car accidents, pneumonia and most other causes."
John Lamperti suggested this forsooth:
By 2023, more than half of all American children will be minority, the Census Bureau projects.
New York Times, October 26, 2008
Timothy Agen
The following forsooths are from the November 2008 RSS News. It is nice to see that even their News can contribute a forsooth.
The RSS has global standing. Of its 7000 members, about one in four is drawn from over 50 countries
Statistics & Society
Royal Statistical Society
2008
Blondes are said to have more fun but it seems brunettes steal the hearts of billionaires.
Brunettes such as Microosft boss Bill Gates's wife, Melinda French, are more likely to marry a successful man then their blonde sisters, a study today said. Experts checked the hair colour of the wives and girlfriends of the world's top 100 billionaires. Most- 62 per cent- were brunettes.
Fair-haired women came in a poor second with 22 per cent of the world's top billionaires marrying blonds.
Raven-haired women entice just 16 per cent of the worlds wealthiest men, while not one of the top billionaires is married to a readhead
Sunday Metro
6 April 2008
Is the Bradley effect real?
Do polls lie about race? Kate Zernike, The New York Times, October 12, 2008.
There has been a lot written about the "Bradley effect." This is a phenomenon first noted in the race for governor of California in 1982, where the Los Angeles mayor, Tom Bradley, polled far ahead of his competition, but lost by a small margin. This phenomenon was also noted in elections involving Harold Washington, David Dinkins, and Douglas Wilder. All of these candidates were black men and in these elections the results of the polls were more favorable to the black candidates than the election results. The belief is that people who are polled don't want to appear bigoted to the pollster by opposing the black candidate, but feel no such social pressure when casting their ballots.
In recent days, nervous Obama supporters have traded worry about a survey — widely disputed by pollsters yet voraciously consumed by the politically obsessed — that concluded racial bias would cost Mr. Obama six percentage points in the final outcome.
Is that true? Perhaps there is a Bradley effect, but perhaps not.
But pollsters and political scientists say concern about a Bradley effect — some call it a Wilder effect or a Dinkins effect, and plenty call it a theory in search of data — is misplaced. It obscures what they argue is the more important point: there are plenty of ways that race complicates polling. Considered alone or in combination, these factors could produce an unforeseen Obama landslide with surprise victories in the South, a stunningly large Obama loss, or a recount-thin margin. In a year that has already turned expectations upside down, it is hard to completely reassure the fretters.
The article notes situations where there may be a reverse Bradley effect. This occurs when
polls understate support for a black candidate, particularly in regions where it is socially acceptable to express distrust of blacks.
More critical than social expectations, perhaps, is an even more fundamental issue about polling.
Research shows that those who refuse to participate in surveys tend to be less likely to vote for a black candidate.
One survey researcher, Andrew Kohut, got at this indirectly by comparing people who responded immediately to those that required some extra effort.
Mr. Kohut conducted a study in 1997 looking at differences between people who readily agreed to be polled and those who agreed only after one or more callbacks. Reluctant participants were significantly more likely to have negative attitudes toward blacks — 15 percent said they had a “very favorable” attitude toward them, as opposed to 24 percent of the ready respondents. “The kinds of people suspicious of surveys are also more intolerant,” Mr. Kohut said.
The article discusses some of the issues involving the race of the person conducting the survey interview.
A further complication is the race of the person who asks the questions. Talking to a white interviewer, blacks or whites are more likely to say that they are supporting the white candidate; talking to a black interviewer, people are more likely to support the black candidate. This holds true whether the surveys are in person, or on the phone.
It is unclear, however, which type of interviewer is more likely to produce an accurate response.
Submitted by Steve Simon
Questions
1. If there is indeed a Bradley effect, is there any statistical adjustment that could be made to produce more accurate election polling results?
2. Does the study by Andrew Kohut produce a valid conclusion about the racial attitudes of non-respondents?
Ghost Writing
Conspiracy theorists sometimes turn to statistics to prove their case. Jack Cashill writing here compares Barack Obama's book, Dreams From My Father, with Bill Ayers' book, Fugitive Days, in order to show that Ayers is the true author of Dreams From My Father.
Cashill writes, "To add a little science to the analysis, I identified two similar 'nature' passages in Obama's and Ayers' respective memoirs, the first from Fugitive Days:
'I picture the street coming alive, awakening from the fury of winter, stirred from the chilly spring night by cold glimmers of sunlight angling through the city.'
The second from Dreams:
'Night now fell in midafternoon, especially when the snowstorms rolled in, boundless prairie storms that set the sky close to the ground, the city lights reflected against the clouds.'
These two sentences are alike in more than their poetic sense, their length and their gracefully layered structure. They tabulate nearly identically on the Flesch Reading Ease Score (FRES), something of a standard in the field. The 'Fugitive Days' excerpt scores a 54 on reading ease and a 12th grade reading level. The 'Dreams' excerpt scores a 54.8 on reading ease and a 12th grade reading level. Scores can range from 0 to 121, so hitting a nearly exact score matters."
Cashill continues, "A more reliable data-driven way to prove authorship goes under the rubric 'cusum analysis' or QSUM. This analysis begins with the measurement of sentence length, a significant and telling variable. To compare the two books, I selected thirty-sentence sequences from Dreams and Fugitive Days, each of which relates the author's entry into the world of 'community organizing.' 'Fugitive Days' averaged 23.13 words a sentence. 'Dreams' averaged 23.36 words a sentence. By contrast, the memoir section of [Cashill's] 'Sucker Punch' averaged 15 words a sentence."
Further, "Interestingly, the 30-sentence sequence that I pulled from Obama's conventional political tract, Audacity of Hope, averages more than 29 words a sentence and clocks in with a 9th grade reading level, three levels below the earlier cited passages from 'Dreams' and 'Fugitive Days.' The differential in the Audacity numbers should not surprise. By the time it was published in 2006, Obama was a public figure of some wealth, one who could afford editors and ghost writers."
Discussion
1. Go to the indispensible Wikipedia site here to find
Flesch Reading Ease
In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark more-difficult-to-read passages. The formula for the Flesch Reading Ease Score (FRES) test is
206.835 -1.015 *(total words/total sentences) -84.6*(total syllables/total words)
Here's the breakdown,
Score
Notes
90.0-100.0
easily understandable by an average 11-year old student
60-70
easily understandable by 13- to 15-year old students
0-30
best understood by college graduates
Determine the FRES score for this Chance News wiki. Do likewise for Cashill's article.
Wikipedia goes on to state:
"Reader's Digest magazine has a readability index of about 65, Time Magazine scores about 52, and the Harvard Law Review has a general readability score in the low 30s. The highest (easiest) readability score possible is 121 (every sentence consisting of only one-syllable words); theoretically there is no lower bound on the score -- this sentence, for example, taken as a reading passage unto itself, has a readability score of ~21.9. This paragraph has a readability score of ~53.93."
Verify the value of 53.93 which is very close to the 54 stated for each book. Does this lend credence to Bill Ayers having written this paragraph?
2. The same Wikipedia site has this to say about how FRES relates to grade level:
An obvious use for readability tests is in the field of education. The "Flesch-Kincaid Grade Level Formula" translates the 0-100 score to a U.S. grade level, making it easier for teachers, parents, librarians, and others to judge the readability level of various books and texts. It can also mean the number of years of education generally required to understand this text, relevant when the formula results in a number greater than 12. The grade level is calculated with the following formula:
0.39*(total words/total sentences) + 11.8*(total syllables/total words) -15.59
The result is a number that corresponds with a grade level. For example, a score of 8.2 would indicate that the text is expected to be understandable by an average student in 8th grade (usually aged 13-15 in the U.S.).
Determine the grade level for this Chance News wiki. Do likewise for Cashill's article. Likewise for the Wikipedia paragraph.
3. Go here for information about QSUM. Determine the QSUM for this Chance News wiki. Do likewise for Cashill's article.
4. Speculate on how similar any two literary works would be as viewed by FRES and QSUM if the investigator had complete free reign over what segments to use.
Submitted by Paul Alper
SMOG (Simple Measure of Gobbledygook)
Strange as it may seem to the general public, even statisticians want to write properly in order to communicate to the reader. A previous wiki (Ghost Writing) mentioned several websites which calculate readability and grade level using regression analysis; as you will see, there are others. According to Wikipedia, (Simple Measure of Gobbledygook) "is a readability formula that estimates the years of education needed to completely understand a piece of writing. SMOG is widely used, particularly for checking health messages. The precise SMOG formula yields an outstandingly high 0.985 correlation with the grades of readers who had 100% comprehension of test materials. SMOG was published by G. Harry McLaughlin in 1969 as a more accurate and more easily calculated substitute for the Gunning-Fog Index."
In order to calculate SMOG
1.. Count a number of sentences (at least: 10 from the start of a text, 10 from the middle, and 10 from the end).
2.. In those sentences, count the polysyllables(words of 3 or more syllables).
3.. Calculate using
For the Gunning-Fan Index, go here where you will find
The Gunning fog index can be calculated with the following algorithm
1.. Take a full passage that is around 100 words (do not omit any sentences).
2.. Find the average sentence length (divide the number of words by the number of sentences).
3.. Count words with three or more syllables (complex words), not including proper nouns (for example, Djibouti), compound words, or common suffixes such as -es, -ed, or -ing as a syllable, or familiar jargon.
4.. Add the average sentence length and the percentage of complex words (ex., +13.37%, not simply + 0.1337)
5.. Multiply the result by 0.4
The complete formula is as follows:
The wonderful website, http://www.editcentral.com/gwt/com.editcentral.EC/EC.html<http://www.editcentral.com/gwt/com.editcentral.EC/EC.html, "is an interactive web page for checking a sample of writing. It is modeled after the ancient Unix utilities style and diction." One can "enter or copy text into the first box below. The scores to the right give the readability of the text according to various formulas" including all the ones mentioned thus far. "Words of three or more syllables are underlined. You should check the words or phrases in red to see if they should be re-written according to the suggestion in the brackets."
Click "the Demo" button one, two, or three times to see different samples of text. Check the scores for each sample; do you think the scores match the abilities of students in those grades? The different formulas give different estimates of grade level [required to understand the text]. Which formula is the most accurate? Click the 'Submit' button to look for problems and to see the more complex words underlined."
For example, enter the entire contents of Chance News Wiki #40 to obtain the following:
Flesch reading ease score:
62.1
Automated readability index:
11.1
Flesch-Kincaid grade level:
9.1
Coleman-Liau index:
11.9
Gunning fog index:
14.1
SMOG index:
12.7
15658 characters 12913 non-space characters 12291 letters/numbers 2464 words 425 complex words 3681 syllables 136 sentences 4.99 chars per word 1.49 syllables per word 18.12 words per sentence
Oh, I forgot to mention thiswhich explains the Coleman-Liau Index
To calculate the Coleman-Liau Index:
1.. Divide the number of characters by the number of words, and multiply by 5.89. Call this A. 2.. Take the number of sentences in a fragment of 100 words, and multiply 0.3. Call this B. 3.. Subtract B from A and subtract 15.8 CLI = 5.89*(characters/words) -0.3*(sentences/words) -15.8
And from http://en.wikipedia.org/wiki/Automated_Readability_Index<http://en.wikipedia.org/wiki/Automated_Readability_Index>
To calculate the Automated Readability Index:
1.. Divide the number of characters by the number of words, and multiply by 4.71. 2.. Divide the number of words by the number of sentences, and multiply by 0.5. 3.. Add #1 and #2 together, and subtract 21.43.
Discussion
1. If possible, randomly sample material you have written and use the http://www.editcentral.com/gwt/com.editcentral.EC/EC.html<http://www.editcentral.com/gwt/com.editcentral.EC/EC.html> to see how your writing has changed over the years, thus obtaining a longitudinal view. Which index shows the most change in absolute value and/or relative value?
2. Do the same for Chance News to see how it has changed over the years.
3. Ask some teachers of English what they think of these figures of merit.
Submitted by Paul Alper