Chance News 110: Difference between revisions

From ChanceWiki
Jump to navigation Jump to search
Line 85: Line 85:
<center>[[File:GCI.png | 500px]]</center>
<center>[[File:GCI.png | 500px]]</center>


The general result holds for multivariate normal distribution and the intersection of two symmetric convex sets. In the example, the sets are the two infinite strips (light shading) that intersect in a rectangle (dark shading).
The general result applies to a general multivariate normal distribution and the intersection of two symmetric convex sets. In the example, the sets are the two infinite strips (light shading) that intersect in a rectangle (dark shading).


Royen's original paper can be found [https://arxiv.org/pdf/1408.1028.pdf here]. It is short but still quite technical.  [https://almostsure.wordpress.com/2017/05/22/the-gaussian-correlation-inequality/ Here] is a nice blogpost that provides useful additional context for the result.
Royen's original paper can be found [https://arxiv.org/pdf/1408.1028.pdf here]. It is short but still quite technical.  [https://almostsure.wordpress.com/2017/05/22/the-gaussian-correlation-inequality/ Here] is a nice blogpost that provides useful additional context for the result.

Revision as of 15:32, 22 June 2017

January 1, 2017 to June 30, 2017

Quotations

“When a coincidence seems amazing, that’s because the human mind isn’t wired to naturally comprehend probability and statistics.”

-- Neil deGrasse Tyson, quoted in: Coincidences are underrated, OUPBlog, 27 May 2017

"What Thursday revealed is that polls struggle to capture the crucial nuances of politics today: there’s no longer a single story in Britain – and averages are dead."

-- Mona Chalabi, in: After this general election is it time to downgrade opinion polls?, Guardian, 10 June 2017

"They [new planet candidates] are fascinating, but Kepler’s mission is not to pinpoint the next tourist destination — it is to find out on average how far away such places are. Or, as Dr. Batalha said, We’re not stamp collecting, we’re doing statistics.

-- in: Earth-size planets among final tally of NASA’s Kepler telescope, New York Times, 29 June 2017

Forsooth

“[Richard] Florida finds that this population [service workers] currently splits its vote evenly between the two parties — no statistical significance for either Trump or Clinton. ”

in: Where Democrats can find new voters, New York Times, 15 June 2017.

Statistical artifacts

Artifacts (from XKCD)

Artifacts-XKCD.png

Suggested by Michelle Peterson

Crowd size

From Lincoln to Obama, how crowds at the capitol have been counted
by Tim Wallace, New York Times, 18 January 2017

This article anticipates the controversy that ensued from Trump's claims about the size of the crowd for his inauguration.

There is a nice historical retrospective here, starting with Lincoln's inauguration. Period photographs have now been studied using tools like Google Earth to give an estimate of 7350 attendees.

Controversy over crowd estimates is also nothing new. It's now been more than 20 years since Louis Farrakhan's Million Man March in 1995. His supporters threatened to sue the National Park Service for giving an estimate of only 400,000. In the aftermath, the Park Service stopped providing official estimates.

In Crowd estimates from Chance News 68, we described Glenn Beck's 2010 rally and event held in response by John Stewart and Stephen Colbert. A Washington Post story at the time gave an annotated graphic of the satellite photo analysis of Barack Obama's 2009 inaugural. The present NYT article notes that satellite analyses have become more common since that time.

The NYT also references a Scientifc American discussion, The simple math behind crunching the sizes of crowds. As their "Math Dude "Jason Marshall, says "I feel that it’s important to note that estimating crowd sizes is a solved problem that’s actually pretty straightforward."

Of course, when the estimate becomes a proxy for political support, things are not so straightforward. The 2017 inaugural has given us the phrase alternative facts!

Still thinking about the election

Margaret Cibes sent a link to the following:

The 2016 national polls are looking less wrong after final election tallies
by Scott Clement, Washington Post, 6 February 2017

Gender stereotypes

Nick Horton sent the following to the Isolated Statisticians list-serv:

Gender stereotypes about intellectual ability emerge early and influence children’s interests
by Lin Bian, Sarah-Jane Leslie, Andrei Cimpian, Science, 27 January 2017

The full article requires a subscription. From the summary on the web page we read:

The distribution of women and men across academic disciplines seems to be affected by perceptions of intellectual brilliance. Bian et al. studied young children to assess when those differential perceptions emerge. At age 5, children seemed not to differentiate between boys and girls in expectations of “really, really smart”—childhood's version of adult brilliance. But by age 6, girls were prepared to lump more boys into the “really, really smart” category and to steer themselves away from games intended for the “really, really smart.”

Nick recommended this study for use in class for a number of reasons, including the fact that available for download from the Open Science Framework, and the analyses are quite accessible with tools such as the t-test and the chi-squared test.

Here is a newspaper story about the study:

Why young girls don’t think they are smart snough
by Andrei Cimpian and Sarah-Jane Leslie, New York Times, 26 January 2017

Hans Rosling

Margaret Cibes sent a link to the following:

Hans Rosling, Swedish doctor and pop-star statistician, dies at 68
by Sam Roberts, New York Times, 9 February 2017

With his famous Gapminder presentations, Rosling invited us to "Pour the sparkling fresh numbers into your eyes and upgrade your worldview."

Gaussian correlation inequality

Pete Schumer sent a link to the following:

A long-sought proof, found and almost lost
by Natalie Wolchover, Quanta, 28 March 2017

Thomas Royen, a retired German statistics professor, has published a proof for the Gaussian correlation inequality, a result originally conjectured in the 1950s. The Quanta article gives an engaging description of Royen's discovery and the path to getting the result published. There is also a nice illustration a simple case with a bivariate normal distribution

GCI.png

The general result applies to a general multivariate normal distribution and the intersection of two symmetric convex sets. In the example, the sets are the two infinite strips (light shading) that intersect in a rectangle (dark shading).

Royen's original paper can be found here. It is short but still quite technical. Here is a nice blogpost that provides useful additional context for the result.

Spotting bad statistics

Priscilla Bremser recommended the following TED talk:

3 ways to spot a bad statistic, by Mona Chalabi

Chalabi is data editor of the Guardian US. In this short (under 12 minutes) and very entertaining talk, she describes describes the problem society faces when policymakers can't get agreement on baseline statistical facts.

In between blindly accepting or reflexively denying any data-based claim, she describes three points to remember when evaluating statistics.

  1. Can you see uncertainty?
  2. Can I see myself in the data?
  3. How was the data collected?

Regarding uncertainty, she discusses the reasons that opinion polling has become more difficult, and wonders why the probability of a Hillary Clinton win was reported "with decimal places." On seeing yourself in the data, she notes that reporting only averages frustrates people who don't see their own experience represented. There is a very memorable quote in the section on data collection, where she observes that for one cosmetics commercial L'Oreal was happy to talk to just 48 women to "prove" that their product worked. She says:

Private companies don't have a huge interest in getting the numbers right, they just need the right numbers.

Flint water crisis

Q&A: Using Google search data to study public interest in the Flint water crisis
by John Gramlick, Pew Research Center, 27 April 2017.

The murky tale of Flint's deceptive water data
by Robert Langkjær-Bain, Significance, 5 April 2017

What went wrong In Flint
by Anna Maria Barry-Jester, FiveThirtyEight, 26 January 2016

Same stats (think Anscombe)

Jeff Witmer sent the following link to the Isolated Statisticians list.

Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing
by Justin Matejka, ACM SIGCHI Conference on Human Factors in Computing Systems

Observing that it is not known how Frank Anscombe went about creating his famous quartet of scatterplots, the authors present the results of their simulated annealing technique to produce some striking visualizations. You'll want to see the Datasaurus Dozen, which even has an R data package.

The fivethirtyeight package for R

fivethirtyeight Package
by Albert Y. Kim, Chester Ismay, and Jennifer Chunn, announced 13 March 2017

The authors have developed a package for pedagogical use that provides data and R code corresponding to analyses presented at FiveThirtyEight.com. Their goal is to allow students to get into the data with minimal overhead.

This should be a very valuable resource for teaching about statistics in the news! Here is a quick illustration of how to use the package. More details are available in the package vignette linked above.

Debate over white mortality

Stop saying white mortality Is rising
by Jonathan Auerbach and Andrew Gelman, Slate, 28 March 2017

The forces driving middle-aged white people's 'deaths Of despair'
by Jessica Boddy, NPR Morning Edition, 23 March 2017


Interracial marriage

Peter Doyle sent a link to this chart from the Economist:

Daily chart: Interracial marriages are rising in America
Economist, 12 June 2017
Economist 50yrsLoving.png,

Quoting from the article, one reader commented:

"Of the roughly 400,000 interracial weddings in 2015, 82% involved a white spouse, even though whites account for just 65% of America’s adult population. " If you lump the population into just two groups A and B, 100% of intergroup marriages will involve a spouse from group A, no matter what fraction of the population belongs to group A.

Exercise: 2015 census data is available by googling "us census quickfacts". While the categories don't precisely match those in this piece, you can use this data to get a rough estimate the fraction of interracial weddings that would involve a white spouse under random pairing. What do you get? Is your answer more or less than 82%?

Peter notes that he got just over 82%. Here is his solution (using Mathematica):

Doyle marriage.png