Chance News 104
Quotations
Forsooth
Transitivity, Correlation and Causation
Theorem 1 of the article cited by Paul Alper in the previous issue, "Is the Property of Being Positively Correlated Transitive?" (The American Statistician, Vol. 55, No. 4, November, 2001), depends on the existence of non-observed independent random variables U, V, and W which cause the correlations between X=U+V, Y=W+V, and Z=W-U to be non-transitive. An interesting question is whether this relates back to the difference between causation and correlation.
The answer turns out to be no, we can get the same sort of result even in the presence of causative relationships between X, Y and Z. Here’s an example:
- X is N(0,1);
- Y = X + U, where U is N(0,1) and independent of X;
- Z = Y - 1.5*X.
The correlation coefficients between X and Y and between Y and Z are both positive but the correlation coefficient between X and Z is negative.
Stan Lipopvetsky’s follow-up letter (The American Statistician, 56:4, 341-342, 2002) hints at this but does not include an actual example.
Submitted by Emil M Friedman
Followup
Thanks to John Allen Paulos, who sent the following link:
Who's Counting: Non-transitivity in baseball, medicine, gambling and politics
by John Allen Paulos, ABCNews.com, 5 December 2010
This installment from the john's "Who's Counting" column describes several real world illustrations of non transitivity in correlation. Among these:
In baseball, [an analysis] from the American Statistician of a year of batting data from the New York Yankees showed that the number triples hit by a player correlated positively with the number of base hits he had, which in turn correlated positively with the number of home runs he hit; however, the number of triples a player hit correlated negatively with the number of home runs he hit. As John explains, good hitters get base hits of all kinds, so it is not surprising that home runs an triples are positively correlated with total hits. But triples tend to be the result of speed, while home runs require power, and stronger players tend to be less speedy. In politics, the 2010 Alaska election for US Senate was tightly contested. Lisa Murawski won re-election as a write-in candidate after losing to Tea Party candidate Joe Miller in the Republican primary. Democrat Scott McAdams finished third. Paulos imagines what might have happened had the voters been asked to rank all three candidates. He writes:
Let's further imagine that faction A, roughly one third of the electorate, preferred Murkowski to Miller to McAdams; faction B, also about one third of the electorate, preferred Miller to McAdams to Murkowski; and faction C, the remaining one third of the electorate, favored McAdams to Murkowski to Miller.
If this had been the case, a clear majority of the electorate - factions A and C - would have preferred Murkowski to Miller, and a clear majority - factions A and B -- would have preferred Miller to McAdams. Yet a clear majority -- factions B and C -- would have preferred McAdams to Murkowski.
See the article for further discussion.