Chance News 38: Difference between revisions
Line 194: | Line 194: | ||
exploratory outcome so we have to worry about this. | exploratory outcome so we have to worry about this. | ||
Bob Norman subjested another thing we might worry about. He | Bob Norman subjested another thing we might worry about. He drew the following picture: | ||
<center> http://www.dartmouth.edu/~chance/forwiki/memoryx.jpg </center> | <center> http://www.dartmouth.edu/~chance/forwiki/memoryx.jpg </center> |
Revision as of 18:37, 2 August 2008
Quotations
If the devil exists, he no doubt has a high IQ and an Ivy League degree. It's clear that having an educational pedigree is no prophylactic against greed and bad behavior..
Tom Donaldson professor of ethics and law at Wharton
Star Tribune July 3, 2008
Paul Alper found inWikipedia the following quotations of well known statisticians relating to Types of error.
In 1948, Frederick Mosteller argued that a "third kind of error" was required to describe circumstances he had observed, namely:
Type I error: rejecting the null hypothesis when it is true.
Type II error: accepting the null hypothesis when it is false.
Type III error: correctly rejecting the null hypothesis for the wrong reason.
In 1957, Allyn W. Kimball, a statistician with the Oak Ridge National Laboratory, proposed a different kind of error to stand beside "the first and second types of error in the theory of testing hypotheses". Kimball defined this new "error of the third kind" as being "the error committed by giving the right answer to the wrong problem"
Mathematician Richard Hamming expressed his view that
It is better to solve the right problem the wrong way than to solve the wrong problem the right way.
In 1974, Ian Mitroff and Tom Featheringham extended Kimball's category, arguing that:
One of the most important determinants of a problem's solution is how that problem has been represented or formulated in the first place.
.
They defined type III errors as either "the error. of having solved the wrong problem. when one should have solved the right problem" or "the error. [of] choosing the wrong problem representation. when one should have. chosen the right problem representation"
In 1969, the Harvard economist Howard Raiffa jokingly suggested "a candidate for the error of the fourth kind: solving the right problem too late" .
In 1970, Marascuilo and Levin proposed a "fourth kind of error" -- a "Type IV error" -- which they defined in a Mosteller-like manner as being the mistake of "the incorrect interpretation of a correctly rejected hypothesis"; which, they suggested, was the equivalent of "a physician's correct diagnosis of an ailment followed by the prescription of a wrong medicine"
See the Wikipedia article for references.
Forsooth
Deborah Alper suggested the following Forsooth:
Roughly one-third of all eligible Americans, 64 million people, are not registered to vote. This percentage is even higher for African-Americans (30 percent) and Hispanics (40 percent).
The Nation
July 21/28, 2008, Page 32
Paul Alper suggested the following two Forsooths:
We want to persuade you of one claim: that William Sealy Gosset (1876-1937)--aka "Student" of Students t-test--was right and that his difficult friend, Ronald A. Fisher, though a genius, was wrong.
From the preface of Cult of Statistical Significance:
Deirdre Nansen McCloskey and Steve Ziliak
Feb 19, 2008
"There's this cluster of interrelated findings", said Richard A. Lippa, a professor of psychology at California State University at Fullerton, who has found evidence that in gay men, the hair on the back of the head is more likely to curl counterclockwise than in straight men. "These are all biological markers that something must have gone on early in development".
From an article by Rob Stein in the Washington Post,
February 5, 2008
Irreligion
Irreligion: A Mathematician Explains Why the Arguments for God Just Don't Add Up . By John Allen Paulos. 158 pp. Hill & Wang. $20.
John suggested that Chance News readers might enjoy some of the arguments that he used in his book that rely on probability concepts . We give a sample below and you can see more of his probability arguments in a talk he gave at the recent Conference "Beyond Belief Enlightenment 2.0" sponsored by the science network.
A common creationist argument goes roughly like the following. A very long sequence of individually improbable mutations must occur in order for a species or a biological process to evolve. If we assume these are independent events, then the probability of all of them occurring and occurring in the right order is the product of their respective probabilities, which is always a tiny number. Thus, for example, the probability of getting a 3, 2, 6, 2, and 5 when rolling a single die five times is 1/6 x 1/6 x 1/6 x 1/6 x 1/6 or 1/7,776 - one chance in 7,776. The much longer sequences of fortuitous events necessary for a new species or a new process to evolve leads to the minuscule probabilities that creationists argue prove that evolution is so wildly improbable as to be essentially impossible.
This line of argument, however, is deeply flawed. Leaving aside the issue of independent events, I note that there are always a fantastically huge number of evolutionary paths that might be taken by an organism (or a process), but there is only one that actually will be taken. So if, after the fact, we observe the particular evolutionary path actually taken and then calculate the a priori probability of its being taken, we will get the minuscule probability that creationists mistakenly attach to the process as a whole.
A related creationist argument is supplied Michael Behe, a key supporter of intelligent design. Behe likens what he terms the "irreducible complexity" of phenomena such as the clotting of blood to the irreducible complexity of a mousetrap. If just one of the trap's pieces is missing -- whether it be the spring, the metal platform, or the board -- the trap is useless. The implicit suggestion is that all the parts of a mousetrap would have had to come into being at once, an impossibility unless there were an intelligent designer. Design proponents argue that what's true for the mousetrap is all the more true for vastly more complex biological phenomena. If any of the 20 or so proteins involved in blood clotting is absent, for example, clotting doesn't occur, and so, the creationist argument goes, these proteins must have all been brought into being at once by a designer.
But the theory of evolution does explain the evolution of complex biological organisms and phenomena, and the Paley argument from design has been decisively refuted. Natural selection acting on the genetic variation created by random mutation and genetic drift results in those organisms with more adaptive traits differentially surviving and reproducing. (Interestingly, that we and all life have evolved from simpler forms by natural selection disturbs fundamentalists who are completely unphased by the Biblical claim that we come from dirt.) Further rehashing of defenses of Darwin or refutations of Paley is not my goal, however. Those who reject evolution are usually immune to such arguments anyway. Rather, my intention here is to develop some loose analogies between these biological issues and related economic ones and, secondarily, to show that these analogies point to a surprising crossing of political lines.
Paulos discusses the response of his readers in the current issue of his monthly column for ABCNews.
Paul Alper suggested that readers might enjoy the following:
Paulos often writes about unlikely events and how quickly the public tends to assume something supernatural is taking place. On page 52 of Irreligion he muses on numerological coincidences involving 9/11. He starts with 9/11 being "the telephone code for emergencies." The digits 9 + 1 + 1 sum to 11 and September 11 is the 254th day of the year so that 2 + 5 + 4 sum to 11. Further, there are another 111 days to the end of the year. The first plane to crash into the towers was flight number 11. The Pentagon, Afghanistan and New York City each have 11 letters. Moreover, any three-digit number when multiplied by 91 and 11 results in a six-digit number where digits four, five and six repeat digits one, two and three, respectively; in particular, starting with 911 results in 911,911. A few pages later he notes that on September 11, 2002 "the New York State lottery numbers were 911." The day before that,"the closing value of the September S&P 500 futures contracts" was 911. And to cinch it all, Johnny Unitas, the number one quarterback ever, died on September 11 and wore 19 on his jersey.
An improbable event and a coincidence
I have an example of an improbable event and a coincidence; it shows the difference between them. At Forrest's graduation last night, all of the seniors marched, in alphabetical order, to the stage to receive their diplomas. The women were wearing gray gowns and the men were wearing black gowns. I was careful to note any siblings (as far as I could tell, there were none). GREAT! So now we have a random sequence of coin tosses of length about 310, and the coin is pretty close to fair. The longest sequence of consecutive men I observed was 9; this is somewhat longer than the expected length of the longest run of heads, which is about 7, and somewhat longer than the expected length of the longest run of either heads or tails, which is about 8. So I observed a fairly unusual event. The coincidence is that Forrest was in the longest run of men.
An email from Charles Grinstead to Laurie Snell about his son's graduation.
The Drunkard's Walk:How Randomness Rules Our Lives
Leonard Mlodinow
Pantheon Books, New York, 2008
There are not many writers who can successfully write about mathematics for the general public but Leonard Mlodinow is one of them. He is a physicist who has written a number of successful books on physics and mathematics for the general public. He has also been an editor for Star Trek.
The Drunkard's Walk is his most recent book. In this book he shows that we all have a hard time understanding probability and yet it plays an important role in our daily lives. To show that we are not wired to understand probability, he has only to show us the birthday problem, the Monty Hall problem, the two sisters problems, the Linda problem, the two-envelope problem, etc.
Of course to understand how probability affects our lives we have to understand some basic probability. Mlodinow makes this more interesting by explaining probability along with the history of its development. He starts with Cardano introducing the sample space and solving dice problems. At the same time he discusses Cardano's colorful life. He then discusses Pascal and Fermat's solution to the problem of points. He continues with Bernoulli, deMere, and Bayes and explains their contributions including the law of large numbers, the central limit theory and conditional probability. Of course none of this is new but, what makes this book so interesting, is that while Mlodinow discusses probability concepts and applications he also explains how they can effect our lives. For example the argument that there is no such thing as a hot-hand in basketball might also be made about your stock adviser.
To hear this in action listen to Mlodinow himself here
You can also a review of this book and other similar books here by the well known probabilist David J. Aldous in the Berkeley Statistics Department. David teaches an Undergraduate Seminar From Undergraduate Probability Theory to the Real World. You will also find on his website a talk The top ten things that math probability says about the real world.
submitted by Laurie Snell
Researchers Fail to Reveal Full Drug Pay
Researchers Fail to Reveal Full Drug Pay
New York Times, June 8, 2008
Gardner Harris and Benedict Carey
The authors say:
A world-renowned Harvard child psychiatrist, whose work has helped fuel an explosion in the use of powerful antipsychotic medicines in children, earned at least $1.6 million in consulting fees from drug makers from 2000 to 2007 but for years did not report much of this income to university officials, according to information given Congressional investigators.
By failing to report income, the psychiatrist, Dr. Joseph Biederman, and a colleague in the psychiatry department at Harvard Medical School, Dr. Timothy E. Wilens, may have violated federal and university research rules designed to police potential conflicts of interest, according to Senator Charles E. Grassley, Republican of Iowa. Some of their research is financed by government grants.
Like Dr. Biederman, Dr. Wilens belatedly reported earning at least $1.6 million from 2000 to 2007, and another Harvard colleague, Dr. Thomas Spencer, reported earning at least $1 million after being pressed by Mr. Grassley’s investigators. But even these amended disclosures may understate the researchers’ outside income because some entries contradict payment information from drug makers, Mr. Grassley found.
In one example, Dr. Biederman reported no income from Johnson & Johnson for 2001 in a disclosure report filed with the university. When asked to check again, he said he received $3,500. But Johnson & Johnson told Mr. Grassley that it paid him $58,169 in 2001, Mr. Grassley found.
The Harvard group’s consulting arrangements with drug makers were already controversial because of the researchers’ advocacy of unapproved uses of psychiatric medicines in children.
In addition to money that they get from the drug company, researchers often get addition support from the National Institute of Health that has some responsibility to monitor conflicts of interest. Since neither the Universities nor the NIH seem to be doing their duty Senator Grassley is asking Congress and the NIH to do something about this. You can read his proposal here.
Does the internet help or confuse medical decisions?
I (JLS) have been diagnosed as having Mild Cognitive Impairment (MCI). This is a transition stage between the cognitive changes of normal aging and the more serious problems caused by Alzheimer's disease (AD). It has been recommended that I take two medications Aricept (donepezil) and Excelon (rivastigmine). The are not expected to improve memory but they are thought to delay the occurrence of Alzheimer’s disease. Since the possible side effects are unpleasant I decided to look on the web for studies on the effectiveness of these drugs.
I found that the recommendation for donepezil is often based on the article:
Vitamin E and Donepezil for the Treatment of Mild Cognitive Impairment.
New England Journal of Medicine, June 9, 2005.
Ronald C. Petersen and others.
This is a well designed experiment and they describe there study as:
A total of 769 subjects were enrolled, and possible or probable Alzheimer's disease developed in 212. The overall rate of progression from mild cognitive impairment to Alzheimer's disease was 16 percent per year. As compared with the placebo group, there were no significant differences in the probability of progression to Alzheimer's disease in the vitamin E group (hazard ratio, 1.02; 95 percent confidence interval, 0.74 to 1.41; P=0.91) or the donepezil group (hazard ratio, 0.80; 95 percent confidence interval, 0.57 to 1.13; P=0.42) during the three years of treatment. Prespecified analyses of the treatment effects at 6-month intervals showed that, as compared with the placebo group, the donepezil group had a reduced likelihood of progression to Alzheimer's disease during the first 12 months of the study (P=0.04), a finding supported by the secondary outcome measures. Among carriers of one or more apolipoprotein E 4 alleles, the benefit of donepezil was evident throughout the three-year follow-up. There were no significant differences in the rate of progression to Alzheimer's disease between the vitamin E and placebo groups at any point, either among all patients or among apolipoprotein E 4 carriers.
Their conclusion was:
Vitamin E had no benefit in patients with mild cognitive impairment. Although
donepezil therapy was associated with a lower rate of progression to Alzheimer's disease during the first 12 months of treatment, the rate of progression to Alzheimer's disease after three years was not lower among patients treated with
donepezil than among those given placebo.
The 3 year period was the primary outcome, but the one year period was an exploratory outcome so we have to worry about this.
Bob Norman subjested another thing we might worry about. He drew the following picture:
The top line is the rate of progression of Alzheimer's for the placebo group and the bottom line for the treated group.This shows the rate of progression of Alzheimer's for the treated group decreasing in the first year and then increasing until the third year. But after that the treated group increases faster than the placebo group!
The most recent study for rivastigmine seems to be the following:
Effect of rivastigmine on delay to diagnosis of Alzheimer's disease from mild cognitive impairment:
Lancet Neurology, June 2007
Howard Feldman and others.
The authors describe their study as follows:
Of 1018 study patients enrolled, 508 were randomly assigned to rivastigmine and 510 to placebo; 17·3% of patients on rivastigmine and 21·4% on placebo progressed to AD (hazard ratio 0·85 [95% CI 0·64–1·12]; p=0·225). There was no significant difference between the rivastigmine and placebo groups on the standardized Z score for the cognitive test battery measured as mean change from baseline to endpoint (−0·10 [95% CI −0·63 to 0·44], p=0·726). Serious adverse events were reported by 141 (27·9%) rivastigmine-treated patients and 155 (30·5%) patients on placebo; adverse events of all types were reported by 483 (95·6%) rivastigmine-treated patients and 472 (92·7%) placebo-treated patients. The predominant adverse events were cholinergic: the frequencies of nausea, vomiting, diarrhoea, and dizziness were two to four times higher in the rivastigmine group than in the placebo group.
And their interpretation was:
There was no significant benefit of rivastigmine on the progression rate to AD (Alzhimers) or on cognitive function over 4 years. The overall rate of progression from MCI to AD in this randomized clinical trial was much lower than predicted. Rivastigmine treatment was not associated with any significant safety concerns.
Ronald C Peterson (lead author of the New England Journal of medicine study) wrote a critique of this study
MCI treatment trials: failure or not? The Lancet Neurology - Volume 6, Issue 6 (June 2 Ronald C Petersen
In this critique he writes"
The 3-year duration of anticipated therapeutic effect applied to this study was, in retrospect, overly ambitious: the treatment did not work for this duration in the donepezil and vitamin E trial, and no cholinesterase inhibitor has been shown to work for 3 years, even in AD. Therefore, this duration of effect would not be expected at the MCI stage. This, coupled with the subtherapeutic doses used, contributed to the treatment failure.
In spite of all these challenges, there was a glimmer of efficacy—a trend towards a positive rivastigmine effect. Some of the MRI measures suggested a therapeutic response during the 1 year to 2 year window, which is similar to the mild efficacy effect in the donepezil and vitamin E trial?
Note: Ronold C. Peterson is the Director of the Mayo Alzheimer's Disease Research Center.
Well were does that leave me? Answer: Confused.
---Discussion---
(1) What are exploratory outcomes and why do we have to worry about them?
(2) What do doctors know that the Internet doesn't?
(3) How do you think we should use the internet in making a medical decision?
Submitted by Laurie Snell
A Controversy About Doping
Detecting drug cheats is a key issue in the lead-up to the Beijing Olympics this summer. Not a surprise given the battering of top-flight sports by successive doping scandals. In June, cyclist Floyd Landis officially lost his 2006 Tour de France title, almost two years after his urine gave a positive test for a performance-enhancing drug known as EPO. WADA (World Anti Doping Agency) has accredited 33 labs around the world to perform this and other anti-doping tests. Recently, some Danish researchers rained on the parade, claiming that their experiment proved that the EPO test had poor “detection power”. (For a good summary, see “The Validity of EPO Testing for Athletes”, ScienceDaily.com, June 28, 2008.)
In their paper, researchers at the Copenhagen Muscle Research Center made two specific claims: (a) that the detection power of the EPO test is poor; and (b) that agreement between results from two WADA-accredited labs is very poor. They concluded that due to a high false negative rate (low sensitivity), this test would fail to catch many drug cheats. They called into question WADA’s ability to detect EPO abuse at the Beijing Olympics.
In the experiment, they recruited eight (8) healthy college students, all non-athletes, to follow a program of EPO injection and exercise over a seven-week period, divided into three phases (boosting/higher dose, maintenance/lower dose, post treatment/off cycle). Eight, 16 and 24 urine samples were collected in these respective phases, in addition to eight samples taken before the EPO program to serve as base-line. Each half-sample was submitted to one of two WADA-accredited labs (known as “Lab A” and “Lab B”) for EPO testing.
An excerpt from the table of results from the original paper is given below:
Phase |
Pre |
Boosting |
Maintenance1 |
Maintenance2 |
Post1 |
Post2 |
Post3 |
Samples tested |
8 |
8 |
8 |
8 |
7* |
6* |
7* |
Lab A + |
0 |
8 |
4 |
2 |
2 |
0 |
0 |
Lab B + |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Note: The reduction in samples in the post-treatment period was not explained
Source: Lundby, et. al., “Testing for recombinant human erythropoietin in urine: problems associated with current anti doping testing”, PresS. J Appl. Physiol, June 26, 2008, online.
--- Discussion ---
(1) Describe a pair of metrics that statisticians use to measure the accuracy (“detection power”) of diagnostic tests. This experiment addressed only one of these metrics. Which one? Why couldn’t this experimental design capture the other metric? Why is it important to know the other metric before passing judgment on test accuracy?
(2) Inside the paper, the authors revealed that each lab classified each sample into one of three categories: positive, suspicious and negative. Suspicious cases are subject to further confirmatory testing (although we do not know if this was done). We were told that Lab A indicated 2 maintenance samples and 3 post-treatment samples “suspicious”; Lab B indicated 7 boosting samples and 5 maintenance samples “suspicious”; Lab B called 1 boosting sample negative. Do the additional data affect your opinion of the study’s conclusions?
(3) Comment on the sample size, and the selection mechanism. How comfortable are you to make statistical inference using this data?
(4) In the Discussion, the authors used language like “In the ‘maintenance’ period, laboratory A found six positive results … in a total of 16 samples.” Explain why special testing methodology must be used if you were to treat this data as an n=16 sample. What key assumption is violated when ordinary testing is invoked?
--- Further Discussion ---
It has been widely reported that the EPO test is highly “unreliable” because this study proved that two labs, both WADA certified, could produce divergent analytical results. In response, Olivier Rabin, WADA’s scientific director, doubted that this study “reflected the true state of EPO testing”. (“Study Shows Problems in Olympic-Style Tests”, New York Times, June 26, 2008) Dr. Rabin raised a core question in statistical inference: given that the true state is unknown, one would like to see if the experimental data provides sufficient statistical evidence to prove/disprove one’s hypothesis.
(5) State null and alternative hypotheses for this problem. Identify the population and the sample in the Danish experiment. What was their sample size?
(6) Under what conditions are you willing to generalize the result of this test of Lab A vs. Lab B? (Or design a different experiment to test your stated hypotheses.)
Submitted by Kaiser Fung
Correlation is enough
The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Chris Anderson, Wired Magazine, 23 June 2008
This article, written by the editor of Wired Magazine, argues that the scientific model - propose a theory then test it by experiment - is outdated, due to the arrival of huge data sets and powerful computing clouds that can crunch this data to look for patterns.
At the petabyte scale, information is not a matter of simple three and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later.
Anderson argues that, upto now, scientists hypothesize, model and test to uncover the links between events, to show how those events come about (causation) and then make predictions about the future; but the way forward is to highlight any correlations which exist, so that patterns emerge right out of the data without pre-selecting the data with a hypothesis and a model.
Some examples based on Google's success include:
Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day.
You can find an impressive list of twelve other examples on this Edge link, such as searching for quarks, winning lawsuits, tracking air fares or monitoring epidemics.
The purpose of a scientific theory is to make predictions, which requires an explanation or justification for why event B follows on from event A, say. The author claims that with a large enough data set it is possible to empirically tabulate what comes after event A, to a high degree of accuracy, and this yields useful resutls without having to understand exactly why. So decisions can be based on this empirical result without any need for a theory to justify the occurance, or not, of event B, following on from event A.
An example is translation engines that translate between languages by comparing vast corpuses of texts that have already been translated, such as official European Union or United Nations documents.
If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them. ... And why it can match ads to content without any knowledge or assumptions about the ads or the content.
What emerges after the computer has been 'trained' is a complex algorithm that has no theory in it but it can translate languages very effectively, the author claims.
Questions
- The article and the responses that it generated focus on what it means to engage in science:
- The author argues that using a large collection of data to view data mathematically first and establish a context for it later is somehow different from the way science has worked for hundreds of years. Do you agree?
- The author claims that sometimes enough correlations are sufficient. For example, a medical doctor may not find the cause an illness but may be able to predict its course and treat its symptoms. Is this science?
- If you don't have a model or theory, is your work something others can build on? If it is not reproducible, is it still science?
- If science is a way to test ideas and answer questions. Does Google's 'cataloguing observations' approach qualify as science?
- Does Google's PageRank search algorithm incorporate a model of the web as a structured social network, in which each link from a node to another one is interpreted as a vote from that node to the other?
- For data to have any meaning, you must have theory. For theories to reflect reality you must have data. They are mutually important to each other. Do you agree?
- The emphasis in these techniques is the data-intensive nature of computation, rather than on the computing cluster itself. Is this new 'petabyte' world good or bad for statistics? Speculate about what kind of statistics/statisticians would benefit and what or who might lose out?
- These new methods are the beginning for new scientific methods which will change the way we understand the world. Do you agree?
- Anderson doesn’t define correlation.
- How many definitions of correlations can you think of? What implicit assumptions are required? Would these differences matter at the petabyte scale?
- Are correlations model free? Is there an implicit model embedded in any system that generates answers?
- Anderson's approach suggests gather lots of data and assume it is representative of other situations. e.g. what worked well in the past will work well (enough) in the future. Is this just a non-parametric model? Is it possible to extrapolate from what has been observed to, as yet, unobserved conclusions? For example, Google's translation engine can translate to and from Chinese even though none of the employees who built it can speak any Chinese dialect.
- What do you think the author meant by dimensionally agnostic statistics?
- Will number-crunching computers will ever entirely replace human experts?
- Do you agree with this article's predicition about the death of theory?
- Here is a link to five previous predictions that Wired Magazine now regrets.
Further reading
- Many blogs have commented on this article, such as:
- The Google Way of Science on Kevin Kelly's Blog - The Technium. "In the coming world of cloud computing perfectly good answers will become a commodity. The real value of the rest of science then becomes asking good questions."
- Great minds think (too much) alike, The Economist, 17 Jul 2008. An unrelated article but on a similar topic. As more journals become available online, fewer articles are being cited in the reference lists of the research papers published within them, based on the correlations between two averages derived from a database of 34 million research papers.
- Google Translate FAQ "we feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model."
- Fancy math takes on je ne sais quoi, Gregory M. Lamb, The Christian Science Monitor. An on-line article on translation engines.
- The author also wrote The Long Tail.
- There is a summary and an analysis of some 'long tail' data - Should you invest in the long tail? (print version) in the Harvard Business Review.
Submitted by John Gavin.
The value of a joint replacement registry
A Call for a Warning System on Artificial Joints Barry Meier, The New York Times, July 29, 2008.
A registry is a database that tracks health outcomes as they are reported by doctors. In the United States, a registry known as SEER (Surveillance, Epidemiology, and End Results) tracks information about cancer patients
The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI) is an authoritative source of information on cancer incidence and survival in the United States. SEER currently collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 26 percent of the US population. SEER coverage includes 23 percent of African Americans, 40 percent of Hispanics, 42 percent of American Indians and Alaska Natives, 53 percent of Asians, and 70 percent of Hawaiian/Pacific Islanders. (Details are provided in the table: Number of Persons by Race and Hispanic Ethnicity for SEER Participants.) The SEER Program registries routinely collect data on patient demographics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and follow-up for vital status. The SEER Program is the only comprehensive source of population-based information in the United States that includes stage of cancer at the time of diagnosis and patient survival data. http://seer.cancer.gov/about
A registry can also be limited to a specific region or even a single medical site. The Aghia Sofia Children's Hospital in Athens created a meningitis registry in 1970.
BACKGROUND: Bacterial meningitis remains a source of substantial morbidity and mortality in childhood. During the last decades gradual changes have been observed in the epidemiology of bacterial meningitis, related to the introduction of new polysaccharide and conjugate vaccines. The study presents an overview of the epidemiological patterns of acute bacterial meningitis in a tertiary children 's hospital during a 32-year period, using information from a disease registry. Moreover, it discusses the contribution of communicable disease registries in the study of acute infectious diseases. METHODS: In the early 1970s a Meningitis Registry (MR) was created for patients admitted with meningitis in Aghia Sofia Children's Hospital in Athens. The MR includes demographic, clinical and laboratory data as well as treatment, complications and outcome of the patients. In 2000 a database was created and the collected data were entered, analyzed and presented in three chronological periods: A (1974-1984), B (1985-1994) and C (1995-2005). RESULTS: Of the 2,477 cases of bacterial meningitis registered in total, 1,146 cases (46.3%) were classified as "probable" and 1,331 (53.7%) as "confirmed" bacterial meningitis. The estimated mean annual Incidence Rate (IR) was 16.9/100,000 for bacterial meningitis, 8.9/100,000 for Neisseria meningitidis, 1.3/100,000 for Streptococcus pneumoniae, 2.5/100,000 for Haemophilus influenzae type b (Hib) before vaccination and 0.4/100,000 for Hib after vaccination. Neisseria meningitis constituted the leading cause of childhood bacterial meningitis for all periods and in all age groups. Hib was the second most common cause of bacterial meningitis before the introduction of Hib conjugate vaccine, in periods A and B. The incidence of bacterial meningitis due to Streptococcus pneumoniae was stable. The long-term epidemiological pattern of Neisseria meningitidis appears in cycles of approximately 10 years, confirmed by a significant rise of IR in period C. The Case Fatality Rate (CFR) from all causes was 3.8%, while higher CFR were estimated for Streptococcus pneumoniae (7.5%, RR=2.1, 95% CI 1.2-3.7) and Neisseria meningitidis (4.8%, RR=1.7, 95% CI 1.1-2.5) compared to other pathogens. Moreover, overall CFR varied significantly among the three time periods (p = 0.0015), and was estimated to be higher in period C. CONCLUSION: By using the MR we were able to delineate long-term changes in the epidemiology of bacterial meningitis. Thus the MR proved to be a useful tool in the study and the prevention of communicable diseases in correlation with prevention strategies, such as vaccinations. Meningitis registry of hospitalized cases in children: epidemiological patterns of acute bacterial meningitis throughout a 32-year period
The New York Times article started with the story of a doctor who had bad experiences with several patients who had recently had a particular joint replacement part implanted.
Dr. Lawrence Dorr, a nationally known orthopedic surgeon in Los Angeles, realized last year that something was very wrong with some of his patients.
Months after routine hip replacements, patients who had expected to live without pain were in agony. "The pain was grabbing me around the back," said Stephen Csengeri, who is 54, and a lawyer from Torrance, Calif.
Dr. Dorr found he had implanted the same metal hip socket in each patient. Several needed surgery again — a replacement for their replacement.
The doctor first told the device’s manufacturer, Zimmer Holdings, last year about his concerns but nothing happened. Then in April, Dr. Dorr, who was a highly paid consultant for Zimmer, sounded an alarm to colleagues in a professional association and soon heard back from doctors with similar experiences.
Although this particular joint replacement part has been temporarily recalled (see more about this below), the article goes on to suggest that if the United States had developed a registry for patients undergoing joint replacement therapy, the problem would have been identified and resolved sooner. Many other countries have joint replacement registries.
But the United States lacks such a national database, called a joint registry, that tracks how patients with artificial hips and knees fare. The risk in the United States that a patient will need a replacement procedure because of a flawed product or technique can be double the risk of countries with databases, according to Dr. Henrik Malchau of Massachusetts General Hospital.
An attempt to start a joint replacement registry in the United States failed to get crucial support.
Medicare, which pays for about half the hip and knee implants in this country, rebuffed a proposal two years ago from a medical group to support a joint database. It said it was not the agency’s job to gather such data — despite the considerable savings in taxpayer dollars that might come from reducing the number of do-over surgeries.
The article draws a sharp contrast between the outcomes in the United States, which does not have a registry, and Sweeden, which does have a registry.
Eight years ago, he [Dr. Dorr] alerted another implant producer, Sulzer Orthopedics, that patients with one of its hip implants were having such pain they needed replacement surgery almost immediately. Sulzer withdrew the device six months later, but about 3,000 patients got replacements for the implant, which had become contaminated by oil during manufacturing. Sulzer, deluged by lawsuits, threatened to file for bankruptcy protection.
But because of their registry, Swedish doctors were alerted after just 30 patients got the Sulzer hip that it had an alarmingly high replacement rate, Dr. Malchau said.
Also, doctors in Sweden today are much less likely than American doctors to embrace new devices until registry data show they work well.
"It has made surgeons stick to well-documented implants," said Dr. Johan Karrholm, who helps direct the Swedish program.
The recall of the device that led off this story, though, is still shrouded in controversy.
Last week, Zimmer announced it was suspending sales of the device, known as the Durom cup, until it trained doctors how best to implant it.
The company does not believe that the device itself is faulty.
Earlier this year, after Dr. Dorr urged Zimmer executives to stop selling the cup, they told him the fault was in his implantation technique, not their product — the same response he received years before from Sulzer executives. That is when he decided to alert colleagues at the American Association of Hip and Knee Surgeons.
In late May, Zimmer informed surgeons that it was investigating Dr. Dorr’s complaint but that it did not see a need for an action like suspending sales. Last week, in releasing a summary of its investigation, the company said that cup failure rates had varied widely among clinics, a disparity it attributed to varying surgical techniques. Some doctors did not have problems.
Questions
1. The article mentions some of the practical difficulties in running a registry. Do you feel that the value of such a registry would warrant the costs involved.
2. Any registry is going to have the potential for serious biases. What are some of the biases that might affect a registry?
3. Does heterogeneity in the failure rate among different surgeons imply that surgical technique is at fault? What are some other possible explanations for this heterogeneity?
Submitted by Steve Simon.