Teaching

  • Three basic theorems concerning expected values and variances of sums and products of random variables play an important role in mathematical statistics and its applications in education, business, the social sciences, and the natural sciences. A solid understanding of these theorems requires that students be familiar with the proofs of these theorems. But while students who major in mathematics and other technical fields should have no difficulties coping with these proofs, students who major in education, business, and the social sciences often find it difficult to follow these proofs. In many textbooks and courses in statistics which are geared to the latter group, mathematical proofs are sometimes omitted because students find the mathematics too confusing. In this paper, we present a simpler approach to these proofs. This paper will be useful for those who teach students whose level of mathematical maturity does not include a solid grasp of differential calculus.

  • Textbooks and websites today abound with real data. One neglected issue is that statistical investigations often require a good deal of cleaning to ready data for analysis. The purpose of this dataset and exercise is to teach students to use exploratory tools to identify erroneous observations. This article discusses the merits of such an exercise and provides a team project, problem data, cleaned data for instructors, and reflections on past experiences. The main goal is to give instructors a prepared project for their students to perform realistic data preparation and subsequent analysis. The data for this project involve categorical and continuous variables for subjects age 65 and over testing calcium, inorganic phosphorus, and alkaline phosphatase levels in the blood. The project described in this article involves summary analysis, but the cleaned data could also be used for

  • Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal distributions, of distributions with thick tails that do not follow the usual models studied in class, and many other interesting statistical curiosities. This paper summarizes the results of research in two areas of Internet data analysis: users' web browsing behavior and network performance. We present some of the main questions analyzed in the literature, some unsolved problems, and some typical data analysis methods used. We illustrate the questions and the methods with large data sets. The data sets were obtained from the publicly available pool of data and had to be processed and transformed to make them available for classroom exercises. Students in Introductory Statistics classes as well as Probability and Mathematical Statistics courses have responded to the stories behind these data sets and their analysis very well. The message in the stories can be conveyed at a descriptive or a more advanced level.

  • We present and discuss three examples of misapplication of the notion of conditional probability. In each example, we present the problem along with a published and/or well-known incorrect - but seemingly plausible - solution. We then give a careful treatment of the correct solution, in large part to show how careful application of basic probability rules can help students to spot and avoid these mistakes. With each example, we also hope to illustrate the importance of having students draw a tree diagram and/or a sample space for probability problems not involving data (i.e., where a contingency table might not be obviously applicable).

  • In this paper, we consider some combinatorial and statistical aspects of the popular Powerball lottery game. It is not difficult for students in an introductory statistics course to compute the probabilities of winning various prizes, including the jackpot in the Powerball game. Assuming a unique jackpot winner, it is not difficult to find the expected value and the variance of the probability distribution for the dollar prize amount. In certain circumstances, the expected value is positive, which might suggest that it would be desirable to buy Powerball tickets. However, due to the extremely high coefficient of variation in this problem, we use the law of large numbers to show that we would need to buy an untenable number of tickets to be reasonably confident of making a profit. We also consider the impact of sharing the jackpot with other winners.

  • Many textbooks teach a rule of thumb stating that the mean is right of the median under right skew, and left of the median under left skew. This rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long but the other is heavy. Most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they also contradict the textbook interpretation of the median. We discuss ways to correct ideas about mean, median, and skew, while enhancing the desired intuition.

  • A data set contained in the Journal of Statistical Education's data archive provides a way of exploring regression analysis at a variety of teaching levels. An appropriate functional form for the relationship between percentage body fat and the BMI is shown to be the semi-logarithmic, with variation in the BMI accounting for a little over half of the variation in body fat. The fairly modest strength of the relationship implies that confidence intervals for body fat, and tolerance intervals for BMI, can be quite wide, so that strict reliance on the BMI as a measure of body fat, and hence obesity, is unwarranted. Nevertheless, when fitting percentage body fat as a function of the class of "power weight for height indices", i.e., indices of the form weight/heightp, the BMI, with a height exponent of p = 2, is an appropriate choice to make.

  • Many leaders of our profession have called for improvements in the way we educate statisticians. Sound recommendations have been made by many, based on real-world experience in the practice of statistical science. These calls for reform have gone largely unheeded, at least in part because of our current paradigm of statistical education. Statistics is seen, by many, as strictly a graduate discipline, yet constraints on the time to complete a graduate degree makes adopting many of the reforms that have been suggested very difficult. It is argued in this paper that a new paradigm of statistical education is needed that provides for strong undergraduate programs in statistics. Such programs would give the profession wider recognition and provide additional entries into the discipline.

  • The sport of Ultimate has grown from parking lot fun to international competition in its 35 year existence. As in many sports, the team that scores is subsequently on defense. Thus the probability that a team will score next is dependent on which team has scored most recently. Unlike in many other sports, teams switch ends after each score. Thus field conditions can affect the scoring patterns. The data and analyses described here can be integrated into a variety of courses ranging from introductory statistics to stochastic models.

  • Methods for calculating confidence intervals for the mean are reviewed for the case where the data come from a log-normal distribution. In a simulation study it is found that a variation of the method suggested by Cox works well in practice. An approach based on Generalized confidence intervals also works well. A comparison of our results with those of Zhou and Gao (1997) reveals that it may be preferable to base the interval on t values, rather than on z values.

Pages