Jacqueline B.
Miller
Department of Mathematics and Computer
Science
Drury University
900 North Benton
Avenue
Springfield, MO 65802
Statistics Teaching
and Resource Library, July 26, 2001
© 2001 by Jacqueline B. Miller, all rights reserved. This
text may be freely shared among individuals, but it may not be
republished in any medium without express written consent from the
author and advance notification of the editor.
As teachers of statistics, we know
that residual plots and other diagnostics are important to
deciding whether or not linear regression is appropriate for a set
of data. Despite talking with our students about this, many
students might believe that if the correlation coefficient is
strong enough, these diagnostic checks are not important. The data
set included in this activity was created to lure students into a
situation that looks on the surface to be appropriate for the use
of linear regression but is instead based (loosely) on a quadratic
function.
Key words: regression; residuals
Objective
To present students with a situation
that appears to be suitable for linear regression and challenge
them to balance the issue of strong correlation versus diagnostics
(here, a residual plot) that indicate otherwise.
The
Activity
Prior to assigning this activity,
students should have had an introduction to scatter plots, linear
regression, and residual plots. The activity involves a set of
data dealing with the width and cost of a square deck you want to
build for your house. Students are asked to construct a scatter
plot of the data, to find the linear regression equation to
estimate deck cost based on the width of the deck, to find the
correlation coefficient, and to comment on the appropriateness of
linear regression for the data set. Students are asked to make
predictions for the cost of building a deck for a width that is
similar to the widths in the data set. Students are then asked to
examine a residual plot to determine the appropriateness of linear
regression of the data set. The scatter plot and correlation
coefficient indicate that linear regression is appropriate, while
the residual plot indicates that linear regression is not
appropriate, for this data set. Following student completion of
the activity, the instructor should engage the students in a
discussion about the appropriateness of linear regression in
situations like the one posed in this activity. Suggested
questions for discussion are included in the assessment section
that follows.
Assessment
To me, the assessment centers on the
in-class discussion question. In many of our classes, we discuss
checking diagnostics for the appropriateness of linear regression.
What should we do in a case where it looks like linear regression
is appropriate, strong correlation and all, until we examine the
residual plot? Getting the students involved in a rich discussion
about the appropriateness of linear regression in such a situation
is important. Such a discussion will give the students the
opportunity to address and deal with issues not addressed in
standard questions about linear regression. Questions for
discussion might include:
|
Based only on the scatter plot and
correlation coefficient, does linear regression appear
to be appropriate for this data set? |
|
Based only on the residual plot, does linear
regression appear to be appropriate for this data
set? |
|
Although the residual plot magnifies a
quadratic pattern that exists in the data, the
correlation coefficient is 0.985. Can we ignore the
findings of the residual plot because there is such a
strong relationship between cost and width of the
deck? Why or why not? |
|
What other diagnostic methods might we use
to determine the appropriateness of linear regression
in this situation? |
|
Based on this activity, can we establish
some general rules to determine the appropriateness of
linear regression in a variety of
situations | |
Assessment of the students would be
done informally during the discussion, paying particular attention
to student involvement in the discussion and to nonverbal
involvement in the discussion.
Formal assessment
might involve an exam question or two that challenge the students
to return to the issue of the appropriateness of linear
regression. Consider, for example, the following
questions:
True or False: Regression
is always appropriate when the points in the scatter plot
appear to be linear and the correlation coefficient is
strong.
True or False: Whenever the
residual plot suggests that there is a pattern in the
data, we cannot perform linear regression on the data
set
|
Instead of the questions above (or
similar objective questions), the instructor could write an
investigative problem, similar to the "Regression – Residuals –
Why?" activity, for a formal exam with a new data set that has
another diagnostic problem (e.g., variation in spread, influential
observation). By examining student responses to the questions in
the investigative problem, the instructor would be able to assess
how students integrate their knowledge of regression in a
situation that challenges the students to think about the
appropriateness of linear regression.
Teaching Notes |
|
This activity can be done in class or assigned
as out-of-class work. Either way, I would suggest that
students be allowed to work together on the assignment
so that they might discuss the issues
together. |
|
This activity is not dependent upon any
particular piece of technology. Students could do the
activity on a graphing calculator or with a software
package. It is up to you as the teacher to determine
whether you would like the students to use a
particular piece of technology. |
|
This activity can clearly be expanded to using
transformations on data, so that students can find a
relationship between the data that might be more
appropriate than the existing
relationship. |
Editor's
note: Before 11-6-01, the "student's version" of an
activity was called the "prototype".
|