John
Gabrosek
Department of Statistics
Grand Valley State
University
1 Campus Drive
Allendale, MI 49401-9403
Michael E.
Schuckers
Department of Statistics
410 Hodges
Hall
West Virginia University
Morgantown, WV 26506
Statistics Teaching
and Resource Library, October 25, 2001
© 2001 by John Gabrosek and
Michael E. Schuckers,
all rights reserved. This text may be freely shared among
individuals, but it may not be republished in any medium without
express written consent from the author and advance notification
of the editor.
The statistical educator often finds
it difficult to convey the beauty and power of descriptive
graphical data summaries to her students. Breaking the Code
actively engages students in constructing and interpreting bar
charts. The activity requires students to describe data
graphically, compare the frequency distribution depicted in two
bar charts, construct and test a hypothesis, and communicate
results.
The activity begins with an explanation of the
Caesar Shift for message encryption (Singh, 1999). The Caesar
Shift is a translation of the alphabet; for example, a five-letter
shift would code the letter a as f, b as g, … z as e. We describe
a five-step process for decoding an encrypted message. First,
groups of size 4 construct a frequency table of the letters in two
lines of a coded message. Second, students construct a bar chart
for a reference message of the frequency of letters in the English
language. Third, students create a bar chart of the coded message.
Fourth, students visually compare the bar chart of the reference
message (step 2) to the bar chart of the coded message (step 3).
Based on this comparison, students hypothesize a shift. Fifth,
students apply the shift to the coded message.
After
decoding the message, students are asked a series of questions
that assess their ability to see patterns. The questions are
geared for higher levels of cognitive reasoning.
Key words: bar charts, Caesar
Shift, encryption, testing hypotheses
Objectives
The objectives of the Breaking the
Code activity include:
|
Constructing frequency tables |
|
Constructing bar charts |
|
Comparing distributions by looking for
patterns |
|
Forming and testing a hypothesis |
|
Understanding sampling variability |
|
Explaining results of statistical
procedures |
|
Working cooperatively in a group |
|
Using a statistical computer package (such as,
SPSS for Windows or
Minitab).
| |
Materials and
Equipment
The
following materials and equipment are needed for the Breaking the
Code activity:
|
A classroom set of handouts, one for each
student |
|
A computer along with statistical software is
optional, but
recommended | |
Time
Involved
This
activity has been used for four semesters in a general education
introductory statistics classroom. The activity has been assigned
early in the semester when students are unfamiliar with a
statistical computer package. The time involved has been as
follows:
|
Step I of the activity – Allow the last 10
minutes of a class
period |
|
Steps II to V of the activity – Allow an entire
50-minute class period held in a computer
lab | |
Depending on course structure,
consider the following alternative approaches:
- Eliminate the computer portion of
the assignment and have students produce bar charts by hand. The
activity can be completed in a 50-minute class.
- Give a brief lecture of the
decoding approach with a short example. Assign the activity as
individual or group homework.
Regarding the Data and
Graphs
Non-coded writing is used to produce
a reference distribution for the frequency of letters in the
English language. Any writing of at least 250 letters could be
used. Below we give a writing sample. The sample will not have the
same frequency distribution of letters found in the English
language as would another writing sample. Students will compare
bar charts of a coded message and the reference distribution to
hypothesize the shift used to encode the message. The message that
we used to generate the reference distribution is:
And, most importantly, I would
like to thank my family for their unconditional love and
generous support. Without the encouragement of my parents,
Joseph and Ann, my brother, Joe, my sister-in-law, Jenna, my
nieces, Gabrielle and Madison, and my sister, Anita, I could not
have completed this work.
The frequency table of the reference
message is:
Letter |
Count |
Letter |
Count |
Letter |
Count |
Letter |
Count |
A |
19 |
B |
2 |
C |
5 |
D |
10 |
E |
23 |
F |
3 |
G |
3 |
H |
8 |
I |
17 |
J |
3 |
K |
3 |
L |
11 |
M |
12 |
N |
23 |
O |
21 |
P |
6 |
Q |
0 |
R |
13 |
S |
12 |
T |
20 |
U |
7 |
V |
2 |
W |
4 |
X |
0 |
Y |
8 |
Z |
0 |
|
|
|
|
The bar chart of the reference
message is:
The two lines of the coded message
are:
Line 1: svukvujhsspunavaolmhyhdhfavduz
Line 2:
uvddhypzkljshylkhukihaasljvtlkvdu
The frequency table of
the coded message is:
Letter |
Count |
Letter |
Count |
Letter |
Count |
Letter |
Count |
A |
5 |
B |
0 |
C |
0 |
D |
5 |
E |
0 |
F |
1 |
G |
0 |
H |
8 |
I |
1 |
J |
3 |
K |
5 |
L |
5 |
M |
1 |
N |
1 |
O |
1 |
P |
2 |
Q |
0 |
R |
0 |
S |
5 |
T |
1 |
U |
7 |
V |
7 |
W |
0 |
X |
0 |
Y |
3 |
Z |
2 |
|
|
|
|
The bar chart of the coded message
is:
Assessment
After completing the activity,
students should be able to interpret bar charts, state and test
hypotheses (informally), and explain the concept of sampling
variability (informally). Question 1 of the activity requires
students to look for patterns in interpreting bar charts.
Questions 1 and 2 informally assess understanding of the process
used to formulate and test hypotheses. Question 3 addresses
knowledge of sampling variability. On homework and exams students
should be required to interpret bar charts looking for peaks,
valleys, and unusual observations. Students should be required to
write about sampling variability and hypothesis testing. For
example, students should be able to answer the following
question:
You are given a six-sided die with each of the
numbers 1,2,3,4,5,6 imprinted on one face.
- Discuss how you can
determine whether the die is "fair." (By fair we mean that all
six faces of the die are equally likely.)
- Suppose that you and I
independently follow the procedure you outlined in part (a).
Would you expect our results to be identical?
Explain.
Teaching
Notes
We have observed the following when
using the activity in an introductory statistics
classroom:
|
Students appreciate the hands-on nature. On a
scale of 1 to 5 (1 = strongly disagree, 5 = strongly
agree), the mean student response to the statement
“The activity was more interesting than solely a
lecture on bar charts” was 4.50. |
|
Giving an example of a shift applied to a short
message helps to avoid student confusion. |
|
Keeping the required computer skills to a
minimum allows students to focus on interpreting the
bar charts. |
|
Students will try to compare single peaks
between the reference and the coded messages. We
purposefully chose a message, that when decoded, has a
most frequent letter other than the most frequent
letter (a tie between e and n) in the reference
message. A hint to “look for general patterns of peaks
and valleys” will usually get students on the right
track. |
|
Closely monitoring each group’s progress is
essential. Students will proceed far down an incorrect
path. This is especially true if they have
hypothesized an incorrect shift. |
|
We have used the same coded message for each
group; however, there is no reason that either the
message and/or the shift could not be varied from
group to group. |
|
Some computer packages (for example, SPSS for
Windows) will not print a label on the horizontal axis
of a bar chart for a category that has frequency 0. We
recommend that students substitute 0.01 for
0. | |
References
Singh, S. (1999). The Code Book. New York:
Doubleday.
Editor's
note: Before 11-6-01, the "student's version" of an
activity was called the "prototype".