New subject: Groundhog Day, and a question about simulation methods

3 Feb 2015

Hello Everyone,

It has been a joy to read about of the great data sets and activities!  I did not have
time to respond yesterday, but here goes now.

I've got an introductory class with 26 students, and we meet all the time in a
computer classroom.  Inspired by the Mosaic Project folks, we teach with R in the R Studio
Server environment.  Our approach to simulation-based inference is about 60% if of the way
from traditional inference towards Locke5/Intro to Statistical Investigations.

We do some simulation-based inference very early in the course --  binomial stuff, on the
first day of class, in fact -- and we try again to work it in early with the chi-square
test for the relationship between two factor variables.  We're in that unit now, so
students have played a couple of times with a "slow-simulation" app to get an
idea of the null distribution of the chi-square statistic; subsequently they have been
exposed to the standard chi-square test where the null distribution is approximated by a
chi-square density curve.  They are told that it was quite a godsend for Mr. Pearson to
have stumbled on such a family of approximating curves, because Mr. Pearson had no access
whatsoever to computing machinery.

Groundhog Day began with a little come-to-Jesus chat about the first data analysis report,
which the students tackled over the weekend.  The project involved looking at some data
from a Current Population Survey and to investigate the relationship between hourly wages
and such factors as sex, union membership status, race, etc.  Apparently there is a rule
at my College that during Greek Rush Week critical thinking is forbidden, including the
act of determining the type of variables involved in your research question prior to
choosing your analytical tools.  Accordingly about a third of the students had attempted
to make bar charts and cross-tables to investigate, for example, whether men or women earn
more, even though wage is a numerical variable.  So we cleared that up, I hope.

Harrumph.  On with the intended show.

Today's plan is to revisit simulation one more time, in a situation where you really
need it (rather small number of observations).   I bring up the "ledge-jump"
data (#59 in the classic Handbook of Small Data Sets).  a social psychologist studied 21
incidents in Britain involving a person threatening to leap from the ledge of a building
or other high structure.   The idea was to see what factors might affect the behavior of
the crowd that gathers in the street below.

To forestall morbid thoughts among students I get up on a nearby table and assure them
through role-play that nobody really gets hurt:  the fire truck comes right away and the
firemen teeter back and forth with their big yellow trampoline, your rabbi, psychiatrist
and spouse are phoned and within minutes they are leaning out of one nearby windows, 
soothing words are spoken, sage advice is given, hope is restored.   Eventually they talk
you back inside.

But in the meantime a crowd has gathered.  Sometimes they wait more or less in silence,
appropriately mindful of the seriousness of the situation playing out above.  Sometimes,
though, they begin to bait the would-be jumper (muster up Cockney accent):  "Go on,
jump will ya?"

In the 21 incidents under study, the following was found:

weather/crowd        Baiting    Polite

cool                            2                7

warm                         8                4

Discussion, with students ( me still on table):

Me:  Somebody could say that, for one reason or another, a crowd is more likely to bait in
warm weather.  Others might say that the outcome of these 21 incidents had nothing to do
with weather, but is just the result of random variation in other factors -- above and
beyond the weather at each particular incident.  Can you think of any other things besides
the weather that might affect crowd behavior?

Student:  How many people show up to watch.  The more that show up, the more likely there
jerk who will yell "jump' and get the rest of 'em started.

Student:  How long they have to wait.  Maybe if they stand around a long time they'll
get impatient.

Student (looking right at me, still on table):  What about the dorkiness of the would-be
jumper?

Me:  Uh, maybe.  Gee, thanks, Zach.

I get down and we work it out on the blackboard:  if weather has nothing to do with crowd
behavior, then our best guess based on the data is that in each incident, regardless of
weather, there is a 10/21 chance for the crowd to bait.  The chi-square test function that
the class uses has built-in provisions for simulation, of three possible types:

  *   "fixed":  the rows sums in the simulated table are constrained to be the
rows sums of the observed table, and you determined the probability of each outcome in the
columns by pooling the data, as we just did to get the 10/21 figure.
  *   "double-fixed":  both the row and column sums of the simulated table are
constrained to be equal to the row and column sums of the observed table.
  *   "random":  neither row nor column sums are constrained, and the
probability of a simulated observation landing in a particular cell is the observed cell
count divided by the grand total of the table.

"random" make sense when the observed data are a random ample a larger
population, and chance comes into play just in the matter of who gets into the sample. 
For example, if you randomly sample people and ask their sex and where they prefer to sit
a in classroom (front, middle or back), then chance is not in how a fixed person will
respond but in whether or not that person gets into the sample.

"double-fixed" (corresponding to the way simulation is done in R's
chisq.test(), and probably in many other software systems as well), appears to be ideally
suited for randomized experiment in which the Null hypothesis imagines that a
subject's response is the same regardless of which treatment group one is placed into.
 In that case a different random assignment of subjects to treatment groups might result
in a different table, but the row and column sums would be the same as for the table we
observe in the actual experiment.

"fixed" seems to be the right thing for the ledge-jump situation, if we assume
that the 21 incidents weren't sampled randomly out of some larger population, that
they were the only 21 incidents that occurred under the period of study in the region
under study.  In that case the weather at the time of each incident simply was it was, and
chance comes into the production of the observed table through random variation in all
other factors (conditional upon the weather).

So we do simulation with the "fixed" option.  Everybody's P-value comes in
around 5%, so we decide that don't have overwhelming evidence that weather and crowd
behavior are related.

Now I come round to my questions.

When we teach inference though simulation, we don't want it to become another
"black box" for students.  We want them to see that the simulation method
generates simulated data that reasonably could occur if the study were to be conducted in
a hypothetical world where the Null hypothesis is definitely true.  Hence we the
simulation method has to model, quite transparently, the role that we think chance played
-- if the Null is true --  in giving us the data we actually see.

But there appears to be controversy among statisticians as to which simulation method is
best to use for contingency tables.  (See e.g., Agresti, Categorical Data Analysis Third
Edition section 3.5.6).  I suppose that sometimes it's possible for a particular
simulation method not to model the role of chance very well, but to possess superior
statistical properties nonetheless, maybe even be the state-of-the art method.  (This
situation seems to occur also in bootstrap hypothesis testing, where the more preferred
re-sampling method is  rather more difficult to justify intuitively than is the
"naive" re-sampling method.)

Where does that leave us in teaching?  Do we stick with simulation methods that model the
role of chance intuitively?  In that case we may get rather far into the weeds (after all,
three options for simulation in the two-factor is alot for students to handle so early in
the course), and we also end up, from time to time, using methods that won't be
recommended in data analysis applications down the road.  On the other hand if we employ
simulation methods without regard to how intuitively they model the presumed role of
chance variation in the production of the data, then we are back to using statistical
procedures as black boxes that don't convey insight to students at the introductory
level.

My dilemma may be due in part to a lack of formal statistical training.  Has anyone else
found themselves puzzled by similar questions?

Homer S. White
Professor of Mathematics
Georgetown College, KY 40324
502-863-8307

Notice:  This message may contain confidential information and is intended for the
person/entity to whom it was originally addressed.  Any use by others is strictly
prohibited.  If you received this email in error, please permanently delete it and
disregard.

Re: [SBI] Groundhog Day, and a question about simulation methods