Re: [SBI] Groundhog Day, and a question about simulation methods - SBI

13 Feb 2015

Homer-

Sorry I didn't reply sooner, but I wanted to comment on your final
paragraph which I've copied here for convenience:

"Where does that leave us in teaching?  Do we stick with simulation methods
that model the role of chance intuitively?  In that case we may get rather
far into the weeds (after all, three options for simulation in the
two-factor is alot for students to handle so early in the course), and we
also end up, from time to time, using methods that won't be recommended in
data analysis applications down the road.  On the other hand if we employ
simulation methods without regard to how intuitively they model the
presumed role of chance variation in the production of the data, then we
are back to using statistical procedures as black boxes that don't convey
insight to students at the introductory level.

 My dilemma may be due in part to a lack of formal statistical training.
 Has anyone else found themselves puzzled by similar questions?"

I'll give a two-pronged response to your questions.

First, my personal opinion is to focus in a first course more on the
intuitive and conceptual ideas, and less on technical details. Thus, I'm a
lot less worried about matching the exact 'best practices' statistical
methods with what I teach in Stat 101. For example, using
Welch-Sattherwaite to approximate degrees of freedom or a +4 confidence
interval or the exactly correct bootstrap, etc. The reality is that best
practices (a) change frequently, (b) are extremely technical and (c) are
rarely actually agreed upon by everyone (as you point out earlier in your
email). So, I'd prefer to stay out of the weeds and focus on
simple/intuitive approaches that model the logic of inference, with enough
cautions to students that more precise/technical methods are available in
practice.

Secondly, with specific regards to simulation-based inference, what does
this mean for me? It means, I'm actually not as concerned re: precisely
matching the data production (e.g., random sampling vs. randomized
experiment, etc.) with the simulation 'crank' used (e.g., bootstrap vs.
permutation) and am comfortable with keeping students using only a single
'crank' (e.g., permutation test) or maybe two, even though this may present
a bit of a disconnect between data production and analysis. In the ISI
text, we navigate this by highlighting to students how study design impacts
scope of conclusions (Can I generalize? Can I conclude cause-effect?) but
don't worry about changing the analysis strategy as well. We find this
keeps things more straightforward for students avoiding the 'too many
technical details' problem that becomes the focus of a course for many
students once you get into it. Finally, I'd point out that for a more
advanced group of students, or in a second (third, beyond) course this
might be exactly the kind of stuff you *do* want to talk about. But, for my
Stat 101 students, this just seems like too much detail to be worth it.

All of this being said, it was not an easy decision for our author team,
nor one that we all necessarily would have made if we were doing so
individually. I'm sure there are lots of other opinions out there and would
love to hear from others on the listserv!

Perhaps we should have some blog posts on this soon!
Nathan

On Tue, Feb 3, 2015 at 7:49 PM, Homer White <
Homer_White(a)georgetowncollege.edu&gt; wrote:

...
   Hello Everyone,

  It has been a joy to read about of the great data sets and activities!
 I did not have time to respond yesterday, but here goes now.

  I've got an introductory class with 26 students, and we meet all the
 time in a computer classroom.  Inspired by the Mosaic Project folks, we
 teach with R in the R Studio Server environment.  Our approach to
 simulation-based inference is about 60% if of the way from traditional
 inference towards Locke5/Intro to Statistical Investigations.

  We do some simulation-based inference very early in the course --
  binomial stuff, on the first day of class, in fact -- and we try again to
 work it in early with the chi-square test for the relationship between two
 factor variables.  We're in that unit now, so students have played a couple
 of times with a "slow-simulation" app to get an idea of the null
 distribution of the chi-square statistic; subsequently they have been
 exposed to the standard chi-square test where the null distribution is
 approximated by a chi-square density curve.  They are told that it was
 quite a godsend for Mr. Pearson to have stumbled on such a family of
 approximating curves, because Mr. Pearson had no access whatsoever to
 computing machinery.

  Groundhog Day began with a little come-to-Jesus chat about the first
 data analysis report, which the students tackled over the weekend.  The
 project involved looking at some data from a Current Population Survey and
 to investigate the relationship between hourly wages and such factors as
 sex, union membership status, race, etc.  Apparently there is a rule at my
 College that during Greek Rush Week critical thinking is forbidden,
 including the act of determining the type of variables involved in your
 research question prior to choosing your analytical tools.  Accordingly
 about a third of the students had attempted to make bar charts and
 cross-tables to investigate, for example, whether men or women earn more,
 even though wage is a numerical variable.  So we cleared that up, I hope.

  Harrumph.  On with the intended show.

  Today's plan is to revisit simulation one more time, in a situation
 where you really need it (rather small number of observations).   I bring
 up the "ledge-jump" data (#59 in the classic Handbook of Small Data Sets).
  a social psychologist studied 21 incidents in Britain involving a person
 threatening to leap from the ledge of a building or other high structure.
 The idea was to see what factors might affect the behavior of the crowd
 that gathers in the street below.

  To forestall morbid thoughts among students I get up on a nearby table
 and assure them through role-play that nobody really gets hurt:  the fire
 truck comes right away and the firemen teeter back and forth with their big
 yellow trampoline, your rabbi, psychiatrist and spouse are phoned and
 within minutes they are leaning out of one nearby windows,  soothing words
 are spoken, sage advice is given, hope is restored.   Eventually they talk
 you back inside.

  But in the meantime a crowd has gathered.  Sometimes they wait more or
 less in silence, appropriately mindful of the seriousness of the situation
 playing out above.  Sometimes, though, they begin to bait the would-be
 jumper (muster up Cockney accent):  "Go on, jump will ya?"

  In the 21 incidents under study, the following was found:

  weather/crowd        Baiting    Polite

  cool                            2                7

 warm                         8                4

  Discussion, with students ( me still on table):

  Me:  Somebody could say that, for one reason or another, a crowd is more
 likely to bait in warm weather.  Others might say that the outcome of these
 21 incidents had nothing to do with weather, but is just the result of
 random variation in other factors -- above and beyond the weather at each
 particular incident.  Can you think of any other things besides the weather
 that might affect crowd behavior?

  Student:  How many people show up to watch.  The more that show up, the
 more likely there jerk who will yell "jump' and get the rest of 'em
started.

  Student:  How long they have to wait.  Maybe if they stand around a long
 time they'll get impatient.

  Student (looking right at me, still on table):  What about the dorkiness
 of the would-be jumper?

  Me:  Uh, maybe.  Gee, thanks, Zach.

  I get down and we work it out on the blackboard:  if weather has nothing
 to do with crowd behavior, then our best guess based on the data is that in
 each incident, regardless of weather, there is a 10/21 chance for the crowd
 to bait.  The chi-square test function that the class uses has built-in
 provisions for simulation, of three possible types:

    - "fixed":  the rows sums in the simulated table are constrained to be
    the rows sums of the observed table, and you determined the probability of
    each outcome in the columns by pooling the data, as we just did to get the
    10/21 figure.
    - "double-fixed":  both the row and column sums of the simulated table
    are constrained to be equal to the row and column sums of the observed
    table.
    - "random":  neither row nor column sums are constrained, and the
    probability of a simulated observation landing in a particular cell is the
    observed cell count divided by the grand total of the table.

 "random" make sense when the observed data are a random ample a larger
 population, and chance comes into play just in the matter of who gets into
 the sample.  For example, if you randomly sample people and ask their sex
 and where they prefer to sit a in classroom (front, middle or back), then
 chance is not in how a fixed person will respond but in whether or not that
 person gets into the sample.

  "double-fixed" (corresponding to the way simulation is done in R's
 chisq.test(), and probably in many other software systems as well), appears
 to be ideally suited for randomized experiment in which the Null hypothesis
 imagines that a subject's response is the same regardless of which
 treatment group one is placed into.  In that case a different random
 assignment of subjects to treatment groups might result in a different
 table, but the row and column sums would be the same as for the table we
 observe in the actual experiment.

  "fixed" seems to be the right thing for the ledge-jump situation, if we
 assume that the 21 incidents weren't sampled randomly out of some larger
 population, that they were the only 21 incidents that occurred under the
 period of study in the region under study.  In that case the weather at the
 time of each incident simply was it was, and chance comes into the
 production of the observed table through random variation in all other
 factors (conditional upon the weather).

  So we do simulation with the "fixed" option.  Everybody's P-value comes
 in around 5%, so we decide that don't have overwhelming evidence that
 weather and crowd behavior are related.

  Now I come round to my questions.

  When we teach inference though simulation, we don't want it to become
 another "black box" for students.  We want them to see that the simulation
 method generates simulated data that reasonably could occur if the
 study were to be conducted in a hypothetical world where the Null
 hypothesis is definitely true.  Hence we the simulation method has to
 model, quite transparently, the role that we think chance played -- if the
 Null is true --  in giving us the data we actually see.

  But there appears to be controversy among statisticians as to which
 simulation method is best to use for contingency tables.  (See e.g.,
 Agresti, Categorical Data Analysis Third Edition section 3.5.6).  I suppose
 that sometimes it's possible for a particular simulation method not to
 model the role of chance very well, but to possess superior statistical
 properties nonetheless, maybe even be the state-of-the art method.  (This
 situation seems to occur also in bootstrap hypothesis testing, where the
 more preferred re-sampling method is  rather more difficult to justify
 intuitively than is the "naive" re-sampling method.)

  Where does that leave us in teaching?  Do we stick with simulation
 methods that model the role of chance intuitively?  In that case we may get
 rather far into the weeds (after all, three options for simulation in the
 two-factor is alot for students to handle so early in the course), and we
 also end up, from time to time, using methods that won't be recommended in
 data analysis applications down the road.  On the other hand if we employ
 simulation methods without regard to how intuitively they model the
 presumed role of chance variation in the production of the data, then we
 are back to using statistical procedures as black boxes that don't convey
 insight to students at the introductory level.

  My dilemma may be due in part to a lack of formal statistical training.
 Has anyone else found themselves puzzled by similar questions?

   Homer S. White
 Professor of Mathematics
 Georgetown College, KY 40324
 502-863-8307

 *Notice*:  *This message may contain confidential information and is
 intended for the person/entity to whom it was originally addressed.  Any
 use by others is strictly prohibited.  If you received this email in error,
 please permanently delete it and disregard.*

-- 
Nathan Tintle, Ph.D.
Associate Professor of Statistics and Dept. Chair
Director for Research and Scholarship
Dordt College
Sioux Center, IA 51250
nathan.tintle(a)dordt.edu
Phone: (712) 722-6264
Office: SB1612