Estimation Principles

  • A song designed to assist in teaching the basics of Multi-Armed Bandits, which is a type of machine learning algorithm and is the foundation for many recommender systems. These algorithms spend some part of the time exploiting choices (arms) that they know are good while exploring new choices.  The song (music and lyrics) was written in 2021 by Cynthia Rudin from Duke University and was part of a set of three data science oriented songs that won the grand prize in the 2023 A-mu-sing competition.  The lyrics are full of double entendres so that the whole song has another meaning where the bandit could be someone who just takes advantage of other people! The composer mentions these examples of lines with important meanings:
    "explore/exploit" - the fundamental topic in MAB!
    "No regrets" - the job of the bandit is to minimize the regret throughout the game for choosing a suboptimal arm
    "I keep score" - I keep track of the regrets for all the turns in the game
    "without thinking too hard,"  - MAB algorithms typically don't require much computation
    "no context, there to use," - This particular bandit isn't a contextual bandit, it doesn't have feature vectors 
    "uncertainty drove this ride." - rewards are probabilistic
    "I always win my game"  - asymptotically the bandit always finds the best arm
    "help you, decide without the AB testing you might do" - Bandits are an alternative to massive AB testing of all pairs of arms
    "Never, keeping anyone, always looking around and around" - There's always some probability of exploration throughout the play of the bandit algorithm

    0
    No votes yet
  • A music video designed to assist in teaching the basics of Multi-Armed Bandits, which is a type of machine learning algorithm and is the foundation for many recommender systems. These algorithms spend some part of the time exploiting choices (arms) that they know are good while exploring new choices (think of an ad company choosing an advertisement they know is good, versus exploring how good a new advertisement is). The music and lyrics were written by Cynthia Rudin of Duke University and was one of three data Science songs that won the grand prize and first in the song category for the 2023 A-mu-sing competition.

    The lyrics are full of double entendres so that the whole song has another meaning where the bandit could be someone who just takes advantage of other people! The author provides these examples of some lines with important meanings:
    "explore/exploit" - the fundamental topic in MAB!
    "No regrets" - the job of the bandit is to minimize the regret throughout the game for choosing a suboptimal arm
    "I keep score" - I keep track of the regrets for all the turns in the game
    "without thinking too hard,"  - MAB algorithms typically don't require much computation
    "no context, there to use," - This particular bandit isn't a contextual bandit, it doesn't have feature vectors 
    "uncertainty drove this ride." - rewards are probabilistic
    "I always win my game"  - asymptotically the bandit always finds the best arm
    "help you, decide without the AB testing you might do" - Bandits are an alternative to massive AB testing of all pairs of arms
    "Never, keeping anyone, always looking around and around" - There's always some probability of exploration throughout the play of the bandit algorithm

    0
    No votes yet
  • This song is about overfitting, a central concept in machine learning. It is in the style of mountain music and, when listening,  one should think about someone staying up all night trying to get their algorithm to work, but it just won't stop overfitting! The music and lyrics are by Cynthia Rudin from Duke University and was one of three data science songs  by Dr. Rudin that won the grand prize and 1st place in the song category in the 2023 A-mu-sing competition.

    0
    No votes yet
  • This song is about the k-nearest neighbors algorithm in machine learning. This popular algorithm uses case-based reasoning to make a prediction for a current observation based on nearby observations. The music and lyrics were written by Cynthia Rudin from Duke University who was accompanied by  Dargan Frierson, from University of Washington in the audio recording. The song is one of three data science songs written by Cynthia Rodin that took the grand prize and first prize in the song category in the 2023 A-mu-sing competition.

    0
    No votes yet
  • A cartoon to spark a discussion about the normal equations in the matrix approach to linear models.  The cartoon was created by Kylie Lynch, a student at the University of Virginia.  The cartoon won first place in the non-song categories of the 2023 A-mu-sing competition.

    0
    No votes yet
  • A song to discuss how a confidence interval made for a population parameter will be biased if the sample is biased (e.g. starting with a random sample of n=100 but then having individuals drop out one at a time based on a non-ignorable reason).  The song was written IN MARCH 2019 by Lawrence Lesser, The University of Texas at El Paso, and Dennis Pearl, Penn State University, using the mid-20th century recursive folk song "99 Bottles of Beer." The idea for the song came from an article by Donald Byrd of University of Indiana in the September 2010 issue of Math Horizons where he suggested using the song for various learning objectives in Mathematics Education.

    0
    No votes yet
  • A cartoon that can be used in a discussion of prediction – and the difference between the accuracy of a single prediction and quantifying the level of accuracy for a prediction method. The cartoon was used in the May 2019 CAUSE cartoon caption contest and the winning caption was written by Mickey Dunlap from the University of Georgia. The cartoon was drawn by British cartoonist John Landers (www.landers.co.uk) based on an idea by Dennis Pearl from Penn State University. A co-winning caption in the May 2019 contest was “I see you come from a long line of statisticians," written by Douglas VanDerwerkenz from the U.S. Naval Academy. Doug's clever pun can be related to the multiple testing problem by talking about how a fortune teller will get some predictions right if they make a long line of them.

    0
    No votes yet
  • A cartoon suitable for use in teaching about confidence intervals and the quality of estimates made by a model. The cartoon is number 2311 (May, 2020) from the webcomic series at xkcd.com created by Randall Munroe. Free to use in the classroom and on course web sites under a Creative Commons attribution-non-commercial 2.5 license.

    0
    No votes yet
  • A cartoon suitable for use in teaching about the variability in estimates (including estimates of the variability of estimates). The cartoon is number 2110 (February, 2019) from the webcomic series at xkcd.com created by Randall Munroe. Free to use in the classroom and on course web sites under a Creative Commons attribution-non-commercial 2.5 license.

    0
    No votes yet
  • Summary: This article describes the capture-recapture method of estimating the size of a population of fish in a pond and illustrates it with both a “hands-on” classroom activity using Pepperidge Farm GoldfiishTM crackers and a computer simulation that investigates two different estimators of the population size.  The activity was described in R. W. Johnson, “How many fish are in the pond?,”Teaching Statistics, 18 (1) (1996), 2-5

    https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9639.1996.tb00882.x

    Specifics: To illustrate the capture-recapture method in the classroom, two different varieties of Pepperidge Farm GoldfishTM crackers are used. The instructor places all of the Goldfish from a full bag of the original variety in a bowl to correspond to the initial state of the pond (the instructor should have previously counted the true number from the bag, which turned out to be 323 in the paper’s example). Students then captured c = 50 of these fish and replaced them with 50 Goldfish of a flavored variety of different color. After mixing the contents of the bowl, t=6 ‘tagged’ fish - fish of the flavored variety were found in a recaptured sample size of r = 41, giving the estimate cr/t= 341. This used the maximum likelihood (ML method. To examine the behavior of the MLE the capture-recapture ML  method is repeated 1000 times using a computer simulation. The distribution of the results will be heavily skewed since the MLE is quite biased (in fact, since there is positive probability that t = 0, the MLE has an infinite expectation). The simulation is then redone using Seber’s biased-corrected estimate = [(c+1)(r+1)/(t+1)] – 1.  After the true value of the population size is revealed by the instructor, students see that the average of the 1000 new simulations show that the biased-corrected version is indeed closer to the truth (and also that the new estimate has less variability).

    (Resource photo illustration by Barbara Cohen, 2020; this summary compiled by Bibek Aryal)

    0
    No votes yet

Pages