Sandbox
The Bulgarian Toto 6 of 42 lottery
The Bulgarian Toto 6 of 42 lottery was the subject of an investigation after the [1] same set of six numbers {4, 15, 23, 24, 35, 42} was drawn in two successive lotteries on September 6 and September 10, 2009. The article cites a mathematician as stating that the probability of picking the same six numbers twice in a row is 4,200,000:1. We wondered how he arrived at this number. What is the probability that a specified set of six numbers will repeat consecutively?
There are <math>{42 \choose 6} = 5245786</math> different sets of six numbers and the probability that a SPECIFIED set will occur in the next two consecutive draws is <math>1/5245786^2</math>. Because the sets involve disjoint events, the probability that SOME set will occur in the next two consecutive draws is <math>5245786 \times 1/5245786^2 = 1/5245786</math>.
But now, suppose the lottery has been running continuously for <math>m</math> draws and we ask what the chance is that during this period there were consecutive draws of the same set. As before, first consider a fixed set of six numbers.
There are <math>m-1</math> opportunities for this set to be drawn twice in succession (beginning with the second drawing). The probability that this will happen is then the probability of the union <math>P(A) = P(\cup_i A_i A_{i+1}) </math> where <math>A_i</math> is the event that this set of numbers is drawn on the ith draw.
Bonferroni's first degree upper bound is <math>P(A) \le \sum_i P(A_i A_{i+1})</math> while the second degree lower bound is <math>P(A) \ge \sum_i P(A_i A_{i+1}) - \sum_{1 \le i < j \le m}P(A_i A_{i+1} A_j A_{j+1}).</math>
We assume (!) that the events <math>A_i</math> are independent and identically distributed with probability <math>p = 1/5245786</math>. As long as <math>mp</math> is small the second sum in the lower bound can be ignored, giving <math>P(A) \approx (m-1)/5245786^2.</math>
It appears that the draws are held twice per week so for one year <math>m = 104</math> giving the probability <math>3.74 \times 10^{-12}</math> that a specified set of numbers will be drawn twice in succession. According to a spokeswoman the lottery has been taking place for 52 years [2]. Using <math>m = 104 \times 52 = 5408</math>, the probability that a specified set of numbers will be drawn twice in succession over this period is <math>1.89 \times 10^{-10}</math>, still very small.
But now let's ask the question, not for a fixed set of numbers but for some set of numbers. After all, in discussing this coincidence the the repeated set arises by chance alone and is not specified in advance.
In <math>m</math> drawings what is the probability that SOME set of six numbers will be repeated in consecutive draws.
There are 5245786 possible sets of numbers that could be repeated. Enumerate the sets by integers <math>1 \le k ≤ \le 5245786</math> with <math>E_k</math> the event that set <math>k</math> repeats consecutively sometime during these <math>m</math> drawings. The probability of the union <math>P(\cup E_k)</math> is needed. Each of the 5245786 events <math>E_k</math> has probability <math>(m-1)/ 5245786^2</math> and if they were independent we could evaluate the probability using complements as <math>P(\cup E_k) = 1 - (1- (m-1)/5245786^2)^{5245786} \approx 1 - e^{-(m-1)/5245786}</math>. However, they are dependent, but as long as <math>mp</math> is small Bonferroni's bounds can once again be used to estimate <math>P(\cup E_k) \approx (m-1)/5245786.</math> For <math>m = 5408</math> this is 0.0010302. (Note that assuming independence gives 0.0010307)
This probability relates to one lottery. Suppose we consider all lotteries worldwise and ask for the probability that in some lottery, somewhere, some set of numbers will be repeated consecutively. All lotteries are variant of Toto with different numbers involved. Each lottery will have had its own cumulative number of drawings. In order to gauge the magnitude of the probability wanted, assume that there are <math>x</math> lotteries, each one sharing the same numerical characteristics as the Bulgarian one.
This time we can use independence. The probability that some set will be repeated is 1 minus the probability that in no lottery is a set of numbers selected on two consecutive drawings <math>= 1 - (1 - (m -1)/5245786)^x</math>. For <math>x = 50</math> this is 0.0503 while for <math>x = 100</math> the probability is 0.0980. (An approximation to one significant digit for this range of values of interest is <math>x(m-1)/5245786.</math>)
For a different problem that discusses "very big numbers" see the article about double lottery winners [3].
Questions.
1. Verify both assertions concerning the Bonferroni lower bound.
2. How many years would the Bulgarian lottery need to be running in order to have the same probability that some set of numbers will appear three times in succession?
3. Instead of demanding that the same set of numbers appear twice in succession, what is the probability that some set of numbers will repeat during <math>m</math> drawings (This is simpler and is the famous birthday problem).
Submitted by Fred Hoppe