A friend told me of a non-deterministic bug that he had found that only crashed occasionally. He wrote a test case, ran it once, and the bug appeared after 3.5 hours of testing. He sent the bug report and test case to code owners, who ran it for 4 hours but couldn’t reproduce the bug. Intuitively we can see that they might have just been unlucky and thus not seen the crash.
So it got me wondering what the minimum amount of time that they need to run the test for, all other factors being equal, to be 95% likely to reproduce the bug again. (i.e. 2 sigma)
Assuming that the bug appears randomly, with an equal probability of
per hour, then after 3.5 hours of running, there’s a probability
of the bug appearing of:

Rearranging:



To have the tester reproduce the results in the end with a 95% chance, we need to 90% confidence in the probability of us producing the bug in the first place, and a 90% confidence of the reproducer reproducing the bug, so that combined we get a
confidence in overall result. So the percentage chance,
, of the observed outcome of seeing the crash after 3.5 hours is between 10% to 90%. Setting p to 0.1 and 0.9 in the above equation, we get that the probability of the bug appearing per hour is between 2.8% to 20%.
Taking the lowest probability, to get a total of 95% chance of reproducing the bug (and thus 90% chance of reproducing the bug GIVEN the probability of the bug appearing for us being 90%) we need to run for:


23 hours.
So to be 95% confident that the bug does not appear on the reproducer’s system, they would need to run the test case for 23 hours, assuming similar hardware etc.
This can be drastically brought down if the initial tester does a second run, to increase the value for
.
Rerunning the test
The original tester reran the test himself and reproduced the bug 4 times and found that it always appeared in less than 6 hours.
To be 95% confidence in the overall results, and thus 90% confidence for just
, we want the probability of the occurring 4 times in 4 runs in less than 6 hours to be 10%. Thus the probability of it occurring in 1 run in less than 6 hours is simply:

So setting p = 34% and using the equation above but with the number of hours set to 6, we get:

Which gives x = 5.0%. This means that with 90% confidence, the bug appears with a minimum probability of 5% per hour. So the amount of time, y, that the reproducer needs to run to be 95% likely to reproduce the bug is:

13 hours