A number of blogs I read are arguing about a paradox, posed by tumblr blogger perversesheaf. Here is my attempt to explain what the paradox says.
Suppose that a drug company wishes to create evidence that a drug is beneficial, when in fact its effect is completely random. To be concrete, we’ll say that the drug has either positive or negative effect for each patient, each with probability . The drug company commits in advance that they will state exactly what their procedure will be, including their procedure for when to stop tasks, and that they will release all of their data. Nonetheless, they can guarantee that a Bayesian analyst with a somewhat reasonable prior will come to hold a strong belief that the drug does some good. Below the fold, I’ll explain how they do this, and think about whether I care.
To be concrete, let’s suppose that the drug company knows that the analyst begins with a uniform prior on the drug’s efficacy: she thinks it is equally likely to be any real number between and . And the drug company’s goal is to get her to hold a greater than percent belief that the drug’s benefit is greater than .
The drug company chooses (and announces!) the following procedure: They will continue to run patients, one at a time, until a point where they have run patients and at least have benefited. This will eventually happen with probability . At this point, they stop the study and release all the data. If the analyst updates on this, she will believe that the drug has effectiveness with a probability that is roughly a bell curve around and standard deviation . (I didn’t check the constants here, but this is definitely the right form for the answer and, if the constants are wrong then just change to .) In particular, the analyst would be willing to bet at 19 to 1 odds that the drug does some good.
If we think that the key to this error is that the length of the experiment is allowed to be infinite, perversesheaf gives some practical numbers based on simulation, which I have also checked in my own simulations. If the experiment is cut off after patients, or when are helped, which ever comes first, then it is the latter situation about 30% of the time.
I mostly want to open this up for discussion, but here are some quick points I noticed:
The uniform prior isn’t important here. As long as the analyst starts out with some positive probability assigned to the whole interval for some , you get similar results.
As Reginald Reagan points out, the analyst rarely thinks the drug is very good.
To state the last point in a different manner, if the drug was even mildly harmful (say it helped 45% of patients and harmed 55%), this problem doesn’t occur. With those numbers, I ran a simulation and found that only 6 out of 100 analysts were fooled. Moreover, in the limit as the simulation goes to , the fraction of analysts who are fooled will stay finite: If a random walk is biased towards , the odds that it will be greater than , let alone greater than , drop off exponentially.
Normally, I’d like to think a bit more about the question before saying something, but I am getting tired and I want to put up this post for one very key reason: Tumblr is an absurd awful interface for conversations. So, I am hoping that if I get a conversation started here, maybe we will be able to actually talk about it usefully.