A problem with the likelihood principle →
As someone interested in both philosophy and mathematics, I am disturbed by the rise of what I call “radical Bayesian,” or the belief that the standard probability axioms and Bayes’ rule together give us a complete description of how we ought to reason about the world (at least in principle). I…
Radical Bayesian here, to talk about why I think Bayes’ theorem is giving the right answer here. Well… the wrong answer in this mathematical idealization, but the right answer in reality and in better mathematical idealizations of reality.
I think it’s important to separate the theoretical and practical aspects here.
The position I’m taking aim at is the one that claims pure Bayesian reasoning (with no additional gizmos or considerations) is the normatively correct ideal way to reason. This is what Yudkowsky advocates. This is his favored epistemic theory.
The most important way we evaluate epistemic theories is to ask how good they are at generating true beliefs. In the scenario I sketched, Bayesian reasoning is comparatively bad at generating true beliefs. Quoting from my reply to slatestarscratchpad: “The example I present shows there are common situations where, given the same information, radical Bayesians will regularly make worse inferences than someone using frequentist methods. Here “worse” means “they’re always wrong.” Further, and this is the Key Point: because of their philosophical commitments, a radical Bayesian will always purposely ignore information that would help them reason better and not always be wrong. That seems like a good reason to be suspicious of those philosophical commitments.”
In this case, Bayesian reasoning augmented with some way to take into account the stopping rule beats pure Bayesian reasoning every time. Hence, pure Bayesian reasoning can’t be the normatively correct way to reason. A pure Bayesian framework has no way to take into account the information the stopping rule provides. In other words, pure Bayesian reasoning is strictly dominated as an idealized epistemic theory by a hybrid Bayesian/frequentist approach.
Turning to practical considerations, you responded that this particular example is unrealistic. I have a few replies to this.
First, so what? My target has always been Yudkowsky-style radical Bayesianism, which holds that pure Bayesian reasoning the ideal way to reason in any scenario.
Second, even if we use a more realistic stopping rule and assume the drug company doesn’t have an unlimited supply of test subjects, the company can still force a favorable result using an unscrupulous stopping rule about 30% of the time with a supply of 10,000 subjects, according to my R simulations.
Third, you mention that while you admit the stopping rule may force the Bayesian to have a false belief (that there’s a 95% chance the effect of the placebo is different from zero), their confidence intervals will be centered around very small effects, so the Bayesian will correctly believe the drug has no practically useful effect. Three sub-replies here:
A) The Bayesian is still stuck with a false belief! This is epistemically objectionable, especially when they could eliminate their false belief most of the time by augmenting their inference procedures with frequentist methods, but refuse to do so to maintain ideological purity.
B) The impression that the small estimated effects don’t matter practically is an artifact of the story I used to present the example. You could imagine trying to estimate tiny physical constants using noisy instruments, for example. Here, very small effects do matter.
C) As su3su2u1 notes, there are many more practical ways to p-hack/manipulate results using stopping rules, and these do actually occur! Imagining the same scenario in these cases, where only the data and stopping rule are presented to the statistician, reveals the same weakness in pure Bayesian reasoning. The stopping rule contains valuable information that the Bayesian ignores.
Regarding your discretization example, I’m sorry, but I don’t understand it. In particular, I don’t understand quite how you’re modeling the pill. What distribution are we using on the pill’s effect, if not a normal (continuous) one?Could you perhaps provide some more details and an example computation or two? I’m always wary of arguments that begin, “I haven’t done the math, but…”
Clarification: I had no complaints about the example, which I think is a great example, just about the continuity of the formalization used to analyze it. Although, yes, now that you point it out, even “Bayes doesn’t work in this mathematical abstraction” is an argument against the radical Bayesian position. With that clear…
Yeah, you’re right. I think the simulations we did show that you’re right. As you pointed out to me in an ask last night, if a continuum of possible effects was the problem, then there would be no problem in our discrete simulations.
It was hypocritical of me to say you’re making a classic Jaynes-defined mistake while myself making the classic Jaynes-defined mistake of arguing about the results of statistical procedures without looking at any numerical examples. If I think that classic Jaynes-defined mistakes lead to problems, I should avoid them myself; and if I don’t, then I shouldn’t bother pointing out when other people do them.
Though I no longer think it’s relevant, I’ll explain my discretization example further, to show I was talking about something real.
Suppose M=3: there are three possible effect sizes. They are 0, m1, and m2. Each has probability 1/3. Then, P(effect=0|y) = e^(-mean(y)^2/2) / (e^(-mean(y)^2/2 + e^(-(mean(y)-m1)^2/2 + e^(-(mean(y)-m2)^2/2). This probability converges to 1, I think? I think it follows from Doob’s theorem (Theorem 1 in this paper). Maybe there’s a simpler argument though.
Same thing for any M. For example, in our simulations, M is the number of possible floating point numbers representable with the number of bits we were using. So, with enough samples, the posterior probability assigned to exactly 0 approaches 1.
But, we saw in our simulations that the Bayesian often does not get enough samples before the stopping rule kicks in. So it’s not really relevant.
Thanks for taking the time to help me understand this, despite having lots of replies on your post.
People changing their minds in the face of new evidence or reasoning is great and raginrayguns is great.
(via shlevy)
