
To me, it’s like there’s a horse loose in a hospital. Like I think everything’s going to be okay, but I have no idea what’s going to happen next. And none of you know either. We’ve all never not known together. And on the news they try to get people like, “We’ve got a man here who once saw a bird in an airport” and we’re like, “Get the hell out of here. This is a horse loose in a hospital.”
It’s not good. It’s confusing because every day we just have to follow the horse, and some days it’s like, “The horse used the elevator?” You know those days when you’re like, “Is the horse smart?” And then we’re all just like, “Why hasn’t the horse-catcher caught the horse?” and then the horse is like, “I have fired the horse-catcher.”
— John Mulaney describing Trump (via transgenderer)
Having thought about this for a few more minutes:
It seems like things are much easier to handle if, instead of putting any actual numbers (probabilities) in, we just track the partial order generated by the logical relations. Like, when you consider a new hypothesis you’ve never thought about, you just note down “has to have lower probability than these ones I’ve already thought about, and higher probability than these other ones I’ve already thought about.”
At some point, you’re going to want to assign some actual numbers, but we can think of this step as more provisional and revisable than the partial order. You can say “if I set P(thing) = whatever, what consequences does that have for everything else?” without committing to “P(thing) = whatever” once and for all, and if you retract it, the partial order is still there.
In fact, we can (I think) do conditionalization without numbers, since it just rules out subsets of hypothesis space. I’m not sure how the details would work but it feels do-able.
The big problem with this is trying to do decision theory, because there you’re supposed to integrate over your probabilities for all hypotheses, whereas this setup lends itself better to getting bounds on individual hypotheses (“P(A) must be less than P(B), and I’ll willing to say P(B) is less than 0.8, so P(A) is less than 0.8″). I wonder if a sensible (non-standard) decision theory can be formulated on the basis of these bounds?
I know this ruins the fun, but if anyone’s curious, I managed to find out what was going on with that ebook-only passage in A Glastonbury Romance: the author was sued for libel by a person with many similarities to a character in the book (“not only was G. W. Hodgkinson the owner of Wookey Hole caves (q.v.), but he also owned an aeroplane”). As a result, that passage (which featured said character) was cut.
This explains why recent editions don’t have it (where they would be likely to have, say, passages omitted earlier for reasons of obscenity), and why they don’t mention the omission (it’s a tiny fraction of a long book, and it’s embarrassing to the author, so it’d make a bad first impression on the reader if noted on the copyright page or in an introductory note).
I don’t think the fact that humans are bad at thinking up logical implications is a very strong argument against bayes, in the same way that “But Harold, you said you loved Chocolate earlier!” is an argument against preferences.
So, I will agree that there’s this non-monotonic thing. This is indeed a very good point against using Bayes as a mental tool! I am not disagreeing with that!
What I do disagree with is the idea that it’s ipso facto problematic. I think the correct way to do this is throw out your first estimate as a preliminary one, and then use the other logical implication questions as a way to generate a battery of knowledge in a kinda organic fashion. To use the original “California succession” thing, let’s say I think it’s unlikely, so I throw out 98% as my likelihood, then some else asks me the “USA still together” so I also generically throw out 98% but A HA!!!!!! THIS SEEMS WRONG, because the set of situations involving the US together but California leaving seems I dunno small or whatever, so I end up adjusting the probabilities as, repeating until I’ve thought of all “relevant” probabilities.
But logically speaking isn’t this troublesome? Isn’t it terrible that in theory an adversary can choose a sequence of questions which allows them to set my probabilities? Well, not really. My claim is that thoughts of these logical implication things provide information because humans are really bad at accessing all the information they have, and that, yeah sure if the adversary controls how a person accesses their information, of course the person is screwed? So you hope that people have good internal “implication generating” machinery, such that by the time that they have worked through a bunch of subset questions, they have dumped out all relevant information, and the ordering effects are washed out.
Which is a much more elaborate way of saying “guys stop throwing out random probabilities and sticking to them if you don’t have good intuition/facts doing cognitive work aaaaaaaahh”
I guess I can agree that nothing I said above is specifically motivated by Bayes, except for this vague feeling of “well, shit it turns out I’m actually really bad at incorporating all relevant information” and I think it’s really just unavoidable.
I don’t think this is a problem with humans, I think it’s much more fundamental. The real issue is that these kinds of “obviously nested” statements have a “easy to check, hard to find” property, like with NP-complete problems.
Let’s define “A is obviously nested in B” as “if you describe both A and B to me, it’ll be immediately obvious to me that A is sufficient but not necessary for B.” And let’s define an “obviously nested pair” as A, B where one is obviously nested in the other.
The “US in 2100″ statements mentioned earlier are all obviously nested pairs with one another. But the ones mentioned are just a few examples; there are infinitely many statements of the same form, asking about slightly bigger or smaller regions of the US, that also form obviously-nested pairs with all other such statements.
And that whole infinite chain is just one “direction” in hypothesis space. You can think about any other subject – existence of various markets and sub-markets (will candy be sold? will lollipops?), demographics and sub-demographics, scientific ideas and special cases thereof, you name it – and produce an infinite obviously-nested chain like this.
In finite time (much less polynomial time), you can only explicitly think about some vanishingly small subset of these statements. Yet you implicitly know infinitely many facts about them (about each chain, in fact, of which there are infinitely many). There’s no way to sit down and think enough beforehand that all of the obvious-nesting information has been dumped out into an explicit representation (and that representation would take infinite space anyway).
Now, maybe there is a way to handle this in practice so that it doesn’t hurt you too much, or something. Such a theory would be very interesting, but as far as I know it doesn’t exist, and it would have to exist for us to begin talking about how a finite being could faithfully represent its implicit knowledge in a prior.
(This is a human problem in the sense that you could make a machine which would lack all this implicit knowledge. That machine would not have this problem, but it would know less than we do, so we’d be throwing away information if we tried to imitate it.)
Yet you implicitly know infinitely many facts about them (about each chain, in fact, of which there are infinitely many). There’s no way to sit down and think enough beforehand that all of the obvious-nesting information has been dumped out into an explicit representation (and that representation would take infinite space anyway).
Now, maybe there is a way to handle this in practice so that it doesn’t hurt you too much, or something.
This sounds like a natural continuity/limits problem. It does seem like there could be infinite nesting like this, and that you do know information about each step of the chain. However, I’m not sure this necessarily needs infinitely many facts to describe, perhaps an overarching fact could sum them up, or the facts get ‘smaller’ as the chain does, so that together they form a finite total fact. Thinking about the obvious-nesting information sounds very much like taking a limit.
The geographic example has very literal continuity, with larger and smaller regions of the US. I’m actually quite surprised there isn’t such a theory already! Hypothesis space, even when infinite, is continuous, and that makes a big difference.
On a separate note, I’m not convinced that we couldn’t make do with a model where we only consider a finite universe, with discrete rather than continuous space. That would mean you could not take infinitely many different regions of the US. And it would mean that only finitely many events could possibly occur in a given time period, which intuitively seems like saying there will only be finitely many such different chains of hypotheses to worry about.
While it seems a bit artificial at times, I don’t think it’s too unreasonable to allow a theory like this to only cover finite cases, not when the finite case can approximate the infinite case arbitrarily closely. Then it seems we could reasonably represent our priors.
I am a bad pun blog and I endorse this message as elaborating on my “eh it probably converges” intuition earlier.
I think we can afford to agree to disagree unless @nostalgebraist can help me intuition pump this a bit further on why doing the subset enumeration problem doesn’t (eventually) converge.
I will say that this substantially downgraded my belief that Bayes is complete; there is much more work to be done, and I think it’s totally reasonable to call out the “unfounded intuition” parts of *the bayes memeplex* from the more proper Edwin and Eliezer’s Excellent Adventure canon.
The continuity thing is interesting.
Re this
I’m actually quite surprised there isn’t such a theory already! Hypothesis space, even when infinite, is continuous, and that makes a big difference.
What immediately came to my mind is that the Bayes setup doesn’t demand that your prior be continuous in any underlying variable, so this doesn’t come up in proving “for all” and “there exist” statements about Bayesian agents, and is easy to dismiss as “just a special case” if you think like that. On the more practical side, concrete applications of Bayes always tend to have continuous priors (bc they use familiar probability distributions that have PDFs); it’s easy to forget that you don’t necessarily have to do this, and so you don’t really think about how it might give you extra properties.
(And indeed, you don’t always want continuity even in spatial examples, since the real world has state lines and other borders, for instance.)
Anyway, even if you assume your prior is always continuous in one or more underlying variables (space, time?, etc?), that still leaves the functional form open. One worry about these kinds of cases is that your contortions to squeeze things in will give you a prior with lots of unmotivated variations in slope (flat for a while, then steep for a while). So in addition to continuity, you’d need some general assumption like “I think things tend to vary linearly (or whatever) w/r/t space,” which would get you most of the way to being able to pull consistent probabilities out of the air in any order. Although you still have to deal with things that are not nested but not independent either, and make sure all those relations work out … IDK, if someone’s worked this all out in detail I’d love to see it, but it sounds really hard.
http://www.superdoomedplanet.com/blog/?tag=novelization-style this is all very Yes. this is often how i feel when reading
Player-character Groove Champion is mixed up in all this highway warfare by sheer chance.

All of the permutations that make sense have already been done, but Buzzfeed is brave, and soldiers on nonetheless
It’s hip to cook bees with swords for hands so i’m SOL.