Install Theme

bayes: a kinda-sorta masterpost

@lostpuntinentofalantis

I don’t think the fact that humans are bad at thinking up logical implications is a very strong argument against bayes, in the same way that “But Harold, you said you loved Chocolate earlier!” is an argument against preferences.

So, I will agree that there’s this non-monotonic thing. This is indeed a very good point against using Bayes as a mental tool! I am not disagreeing with that!

What I do disagree with is the idea that it’s ipso facto problematic. I think the correct way to do this is throw out your first estimate as a preliminary one, and then use the other logical implication questions as a way to generate a battery of knowledge in a kinda organic fashion. To use the original “California succession” thing, let’s say I think it’s unlikely, so I throw out 98% as my likelihood, then some else asks me the “USA still together” so I also generically throw out 98% but A HA!!!!!! THIS SEEMS WRONG, because the set of situations involving the US together but California leaving seems I dunno small or whatever, so I end up adjusting the probabilities as,  repeating until I’ve thought of all “relevant” probabilities.

But logically speaking isn’t this troublesome? Isn’t it terrible that in theory an adversary can choose a sequence of questions which allows them to set my probabilities? Well, not really. My claim is that thoughts of these logical implication things provide information because humans are really bad at accessing all the information they have, and that, yeah sure if the adversary controls how a person accesses their information, of course the person is screwed? So you hope that people have good internal “implication generating”  machinery, such that by the time that they have worked through a bunch of subset questions, they have dumped out all relevant information, and the ordering effects are washed out.

Which is a much more elaborate way of saying “guys stop throwing out random probabilities and sticking to them if you don’t have good intuition/facts doing cognitive work aaaaaaaahh”

I guess I can agree that nothing I said above is specifically motivated by Bayes, except for this vague feeling of “well, shit it turns out I’m actually really bad at incorporating all relevant information” and I think it’s really just unavoidable.

I don’t think this is a problem with humans, I think it’s much more fundamental.  The real issue is that these kinds of “obviously nested” statements have a “easy to check, hard to find” property, like with NP-complete problems.

Let’s define “A is obviously nested in B” as “if you describe both A and B to me, it’ll be immediately obvious to me that A is sufficient but not necessary for B.”  And let’s define an “obviously nested pair” as A, B where one is obviously nested in the other.

The “US in 2100″ statements mentioned earlier are all obviously nested pairs with one another.  But the ones mentioned are just a few examples; there are infinitely many statements of the same form, asking about slightly bigger or smaller regions of the US, that also form obviously-nested pairs with all other such statements.

And that whole infinite chain is just one “direction” in hypothesis space.  You can think about any other subject – existence of various markets and sub-markets (will candy be sold?  will lollipops?), demographics and sub-demographics, scientific ideas and special cases thereof, you name it – and produce an infinite obviously-nested chain like this.

In finite time (much less polynomial time), you can only explicitly think about some vanishingly small subset of these statements.  Yet you implicitly know infinitely many facts about them (about each chain, in fact, of which there are infinitely many).  There’s no way to sit down and think enough beforehand that all of the obvious-nesting information has been dumped out into an explicit representation (and that representation would take infinite space anyway).

Now, maybe there is a way to handle this in practice so that it doesn’t hurt you too much, or something.  Such a theory would be very interesting, but as far as I know it doesn’t exist, and it would have to exist for us to begin talking about how a finite being could faithfully represent its implicit knowledge in a prior.

(This is a human problem in the sense that you could make a machine which would lack all this implicit knowledge.  That machine would not have this problem, but it would know less than we do, so we’d be throwing away information if we tried to imitate it.)

(via lostpuntinentofalantis)

bayes: a kinda-sorta masterpost

lostpuntinentofalantis:

nostalgebraist:

I have written many many words about “Bayesianism” in this space over the years, but the closest thing to a comprehensive “my position on Bayes” post to date is this one from three years ago, which I wrote when I was much newer to this stuff.  People sometimes link that post or ask me about it, which almost never happens with my other Bayes posts.  So I figure I should write a more up-to-date “position post.”

I will try to make this at least kind of comprehensive, but I will omit many details and sometimes state conclusions without the corresponding arguments.  Feel free to ask me if you want to hear more about something.

I ended up including a whole lot of preparatory exposition here – the main critiques start in section 6, although there are various critical remarks earlier.

Keep reading

This isn’t convincing to me (and I guess everything of this genre isn’t convincing to me) because, like, it seems to me that the infinite hypothesis thing is just a problem for every kind of thinking?
You can claim that frequentist tools only work in limited domains or whatever, but in my mind all you’ve done is swept the “oh no what if I didn’t think of relevant hypothesis??!??” problem into the “well yeah you’re going to get burnt by this if you use it out of bounds”.

To (ab)use the tool analogy, it turns out that all human made tools cannot survive in the middle of a supernova, and yes you’re technically correct that all the omnitool fanboys have been overselling the utility of omnitool usage in Exotic Space Environments, but the fact that all the non-omnitools have warnings about “cannot be used in supernovae” is not going to convince me that omnitools don’t exist, or are necessarily worse in all cases.

If you’re talking about Section 7, I’m not just saying that “there might be relevant hypotheses you hadn’t thought of,” I’m saying that it’s really hard to encode what you do know in a prior without throwing away some information.

In jadagul’s examples with the different regions in 2100, you already know (before you think about any of it) that those statements have a certain logical implication structure.  But you only start thinking about each relation as the relevant statement is brought to your attention.  Like, if you ask someone those questions in a non-monotonic order, they’ll have to take care to squeeze some probabilities inside others they’ve already stated, and this will make things clearly depend on the order of asking.  (In my example, the person said “94.5%” because they know they needed something between 94 and 95, even though they were giving whole-number answers at first, and would have given a whole number answer to the intermediate case if asked about it first.)

(BTW I once actually asked these questions sequentially to a rationalist meetup group as a way of making this point)

So the problem isn’t “your knowledge is finite” but “you can’t encode exactly what you know (and nothing else) in a prior, or at least I know of no way to do it.”

You could say this is just another thing warning that should go on the label, but it suggests that we’re actually using the wrong representation for our prior knowledge, and so we have a “garbage in, garbage out” type problem: Bayes is somehow failing to capture what we know, and we don’t (AFAIK) have any bounds or guarantees on what problems this will or won’t cause.  Whereas in the frequentist procedures, we can at least describe what it would look like for a human to use them correctly, and guarantee certain things for that human.

nostalgebraist:

mildly creepy thing from today: i have a used paperback copy and also a pirated (imperfectly OCR’d) kindle copy of the same book (a glastonbury romance).  i’ll read the kindle copy if i’m in bed and it’s too late to have the light on.  i was trying to sync up where i am in the two copies, and it turns out there’s a whole multi-page section in the ebook that doesn’t appear in the paperback.  like, the chapter just ends in the paperback but it keeps going for a while in the ebook.  neither has any mention of being abridged, and the paperback is a more recent printing, so you’d think it’s be more complete.

the kicker: the ebook-only section begins with a bunch of really garbled text even by the usual bad OCR standards, and is about two characters descending into a cave.

Start of the ebook-only part:

Persephone became aware now of the sound of water, down somewhere in the darkness to their right, and it was not long before he made her stop and look between the tree trunks, upon whose rough surface he threw the light of his flashlight. There she saw a single bright lamp burning, throwing morbid shadows upon an expanse of grass. By this radiance a link la^n v.itl. rbairs and tables set out became apparent, al! tii^e tiling rji’^ij^s and deserted, looking ghostly, and even gka-tlv ih« •:••.*. **I k *op it lit,“ whispered Philip in her ear. Thert* was nu n:>;-i- ^ -ri^d for him to whisper. It was hard to raise his \rov j,;rt ihtn. ”It’s my electric plant. That’s where we serve tea to vi-it.jrs. I expect you’ve been there yourself, only \u;i came in a diiierent way.“

They went on again, the path they followed growiii^ *tea<iilv narrower and steeper. They walked closely s-ide by side mid presently Philip without a word possessed himself of her hami. At last she saw before her the upward rise of a precipitous r,;< k. covered with moss and last year’s ferns, and right befo:e them, at the base of the rock, a little square doorway. Philip took from his waistcoat pocket a large key, like the key to a drive gate, and using his flashlight turned it in the lock and pushed the door open. 

mildly creepy thing from today: i have a used paperback copy and also a pirated (imperfectly OCR’d) kindle copy of the same book (a glastonbury romance).  i’ll read the kindle copy if i’m in bed and it’s too late to have the light on.  i was trying to sync up where i am in the two copies, and it turns out there’s a whole multi-page section in the ebook that doesn’t appear in the paperback.  like, the chapter just ends in the paperback but it keeps going for a while in the ebook.  neither has any mention of being abridged, and the paperback is a more recent printing, so you’d think it’s be more complete.

the kicker: the ebook-only section begins with a bunch of really garbled text even by the usual bad OCR standards, and is about two characters descending into a cave.

bayes: a kinda-sorta masterpost

4point2kelvin:

nostalgebraist:

I have written many many words about “Bayesianism” in this space over the years, but the closest thing to a comprehensive “my position on Bayes” post to date is this one from three years ago, which I wrote when I was much newer to this stuff.  People sometimes link that post or ask me about it, which almost never happens with my other Bayes posts.  So I figure I should write a more up-to-date “position post.”

I will try to make this at least kind of comprehensive, but I will omit many details and sometimes state conclusions without the corresponding arguments.  Feel free to ask me if you want to hear more about something.

I ended up including a whole lot of preparatory exposition here – the main critiques start in section 6, although there are various critical remarks earlier.

Keep reading

Finding the realio truilo bestio hypothesis by simple application of Bayes’ theorem requires infinite computing power: this is a true and important point. But you can also find the best hypothesis within the set of hypotheses you’ve actually thought of. The probability isn’t “right” - it neither matches the hypercomputing limit nor even tries to account for your own fallibility - but you can find the best hypothesis of those available (up to a magical prior).

I think this task, of finding the best hypothesis among some you’ve thought of, is a useful one for grounding the discussion and allowing comparison between different problem-solving methods. I think that solving this problem provides space for a Bayesianism that’s more substantive than just a collection of machinery, but is still part of a larger system for understanding human reasoning.

(Of course, choice of this goal [identify the best hypothesis] is itself not Bayesian - a more natural thing to do would be to frame this in terms of making empirical predictions based on the set of imagined hypotheses, in which case the Bayesian approach still gets some nice guarantees for the same reason that minimum message length prediction is expected to work [even if you don’t do anything uncomputable, you can still piggyback off of the nice properties of Solomonoff induction].)

One can still criticize the case of choosing between a list of hypotheses, given some data, as too abstract and not engaging enough with human limitations. But now I think this criticism is about equally deflationary for all the tools in all the toolboxes, and so it’s more emotionally appealing to reject it.

On the topic of regularization: Whenever you see the adjective “just” or “mere” in anything remotely philosophical, you can guess that that poor word is about to do some heavy lifting. So you can imagine what I anticipated upon reading that “Bayesianism is just regularization, dude.”

Funnily enough, I think the problem with the simple Bayesian interpretation of regularization (as you point out: who the heck has a prior that your model parameters are Gaussian-distributed with known variance?) is that they are insufficiently Bayesian. By this I mean that they tunnel-vision on a particular model, instead of trying to assign weights to a whole bunch of possible models and choosing between them based on what the data says, which involves applying Bayes’ rule way more, so it must be more Bayesian (:P). And of course, this isn’t an original idea: plenty of people are trying to do Bayesian hyperparameter optimization.

Interesting stuff.

When you talk about finding the best hypothesis (i.e. getting the order of the probabilities right, if not the numerical value), why do you think Bayes gives the right answer?  You say “up to a magical prior,” but if we ignore the prior, we just have the likelihood, and we’re talking about “best hypothesis = maximum likelihood hypothesis.”  This isn’t exactly a bad idea but it’s neither uniquely Bayesian nor a good encapsulation of what we mean by “best” here.

One reason it isn’t a good encapsulation is that maximum likelihood may work better with some regularization, which a good prior would do.  But then, people seem to have a lot of trouble coming up with and using coherent priors, plus this gives us enough freedom that we can often change the result (which is best) by changing the prior … I’m just not seeing why Bayes does the job we want here in some assured, or uniquely good, way.

a more natural thing to do would be to frame this in terms of making empirical predictions based on the set of imagined hypotheses, in which case the Bayesian approach still gets some nice guarantees for the same reason that minimum message length prediction is expected to work [even if you don’t do anything uncomputable, you can still piggyback off of the nice properties of Solomonoff induction]

I agree about the first part (mean vs. mode, right?), but I don’t think I’m familiar with the guarantees you refer to here – link?

About “just”: that was meant as semi-joking payback for all of the gotchas about how other methods are “just” Bayes in disguise.  Regularization is just Bayes, huh?  Well, guess what: Bayes is just regularization!!!

(via 4point2kelvin)

bayes: a kinda-sorta masterpost

derplefurf:

nostalgebraist:

I have written many many words about “Bayesianism” in this space over the years, but the closest thing to a comprehensive “my position on Bayes” post to date is this one from three years ago, which I wrote when I was much newer to this stuff.  People sometimes link that post or ask me about it, which almost never happens with my other Bayes posts.  So I figure I should write a more up-to-date “position post.”

I will try to make this at least kind of comprehensive, but I will omit many details and sometimes state conclusions without the corresponding arguments.  Feel free to ask me if you want to hear more about something.

I ended up including a whole lot of preparatory exposition here – the main critiques start in section 6, although there are various critical remarks earlier.

Keep reading

Worth noting that your Section 8 (considering more hypotheses as you go along, not enumerating an infinite hypothesis space at the start or using infite computational power) highlights a problem that Eliezer and company have acknowledged for years, worked hard on, and last year actually found a novel answer to. (The best way to understand the paper, currently, is probably this 90-minute lecture.)

https://www.youtube.com/watch?v=UOddW4cXS5Y

Computable approximate Bayesian reasoners, e.g. logical inductors (which provably converge to perfect Bayesian reasoning in the limit, and have a bunch of nice properties as they go along), are indeed weirder to ponder than Solomonoff Induction. The objection about priors has an interesting answer here (with some edge cases), but I really can’t explain it out of context. And of course, this a computable algorithm but not an effectively computable one.

But I’d like to note that while non-Bayesians were pointing out the issue as a “see, this is why Bayesian reasoning can’t do anything without infinite computation, might as well scrap that endeavor”, Eliezer and company were actually working on that issue.

I’m aware of that paper.  Here are my thoughts on it.

Re: your last paragraph – people tend to work on approaches they find relatively promising, so it shouldn’t be surprising that Bayesians worked on fixing problems with Bayes while non-Bayesians worked on improving other approaches.

(via profound-yet-trivial)

tchaikovskaya:
“thank u google thats exactly what i was looking for, not how many days she has been the prime minister of the united kingdom, her height in picometers. you can read my mind, google, its uncanny
”

tchaikovskaya:

thank u google thats exactly what i was looking for, not how many days she has been the prime minister of the united kingdom, her height in picometers. you can read my mind, google, its uncanny 

(via rincewitch)

As in any good story, the characters developed over time. Eventually, Lamar and Koenig got divorced – Koenig got custody of the majority of internal organs, and later went on to become a pan-dimensional being. The Infomage, on the other hand, set up a successful data exchange business with his dataside alter ego, Image. A major event was the marriage of Goblin the tea-boy and Unidentified Girl in Pigtails, or UPiG.

There was one glorious moment when I got one of the robots to say she “Shot a man in Reno, just to see him die,” but aside from that, it was a miserable experience.

galleytrot:

the good folk you meet on desert highways

(via whatevernatureis)