Install Theme

raginrayguns:

Yudkowsky is better than Jaynes at keeping his eye on the prize. He’s less likely to defend an approach by redefining what the approach should accomplish. Kinda like what he says, ‘the utility function is not up for grabs.’“

I was thinking that after reading things they said, related to Solomonoff induction. I was wondering why Yudkowsky thinks that 2^-K(H), the Solomonoff prior, is a good prior compared to all the other priors you could put on the space of all computable hypotheses. Even among all the priors you could choose that are based on Kolmogorov complexity. For a long time I imagined that he had encountered some argument for it, analogous to Cox’s theorem for the probability laws, or Jaynes’s derivation of Laplace’s principle of indifference. This doesn’t seem to be the case though, here’s an old LessWrong comment of Yudkowsky’s:

Well, the ideal simplicity prior you should use for Solomonoff computation, is the simplicity prior our own universe was drawn from.

Since we have no idea, at this present time, why the universe is simple to begin with, we have no idea what Solomonoff prior we should be using. We are left with reflective renormalization - learning about things like, “The human prior says that mental properties seem as simply as physical ones, and that math is complicated; but actually it seems better to use a prior that’s simpler than the human-brain-as-interpreter, so that Maxwell’s Equations come out simpler than Thor.” We look for simple explanations of what kinds of “simplicity” our universe prefers; that’s renormalization.

Does the underspecification of the Solomonoff prior bother me? Yes, but it simply manifests the problem of induction in another form - there is no evasion of this issue, anyone who thinks they’re avoiding induction is simply hiding it somewhere else. And the good answer probably depends on answering the wrong question, “Why does anything exist in the first place?” or “Why is our universe simple rather than complicated?” Until then, as said, we’re left with renormalization.

And it contrasts, to me, to the way I don’t think I’ve ever seen Jaynes say he was dissatisfied with his solution to some problem or another, the way Yudkowsky here says he’s bothered by how he can’t rule out other priors.

So, Jaynes actually says something sort of related to universal priors, which is that he doesn’t seem to think they exist:

As we showed in connection with multiple hypothesis testing in Chapter 4, Newton’s theory in Chapter 5, and the above discussion of significance tests, an hypothesis can attain a very high or very low probability within a class of well-defined alternatives. Its probability within the class of all conceivable theories is neither large nor small; it is simply undefined because the class of all conceivable theories is undefined. In otherwords, Bayesian inference deals with determinate problems – not the undefined ones of Popper – and we would not have it otherwise.

So, you know, first of all this looks like sour grapes to me, “I can’t calculate this probability and actually, I wouldn’t want to.” Would you have thought that if you could calculate it? And second, I think there ARE universal priors over at least all computable hypotheses, and maybe all conceivable ones, so it’s kind of like, he missed an opportunity.

I don’t really expect Yudkowsky to do that. I think you see that in his approach to philosophical questions to, where if he says “this doesn’t have an answer,” he doesn’t think he’s done until he explains why we THINK it has an answer, why we’re looking for an answer in the first place. What kind of mind would have this philosophical dilemma.

I hadn’t seen that Yudkowsky comment and it kind of startles me.  I had always figured that people trying to construct a universal prior were trying to be completely strict about the, well, universality of it – it has to not smuggle in any information about our own universe.  It has to be able to handle any universe, any conceivable laws of physics (etc).

If you do allow information to be smuggled in like that, then the case for using a universal prior seems much weaker to me.  Why bother including in the prior a bunch of hypotheses that are clearly inapplicable to our own universe?  That will just slow learning down (not in the computation time sense, but in the sense that one will need more observations to learn any given thing).

A while ago I made a post about approximated AIXI playing Pac-Man, where I claimed that it was slower to learn the game than a human would be.  I then speculated that this was because it had to locate the game laws in a very large space of possibilities, where a human would come in with more background information about what a game should be like.  Humans can smuggle in information.

But here Yudkowsky is saying that he thinks the universe is simple for a particular definition of simplicity, which we can’t figure out a priori and can only learn if we look at the universe first.  But he still wants a universal prior weighted by simplicity, just using this definition of simplicity.  But now he’s smuggling in information learned from observations, so it seems like you could go further: you could say “the universe seems X” for any given X and then choose a prior that makes X likely.  By allowing our concept of simplicity to vary based on observations, we’ve already lost universality, so why stop at simplicity?

(It’s not clear what is even ruled out by restricting ourselves to “simplicity” – this is hand-wavey, but for any property of the universe X, the universe can be described more briefly if we let ourselves assume X first, and wouldn’t that count as simplicity?  In the extreme, we could just make a “simplicity prior” that puts probability 1 on the universe exactly as we understand it now, and say “according to this notion of simplicity, the universe is the simplest thing!”)

another-normal-anomaly:

plain-dealing-villain:

shlevy:

theungrumpablegrinch:

shlevy:

We will cry “ Justice!” and proclaim that rationality need not come as dear as [Bayesians] insist. More than this we shall argue that Bayesian principles cannot even be construed as an idealization of human rationality; in many cases applicable to the human condition, these principles disallow what is rational

[popcorn-eating gif, snipped by nostalgebraist]

Link? (If it’s any good.)

http://www.cs.toronto.edu/~fbacchus/Papers/BKTSYN90.pdf (via @nostalgebraist)

How does someone this clueless commute without being hit by a bus daily?

This seems to be mixing different domains (descriptive, normative, mathematical, practical) wildly, and also to be completely stupid. IIRC, if you can construct a dutch book that someone with axiom-violating beliefs would consider to have a net payout of 0, you can construct one that they would consider to have a positive net payout, unless their deviation from the axioms is super tiny. Then they take it, and you take all their money. And then the paper goes on to say that it’s okay to be dutch-bookable because you might never meet anybody interested in taking advantage! If I knew more social engineering, I would try to sucker the paper authors out of a pile of money, but I suspect their common sense will protect them where their explicit beliefs let them down.

I don’t understand your objection.  The point the authors are making is that one can recognize a Dutch book simply by looking at the odds, and note that betting at those odds gives you a loss, no matter what beliefs one holds about the actual events.  Then one just rejects it because it gives a loss.

For instance, in the Dutch book example given here, you can simply look at the odds given and note that the bookie has created a money pump, and refuse to take the bets.  You can always do this, if even you believe that the bookie’s probabilities are “true”  (in some sense of that phrase).

To avoid getting Dutch booked, you must only agree to betting odds that obey the probability axioms.  The question is then, do the betting odds you post have to reflect your degrees of belief?  What goes wrong if they don’t?

(In practice, I’m well aware that my subjective degrees of belief don’t follow the probability axioms and can’t given the nature of my brain, so I wouldn’t base my betting behavior on them as if they were probabilities – then I would be susceptible to Dutch books!)

(via another-normal-anomaly)

theungrumpablegrinch:

shlevy:

theungrumpablegrinch:

shlevy:

We will cry “ Justice!” and proclaim that rationality need not come as dear as [Bayesians] insist. More than this we shall argue that Bayesian principles cannot even be construed as an idealization of human rationality; in many cases applicable to the human condition, these principles disallow what is rational

[popcorn-eating gif, snipped by nostalgebraist]

Link? (If it’s any good.)

http://www.cs.toronto.edu/~fbacchus/Papers/BKTSYN90.pdf (via @nostalgebraist)

A decent case against Dutch Books, but the paper falls down when it comes to conditionalization. Yes, you can add more information to a thought experiment such that the rational outcome is different. Misses the point.

The exposition is definitely muddled, but I think there is a bit more going on than that.

The running example seems like it is about adding information to a thought experiment, but I don’t think it’s trivial to deal with the modified experiment in a Bayesian framework.  I think the point they’re making is that in practice, one doesn’t get a prior out of nowhere, one constructs it somehow from available information, and later observations may lead one to conclude that one’s construction procedure should have been different.

The Bayesian approach would be to try to pack this information into a more general prior – there is this outcome, E, and one would write down a prior so that all of the probabilities conditioned on E are adjusted in light of the information about prior construction one would learn from E.  In other words, you choose up your prior so that the beliefs that will result from it by conditionalization will seem reasonable.

If you just stipulate that you can do this, then conditionalization is fine, but you’ve assumed the conclusion – you’ve assumed you can construct a prior such that the results of conditionalization will always be reasonable.

But it we don’t take this as given, it’s not clear that it is possible even in principle.  If I have any technique for prior construction that involves some tunable parameters, some feature that is not just “inarguably correct,” then I have to have a meta-prior over those parameters, so that the support of my overall prior contains situation where I learn information like E that would affect my choice of prior construction.  But then I have to construct my meta-prior over the parameters, which I will construct using some technique, and I’ll need a meta-meta-prior over the parameters of that technique, and so we get an infinite regress.

(This could be avoided if at some meta-level there is only one possible prior, but that doesn’t seem especially likely.  Objective Bayesianism seems like it would avoid things like this, but it is a hard thing to do even in theory, much less in practice.)


OTOH, I think the Scandinavians example just doesn’t work.  The authors don’t state it outright, but you can derive that in the distribution they give, P(Swede|Stat) = P(Nor|Stat).  This seems strange, since Stat states that “80% of all Scandinavians are Swedes,” but is justified by stipulating that we know P(Stat|Swede) and P(Stat|Nor) have the values necessary to make P(Swede|Stat) = P(Nor|Stat).

Ordinarily, you’d think that learning Stat would teach us a lot about Swede and Nor, but not the other way around: Peterson is just one guy among millions, so base rates tell us things about him, but facts about him don’t tell us much about base rates.  If facts about him implied nothing about the base rates, we’d have P(Stat|Swede) = P(Stat|Nor).  But this is far from true.

Maybe we are playing some game with Peterson, who is a statistician, and the game allows one to tell truths and lies in certain patterns, so that if Peterson tells us Stat or ~Stat, we think he’d know the truth of the matter, but also he might be lying for game purposes, and likewise if Peterson tells us Nor or Swede.  (Or something.)

But if we’re playing this game, then it’s no longer clear that upon learning Stat, we should update P(Swede) and P(Nor) to the base rates given by Stat.  After all, the rules of the game tie together Stat, Swede and Nor in an unusual way.  Learning that Peterson was telling the truth when he claimed Stat may affect our opinion of whether he was telling the truth when he said (for instance) Nor.


Not sure what I think of the drunkenness example.  Knowing that one will predictably have incorrect beliefs at a later time seems like a special case, but then it’s a special case that occurs in real life and Bayesians should have some way of handling it.

(via theungrumpablegrinch)

shlevy:

Am I the only one who finds dutch-book style arguments in favor of Bayesian/utilitarian reasoning utterly underwhelming? You can’t just assume I can assign a fair price to all possible beliefs, that’s the entirety of my objection to the scheme!

I do too.  In the Bayes case, they’re a justification for the update rule rather than for the representation of beliefs by probabilities, and my main objections are all about the latter.

Even if you grant that part, Dutch Book arguments are still kind of strange – they seem to be saying that you can only bet rationally if you insure against the worst case.  In other words, you have to use minimax, which makes some sense when playing a game against an opponent who wants you to lose, but is not intuitive as a general principle of decision-making.  (The paper Against Conditionalization (PDF) talks about all this in a fun if rambling way.)

lisp-case-is-why-it-failed:

nostalgebraist:

lisp-case-is-why-it-failed:

nostalgebraist:

A year ago, I remember being baffled by Eliezer Yudkowsky’s statement that he was working on “an attempted successor to Wikipedia and/or Tumblr and/or peer review.”  Today I saw a link to a site called https://arbital.com/ which appears to be the thing he was talking about (I don’t know how long it has been up, but I only became aware of it today).

The basic idea is that Arbital will provide explanations of concepts for readers at a variety of background knowledge levels, all in one place.  This seems like a good idea to me – I’m sure we have all had the experience of Googling around and finding a bunch of explanations that one doesn’t know how to compare, which all use different terminology, none of which is quite adequate for one’s purposes.

The way the site implements this is kind of awkward.  When you look up a concept (example), you are presented with a quiz about your background knowledge.  After you fill out the quiz, you get a list of explainer articles, like a lesson plan, ostensibly tailored to your background knowledge.  If the plan works, everything is fine, but if it doesn’t, it’s not easy to find other sub-explainers that might work, or in general to get a sense of what content the site actually contains (in total).

That aside, as I said, the basic concept sounds good.  However, I don’t think the site will ever live up to its ambitions.  Those ambitions include:

We want to do for difficult explanations - and someday, complicated arguments in general - what Wikipedia did for centralizing humanity’s recounting of agreed-on facts.

There’s a certain sleight of hand going in this sentence.  “Agreed-on” only occurs in the description of Wikipedia, implying that explanations don’t need to be “agreed-on” on the same way, or to the same extent.  But then what exactly are we reading when we read an Arbital explanation?  Something that is somehow authoritative, or just some particular person’s opinion about how to conceptualize a topic?  If (say) the choice of explanatory metaphor for a topic is controversial, which metaphor will Arbital choose?

This problem is already evident in Arbital’s example of its approach, its section on Bayes’ Rule.  Arbital provides several different explanations here, but they are all basically rewrites of Yudkowsky’s earlier Bayes explainers, and inherit their problems, like slipping from synchonic to diachronic without noting the difference (e.g. merely using conditional probabilities in a known population is referred to as “updating” from a “prior”).  This is neither an unbiased explanation nor (IMO) a good one.

By contrast, the very messiness of Wikipedia is a positive here.  The page on a topic often mixes together a bunch of different views and levels of technicality, which is good insofar as it gives the reader a sense of the range of things that have been thought and said on the topic.  The same “messiness” is apparent in a Google search or a trawl through academic literature, but there too it’s a good thing, and for the same reason.  When trying to understand a nontrivial subject, one should never be satisfied with just a single explainer article – but that’s what Arbital wants to provide.

(I should also say that I think Wikipedia kind of already solves the “multiple levels of background knowledge” problem.  Although it rarely has technical and non-technical versions of the same article, it does allow the user to look up many specific terms they don’t understand, essentially allowing the reader to specify their background knowledge in a fine-grained way.  If you know every term on a Wikipedia page except, say, “Nestorianism,” well, you can just click that word.

By contrast, Arbital can only respond to backgrounds that fit into its coarse-grained framework.  If the Arbital explainer that’s ostensibly “on your level” uses a term you don’t know, the best you can do on Arbital is go back, request a lower-level explainer of the same subject, and hope that that term is explained somewhere in the new lesson plan.  In practice one would probably look the term up somewhere else rather than continuing to use Arbital, but this itself demonstrates some of the limitations of the premise – compare to Wikipedia, where you can happily browse within the closed system for hours.)

Could you explain what’s wrong about calling using conditional probabilities in a known population updating from a prior? Is it because there’s no prior to be had? Is it just a complete misuse of the terms? I’m not trying to challenge you here, I just don’t know statistics.

Sure.  This topic can be confusing because on the one hand there is “Bayesianism,” which is a whole philosophy of inference that not everyone subscribes to, and on the other hand there’s Bayes’ Rule (or Bayes’ Theorem), which is just a rearrangement of the definition of conditional probability – and thus something you have to “subscribe to,” since it’s just mathematically true.

Despite the name, Bayesianism isn’t exactly about Bayes’ Rule per se.  It’s (roughly) the idea that you should represent your beliefs at any given time by a probability distribution over possibilities, and change your beliefs when you observe something new by replacing your probabilities with their values conditional on the thing you saw.  Like, if your level of belief in a was P(a) before, and now you’ve seen b, your level of belief becomes P(a|b).  And this is called a “Bayesian update.”

You usually end up using Bayes’ Rule to compute these conditional probabilities, which is how Bayesianism got the name.

So now, say you have a known population – like, say you have a bunch of blocks of different shapes, and on the whole 40% of them are red, but 80% of the triangular ones are red.  And if I ask about the probability of a block being red, conditional on it being triangular, you’d say 80%.  But this is just basic probability stuff that no one disagrees with, not something that depends on any of the ideas in the second paragraph in this post.  Calling it an “update” makes it sound like a specifically Bayesian thing.

(In particular, if you are not a Bayesian, you can talk about the probabilities associated with a single block and change your opinions about them over time, but you will never say “my degree of belief that this block is red is X%” – you have no degrees of belief to update.)

Okay, that makes some sense? I agree that it’s a little silly to call the second thing an “update”, but I’m unclear on the actual distinction between Bayesians and everyone else. Is it a philosophical argument over how to define probability (Bayesians think they’re degrees of belief, other people think different stuff)? I thought there were also technical arguments between the various camps?

IMO, the point of most contention is not exactly that Bayesians think probabilities “are” degrees of belief, it’s that Bayesians think degrees of belief should ideally be represented as probabilities.  They tend to think that this is the way a “perfect inductive reasoner” would do things, and that statistics only makes sense if you start from this foundation.

(This philosophical school of thought is most closely associated with E. T. Jaynes; there are also more “practical” Bayesians who just like statistical methods derived from from the Bayesian framework)

Critics tend to point out the many technical problems with representing degrees of belief as probabilities, such as the question of how you assign probability mass to possibilities you haven’t thought of yet, or how you work with spaces that are difficult to put measures on (like “conceivable physical laws”), or the question of how you can represent “I don’t know about this thing” in a probability distribution without thereby implying belief about some other thing you don’t know either.  (This is kind of hilariously difficult, and certain people have devoted a lot of work to figuring out how to state “seriously, I have no idea” in properly Bayesian terms – look up “objective Bayesianism” if you’re curious.)

(via just-evo-now)

lisp-case-is-why-it-failed:

nostalgebraist:

A year ago, I remember being baffled by Eliezer Yudkowsky’s statement that he was working on “an attempted successor to Wikipedia and/or Tumblr and/or peer review.”  Today I saw a link to a site called https://arbital.com/ which appears to be the thing he was talking about (I don’t know how long it has been up, but I only became aware of it today).

The basic idea is that Arbital will provide explanations of concepts for readers at a variety of background knowledge levels, all in one place.  This seems like a good idea to me – I’m sure we have all had the experience of Googling around and finding a bunch of explanations that one doesn’t know how to compare, which all use different terminology, none of which is quite adequate for one’s purposes.

The way the site implements this is kind of awkward.  When you look up a concept (example), you are presented with a quiz about your background knowledge.  After you fill out the quiz, you get a list of explainer articles, like a lesson plan, ostensibly tailored to your background knowledge.  If the plan works, everything is fine, but if it doesn’t, it’s not easy to find other sub-explainers that might work, or in general to get a sense of what content the site actually contains (in total).

That aside, as I said, the basic concept sounds good.  However, I don’t think the site will ever live up to its ambitions.  Those ambitions include:

We want to do for difficult explanations - and someday, complicated arguments in general - what Wikipedia did for centralizing humanity’s recounting of agreed-on facts.

There’s a certain sleight of hand going in this sentence.  “Agreed-on” only occurs in the description of Wikipedia, implying that explanations don’t need to be “agreed-on” on the same way, or to the same extent.  But then what exactly are we reading when we read an Arbital explanation?  Something that is somehow authoritative, or just some particular person’s opinion about how to conceptualize a topic?  If (say) the choice of explanatory metaphor for a topic is controversial, which metaphor will Arbital choose?

This problem is already evident in Arbital’s example of its approach, its section on Bayes’ Rule.  Arbital provides several different explanations here, but they are all basically rewrites of Yudkowsky’s earlier Bayes explainers, and inherit their problems, like slipping from synchonic to diachronic without noting the difference (e.g. merely using conditional probabilities in a known population is referred to as “updating” from a “prior”).  This is neither an unbiased explanation nor (IMO) a good one.

By contrast, the very messiness of Wikipedia is a positive here.  The page on a topic often mixes together a bunch of different views and levels of technicality, which is good insofar as it gives the reader a sense of the range of things that have been thought and said on the topic.  The same “messiness” is apparent in a Google search or a trawl through academic literature, but there too it’s a good thing, and for the same reason.  When trying to understand a nontrivial subject, one should never be satisfied with just a single explainer article – but that’s what Arbital wants to provide.

(I should also say that I think Wikipedia kind of already solves the “multiple levels of background knowledge” problem.  Although it rarely has technical and non-technical versions of the same article, it does allow the user to look up many specific terms they don’t understand, essentially allowing the reader to specify their background knowledge in a fine-grained way.  If you know every term on a Wikipedia page except, say, “Nestorianism,” well, you can just click that word.

By contrast, Arbital can only respond to backgrounds that fit into its coarse-grained framework.  If the Arbital explainer that’s ostensibly “on your level” uses a term you don’t know, the best you can do on Arbital is go back, request a lower-level explainer of the same subject, and hope that that term is explained somewhere in the new lesson plan.  In practice one would probably look the term up somewhere else rather than continuing to use Arbital, but this itself demonstrates some of the limitations of the premise – compare to Wikipedia, where you can happily browse within the closed system for hours.)

Could you explain what’s wrong about calling using conditional probabilities in a known population updating from a prior? Is it because there’s no prior to be had? Is it just a complete misuse of the terms? I’m not trying to challenge you here, I just don’t know statistics.

Sure.  This topic can be confusing because on the one hand there is “Bayesianism,” which is a whole philosophy of inference that not everyone subscribes to, and on the other hand there’s Bayes’ Rule (or Bayes’ Theorem), which is just a rearrangement of the definition of conditional probability – and thus something you have to “subscribe to,” since it’s just mathematically true.

Despite the name, Bayesianism isn’t exactly about Bayes’ Rule per se.  It’s (roughly) the idea that you should represent your beliefs at any given time by a probability distribution over possibilities, and change your beliefs when you observe something new by replacing your probabilities with their values conditional on the thing you saw.  Like, if your level of belief in a was P(a) before, and now you’ve seen b, your level of belief becomes P(a|b).  And this is called a “Bayesian update.”

You usually end up using Bayes’ Rule to compute these conditional probabilities, which is how Bayesianism got the name.

So now, say you have a known population – like, say you have a bunch of blocks of different shapes, and on the whole 40% of them are red, but 80% of the triangular ones are red.  And if I ask about the probability of a block being red, conditional on it being triangular, you’d say 80%.  But this is just basic probability stuff that no one disagrees with, not something that depends on any of the ideas in the second paragraph in this post.  Calling it an “update” makes it sound like a specifically Bayesian thing.

(In particular, if you are not a Bayesian, you can talk about the probabilities associated with a single block and change your opinions about them over time, but you will never say “my degree of belief that this block is red is X%” – you have no degrees of belief to update.)

(via just-evo-now)

A year ago, I remember being baffled by Eliezer Yudkowsky’s statement that he was working on “an attempted successor to Wikipedia and/or Tumblr and/or peer review.”  Today I saw a link to a site called https://arbital.com/ which appears to be the thing he was talking about (I don’t know how long it has been up, but I only became aware of it today).

The basic idea is that Arbital will provide explanations of concepts for readers at a variety of background knowledge levels, all in one place.  This seems like a good idea to me – I’m sure we have all had the experience of Googling around and finding a bunch of explanations that one doesn’t know how to compare, which all use different terminology, none of which is quite adequate for one’s purposes.

The way the site implements this is kind of awkward.  When you look up a concept (example), you are presented with a quiz about your background knowledge.  After you fill out the quiz, you get a list of explainer articles, like a lesson plan, ostensibly tailored to your background knowledge.  If the plan works, everything is fine, but if it doesn’t, it’s not easy to find other sub-explainers that might work, or in general to get a sense of what content the site actually contains (in total).

That aside, as I said, the basic concept sounds good.  However, I don’t think the site will ever live up to its ambitions.  Those ambitions include:

We want to do for difficult explanations - and someday, complicated arguments in general - what Wikipedia did for centralizing humanity’s recounting of agreed-on facts.

There’s a certain sleight of hand going in this sentence.  “Agreed-on” only occurs in the description of Wikipedia, implying that explanations don’t need to be “agreed-on” on the same way, or to the same extent.  But then what exactly are we reading when we read an Arbital explanation?  Something that is somehow authoritative, or just some particular person’s opinion about how to conceptualize a topic?  If (say) the choice of explanatory metaphor for a topic is controversial, which metaphor will Arbital choose?

This problem is already evident in Arbital’s example of its approach, its section on Bayes’ Rule.  Arbital provides several different explanations here, but they are all basically rewrites of Yudkowsky’s earlier Bayes explainers, and inherit their problems, like slipping from synchonic to diachronic without noting the difference (e.g. merely using conditional probabilities in a known population is referred to as “updating” from a “prior”).  This is neither an unbiased explanation nor (IMO) a good one.

By contrast, the very messiness of Wikipedia is a positive here.  The page on a topic often mixes together a bunch of different views and levels of technicality, which is good insofar as it gives the reader a sense of the range of things that have been thought and said on the topic.  The same “messiness” is apparent in a Google search or a trawl through academic literature, but there too it’s a good thing, and for the same reason.  When trying to understand a nontrivial subject, one should never be satisfied with just a single explainer article – but that’s what Arbital wants to provide.

(I should also say that I think Wikipedia kind of already solves the “multiple levels of background knowledge” problem.  Although it rarely has technical and non-technical versions of the same article, it does allow the user to look up many specific terms they don’t understand, essentially allowing the reader to specify their background knowledge in a fine-grained way.  If you know every term on a Wikipedia page except, say, “Nestorianism,” well, you can just click that word.

By contrast, Arbital can only respond to backgrounds that fit into its coarse-grained framework.  If the Arbital explainer that’s ostensibly “on your level” uses a term you don’t know, the best you can do on Arbital is go back, request a lower-level explainer of the same subject, and hope that that term is explained somewhere in the new lesson plan.  In practice one would probably look the term up somewhere else rather than continuing to use Arbital, but this itself demonstrates some of the limitations of the premise – compare to Wikipedia, where you can happily browse within the closed system for hours.)

tentativelyassembled:

nostalgebraist:

theaudientvoid:

dagny-hashtaggart:

marcusseldon:

Yudkowsky so grinds my gears. He just reeks of con artistry and/or crankery.

I’ve never gotten a con artist vibe from him; he’s into too much stuff that limits his appeal. I’m with you on crank, though. As someone on the periphery, I think a lot of the core LW group don’t get how his writing looks to people on the outside. I know Scott has expressed frustration at the amount of flak rationalism in general has caught over EY’s position on quantum mechanics, and inasmuch as I agree with a lot of rationalism and would like to see its ideas spread I can see where he’s coming from, but it’s not like Yudkowsky has to include asides to the effect that you can’t really be rational if you don’t believe in Many Worlds in unrelated articles. (See also: Bayesianism vs. Frequentism as an element of in-group solidarity.) That puts people off, and I can see why.

As a statistics student who is agnostic with regard to the whole Bayesian/Frequentist thing, Yudkowsky’s burning devotion to Bayes is one of the more off-putting things about him (conversely, I have no issues with Nate Silver’s advocacy for Bayesian statistics as superior, mainly because he’s an actual statistician, who does actual statistics stuff, and is therefore entitled to have opinions about how useful various statistical tools are).

The Bayes thing is especially strange to me because it doesn’t seem to underlie anything important in what he says or does.  There are a few points where he seems to get something substantive out of it, like when he says that hypotheses shouldn’t all be treated as equally likely before someone tests them scientifically (because we have other sources of information), but these points seem kind of obvious and could be found by other means.

It seems like E. T. Jaynes convinced him that Bayesianism was the only complete, self-consistent way to do inference and he never looked back – he wouldn’t be the only one – but it’s not clear that this change has had any real consequences.

The Bayes thing doesn’t throw me, but I suspect that’s at least in part because I never looked too closely at why EY was very pro-Bayesianism - I just skipped over it because I’m a statsy person who is a pretty big fan of Bayes myself (not particularly capable of justifying it well though, don’t ask). 

The Many Worlds stuff and the quantum mechanics though - it’s not even that I disagree with it - it’s just, why ???

His answer to that question (or the one I know of, there may be others) is in this post:

I wanted a very clear example – Bayes says “zig”, this is a zag – when it came time to break your allegiance to Science.

By “Bayes” here he means Bayes with a prior that says more complex hypotheses are less likely, and by “Science” (capital S) he means the kind of casual hypotheco-deductivism that scientists tend to use in practice (”a hypothesis has to make predictions,” etc).

His point is actually really simple: he thinks that if there are multiple explanations that explain the data equally well (i.e. make the same predictions), you should say the simplest one is correct, rather than saying they’re all equally valid or just using the one that’s traditional.

This is a sensible enough idea, although not a uniquely “Bayesian” one*, but it’s odd that he made such a simple point by digressing at length into QM in the service of a controversial claim.  Here is why he says he had to do it that way:

In physics, you can get absolutely clear-cut issues.  Not in the sense that the issues are trivial to explain.  But if you try to apply Bayes to healthcare, or economics, you may not be able to formally lay out what is the simplest hypothesis, or what the evidence supports.  But when I say “macroscopic decoherence is simpler than collapse” it is actually strict simplicity; you could write the two hypotheses out as computer programs and count the lines of code. Nor is the evidence itself in dispute.

This makes a certain amount of sense if you are as confident about Many-Worlds as he is: he thinks people are making important mistakes as a consequence of not using his particular stance on philosophy of science, and he wants to exhibit one such mistake, so he chooses an area with minimum ambiguity.


*(My justification for this statement that is more technical than the rest of this post, so I’m putting it in a footnote.  In machine learning terms the prior is like a “regularizer” that penalizes model complexity, and there are good reasons for penalizing model complexity whether you’re a Bayesian or not.  EY seems to believe that Bayes with a Solomonoff prior is the perfect way to penalize model complexity, or would be if we could do it computably, but it’s not clear why he thinks so, and he’s writing as though only Bayesians care about model complexity, which is just plain false.)

(via tentativelyassembled)

theaudientvoid:

dagny-hashtaggart:

marcusseldon:

Yudkowsky so grinds my gears. He just reeks of con artistry and/or crankery.

I’ve never gotten a con artist vibe from him; he’s into too much stuff that limits his appeal. I’m with you on crank, though. As someone on the periphery, I think a lot of the core LW group don’t get how his writing looks to people on the outside. I know Scott has expressed frustration at the amount of flak rationalism in general has caught over EY’s position on quantum mechanics, and inasmuch as I agree with a lot of rationalism and would like to see its ideas spread I can see where he’s coming from, but it’s not like Yudkowsky has to include asides to the effect that you can’t really be rational if you don’t believe in Many Worlds in unrelated articles. (See also: Bayesianism vs. Frequentism as an element of in-group solidarity.) That puts people off, and I can see why.

As a statistics student who is agnostic with regard to the whole Bayesian/Frequentist thing, Yudkowsky’s burning devotion to Bayes is one of the more off-putting things about him (conversely, I have no issues with Nate Silver’s advocacy for Bayesian statistics as superior, mainly because he’s an actual statistician, who does actual statistics stuff, and is therefore entitled to have opinions about how useful various statistical tools are).

The Bayes thing is especially strange to me because it doesn’t seem to underlie anything important in what he says or does.  There are a few points where he seems to get something substantive out of it, like when he says that hypotheses shouldn’t all be treated as equally likely before someone tests them scientifically (because we have other sources of information), but these points seem kind of obvious and could be found by other means.

It seems like E. T. Jaynes convinced him that Bayesianism was the only complete, self-consistent way to do inference and he never looked back – he wouldn’t be the only one – but it’s not clear that this change has had any real consequences.

(via theaudientvoid)

Unscrambling the second law of thermodynamics →

exsecant:

nostalgebraist:

Just to complicate things, and because this is like the Platonic form of a post I would make:

Cosma Shalizi is skeptical of Jaynes’ interpretation of the second law, saying it would actually imply that entropy decreases over time.

And Eliezer Yudkowsky is skeptical of Shalizi’s argument.  (See discussion here).

(I remember being confused by Shalizi’s argument when I first read it.  I should read it again.)

Argh, on the first page of Shalizi’s paper and already I’m lost. (Which is why I’m in the “grab bag of random physics topics that someone with nothing but multivariable calculus and a good imagination could handle” class and not advanced statistical mechanics. :p)

I think I get what the 3 main assumptions she makes at the top of page 2 are. The evolution operator, the function on phase space that says what happens when you go forward in time, has an inverse. Bayes’ theorem applies as usual and is used to update probabilities of a given amount of matter being in a given state (How widely are microstates defined? I don’t think she gave a specific definition. Is it a single object’s position in state space? What are we defining as a single object/unit of matter, here?) Thermodynamic entropy at time T is equal to the information content of the uncertainty distribution at time T. (Hopefully my not-so-strictly defined ideas of “information” and “uncertainty” are enough to get the general gist of what she’s saying.)

I’m predicting that the very, very general form of her argument is going to look something like, “under Bayesian inference uncertainty decreases the longer you observe something and the more you update your probabilities, and entropy as defined in assumption 3 decreases with uncertainty, which is clearly not actually happening.” (I don’t know where the existence of the inverse of the evolution operator comes in.) It could take me anywhere between a week and 5+ years to wrap my head around what Shalizi is actually saying, though. And much, much longer than that to figure out if she’s right.

Having now re-read the paper, yes – Shalizi just gives a formal version of the argument you describe in your last paragraph, and the extra formalism doesn’t really add anything.

The article you linked talked about a smaller red region and a larger green region, and about how regions don’t expand or contract over time, which means that the red region will probably end up inside the green region, but not vice versa.

This is all true.  What Shalizi is pointing out is that, since regions don’t expand or contract over time, your actual prediction for where the red region will be at a later time can’t as big as the whole green region.  It’s just some other small region somewhere.

OK, so what?  Well, the article talks about how entropy is about “what you know.”  “It’s in the green region” is a less certain statement than “it’s in the red region,” so saying a thing will move from red to green is a good bet in the way that the opposite wouldn’t be.  So one way to look at “entropy always increases,” says the article, is that vaguer claims about the future are more likely to be borne out than specific claims.

But wait.  There was nothing in nature that forced you to make a vague claim about the future.  If you just try to predict where the red region will be in five minutes, you’ll get back an answer exactly as big as the red region (”regions don’t expand or contract”).  “It’s in the green region” is a safe bet, yes, but what is requiring us to make bets safer than we need to?

If you ask where I’ll be if I walk north ten feet from my apartment complex, I’ll answer a specific location: ten feet north from my apartment complex.  It’s true that I could also answer “in the United States” or “in the Milky Way galaxy.”  Those are safe bets.  But the fact that I could give these answers isn’t some principle of nature implying that my position somehow gets vaguer (!) over time.

This is what Shalizi is referring to with his “reversible dynamics.”  We’re trying to get some principle of increasing vagueness out of physics which keeps vagueness (”region size”) constant over time.  Saying “you’re more uncertain about the future” won’t cut it, because according to deterministic physics, you aren’t.

Shalizi’s point about the Bayesian update is almost tangential here, but still worth mentioning.  He’s saying, yes, that when you make an observation, you tend to get more certain, not less.