Yudkowsky is better than Jaynes at keeping his eye on the prize. He’s less likely to defend an approach by redefining what the approach should accomplish. Kinda like what he says, ‘the utility function is not up for grabs.’“
I was thinking that after reading things they said, related to Solomonoff induction. I was wondering why Yudkowsky thinks that 2^-K(H), the Solomonoff prior, is a good prior compared to all the other priors you could put on the space of all computable hypotheses. Even among all the priors you could choose that are based on Kolmogorov complexity. For a long time I imagined that he had encountered some argument for it, analogous to Cox’s theorem for the probability laws, or Jaynes’s derivation of Laplace’s principle of indifference. This doesn’t seem to be the case though, here’s an old LessWrong comment of Yudkowsky’s:
Well, the ideal simplicity prior you should use for Solomonoff computation, is the simplicity prior our own universe was drawn from.
Since we have no idea, at this present time, why the universe is simple to begin with, we have no idea what Solomonoff prior we should be using. We are left with reflective renormalization - learning about things like, “The human prior says that mental properties seem as simply as physical ones, and that math is complicated; but actually it seems better to use a prior that’s simpler than the human-brain-as-interpreter, so that Maxwell’s Equations come out simpler than Thor.” We look for simple explanations of what kinds of “simplicity” our universe prefers; that’s renormalization.
Does the underspecification of the Solomonoff prior bother me? Yes, but it simply manifests the problem of induction in another form - there is no evasion of this issue, anyone who thinks they’re avoiding induction is simply hiding it somewhere else. And the good answer probably depends on answering the wrong question, “Why does anything exist in the first place?” or “Why is our universe simple rather than complicated?” Until then, as said, we’re left with renormalization.
And it contrasts, to me, to the way I don’t think I’ve ever seen Jaynes say he was dissatisfied with his solution to some problem or another, the way Yudkowsky here says he’s bothered by how he can’t rule out other priors.
So, Jaynes actually says something sort of related to universal priors, which is that he doesn’t seem to think they exist:
As we showed in connection with multiple hypothesis testing in Chapter 4, Newton’s theory in Chapter 5, and the above discussion of significance tests, an hypothesis can attain a very high or very low probability within a class of well-defined alternatives. Its probability within the class of all conceivable theories is neither large nor small; it is simply undefined because the class of all conceivable theories is undefined. In otherwords, Bayesian inference deals with determinate problems – not the undefined ones of Popper – and we would not have it otherwise.
So, you know, first of all this looks like sour grapes to me, “I can’t calculate this probability and actually, I wouldn’t want to.” Would you have thought that if you could calculate it? And second, I think there ARE universal priors over at least all computable hypotheses, and maybe all conceivable ones, so it’s kind of like, he missed an opportunity.
I don’t really expect Yudkowsky to do that. I think you see that in his approach to philosophical questions to, where if he says “this doesn’t have an answer,” he doesn’t think he’s done until he explains why we THINK it has an answer, why we’re looking for an answer in the first place. What kind of mind would have this philosophical dilemma.
I hadn’t seen that Yudkowsky comment and it kind of startles me. I had always figured that people trying to construct a universal prior were trying to be completely strict about the, well, universality of it – it has to not smuggle in any information about our own universe. It has to be able to handle any universe, any conceivable laws of physics (etc).
If you do allow information to be smuggled in like that, then the case for using a universal prior seems much weaker to me. Why bother including in the prior a bunch of hypotheses that are clearly inapplicable to our own universe? That will just slow learning down (not in the computation time sense, but in the sense that one will need more observations to learn any given thing).
A while ago I made a post about approximated AIXI playing Pac-Man, where I claimed that it was slower to learn the game than a human would be. I then speculated that this was because it had to locate the game laws in a very large space of possibilities, where a human would come in with more background information about what a game should be like. Humans can smuggle in information.
But here Yudkowsky is saying that he thinks the universe is simple for a particular definition of simplicity, which we can’t figure out a priori and can only learn if we look at the universe first. But he still wants a universal prior weighted by simplicity, just using this definition of simplicity. But now he’s smuggling in information learned from observations, so it seems like you could go further: you could say “the universe seems X” for any given X and then choose a prior that makes X likely. By allowing our concept of simplicity to vary based on observations, we’ve already lost universality, so why stop at simplicity?
(It’s not clear what is even ruled out by restricting ourselves to “simplicity” – this is hand-wavey, but for any property of the universe X, the universe can be described more briefly if we let ourselves assume X first, and wouldn’t that count as simplicity? In the extreme, we could just make a “simplicity prior” that puts probability 1 on the universe exactly as we understand it now, and say “according to this notion of simplicity, the universe is the simplest thing!”)
