raginrayguns replied to your quote “@nostalgebraist I interpreted your LI post as arguing that it wasn’t…”
I think ive said this before, but I think if I was in this field, I’d be approaching from the opposite direction. I think Bayesianism came from, “hmm why does probability theory seem to reproduce a lot of qualitative good advice about reasoning, when it’s just counting possibilities? Is there another way to look at it?” I’d be trying to get something that works in SOME situations, and something else that works in OTHER situations, etc, and start from there for a general solution
This sounds right, although some cases are harder because you don’t have anything sitting around that does mysteriously well at the problem. Like, clearly there are some things (say, humans) that can “do practical logical induction” pretty well, but we don’t have any prototype case that’s simple enough to analyze.
Maybe related: after I made that post, I thought “hmm, but do I have any positive examples of how to think about ideal behavior under constraints?” And I thought, well, VC theory and PAC learning seem like examples of what I want. These ideas start out with the observation that real “learners” (humans, ML algorithms) can do pretty well at inferring functions from finitely many examples, even though you can’t really be good at this in the fully general case where a function is just an arbitrary mapping between sets and f(x) tells you nothing about f(x’). So they ask, how would the functions have to be restricted to make learning feasible? And what sort of success metric captures the success of the real learners, without being too forgiving or too stringent?
So, these ideas start out by noting that there are versions of your problem that are too hard or too easy, and trying to find the “right scale” where some practical methods look better than others, rather than looking equally bad or good, or too coarsely distinguished.
Actually… now that I think about it, these are theories of online supervised learning, and logical induction is trying to do online supervised learning for logic. So, this isn’t just an analogy — these things are directly comparable. Just like with PAC learning, you could say: “if logic were just an arbitrary assignment of binary labels to sentences, then there would be no patterns to learn and the problem would be impossible. So, what properties does logic have that could make it more learnable? And how weak does our notion of learnability have to be?” Maybe you’ll find that PA is fundamentally too hard but some more restricted system is okay, maybe you’ll find that PAC is too hard but a weaker learning concept is appropriate, that sort of thing.
Viewed from the online supervised learning perspective, MIRI’s criterion essentially says, “you should use a nonparametric method, because a parametric method will have some blind spots that persist no matter how much data has come in, and someone without these blind spots could make winning trades against your parametric model forever.”
And this is … a fair if limited point about machine learning, although it has nothing really to do with logic! Like, in a nonlinear online learning problem, someone using a random forest that grows with N could pump money out of someone using logistic regression, and they could do this forever. This is true but it’s not the only fact that’s ever relevant about these two methods. You can’t make a good general theory of online supervised learning out of just this one distinction, and it’s not clear why this distinction would be any more (or less) important for predicting logic than for anything else.
So this is picking the “wrong scale” for the problem, sorta – your evaluation is too strict in one way (you have to do well forever) and too forgiving in another (you only have to do well in the long run), and this ends up making a distinction between parametric and nonparametric and not getting any finer-grained than that, with the weird implication that the worst (consistent) nonparametric methods are better than the best parametric methods. This is so simple, I’m really not sure why I didn’t think about it before.
One of the OpenPhil reviewers actually said something about the relation between PAC learning and MIRI’s framework (this was about the paper “Inductive Coherence”):
At a very high level, there is some overlap between the type of work considered here and work on making decisions optimally in the face of computational constraints, since making such decision might involve approximating probability. […] In spirit, the paper is also close to work going back to the 1960s on language identification in the limit. Work on this topic has been largely superseded by a weaker model, Valiant’s notion of PAC (probably approximately correct) learning. (As an aside, if I were a referee of this paper for a journal, I would ask the authors to compare their work to Gold’s work on language identification.)
I remember reading this wanting to understand it, but I looked up Gold’s work and found it technically challenging. But I suspect that understanding this context, where people tried “language identification in the limit” and then moved to PAC learning, would be very illuminating.


