Install Theme

“embedded self-justification,” or something like that

preamble

Sometimes I wonder what the MIRI-type crowd thinks about some issue related to their interests.  So I go to alignmentforum.org, and quickly get in over my head, lost in a labyrinth of issues I only half understand.

I can never tell whether they’ve never thought about the things I’m thinking about, or whether they sped past them years ago.  They do seem very smart, that’s for sure.

But if they have terms for what I’m thinking of, I lack the ability to find those terms among the twists of their mirrored hallways.  So I go to tumblr.com, and just start typing.

parable (1/3)

You’re an “agent” trying to take good actions over time in a physical environment under resource constraints.  You know, the usual.

You currently spend a lot of resources doing a particular computation involved in your decision procedure.  Your best known algorithm for it is O(N^n) for some n.

You’ve worked on the design of decision algorithms before, and you think this could perhaps be improved.  But to find it, you’d have to shift resources some away from running the algorithm for a time, putting them into decision algorithm design instead.

You do this.  Almost immediately, you discover an O(N^(n-1)) algorithm.  Given the large N you face, this will dramatically improve all your future decisions.

Clearly (…“clearly”?), the choice to invest more in algorithm design was a good one.

Could you have anticipated this beforehand?  Could you have acted on that knowledge?

parable (2/3)

Oh, you’re so very clever!  By now you’ve realized you need, above and beyond your regular decision procedure to guide your actions in the outside world, a “meta-decision-procedure” to guide your own decision-procedure-improvement efforts.

Your meta-decision-procedure does require its own resource overhead, but in exchange it tells you when and where to spend resources on R&D.  All your algorithms are faster now.  Your decisions are better, their guiding approximations less lossy.

All this, from a meta-decision-procedure that’s only a first draft.  You frown over the resource overhead it charges, and wonder whether it could be improved.

You try shifting some resources away from “regular decision procedure design” into “meta-decision-procedure-design.”  Almost immediately, you come up with a faster and better procedure.

Could you have anticipated this beforehand?  Could you have acted on that knowledge?

parable (3/3)

Oh, you’re so very clever!  By now you’ve realized you need, above and beyond your meta-meta-meta-decision-procedure, a “meta-meta-meta-meta-decision-procedure” to guide your meta-meta-meta-decision-procedure-improvement efforts.

Way down on the object level, you have not moved for a very long time, except to occasionally update your meta-meta-meta-meta-rationality blog.

Way down on the object level, a dumb and fast predator eats you.

Could you have anticipated this beforehand?  Could you have acted on that knowledge?

Keep reading

garlend asked: Hi, I remember something about how we do have a theoretical general artificial intelligence, maybe based on brute force bayesianism, the only problem being that it would take unreasonable (as in computronium earth unreasonable) resources to run it. I'm having the worst time trying to find links or references to that, and I'm pretty sure you'd be aware of that. Could you point me in the right direction?

I don’t know the specific article or post (or whatever) that you’re referring to.  I do talk a bit in my Bayes “masterpost” about the exorbitant resource demands needed to explicitly track all of the stuff that brute force Bayes needs you to track.

You may also be thinking of MIRI’s Logical Induction work, which I initially critiqued here and which I tried (not very productively) to discuss further in some more recent posts under this tag.

I need to remember to be careful about getting into these logical inductor arguments, and really any MIRI arguments.  My mind has a tendency to get obsessive about them on a technical level, and I don’t feel like this has been very fruitful.

Ultimately I just think MIRI does a poor job of motivating its work, and this creates a rabbit hole where you can imagine variant motivations forever and try to argue for or against the work conditional on each.  Probably a better question is where they are actually going with it.

evolution-is-just-a-theorem replied to your quote “@nostalgebraist​ I interpreted your LI post as arguing that it wasn’t…”

@nostalgebraist​ I don’t remember seeing your arguments that it can be deformed or relaxed though. Link?

This idea was implicit in the various posts/comments I wrote back when I was talking about these things in 2016. Ideally I’d write up something more explicit and clear here, but I’m kind of burned out of this topic. Here’s a very compressed version, though:

The LI criterion itself only cares about the limit, so asking whether we can get derive bounds like “run 100 steps to get within epsilon=0.1” from the LI criterion is like asking whether we can get such bounds from the mere fact that a series converges (i.e. you can’t and no one thinks you can).

There are basically two ways to take the things the criterion demands for the limit, and demand them before the limit. You could try to apply all of the demands at once, but each only approximately; the demands are too strong for this to work (see Sam Eisenstat’s argument here).

Alternately, you could require only a finite subset of the properties to hold at N (or perhaps even an infinite but proper subset).  For this to be interesting, you need to select some subset that you consider especially important.  I suspect any practical algorithm would be based on something closer to this – for example, you would probably want to insist on things like coherent probabilities and Cromwell’s rule (which are important for decision theory) at every N, while not caring as much about assurance against very clever or adversarial money pumps.

However, if we go down this route, we are now doing something completely different from the spirit of the LI paper. We are not exploiting completeness properties of some very large set of programs to get every heuristic that could ever be useful, we’re selecting specific programs we think encode useful heuristics. This just reduces to ordinary algorithm design: if you are just “defending against traders” for the sake of ensuring you use specific heuristics, the “traders” framework isn’t doing any work. You’re just saying you want your algorithm to use some heuristics, and you can say this in the normal way without having to rewrite them as traders.

raginrayguns replied to your quote “@nostalgebraist​ I interpreted your LI post as arguing that it wasn’t…”

I think ive said this before, but I think if I was in this field, I’d be approaching from the opposite direction. I think Bayesianism came from, “hmm why does probability theory seem to reproduce a lot of qualitative good advice about reasoning, when it’s just counting possibilities? Is there another way to look at it?” I’d be trying to get something that works in SOME situations, and something else that works in OTHER situations, etc, and start from there for a general solution

This sounds right, although some cases are harder because you don’t have anything sitting around that does mysteriously well at the problem.  Like, clearly there are some things (say, humans) that can “do practical logical induction” pretty well, but we don’t have any prototype case that’s simple enough to analyze.

Maybe related: after I made that post, I thought “hmm, but do I have any positive examples of how to think about ideal behavior under constraints?”  And I thought, well, VC theory and PAC learning seem like examples of what I want.  These ideas start out with the observation that real “learners” (humans, ML algorithms) can do pretty well at inferring functions from finitely many examples, even though you can’t really be good at this in the fully general case where a function is just an arbitrary mapping between sets and f(x) tells you nothing about f(x’).  So they ask, how would the functions have to be restricted to make learning feasible?  And what sort of success metric captures the success of the real learners, without being too forgiving or too stringent?

So, these ideas start out by noting that there are versions of your problem that are too hard or too easy, and trying to find the “right scale” where some practical methods look better than others, rather than looking equally bad or good, or too coarsely distinguished.

Actually… now that I think about it, these are theories of online supervised learning, and logical induction is trying to do online supervised learning for logic. So, this isn’t just an analogy — these things are directly comparable. Just like with PAC learning, you could say: “if logic were just an arbitrary assignment of binary labels to sentences, then there would be no patterns to learn and the problem would be impossible. So, what properties does logic have that could make it more learnable? And how weak does our notion of learnability have to be?” Maybe you’ll find that PA is fundamentally too hard but some more restricted system is okay, maybe you’ll find that PAC is too hard but a weaker learning concept is appropriate, that sort of thing.

Viewed from the online supervised learning perspective, MIRI’s criterion essentially says, “you should use a nonparametric method, because a parametric method will have some blind spots that persist no matter how much data has come in, and someone without these blind spots could make winning trades against your parametric model forever.”

And this is … a fair if limited point about machine learning, although it has nothing really to do with logic!  Like, in a nonlinear online learning problem, someone using a random forest that grows with N could pump money out of someone using logistic regression, and they could do this forever.  This is true but it’s not the only fact that’s ever relevant about these two methods.  You can’t make a good general theory of online supervised learning out of just this one distinction, and it’s not clear why this distinction would be any more (or less) important for predicting logic than for anything else.

So this is picking the “wrong scale” for the problem, sorta – your evaluation is too strict in one way (you have to do well forever) and too forgiving in another (you only have to do well in the long run), and this ends up making a distinction between parametric and nonparametric and not getting any finer-grained than that, with the weird implication that the worst (consistent) nonparametric methods are better than the best parametric methods.  This is so simple, I’m really not sure why I didn’t think about it before.


One of the OpenPhil reviewers actually said something about the relation between PAC learning and MIRI’s framework (this was about the paper “Inductive Coherence”):

At a very high level, there is some overlap between the type of work considered here and work on making decisions optimally in the face of computational constraints, since making such decision might involve approximating probability. […] In spirit, the paper is also close to work going back to the 1960s on language identification in the limit. Work on this topic has been largely superseded by a weaker model, Valiant’s notion of PAC (probably approximately correct) learning. (As an aside, if I were a referee of this paper for a journal, I would ask the authors to compare their work to Gold’s work on language identification.)

I remember reading this wanting to understand it, but I looked up Gold’s work and found it technically challenging.  But I suspect that understanding this context, where people tried “language identification in the limit” and then moved to PAC learning, would be very illuminating.

evolution-is-just-a-theorem replied to your post “raginrayguns replied to your post “The new MIRI blog post…”

@nostalgebraist​ I interpreted your LI post as arguing that it wasn’t a stepping stone / stepping stones aren’t useful, not that it wasn’t a plausible stepping stone. Personally I would like to see more detailed arguments for why you think a particular result is unpromising. I genuinely don’t know what your current reasoning is.

(Following up on this reply but also the whole post / comment thread here.  The point of this post is to explain why I don’t find some of MIRI’s constructions to be “plausible stepping stones” toward solutions to their problems of interest, as apart from any critique like “this is just a stepping stone” which would apply to anything that’s not a complete solution.)

Much of MIRI’s work, as I understand it, attempts to extend existing ideas about rationality (such as Bayesianism and decision theory) to cover cases where the theory doesn’t give a well-defined answer.

Sometimes, though not always, the reason the theory doesn’t give a well-defined answer is that it assumes a kind of power that a real-world agent cannot have.  For instance, real-world agents are logically uncertain, while standard Bayesianism assumes logical omniscience.  And real-world agents have to be “smaller than the world” (i.e. all of their world-models are implemented in a proper subset of existing physical things), while some models of rational inference or decision-making (Solomonoff/AIXI) require an agent “larger than the world.”

All of this is, ultimately, motivated by an interest in real-world agents – that is, agents which are not just subject to one specific constraint or another, but simultaneously subject to all of the constraints impose by existing in the real world.  What MIRI really wants to do, of course, is to draw conclusions that can hold for real-world agents, because they’re not only doing pure math – they also want to understand, and help guide, some things in the real world.

Now, I don’t necessarily advocate trying to develop theories for real-world agents all at once – that is, impose every real-world constraint you can think of, and then try to develop some theory of rationality (or whatever) in this extremely restrictive context.  This could actually block insights that one could otherwise have by just imposing one constraint, and seeing that constraint clearly without having to keep track of every problem at once.  I get that.  You can play Level 1 before you try the final boss.  And a lot of research, in a lot of areas, can work like this – people chip away at small subsets of the problem, subsets that look pathetically insufficient for solving the whole thing, and then eventually it all combines to yield something nontrivial for the full problem.

However.  The danger with this decompose-the-problem approach, in this context, is the potential to produce solutions that address one limitation by asking for even more resources somewhere else.  If you start out assuming there are no limitations, and then relax just one limitation and work in that context, your incentives are to “lean in” to your other unlimited capabilities and use them to simulate the absence of the limitation, like using other senses to compensate for one impaired sense.

I would argue that constructions which “lean in” like this do not usually provide “plausible stepping stones” toward theories that can handle the full range of constraints.  So, if you are working in this kind of way, it is very important to make some argument that your theories do more than just “leaning in” – compensating for the absence of one infinity by embedding something similar inside another infinity.

Note that, for any such theory to be relevant to real-world agents, it has to retain some value when you “truncate” it – that is, when you put bounds on all the resources it leaves unbounded.  After all, in the place we’re trying to get to, all resources are bounded.  So, we are doing something like this, which I’ll call a one-bound approach:

  1. Start out by putting a bound on resource A, but not resources B, C, D, …, Z.
  2. Develop a theory in this context.
  3. Place bounds on B, C, D, …, Z and impose these on the theory in some natural-seeming way.  For instance, if there’s a limit as B –> infinity, replace it with B large but finite.

This inherently involves treating some constraints differently than others.  We’ve treated the constraint on A as fixed through the whole process, while only imposing the other constraints at the end.  We could have (with the same notation fixed) developed a theory for constrained B, then imposed limits on A (etc.) on the end.  Or likewise for constrained C.

Instead of doing a one-bound approach, we could have always done a zero-bound approach:

  1. Don’t put any bounds on resources A, B, C, D, …, Z.
  2. Develop a theory in this context.
  3. Place bounds on A, B, C, D, …, Z and impose these on the theory in some natural-seeming way.  For instance, if there’s a limit as A –> infinity, replace it with A large but finite.

I think MIRI and I both have the same sort of skepticism about zero-bound approaches.  A zero-bound approach allows you to sit around being arbitrarily bad in the real world while you wait for the optimality to turn on.  For example, a glib zero-bound answer to the problem of logical induction would be, “well, it all works out if you’re logically omniscient, so just assign P=0.5 to every logical sentence until deduction weighs in on it; we have a good theory in the limit where deduction has weighed in on every sentence, and your approach converges to it as the number of deduction steps tends to infinity, so you’re fine.”

But the problem statement itself is “what should I do about sentences before deduction weighs in on them,” so this is just ignoring the problem.  To put it another way, if your question is “what do I do with this finite resource?” your standards for a answer are more stringent than “does it perform well as the resource grows to infinity?”  That’s a good first step, in that any solution is likely to have that property, but it doesn’t fully answer the question.

But here’s the thing: one-bound approaches are in danger of failing in exactly the same way, unless you pay close attention to how you are transferring work between limited resources and unlimited ones.

If you start with an idea like “we don’t have time to do all of these deductions exactly,” then you are saying that it is costly to spend time on anything.  So to give a satisfying answer to your actual question, you don’t just have to exhibit a construction which can handle the absence of deduction given an arbitrary amount of time – you have to exhibit something which, cycle for cycle, is a better use of time than the equivalent quantity of deduction.

Both zero-bound approaches and one-bound approaches are trying to get to the same place: non-trivial insights about what to do when all bounds are imposed.  If a one-bound approach reaches its results by assuming you can do arbitrarily many things, just not of a certain kind, then it may not get you any closer to understanding the real case of interest, where inability to do arbitrarily many things of a certain kind is a special, derived case of the inability to do arbitrarily many things, period.

I am, by default, suspicious of one-bound approaches for this reason, and I think people working on one-bound approaches ought to do some work to show that their particular setup, with its choices of what to bound and what not to, is likely to transfer to the fully bounded case.  (That is not just me imposing some arbitrary high bar.  It seems to me that the whole point of these approaches, if there is one, is that they may allow you to make this argument.)

This has been very high-level and removed from specifics.  Now I’ll fill in some specifics.  I’ll focus on the case of logical induction, since it’s the one I’ve looked into the most, and since it is the case where I am most confident MIRI is pursuing a one-bound approach that doesn’t really advance beyond the zero-bound approach they take as a starting point.

Logical induction, as described in the paper, occurs in a context endowed with a “deductive process” D, which (so to speak) proves more and more theorems as an integer time variable tends to infinity.  If we could wait a countably-infinite number of timesteps, we could read all of the deductive results off of D, and there would be no reason to do logical induction.  (Technically, even at this point the logical inductor provides a measure over sentences undecidable by D, but providing such a measure is not the goal of this work.)

So, we are trying to do something useful before a countable infinity of timesteps has elapsed.  Logical induction must do this to get anywhere, because if we allow ourselves to run a countable infinity of compute steps, we can run the whole of D and be done.  The authors are explicit about the fact that they are trying to construct something which will weigh in on sentences (in some way) before D has time to get to them, i.e. before countably-infinite time:

The reasoner is given access to a slow deductive process that emits theorems over time, and tasked with assigning probabilities in a manner that outpaces deduction […] 

We are interested in the question of what it means for reasoners to assign “reasonable probabilities” to statements of logic. Roughly speaking, we will imagine reasoners that have access to some formal deductive process, such as a community of mathematicians who submit machine-checked proofs to an official curated database. We will study reasoners that “outpace” this deductive process, e.g., by assigning high probabilities to conjectures that will eventually be proven, and low probabilities to conjectures that will eventually be disproven, well before the relevant proofs are actually found.

The authors do not assume access to the final results of D at the end of time, because that would assume away the whole problem.  So, in that sense, this is a one-bound approach: we are bounded by our lack of access to the full set of all theorems that will ever be proved.  But this is our only bound.  We are allowed access to a remarkable plenitude of other resources:

  • Given any finite number of timesteps N, we are allowed to say “no, we’re not done” at N, and continue for arbitrarily many more timesteps.  That is, even if we do not run for an actual infinity of time, we can run for a potential infinity, in that the user cannot specify any N (no matter how large) and demand any results at that time.  For example, you can satisfy the logical induction criterion even while setting P=0 for all sentences until any finite time N.
  • We are allowed to run for an arbitrarily large number of steps N in the strong sense that, although our algorithm is computable, determining the first N for which you get some desired property is not computable.
  • Whenever we say we are allowed to run for N steps, this means we are allowed to run pure deduction for N steps, and also our inductive algorithm for N steps.  The results of this parallel process can (approximately) achieve some things which deduction alone could only achieve in M > N steps.  But running deduction alone for M steps is cheaper than running the parallel process for M steps.  We do not impose any constraint that would let us quantify whether it is better to run pure deduction for M steps, or to run the parallel process for M steps.
  • We are allowed to run for more time than the agents who, for the sake of an “approximate Dutch book criterion,” we consider trying to Dutch book us.  That is, our algorithm derives its properties from its assurances against Dutch book strategies in some complexity class (say polytime).  But our algorithm can run slower than this.  So, in the time it takes for our algorithm to run to some point, someone could also run an algorithm of the same complexity to Dutch book us, and we would have no assurance against this.  If we defend ourselves by saying that bookies this slow are impractical, then by the same token, our algorithm is impractical.
  • Although our goal is to produce probabilistic confidences for as-yet-unproven theorems, our probabilities do not have to be usable for decision theory at any finite time N.  (That is, they do not have to be usable until D is finished, at which point they are not needed.) Our probabilities do not have to sum to 1, and can be exploited to pump money from us.  The only restriction on our ability to be money pumped is that we must avoid some types of money pumps that take an infinite amount of time.  (Even those may be able to exploit us if they are allowed to run as slowly as we do, rather than more quickly.)

I hope it is intuitively clear what I mean when I say this has room to “lean in” to the remaining unrestricted resources.  The original question here was about what to do when you do not have enough resources to just run your deductive process D until you know deductively whether a sentence is true or false.  We can recognize, in the LI setup, a constraint originating in this question: we aren’t allowed to just run D forever and inspect the results.  But we are allowed vast resources of various kinds which could be pumped into running D further.

Given infinite cash and the rule “spend this on anything but deduction,” we can buy things that might be just as good as deduction.  But what we care about is where to spend the marginal dollar.

Recall my description of a one-bound approach: you bound A but nothing else, do your work, and then bound everything else at the end.  The one question I want to see answered, whenever I see a one-bound approach, is: is it better to spent N resource units on this, rather than on A?

We know we can’t spend infinity on A, but we can spend some, and it might be better than allocating those resources to a proxy for A which works nicely when you are A-poor but B-rich, C-rich, …, Z-rich.

To quote one of the external paper reviewers from OpenPhil’s review of MIRI:

As usual, online learning of course offers a powerful set of tools for addressing prediction problems, but online learning can only do so much. In particular, one can push online learning towards philosophy (lacking any practical relevance), e.g., by using infinite expert sets and unbounded computational power, as done here. But it is rather intriguing if not self-defeating if this is done in the context of trying to answer a question that is derived from the lack of sufficient computational resources!

What worries me is not just that MIRI is pursuing one-bound approaches.  Maybe those are all you can do.

What worries me is that MIRI is so punctilious about developing the theory for the case where you are A-poor but B-rich, C-rich, …, Z-rich, and then so glib and brief about the idea that things will work out when you put all the other constraints back.  For instance, from the LI paper:

Logical inductors are far from efficient, but they do raise an interesting empirical question. While the theoretically ideal ensemble method (the universal semimeasure [Li and Vitányi 1993]) is uncomputable, practical ensemble methods often make very good predictions about their environments. It is therefore plausible that practical logical induction-inspired approaches could manage logical uncertainty well in practice. Imagine we pick some limited domain of reasoning, and a collection of constant- and linear-time traders. Imagine we use standard approximation methods (such as gradient descent) to find approximately-stable market prices that aggregate knowledge from those traders. Given sufficient insight and tweaking, would the resulting algorithm be good at learning to respect logical patterns in practice? This is an empirical question, and it remains to be tested.

Or from the reflective oracles “grain of truth” paper:

Our solution to the grain of truth problem is purely theoretical. However, Theorem 6 shows that our class M^O_refl allows for computable approximations. This suggests that practical approaches can be derived from this result, and reflective oracles have already seen applications in one-shot games [FTC15b].

Constraint A is all-important, and we have made an advanced in the theory of A-bounded rationality!  But constraints B, C, …, Z, eh, they’ll work themselves out.  (Now, in the next paper: an advance in B-constrained rationality!  And so on.)

raginrayguns

replied to your post

“The new MIRI blog post on “Embedded World-Models” says some of the…”

I was writing logical uncertainty posts on lesswrong in 2014
2014 is also when Christiano put “Non-Omniscience, Probabilistic Inference, and Metamathematics” online

Yeah, I know they’ve been interested in logical uncertainty for quite a while.  I think I meant that the … framing (?) is different in the new post.

Maybe it’s just that they’re talking about problems rather than solutions, and describing the problems they’re working on in a way that makes them sound really far from being solved or understood.  I interpreted that as pessimism about how much their existing work does to resolve the problems, but that may be a misreading – maybe the intended message is “here are the problems we’ve been working on for the last few years (by the way, we solved them!).”  Although that would really bury the lede, so I dunno.


Incidentally, reading that post led me to have some interesting interactions on LW.  I made a comment on that post, then decided to turn it into my own post, because I was kinda proud of how I was able to crystalize a certain objection I had to AIXI, LIA, etc. – essentially, that tell you to search over a set that already contains everything you can possibly do as an element, so you can’t do the search.  (And if you can do anything approximating the search, that’s just an element in the set, so “you should do the search!” doesn’t tell you why it’s good.)

And someone pointed to me to the MIRI papers on “reflective oracles,” which I hadn’t looked into.  (Even though a reflective oracle paper got highly positive external reviewer commenters in the OpenPhil review of MIRI, unlike all the other papers reviewed.)  As it turns out, these papers were motivated, basically, by a search for some mathematical context in which my argument fails.  The papers frame it as being about game theory – what kind of player could actually contain a perfect model of another player as complex as itself? – but in the terms of my argument, this is asking, “what kind of search could contain itself as an element of the search space?”

And they succeed at coming up with such a thing, by doing something that’s almost “giving Turing machines oracles that tell them what Turing machines will do,” but with some probabilistic stuff that blocks self-reference paradoxes and doesn’t let them solve the halting problem.

It’s all very cool and trippy, but it ends in the same weird way that the Logical Induction paper did, where after constructing this apparently very delicate thing and working hard to prove that it has some very specific properties, they basically say “you can approximate it on a computer, maybe that’ll work!”, without talking at any level of rigor about which properties we might expect to transfer over to an approximation.  And this is probably the thing about MIRI’s work that I find most confusing: it seems like their preferred approach to solving practical problems is to 

  1. clearly articulate the practical problems
  2. ascend to a mystical Platonic world where ten eternities pass in the blink of an meta-eye, your breakfast eats you eating it in a infinite yet somehow terminating loop, etc.
  3. carefully prove that in this world, you wouldn’t have the practical problems, although you might have others
  4. return to reality and say, “eh, take one of the infinity signs in there somewhere and replace it with like ‘1000′ or something.  it could work, who knows??”

The new MIRI blog post on “Embedded World-Models” says some of the same things I (among various others) been saying for a long time about the problems with standard Bayesian rationality.

Not sure what to make of that – did they change their minds about this stuff?  Were they always closer to my position than to the people who would argue against me when I said stuff like (direct quotes from the post follow):

Imagine a computer science theory person who is having a disagreement with a programmer. The theory person is making use of an abstract model. The programmer is complaining that the abstract model isn’t something you would ever run, because it is computationally intractable. The theory person responds that the point isn’t to ever run it. Rather, the point is to understand some phenomenon which will also be relevant to more tractable things which you would want to run.

I bring this up in order to emphasize that my perspective is a lot more like the theory person’s. I’m not talking about AIXI to say “AIXI is an idealization you can’t run”. The answers to the puzzles I’m pointing at don’t need to run. I just want to understand some phenomena.

However, sometimes a thing that makes some theoretical models less tractable also makes that model too different from the phenomenon we’re interested in.

The way AIXI wins games is by assuming we can do true Bayesian updating over a hypothesis space, assuming the world is in our hypothesis space, etc. So it can tell us something about the aspect of realistic agency that’s approximately doing Bayesian updating over an approximately-good-enough hypothesis space. But embedded agents don’t just need approximate solutions to that problem; they need to solve several problems that are different in kind from that problem.

[…]

Uncertainty about the consequences of your beliefs is logical uncertainty. In this case, the agent might be empirically certain of a unique mathematical description pinpointing which universe she’s in, while being logically uncertain of most consequences of that description.

Logic and probability theory are two great triumphs in the codification of rational thought. However, the two don’t work together as well as one might think.

Probability is like a scale, with worlds as weights. An observation eliminates some of the possible worlds, removing weights and shifting the balance of beliefs.

Logic is like a tree, growing from the seed of axioms. For real-world agents, the process of growth is never complete; you never know all the consequences of each belief.

Not knowing the consequences of a belief is like not knowing where to place the weights on the scales of probability. If we put weights in both places until a proof rules one out, the beliefs just oscillate forever rather than doing anything useful.

This forces us to grapple directly with the problem of a world that’s larger than the agent. We want some notion of boundedly rational beliefs about uncertain consequences; but any computable beliefs about logic must have left out something, since the tree will grow larger than any container.

[…]

In a traditional Bayesian framework, “learning” means Bayesian updating. But as we noted, Bayesian updating requires that the agent start out large enough to consider a bunch of ways the world can be, and learn by ruling some of these out.

Embedded agents need resource-limited, logically uncertain updates, which don’t work like this.

the perverse paradise of HRAD

I’ve been reading some MIRI / Agent Foundations stuff over the last few days, and I’ve gotten the following impression – probably nothing new here, but perhaps this will be in slightly sharper relief than earlier posts I’ve made on the same topic.

MIRI’s “Highly Reliable Agent Design” (HRAD) research program is founded in the idea that we need to state impractical (or non-practical) ideals about reasoning first, so we know what real AI programs should be aiming for, before we go on to judge those real programs.  This sounds reasonable on the face of it, but has led this research into a cul-de-sac: it now consists mostly of technical work on the problems with its own chosen ideals, problems which largely arise from the very idealizations that separate the ideals from practical programs.

This research is fundamentally about the weirdness that can arise when you try to talk about reasoning or computation while being deliberately casual about requiring these processes to do useful work in any specifiable situation, as opposed to in an abstract paradise where they are allowed resources that are, in one or more ways, infinite or impossible.  This casualness gives rise to some unique, trippy problems that can only be problematic in the abstract paradise, not “in practice,” but nonetheless provide one with plenty of hard work if one wants it.  Moreover, while creating these “theoretical-only” problems, the vast resources of the abstract paradise solve all of the usual problems of real-world inference by themselves.  One is left with a perverse focus on only the problems that do not arise in practice, like a textbook on “the mathematics of classical mechanics” which describes the Weierstrass function but never defines the derivative.

That post about HRAD that I approvingly linked provides some good outside view arguments, but I think this more inside-view stuff is equally damning if not more so.

Specifically, the ideals in HRAD tend to suspend requirements like:

  1. Your process must be computable.
  2. Your process must finish in some finite (even if impractically large) amount of time.
  3. Your process must produce usable results after seeing only some finite (even if impractically large) quantity of data.

Suspending these requirements tends to enable brute force search over very large spaces of programs or strategies.  Solomonoff suspends 1+2 (but not 3) and searches over all programs to explain a finite data sample.  Logical inductors suspend 3 and sort of 2 (but not 1) and search over all poly-time programs, obtaining useless results (with no e.g. coherence guarantees) at finite times and useful results only as the data set becomes infinite.

Even at its best, there is a fundamental vacuity to this sort of thing.  When you can consult all programs, or all practical programs, all of the reasoning has been outsourced and you are left playing a sort of managerial role over reasoners.  These methods do encode a few ideas about Occam’s Razor or logical uncertainty, but otherwise feel like being told “the best strategy is to find the best strategy, and then do what it says.”

In particular, I personally suspect that any sensible theory of “intelligence” or “good reasoning” will be a theory of grappling with resource constraints.  It’s only due to resource constraints that one might do “smart” things like constructing concepts, i.e. picking out certain clusters in your observations that make for especially good nodes in simplified causal models of the world (or something like that).  The HRAD ideals either outsource work like this to their program set (as in logical inductors), or do away with it entirely.  The latter is true of Solomonoff, which (if the universe is computable) can access an exact copy of the universe and simply crib its predictions from the actual future.

That last point leads us into the other side of the coin, that the HRAD ideals introduce “only-in-theory” problems.  The paradise can provide so many resources that the plenitude begins to defeat itself in bizarre ways; if you have access to entire universes, their inhabitants may be able to infer that you have this access, and use it to mess with you.  This is, of course, only a problem if the ideal is achievable in reality, which it isn’t, which would then seem to suggest it isn’t a problem after all, even for the ideal – but regardless of how these things cash out, we are still worrying over a problem that arises not just from “impractical assumptions,” but from a deliberate suspension of reality itself.

That is, these problems are unique to situations which literally cannot arise – to actual, rather than potential, infinities.  The goal of HRAD is to derive an ideal theory assuming the actual infinities, then relax to approximate results in the potential infinity case (i.e. given finite, but extremely or arbitrarily large, resources).  But the actual infinities create problems that the potential infinities do not, and so the research time allocated to HRAD is currently dedicated to solving hard problems in an impossible world, for the sake of transferring the end results to easier problems in possible worlds.  This strikes me as, well … perverse.

My current thoughts on MIRI's "highly reliable agent design" work - Effective Altruism Forum →

nostalgebraist:

This is from 2017, but I only just read it, and it did a lot to clarify for me why MIRI thinks work like Logical Induction is relevant to AI.  Also does a good job crystalizing the reasons I (and the author) find this unpersuasive.

This in particular is really good:

I understand HRAD work as aiming to describe basic aspects of reasoning and decision-making in a complete, principled, and theoretically satisfying way, and ideally to have arguments that no other description is more satisfying. I’ll refer to this as a “complete axiomatic approach,” meaning that an end result of HRAD-style research on some aspect of reasoning would be a set of axioms that completely describe that aspect and that are chosen for their intrinsic desirability or for the desirability of the properties they entail. This property of HRAD work is the source of several of my reservations:

  • I haven’t found any instances of complete axiomatic descriptions of AI systems being used to mitigate problems in those systems (e.g. to predict, postdict, explain, or fix them) or to design those systems in a way that avoids problems they’d otherwise face. AIXI and Solomonoff induction are particularly strong examples of work that is very close to HRAD, but don’t seem to have been applicable to real AI systems. While I think the most likely explanation for this lack of precedent is that complete axiomatic description is not a very promising approach, it could be that not enough effort has been spent in this direction for contingent reasons; I think that attempts at this would be very informative about HRAD’s expected usefulness, and seem like the most likely way that I’ll increase my credence in HRAD’s future applicability. (Two very accomplished machine learning researchers have told me that AIXI is a useful source of inspiration for their work; I think it’s plausible that e.g. logical uncertainty could serve a similar role, but this is a much weaker case for HRAD than the one I understand MIRI as making.) If HRAD work were likely to be applicable to advanced AI systems, it seems likely to me that some complete axiomatic descriptions (or early HRAD results) should be applicable to current AI systems, especially if advanced AI systems are similar to today’s.
  • From conversations with researchers and from my own familiarity with the literature, my understanding is that it would be extremely difficult to relate today’s cutting-edge AI systems to complete axiomatic descriptions. It seems to me that very few researchers think this approach is promising relative to other kinds of theory work, and that when researchers have tried to describe modern machine learning methods in this way, their work has generally not been very successful (compared to other theoretical and experimental work) in increasing researchers’ understanding of the AI systems they are developing.
  • It seems plausible that the kinds of axiomatic descriptions that HRAD work could produce would be too taxing to be usefully applied to any practical AI system. HRAD results would have to be applied to actual AI systems via theoretically satisfying approximation methods, and it seems plausible that this will not be possible (or that the approximation methods will not preserve most of the desirable properties entailed by the axiomatic descriptions). I haven’t gathered evidence about this question.
  • It seems plausible that the conceptual framework and axioms chosen during HRAD work will be very different from the conceptual framework that would best describe how early advanced AI systems work. In theory, it may be possible to describe a recurrent neural network learning to predict future inputs as a particular approximation of Solomonoff induction, but in practice the differences in conceptual framework may be significant enough that this description would not actually be useful for understanding how neural networks work or how they might fail.