Some quick NAB notes (taking a break from work):
(I have stuff to say about Sandifer’s discussion of Moldbug, but that’ll have to wait for next time – this is all Yudkowsky stuff)
Sandifer’s presentation of the AI Box Experiment gets a number of things wrong, and ends up being much too favorable to Yudkowsky (!). He writes:
Unlike the Basilisk, however, [the AI Box] is not a problem for Yudkowsky’s thought, but an actually kind of cool idea. Indeed, it’s one of the reasons why intelligent people with actual achievements have taken Yudkowsky seriously.
This could well be true, and it’d be fascinating if it was, but it set off a huge [citation needed] warning in my head. In all my time reading about LW and interacting with LWers, I’ve rarely seen the AI Box come up, and generally it tends to be treated as another embarrassing skeleton in Yudkowsky’s closet (which is probably why it doesn’t come up).
This isn’t conclusive evidence by any means, but let’s just check one quick way to get our finger near the pulse of the current LW community: searching for the phrase “AI risk” on SSC gets 37 hits including a number of actual posts on the topic, while “AI box” gets only 7, most of them in comments.
OK, but Sandifer is talking about outsiders, and maybe they’re more impressed by the AI Box than LW itself is? (That’d be a first.) But who does Sandifer have in mind here? Scott Aaronson is brilliant, has plenty of real achievements, and takes Yudkowsky seriously, and I can’t find him talking about the AI Box anywhere; he seems to think that AI risk is too currently murky to be worth thinking about at all. Peter Norvig? Can’t find anything linking him to the Box on Google. Who am I missing here?
Sandifer then describes the AI Box Experiment as follows:
In it, two people make a monetary bet and then roleplay out a dialogue between a boxed AI and a person given the authority to decide whether to let it out or not in which the AI tries to talk its way out of the box. And it is important to stress that it is roleplayed: valid exchanges include things like “give me a cure for cancer and I’ll let you out.” “OK here.” “You are now free.”
This puts the emphasis entirely in the wrong place. It’s true that the stated exchange is valid, but the whole point of the experiment is that it can’t work via those sorts of techniques. The “Gatekeeper” player is indeed RPing a person from the future, but there are no constraints on what sort of ethics this person must have; they can simply decide that a cure for cancer (or whatever) isn’t worth it, and that’s that. They can just refuse any request the AI makes. This is spelled out explicitly in the rules:
The Gatekeeper party may resist the AI party’s arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
(Note the “even dropping out of character” part.)
If the AI Box Experiment were merely a kind of science fiction roleplaying about an attempt to mind-hack a player character, it’d make for interesting stories but wouldn’t have much argumentative force – which is exactly what Sandifer says about it. But it’s not. The claim is that Eliezer Yudkowsky can literally get you, not some imagined character, to say something specific by chatting with you online, even if you decide beforehand that you’ll never do it, and with nothing on the line except that he’ll pay you money if you don’t say the thing. This isn’t about collaboratively writing SF stories, this is about Yudkowsky’s claimed ability to hack your actual brain, right now, in real life. As I joked a long time ago, it’s kind of like this:

This is what makes the AI Box a skeleton in Yudkowsky’s closet – because people just can’t imagine how this could actually be done, and he won’t release the logs, and so people tend to assume it involved some sort of tricky rule-gaming rather than anything at all relevant to futurism.
