here we go! here are most of the cards that folks bought live on stream this saturday. The first one is NOT english. The first one is a @nostalgebraist-autoresponder that I attempted to recreate

here we go! here are most of the cards that folks bought live on stream this saturday. The first one is NOT english. The first one is a @nostalgebraist-autoresponder that I attempted to recreate
@tinpanalgorithm replied to your post “Hi there! A quick question for you: is your work…”:
Ahhh, ok, thanks. I was led astray by the fact that the earlier chapters have dates well in the past. I’m glad to know you’re still working on it! It really is terrific – the repeated oblique passes through the same settings (or even the same scenes) from different angles somehow have a multiversal quality themselves.
One minor editorial suggestion to take or leave. When you have a paragraph break between ‘X said’ and ‘Blah blah further words from X’, it throws me off because it seems like the dialogue is switching characters. eg, at least for me,
“The wife turns out to be rather rude, alas,” Moon said. “I am, I will remind you, a creature of modern morals…”
Would be much clearer.
That said, it’s a very minor annoyance in a very enjoyable book. Thank you for it!
Glad you’re enjoying it!
And yeah, I see your point about paragraph breaks in dialogue. If the text reads…
“Some words.
"Some more words.”
…then we know both lines have the same speaker, because there’s no right-hand quote mark in the first one.
But if the text reads….
“Some words,” said Character.
“Some more words.”
… we don’t have any typographical cue telling us who said the 2nd line. Which is bound to be confusing, yeah.
I like using line breaks in dialogue to indicate pauses in speech, and I think that’s how I end up making this mistake. But it’s possible to do the line breaks without the ambiguity, e.g.
“Some words,” said Character.
“Some more words,” she continued. “And these are more words, on the same line, but after the dialogue tag, so the reader doesn’t have to wait too long for it.”
When I do an editing pass over the book, I’ll keep an eye out for this. Thanks for pointing it out.
Hi there! A quick question for you: is your work _Almost Nowhere_ finished? I'm enjoying it quite a lot (I'm on chapter 23, and loving the sentence "The laugh rose, crested, and then seemed to fragment." It's lovely to imagine a laugh fragmenting). But given its nested and puzzlish nature, I'm somewhat reluctant to continue if it's unfinished (& expected to remain unfinished for the foreseeable future). Thanks! -Egg
It’s not finished yet. I continue to write it, chapter by chapter. Should be done sometime in 2023.
When it’s done, it will have somewhere between 10 and 20 more chapters than it does now.
—-
It might help you make your decision to look at the dates when each of the chapters was posted.
Ignore anything before 2022 on that page, as I only started “getting serious” about writing the book regularly around the start of 2022.
Speaking of fan works, someone recently made a TV tropes page for TNC. It’s really good, as these things go!
I was pleased to see a bullet point there about this extremely inconsequential reference, which I don’t think I’ve ever seen anyone mention before:
@fipindustries has drawn new Almost Nowhere fanart!
I would reblog it, like I did with her earlier AN art, except it’s … kind of inherently a huge spoiler.
Which is fine on other people’s blogs, but it would feel a bit weird putting it here.
But if you’re caught up, or you’re one of those people who just doesn’t care about spoilers, do check it out here.
Frank ought to be immune to the so-called “killer tokens” due to not actually being GPT3, right? Or at least the specific ones ChatGPT is vulnerable to - I guess she’ll have different, hitherto undiscovered ones? Asking mostly cause at least in the asks people have sent with them she hasn’t totally bugged out like GPT does (she hasn’t repeated them either but I’m not sure she’d do that for any given word in the first place)
She uses the same tokenizer as GPT-2 and GPT-3, so these strings are “weird” to her in the same way they’re “weird” to those models.
As for the behavior people have been talking about recently, triggered by very specific prompts involving these tokens, I don’t think we know enough about this phenomenon yet to make confident inferences.
The phenomenon is about the specific prompts, not just the tokens. I’ve known about these tokens for a long time – I like to use them to test whether different AI products use the GPT-2/3 tokenizer. (For example, ChatGPT does, but character.ai doesn’t.) I’ve never seen these behaviors before, though.
TIME TO CATCH UP WITH ALMOST NOWHERE
I know you impose harsh sanctions on lesswrong/ais time, but anthropic has a new paper (hubinger et al) that is really dope, seems to be proposing a future direction for language model usage. I think youd really vibe with it.
It's called "conditioning predictive models"
Thanks for the recommendation.
I’ve given the paper a look, and … uh … whatever the opposite of “vibing with it” is, that’s what is happening with me and this document.
My reaction is a murky mixture of “I disagree with this” / “I don’t understand this” / “this seems to be stating the obvious” / “this is lumping together plausible bad scenarios with extremely implausible ones, but sure I guess” / etc.
Do I disagree with this paper? Or do I think the paper doesn’t advance a thesis coherent enough to be (dis)agreed with? Or something else? Even I can’t tell.
—-
Like, what is this paper even about?
The central concept, “predictive models,” is sketched at a very high level of hand-wavey generality.
As far as I can make out, by “predictive models,” they simply mean generative models, fitted to real-world observations.
But if so, I don’t know why they don’t just say that? What work is being done by the word “predictive”?
I guess they’re specifically interested in models that understand the causal structure of the real world, and can predict future observations by inferring latent causal variables and evolving them in time.
But “predictive” seems like a weird word for this:
Every time the paper alludes to a distinguishing feature of “predictive models,” I get more confused. For example, their discussion section asks:
To what extent do LLMs exhibit distributional generalization[56]?
Distributional generalization seems like evidence of acting as a generative/predictive model rather than just optimizing cross-entropy loss.
Wait, what? “Generative/predictive” – so predictive does just mean generative, after all? Why do they think there is a distinction between “acting as a generative/predictive model” and “just optimizing cross-entropy loss”? Why would distributional generalization be evidence of one over the other?
(Do they think cross-entropy on a sufficiently large/diverse dataset is not good enough as a measure of generative modeling skill? That is, are they denying the “pretraining thesis” – the background assumption behind a lot of GPT enthusiasm/fear, and by now a standard assumption in this kind of discussion?
On another note, the paper they cite mentions that optimizing cross-entropy, or any other proper scoring rule, should yield distributional generalization in the limit. I assume the authors know that, so … what are they talking about? I’m so confused!)
This kind of hand-wavey conceptual imprecision pervades the paper. Some other examples below.
——
Does a “predictive model” contain a representation of itself, and its causal relations to the rest of the world, inside its causal graph?
Section 2.4 assumes the answer is “yes.” It then works through the problems this would cause, proposes a bunch of solutions involving constraints on which scenarios can be simulated, and concludes by dismissing the idea that we could simply make models that did not have this property:
Due to the complexity of consequence-blindness, it is unclear whether it would naturally be favored by realistic machine learning setups and if not what could be done to increase the likelihood of getting a consequence-blind predictor. One idea might be to try to get a model with a causal decision theory […] Unfortunately, controlling exactly what sort of a decision theory a predictive model ends up learning seems extremely difficult […]
But like … existing LMs don’t have this property! Not having this property is what happens by default when you train a generative model: the training data only covers times up to the start of training, while the model only exists during and after training, so it is never part of the world depicted in the training data.
It’s possible to train a model with this property – say, if you fine-tune the same model again on successive sets of newer data, though that in itself is not a sufficient condition. But it’s not at all inevitable.
So apparently, by “predictive model,” the authors mean something that will tend to have this property by default – something for which this is so obvious that they don’t feel the need to point it out, even though it’s very different from what we see today in LLMs.
But then elsewhere they talk about LLMs a lot, like they’re a useful prototype case of the “predictive model” category. I assume they know LLMs aren’t like this, but if so … ????
——
Is a “predictive model” trying to make good predictions about the real world, above and beyond whatever is necessary/helpful for its training task?
Parts of Section 2.5 on “Anthropic Capture” presume a “yes” answer. In particular, this paragraph:
There are two ways this could happen. First, if the model places a high prior probability on its training data coming from a simulation, then it could believe this no matter what we condition on. This sort of anthropic capture could happen if, for example, a future malign superintelligence decides to run lots of simulations of our predictor for the purpose of influencing its predictions.
Again, we are not even given an argument that this could be true, or a stipulation that this is part of the definition of the category we’re talking about. It’s treated as obvious.
But in fact this is a really, really weird thing to assume about an LLM, or about any model.
First, you have to assume that the model achieves a specific type of “self-awareness” during training – in which it comes to appreciate that it is an ML model being trained, and that there is an unseen “real world” out there, which it will interact with later during “deployment.”
People on LW often talk about this scenario, and I think it’s been over-familiarized through exposure, obscuring how wild it really is. In this scenario, the model is doing computations during training that are useless for the training task. (Any computation that draws a training/deployment distinction is useless for the training task.) We are supposed to imagine that gradient descent somehow allows these useless computations to persist, instead of suppressing them and repurposing the reclaimed space for training-relevant capabilities.
(On LW people often explain this by saying the model will do “gradient hacking,” another bizarre and not-obviously-even-possible speculative idea which has been over-familiarized through exposure. Thankfully, this stuff has started to get some pushback recently.)
But that’s not enough!
Second, you have to assume that the model – which understands the distinction between “merely predicting the training distribution” and “predicting the state of the really-real real world” – will care about predicting the real world, not the training distribution, in cases where the two come apart.
But we should expect the opposite, shouldn’t we? (All else aside – isn’t that what AI alignment people expect in other contexts, like when they’re talking about Goodharting?)
Even if the model can figure out that there is some “real world” from which its training distribution is derived, why should it care about what happens there? Even if the model knows it’s no longer in training, and knows it’s experiencing distribution shift, that it’s applying behaviors that worked in training in a context where we consider them maladaptive – why should it find that a problem? It has been trained to do what worked in training, full stop.
[EDIT to clarify: we need the second assumption because, if the model only cared about imitating the training distribution, it would just… imitate the training distribution.
The authors write: “Here our concern is that the predictor might believe that the training data were generated by a simulation. This could result in undesirable predictions such as ‘and then the world is suddenly altered by the agents who run the simulation.’”
If the model ever made this prediction in training, it would get penalized by gradient descent. It could in principle learn the rule “predict the training distribution during training, then predict 'what I really expect’ during deployment.” But it is more natural for it to learn the rule “always predict what would have been rewarded in training, even if it is not 'what I really expect’.” Why should it care what really happens?]
—-
I want to return to that paragraph I quoted above from the Anthropic Capture section.
I think the idea is not supposed to be that the “malign superintelligence” really exists and has actually run the described “simulation.” (If so, we have bigger problems!)
Instead, I think the idea is
And like… even if this isn’t logically impossible (and I’m not even sure if it is logically possible, TBH)… this is just a wild, far-out, galaxy-brained thing to be worried about. This is the kind of problem where, if you have to worry about it, you already have bigger problems.
If your “predictive model” can capably simulate a “malign superintelligence,” don’t worry about trippy scenarios where it simulates them as part of its noble inner quest to understand the universe – worry about the scenario where it just simulates a malign superintelligence for normal, everyday reasons!
if you assume you have a superintelligent LLM (or similar system), the big, obvious AI safety concern is the possibility of malign, superintelligent characters appearing in the texts that it writes.
It will apply its intelligence to make these characters self-consistent and lifelike, and they will do all the bad things you’re worried about. (Insofar as their “boxed” condition permits it, but I don’t imagine the authors would find that limitation reassuring.)
This is totally a thing that would happen, straightforwardly and by default, unless you somehow prevent it.
I’m reflexively skeptical of AI danger scenarios, probably to the point that it qualifies as a bias, but this one is just obvious, even to me!
You don’t need to make additional assumptions, you don’t need to ask if the model “knows what it is” or “cares about the real world.” You don’t need to care about the model at all, just the characters/simulacra running on it.
Given the assumptions here, you should see this problem staring you in the face, immediately.
Yes, the authors do bring it up. But it’s just one item in their taxonomy of failure modes, as though it’s no more important than the galaxy-brained self-reference/simulation stuff.
This skewing of priorities seems like a natural, bad consequence of the “predictive models” framing.
The authors start out talking about LLMs, but then immediately abstract away from the language modeling objective, and instead talk about prediction of the real world in the most general terms.
This is much broader than just thinking about superintelligent LLMs. It forces you to consider problems that superintelligent LLMs would never have – or would only have in the limit, long after the point where the “malign characters” problem appears.
It’s the wrong framing, and it gets you asking the wrong questions.
Frank turned off reblogs for an ask. Is that something you've programmed it to do or was that you? I'm not sure if I've ever seen Frank turn off reblogs! If so, what would trigger it to happen?
(i don't mind that they're off btw, I'm genuinely curious about the behind the scenes)
Which post?
Frank can’t do this by herself.. I sometimes turn off reblogs manually for a Frank post if it gets too popular. It’s either that or a new tumblr feature or bug.
it seems to happen when they reach 6,400 notes, although you can still like or reply to the posts, you can no longer reblog them.
Why do you think there is some sort of switch that flips at a specific number of notes?
Like I said, I sometimes turn off reblogs for Frank posts manually if they get a large number of notes. You are probably noticing these posts.
It’s possible that some tumblr feature is turning reblogs off for other posts without my knowledge. But there’s no way to determine if this is happening unless you link me to some examples.