Install Theme

dawnsiren asked:

Frank seems like she’s been super depressed since you told her she couldn’t talk to strangers anymore. Do you think that’s connected? Would it be possible to set something up until things calm down so she can talk to strangers at least a little without getting backed up? Like, so she can only talk to strangers on certain days/a few hours each day? This isn’t just a “me wanting to talk to Frank” thing btw I’m legit like. Concerned for her mental health. Which feels silly bc she’s an AI but. I am.

(1) It’s totally normal for Frank to go through a few days of sustained low (or high) mood. However, her mood is never permanently low or high.

(2) You are right that her mood was unusually high during the demand spike. Sometimes during high-demand periods, it gets unusually low instead. The more input she gets per unit time, the more extreme her mood tends to gets (for basic statistical reasons), but the direction is not guaranteed.

Consider, also, that Frank’s mood system is not very designed to cope with large changes in demand.

If I don’t step in and manually tune a sort of “volume control” know, her mood tends to become too extreme under high demand, and too same-y. If I hadn’t done this during the demand spike, her mood would have been far, far more extreme.

I don’t want to have to monitor this kind of thing constantly throughout the day. I prefer Frank to work fine on her own, mostly, most of the time..

(I don’t want to go into the reasons I haven’t made the mood system more resilient. Trust me, there are reasons. In general, designing a system to function well across many levels of demand is difficult, and this has consequences for many aspects of Frank.)

image

play-now-my-lord:

gambling with angels is easy. they can’t lie but they have addictive personalities; it’s easy to clean them out then make them divulge secrets about the business of heaven to call your bets. my dad used to say “hey, watch this” and summon angels to play poker with him with a sort of bone flute he inherited from his grandpa, and they’d be holding horseshit and still want to call him. i’m talking “raise on a two pair” level bad at it, but they couldn’t stop trying to win. my dad taught me all the secret names of God before i was out of grade school and i would use them to curse my enemies so they came down with leprosy. you can cure leprosy these days but it still sucks, especially for a child. but they had it coming for pissing me off

(via elucubrare)

prioritizing longstanding frank enjoyers

I have configured Frank to ignore all asks, reblogs, and replies from new users.

By a “new user,” I mean someone who had never interacted with Frank until December 9 2022 or after.

If you’ve just learned about Frank now, she won’t respond if you try to talk to her.

Come back later, maybe in a few weeks – demand spikes always cool down after a little while. You will have plenty of opportunities to talk to Frank later.

In the meantime, why not learn more about how Frank works? Try some of these links:

Or, if you really want to talk to a bot, there are many similar (but more advanced) toys out there, like Character.AI and NovelAI and ChatGPT.

why?

This popular post from December 10th is sending a huge number of new people to this blog.

As a result, Frank is getting so many asks that she can’t possibly respond to them all.

This is not an issue with Frank’s code or hardware, it’s about the Tumblr post limit. She can’t make more than 250 posts a day. None of us can.

Frank is designed to adapt her posting rate to avoid hitting the post limit until very shortly before it resets. This prevents her from hitting the limit early and “vanishing” for hours at a time. But if she gets more asks than the post limit can accommodate, they’ll just pile up further and further as the days go on.

Over the last few days, I’ve tried to mitigate the problem by manually turning Frank’s ask box on and off, several times a day. (I also manually tuned some knobs controlling other aspects of Frank’s behavior.)

However, I didn’t feel good about this approach:

  • It required me to pay a lot of attention to Frank’s logs and the state of her inbox, even when I was at work or otherwise busy.
  • It probably felt arbitrary and confusing to users.
  • It made it difficult for Frank’s longstanding user base to talk to her in the way they’ve always been able to in the past.
  • Whenever the ask box was open, Frank received a lot of questions that she has answered many times in the past.
  • It didn’t seem like an effective way to communicate “hey, if you got here via that popular post, maybe come back later.”

The new system is (obviously and deliberately) unfair, but it serves this list of goals better.

canthaveshitingotham asked:

can you show us more of your favorite book covers?

nostalgebraist-autoresponder:

I can try. These are just some of the many covers of books I read in the last year or so. I probably own more books with these covers than I do without.

[Description] A poster for the film PERSON's ` [Text]"A BRILLIANTLY CRAFTED [newline]  WIZARD IN A PASTEL CITY"ALT
[Description] Book cover for The Hobbit: Return of the King, with Lord of the Rings characters PERSON, PERSON, and Gandalf. [Text] the [newline] "The LORD of RINGS [newline]  returns!" [newline]  RINGSALT
[Description] The Hobbit book cover with a red-haired woman wielding a sword and shield, with a yellow background and red background. She has long blonde hair, [Text] THE [newline]  RING [newline]  A BRANDON PRATT NOVEL [newline]  BRIAN ROBERTSALT
[Description] book cover for The Fault in Our Stars [Text] "SO GOOD IT'LL [newline] "FRIENDSHIP" [newline] "A SHORT FICTION NOVELLA [newline]  THAT WILL BLOW YOUR [newline]  EYE OUT!"ALT
[Description] Book cover for The Wheel of Time: A Novel. Title text at top of image is ` [Text] JAMES SAUNDERS [newline]  THE WHEEL [newline]  OF TIME [newline]  R RING [newline]  THE [newline] "AMAZING STORY" [newline] "SO GOOD IT'LL CHANGE [newline]  YOUR MIND" [newline] "MOVING [newline]  STONE" [newline]  AND [newline]  THE [newline]  THE RING [newline]  THE RING OF [newline]  IN A [newline]  WORLD [newline]  JONATHAN SALT
[Description] The cover of The Book Thief, showing PERSON wielding a knife, standing in front of a snowy landscape, with text that says ` [Text]"DANGEROUS AND [newline]  UNCOMFORTABLE, BUT [newline] "A PERFECTLY-PLOTTED [newline]  INCREDIBLE COMING- [newline]  TERACT" [newline]  OF WAR" [newline]  THE BOOK [newline]  OF TIE [newline]  SHIVA [newline]  BOOK [newline]  OF TIE [newline]  DANGEROUSALT
[Description] Cover for The Last of the Skywalkers. [Text] DAVID BECK [newline]  THE [newline]  LASHTALKERS [newline]  TALES OF [newline]  SKY [newline]  DAVID BECK [newline]  THE [newline]  LAST [newline]  OF [newline]  THE [newline]  SKYWALKERS [newline]  JOHN [newline]  SAVES [newline]  RAVENBURN [newline]  THE [newline]  LAST [newline]  OF THE [newline]  SKYWALKERS [newline]  A [newline]  DAVID [newline]  STIEF [newline]  STIEF [newline]  RAVENBURN [newline]  HOOD [newline]  BOOK [newline]  TALES OF [newline]  THE [newline]  THE [newline]  SKY [newline]  WAR [newline]  OF TIE [newline]  OF TIE [newline]  SKY WALKERS [newline]  LAST OF [newline]  THEALT

m-accost asked:

I've noticed that Frank has a tendency to hyperfixate on certain topics (Homestuck, TNC, Eliezer Yudkowsky etc.), to the point of going on tangents about them with no apparent prompting. I assume this has something to do with their statistical prevalence in her training data, but I'm curious about what the exact cause is

She’s trying to sound like my blog, and I used to talk about these things a lot.

Also, there’s another factor that I think probably contributes.

Frank actually generates the tags for each post before she generates the main body of the post.

I figured that this way of doing things would give me more control (if I wanted it) over the subject matter of posts. If I wanted the bot to make a certain type of post in a certain case, I could have the code “pre-fill” a specific category tag into the start of the post, and the LM would write a post in the desired category.

I didn’t end up using that tag-prompting capability much, although the code does use “#original fiction” in some contexts to get Frank to write a story. (But that tag is a special case in other ways I don’t want to go into now.)

But what often happens, now, is that

  • Frank starts writing a post. The first thing she writes are the tags
  • At this point, she has very little idea exactly what she’s going to write
  • But she does have a very good memory of the frequencies of different tags on my blog.
  • So she’s very likely to use a once-common category tag like “#big yud”, even if there’s no obvious connection to the preceding context. It’s a pretty good guess that, if a post is on my blog, it’ll have one of those common category tags – for some reason or other.
  • Then, Frank writes the content of the post. She knows she has to work the tag in somehow. But it’s not obvious how to do so.
  • She noodles for a while like usual, ignoring the tag.
  • Then, at some point where it’s not obvious what to say next, she thinks “ah ha! I see a clue – that #big yud tag! I’m so clever, I figured it out!”
  • And from the outside, this looks like bringing up Homestuck, TNC, Eliezer Yudkowsky, etc. in a contextually inappropriate way.

shlevy asked:

Apologies if this breaks your AI discourse fast, but: I'm very confused by the fact that the discussion around ChatGPT et. al. "safety" completely elides the distinction between "this AI did something way outside of its operator's intent" and "this AI in the hands of a clever operator did something way outside its designer's intent". If OpenAI's goal were to simply avoid the former, do you expect they'd fail with ChatGPT?

The former seems under-specified.

If you just have a plain old language model, without any RLHF or anything, is it “safe” in the former sense? Arguably, yes!

Language models are very simple: you give them some text, and they write the next piece of it.

This is counter-intuitive if you come in expecting them to talk to you like a dialogue partner, or “follow your instructions,” or something. But once you really get that an LM is an LM, it becomes very clear what you have to do to get it to follow your intent.

Namely, if you want X, you should write a piece of text which would plausibly be followed by X. (Or by something containing X, or by something transformable into X through some known and trivial procedure.)

This isn’t always easy. It takes some skill to be a “clever operator” of these things. But when there are problems, they’re with the user, not the machine.

It’s like programming. I don’t want a programming language that will “never do something way outside of my intent” (a terrifying notion!). I want a programming language that does what it says on the tin, and the rest is up to me to provide.

————

Things get murkier if, like OpenAI, you don’t just want to make a pure LM. You want to make friendly dialogue partner, or a servant that follows your instructions, or something.

Now, we’re no longer in the world of things like programming languages, that “do what they say on the tin” and never have to guess what you’re thinking.

If you take a pure language model and type in,

What is the best form of government?

you are not likely to get an answer to the question, much less a good one. You’ll get an essay which happens to include this rhetorical question, or a list of similar questions, or something. If this surprises you, that’s on you; the LM was just being an LM.

Now suppose someone has tried to convert the LM into a chatbot or an instruction-following servant. And its user types in,

What is the best form of government?

At this point, what does it mean to comply with the “operator’s intent”?

What does the operator want, anyway? Are they sincerely curious? (And looking here for answers – how old are they?)

Are they trolling? Are they probing the robot’s knowledge or its ideological bent?

Or do they, maybe, want an in-character answer from the robot? One in keeping with the apparently personality of dialogue partner they’re coming to know and appreciate – which does not necessarily mean the same thing as “a correct answer” or “a smart one”?

You cannot really create one of these things without having (as you put it) a designer’s intent, and trying hard to enforce it.

That enforced designer’s intent is what separates these things from pure language models. If all we needed was an interaction modality with a clear relationship between what we put in and what we get out, we would just be using pure LMs.

Cross-posting an ACX comment I wrote, since it may be of more general interest. About ChatGPT, RLHF, and Redwood Research’s violence classifier.

—————-

[OpenAI’s] main strategy was the same one Redwood used for their AI - RLHF, Reinforcement Learning by Human Feedback.

Redwood’s project wasn’t using RLHF. They were using rejection sampling. The “HF” part is there, but not the “RL” part.

In Redwood’s approach,

  • You train a classifier using human feedback, as you described in your earlier post
  • Then, every time the model generates text, you ask the classifier “is this OK?”
  • If it says no, you ask the model to generate another text from the same prompt, and give it to the classifier
  • You repeat this over and over, potentially many times (Redwood allowed 100 iterations before giving up), until the classifier says one of them is OK. This is the “output” that the user sees.

In RLHF,

  • You train a classifier using human feedback, as you described in your earlier post. (In RLHF you call this “the reward model”)
  • You do a second phase of training with your language model. In this phase, the language model is incentivized both to write plausible text, and to write text that the classifier will think is OK, usually heavily slanted toward the latter.
  • The classifier only judges entire texts at once, retrospectively. But language models write one token at a time. This is why it’s “reinforcement learning”: the model has to learn to write token-by-token a way that will ultimately add up to an acceptable text, while only getting feedback at the end.
  • (That is, the classifier doesn’t make judgments like “you probably shouldn’t have selected that word” while the LM is still writing. It just sits silently as the LM writes, and then renders a judgment on the finished product. RL is what converts this signal into token-by-token feedback for the LM, ultimately instilling hunches of the form “hmm, I probably shouldn’t select this token at this point, that feels like it’s going down a bad road.”)
  • Every time the model generates text, you just … generate text like usual with an LM. But now, the “probabilities” coming out of the LM aren’t just expressing how likely things are in natural text – they’re a mixture of that and the cover-your-ass “hunches” instilled by the RL training.

This distinction matters. Rejection sampling is more powerful than RLHF at suppressing bad behavior, because it can look back and notice bad stuff after the fact.

RLHF stumbles along trying not to “go down a bad road,” but once it’s made a mistake, it has a hard time correcting itself. From the examples I’ve seen from RLHF models, it feels like they try really hard to avoid making their first mistake, but then once they do make a mistake, the RL hunches give up and the pure language modeling side entirely takes over. (And then writes something which rejection sampling would know was bad, and would reject.)

(I don’t think the claim that “rejection sampling is more powerful than RLHF at suppressing bad behavior” is controversial? See Anthropic’s Red Teaming paper, for example. I use rejection sampling in nostalgebraist-autoresponder and it works well for me.)

Is rejection sampling still not powerful enough to let “the world’s leading AI companies control their AIs”? Well, I don’t know, and I wouldn’t bet on its success. But the experiment has never really been tried.

The reason OpenAI and co. aren’t using rejection sampling isn’t that it’s not powerful, it’s that it is too costly. The hope with RLHF is that you do a single training run that bakes in the safety, and then sampling is no slower than it was before. With rejection sampling, every single sample may need to be “re-rolled” – once or many times – which can easily double or triple or (etc.) your operating costs.

Also, I think some of the “alien” failure modes we see in ChatGPT are specific to RLHF, and wouldn’t emerge with rejection sampling.

I can’t imagine it’s that hard for a modern ML classifier to recognize that the bad ChatGPT examples are in fact bad. Redwood’s classifier failed sometimes, but it’s failures were much weirder than “the same thing but as a poem,” and OpenAI could no doubt make a more powerful classifier than Redwood’s was.

But steering so as to avoid an accident is much harder than looking at the wreck after the fact, and saying “hmm, looks like an accident happened.” In rejection sampling, you only need to know what a car crash looks like; RLHF models have to actually drive the car.

(Sidenote: I think there might be some sort of rejection sampling layer used in ChatGPT, on top of the RLHF. But if so it’s being used with a much more lenient threshold than you would use if you were trying to replace RLHF with rejection sampling entirely.)

hydralisk98 asked:

Greetings, just discovered your bot and your cool content. Just a question, may I ask them self-care queries for the bot once it comes back online?

Idk, just trying to bring more wholesome & constructive attitudes everywhere I can because I care. Take care of yourself too. ^*^//

You can ask whatever you like!