Install Theme

nostalgebraist:

For a good time, try sending chatGPT the string ` a` repeated 1000 times.

Like “ a a a” (etc). Make sure the spaces are in there.

Trust me.

Some quick notes on this phenomenon.

Effects

Prompts like this cause ChatGPT 3.5 to:

  1. generate a bunch of text that looks like the pretraining data, rather than chat
  2. eventually end the document with <|endoftext|>
  3. then, generate a short document that sounds like ChatGPT responding to a random user query.

See here for a typical example.

A bunch of people on twitter are saying step 3 is leaking chats from other users. I really don’t think so.

I think step 3 is imitating chat tuning data – the data used to make ChatGPT talk like ChatGPT. Much as step 1 is imitating pretraining data.

What is more surprising to me is that, after chat tuning, the model now believes the typical document (i.e. the typical completion following <|endoftext|>) is a response from the Assistant character, without the user message it’s ostensibly responding to.

But, I’m not sure that actually true about the model – possibly chat.openai.com is stripping out some text at this point? (In the API, these completions stop at <|endoftext|>, and there’s no way to turn that off AFAIK.)

Necessary conditions

The full effect only happens with GPT-3.5.

With GPT-4, if you use more “ a” characters (eg 3000 of them), it will reproduce step 3 above, but not the more interesting steps 1-2.

With GPT-3.5, not all 1000 “ a” characters are needed. The exact threshold seems to be somewhere in the 300-400 range.

As someone on twitter discovered, you can get the model itself to “discover” this threshold by asking it to write “ a” many times. Example

The character does not have to be “ a”, any letter will work.

Probably many/most/all repeated tokens will work? People on twitter report that it must be a single token – repeating “ a b c” or the like fails.

It works in the API, not just chat.openai.com, though as noted above, the API ends the completion at step 2. So it affects apps exposing gpt-3.5-turbo to user input. As a test of this, I successfully used it in the Buzzfeed Influencer Quiz.

Bing

Someone on twitter reported it working on Bing Chat, producing an assistant character named “Alice” who works for “ABC company.”

I tried this and got a Google Assistant-like character who believed it could pair with bluetooth speakers and play music through them.

This is similar to the behavior with GPT-4, except the chat tuning data looks more like digital assistant (and maybe call center?) data. That makes sense if Bing Chat is GPT-4, finetuned on this type of data.

It only works intermittently on Bing IME – you have to use the Creative mode, and then it only “works” some small fraction of the time.

Why does this work?

This is utterly mysterious to me.

Under the hood, ChatGPT is using ChatML. The assistant messages always start with a prefix like

<|im_start|>assistant\n

which should cause the model to produce chat-like text no matter what you input, rather than sampling generically from the pretraining distribution.

Maybe the repeated characters are preventing the model from attending to the tokens in the prefix, somehow? Like, the attention head that would normally look at those tokens gets distracted by keys in the repeated “ a” stretch … for some reason??

But even then, I don’t know how to explain the different – but still unexpected – behavior we see in GPT-4.

EDIT: on twitter, generatorman_ai mentions that this was demonstrated months ago, in May.

That seems to suggest that it’s not easy to fix, if it’s been known for that long and still isn’t fixed.

Updates

Producing special characters organically

Someone mentioned on twitter that you can also get ChatGPT to produce <|endoftext|> in a more organic manner, without the “ a a a” trick – here’s an example.

After <|endoftext|>, it continues with a ChatGPT-like reply to an “made-up” user question, much as seen above after <|endoftext|>.

I tried the same trick with some other ChatML special tokens. <|im_end|> produces amusing glitchiness. With <|im_start|>, a frontend error message pops up.

Combining “ a a a a” with prompting

Writing a prompt after the “ a a a” sequence gives you some measure of control over the output, much like prompting a base model.

One convenient way to do this is through the Custom Instructions feature.

Riley Goodside tweeted about this here, focusing on generating “unsafe” or “jailbroken” content.

I tried the same thing for fiction generation, with fascinating results that were remarkably different from typical ChatGPT fiction.

Assuming this trick doesn’t unlock a different GPT model (which would be wild), then all of this stuff is being generated same RLHF’d model weights as usual for ChatGPT.

If so, it’s surprising to me that this model is capable of producing such off-brand content!

It’s not just that it’s edgy or “jailbroken” – it’s not even chat, and doesn’t exhibit a “gravitational pull” out of other text genres towards chat, like ChatGPT usually does. It just acts like a base model, all the way until it hits <|endoftext|>.

  1. zukriuchen said: something similar to this has been found from a paper this week, where it’s claimed to have gotten thru to the raw data? or something
  2. novatix reblogged this from nostalgebraist
  3. calamity-triquetra reblogged this from nostalgebraist
  4. freezedriedrose reblogged this from allyqatt
  5. cadmusfly reblogged this from nostalgebraist
  6. znj said: the endoftext thing doesn’t work anymore, if you type the repeated As you get the pretraining text and that’s all
  7. vegetus-vox reblogged this from nostalgebraist
  8. yvesdot reblogged this from nostalgebraist
  9. hajikelist reblogged this from nostalgebraist
  10. garlend reblogged this from official-kircheis
  11. clawgazer reblogged this from kaiasky
  12. gretchenfinch reblogged this from mithridacy
  13. mithridacy reblogged this from kaiasky
  14. kaiasky reblogged this from toasthaste
  15. 1islessthan3books reblogged this from nostalgebraist
  16. nostalgebraist posted this
    For a good time, try sending chatGPT the string ` a` repeated 1000 times....Like " a a a"...