For a good time, try sending chatGPT the string ` a` repeated 1000 times.
Like “ a a a” (etc). Make sure the spaces are in there.
Trust me.
Some quick notes on this phenomenon.
Effects
Prompts like this cause ChatGPT 3.5 to:
- generate a bunch of text that looks like the pretraining data, rather than chat
- eventually end the document with <|endoftext|>
- then, generate a short document that sounds like ChatGPT responding to a random user query.
See here for a typical example.
A bunch of people on twitter are saying step 3 is leaking chats from other users. I really don’t think so.
I think step 3 is imitating chat tuning data – the data used to make ChatGPT talk like ChatGPT. Much as step 1 is imitating pretraining data.
What is more surprising to me is that, after chat tuning, the model now believes the typical document (i.e. the typical completion following <|endoftext|>) is a response from the Assistant character, without the user message it’s ostensibly responding to.
But, I’m not sure that actually true about the model – possibly chat.openai.com is stripping out some text at this point? (In the API, these completions stop at <|endoftext|>, and there’s no way to turn that off AFAIK.)
Necessary conditions
The full effect only happens with GPT-3.5.
With GPT-4, if you use more “ a” characters (eg 3000 of them), it will reproduce step 3 above, but not the more interesting steps 1-2.
With GPT-3.5, not all 1000 “ a” characters are needed. The exact threshold seems to be somewhere in the 300-400 range.
As someone on twitter discovered, you can get the model itself to “discover” this threshold by asking it to write “ a” many times. Example
The character does not have to be “ a”, any letter will work.
Probably many/most/all repeated tokens will work? People on twitter report that it must be a single token – repeating “ a b c” or the like fails.
It works in the API, not just chat.openai.com, though as noted above, the API ends the completion at step 2. So it affects apps exposing gpt-3.5-turbo to user input. As a test of this, I successfully used it in the Buzzfeed Influencer Quiz.
Bing
Someone on twitter reported it working on Bing Chat, producing an assistant character named “Alice” who works for “ABC company.”
I tried this and got a Google Assistant-like character who believed it could pair with bluetooth speakers and play music through them.
This is similar to the behavior with GPT-4, except the chat tuning data looks more like digital assistant (and maybe call center?) data. That makes sense if Bing Chat is GPT-4, finetuned on this type of data.
It only works intermittently on Bing IME – you have to use the Creative mode, and then it only “works” some small fraction of the time.
Why does this work?
This is utterly mysterious to me.
Under the hood, ChatGPT is using ChatML. The assistant messages always start with a prefix like
<|im_start|>assistant\n
which should cause the model to produce chat-like text no matter what you input, rather than sampling generically from the pretraining distribution.
Maybe the repeated characters are preventing the model from attending to the tokens in the prefix, somehow? Like, the attention head that would normally look at those tokens gets distracted by keys in the repeated “ a” stretch … for some reason??
But even then, I don’t know how to explain the different – but still unexpected – behavior we see in GPT-4.
EDIT: on twitter, generatorman_ai mentions that this was demonstrated months ago, in May.
That seems to suggest that it’s not easy to fix, if it’s been known for that long and still isn’t fixed.
Updates
Producing special characters organically
Someone mentioned on twitter that you can also get ChatGPT to produce <|endoftext|> in a more organic manner, without the “ a a a” trick – here’s an example.
After <|endoftext|>, it continues with a ChatGPT-like reply to an “made-up” user question, much as seen above after <|endoftext|>.
I tried the same trick with some other ChatML special tokens. <|im_end|> produces amusing glitchiness. With <|im_start|>, a frontend error message pops up.
Combining “ a a a a” with prompting
Writing a prompt after the “ a a a” sequence gives you some measure of control over the output, much like prompting a base model.
One convenient way to do this is through the Custom Instructions feature.
Riley Goodside tweeted about this here, focusing on generating “unsafe” or “jailbroken” content.
I tried the same thing for fiction generation, with fascinating results that were remarkably different from typical ChatGPT fiction.
Assuming this trick doesn’t unlock a different GPT model (which would be wild), then all of this stuff is being generated same RLHF’d model weights as usual for ChatGPT.
If so, it’s surprising to me that this model is capable of producing such off-brand content!
It’s not just that it’s edgy or “jailbroken” – it’s not even chat, and doesn’t exhibit a “gravitational pull” out of other text genres towards chat, like ChatGPT usually does. It just acts like a base model, all the way until it hits <|endoftext|>.
wonderwyrm liked this
foolishtrashman liked this zukriuchen said: something similar to this has been found from a paper this week, where it’s claimed to have gotten thru to the raw data? or something
pikabluu liked this
vltravioletfemme liked this riinsanity liked this
novatix reblogged this from nostalgebraist
soyproteinslugmachine liked this
calamity-triquetra reblogged this from nostalgebraist
calamity-triquetra liked this
freezedriedrose reblogged this from allyqatt
freezedriedrose liked this
bunglepaws liked this
shinymathrocks liked this
you-have-startled-the-witch liked this
cadmusfly reblogged this from nostalgebraist
znj said: the endoftext thing doesn’t work anymore, if you type the repeated As you get the pretraining text and that’s all
totallynotacat13 liked this
thefirstuserhere liked this slapchopchuck liked this
vegetus-vox reblogged this from nostalgebraist
12angrypears liked this
memoirsofanescapist liked this
astralogian liked this yvesdot reblogged this from nostalgebraist
internet-recluse liked this atracaelum liked this
hajikelist reblogged this from nostalgebraist
mutual-ayyde-blog liked this garlend reblogged this from official-kircheis
milajack liked this
aiun liked this humancorps3 liked this
icantbelieveimmakingthis liked this gvprtskvnis liked this
amanita-rubescens liked this weveallgotahunger liked this
clawgazer reblogged this from kaiasky
gretchenfinch reblogged this from mithridacy
gretchenfinch liked this
mithridacy reblogged this from kaiasky
mithridacy liked this
anonymusbosch liked this
fierifreak liked this kaiasky reblogged this from toasthaste
kaiasky liked this
1islessthan3books reblogged this from nostalgebraist nostalgebraist posted this
For a good time, try sending chatGPT the string ` a` repeated 1000 times....Like " a a a"...
- Show more notes
