Install Theme

moths-in-the-window asked:

Is there much known about how much LLMs transfer / generalise across natural languages? For example, if ChatGPT’s RLHF-trained rules / response formats apply similarly in every language it knows?

I haven’t read any papers that explicitly address this question. (I’d be surprised if there aren’t any papers like that, I just haven’t seen them.)

But at an informal level, the answer is “yes, a whole lot of generalization happens.”

For example, causal (i.e. GPT-style) language models are extremely good at translation. This was one of the most striking few-shot results in the GPT-3 paper, see also this follow-up paper from 2021 that achieved SOTA machine translation results with GPT-3.

(EDIT: actually they were only SOTA results for unsupervised MT, which is not as impressive.)

And what is translation? It’s simply the most general form of “generalizing between languages.” To know how to express a source language text in a target language, you need to know how [all the stuff you understand in the source text] maps onto [all the stuff you understand about the target language]. If you can translate, you can probably do any other form of “generalizing between languages.”

—-

Does this mean that ChatGPT follows its rules equally well in every non-English language? I’m not sure.

The hope with RLHF is that the model learns general(ized) concepts and “understands” that they have universal scope. That it learns “all outputs should be helpful,” as opposed to “all outputs in English should be helpful,” or more generally “all outputs in contexts ‘similar to’ the RLHF training data should be helpful.”

But there’s not much evidence one way or the other about how much this really happens. I also don’t know whether, or how much, OpenAI has used non-English data in RLHF, either in the initial ChatGPT release or in later patches. They say very little about their work these days :(