I’ve been fine-tuning the 345M GPT-2 on a bunch different things lately. I set it going on a bunch of Nabokov ebooks this morning, and when I got home it was writing some startlingly on-brand, uncanny valley stuff – examples are below the cut because I couldn’t resist quoting a whole bunch of relatively long ones.
[Note: I got kind of carried away with machine learning speculation here, but please do click the readmore and read the samples, even if you’re not interested in the sort of thing I’m effortposting about above the readmore]
I’ve been a little paranoid about this new larger model learning to memorize its input – I know it can do this, because when I was first generating unconditional samples from the (non-fine-tuned) model, I got curious about one oddly distinctive passage and Googled it, and it was literally (as in perfectly verbatim) the “Translator’s Synopsis” for some light novel called “Hedonist Sovereign.”
Since then I’ve been regularly Googling suspiciously good output, and I haven’t gotten any other hits like that. But even that one example was surprising, and caused some sort of shift in my view of what these models are doing.
Of course, it’s not like I imagine the thing is directly storing individual stretches of input text, side by side and separate from one another. It’s trying to store the information necessary to reconstruct the input as efficiently as possible (since the total information content of the model is a fixed constraint), and if it gains the ability to regurgitate something verbatim, that thing is still stored only implicitly in some compressed form and mixed together with everything else it knows.
But it’s possible to compress information in this way and still be able to “read it off of” the resulting model in a surprisingly complete way. Cf. the “secret sharer” paper, which showed how specific input details like credit card numbers could be determined from the distribution over a very large amount of model output, since the numbers appearing in the input were assigned slightly higher probability than other strings of the same format. (It’s interesting to think about why this happens and what degree/type of “pressure” to store other information would be required to eliminate the tendency entirely, rather than just weaken the signal and require a larger output sample.)
I’m not sure of the right way to think about this. It makes me think of (one simplified view of) the model where it essentially has this huge implicit library of phrases and even sentences and paragraphs, which are all sort of “competing” to be part of the next stretch of text. In this view, some of the higher-level abstractions it seems to form (like certain styles complete with diction and sentence structure) may be represented internally not as equally high-level abstractions, even implicitly, but as a large number of noisy/compressed concrete examples which can be “strung together” via lower-level similarities. That is, to write (say) a Nabokovian sentence, maybe you don’t need a hierarchical ontology of stylistic concepts – “ah, I see I’m writing this sort of sentence; that means I need these sorts of phrases, this sort of wry aside, these sorts of first names, etc.” – maybe you can just use a large memory plus lower-level ideas to string you along from word to word, so that writing a long clause calls up the (noisy) memory of thousands of passages with long clauses, and causes you to imitate other features of those passages, and then those features affect/refine the set of memories called up next. (I think I’d need to formalize this distinction more to really know whether it makes sense.)
I’m not trying to denigrate these models here, BTW; this reminds me in some ways of how it feels when I’m coming up with the next thing I’ll write or say, and maybe the lesson is really that I have some misguided intuitions about human cognition.


