the-moti asked:
Could you use some kind of optimization to generate a post for nostalgebraist-autoresponder which is rated very highly by the selector model (like, higher than any post Frank actually made) while still being rated as reasonably likely by the generator model (like, comparable to some posts Frank makes of the same length)? Would it be an actually good post or just repeat a bunch of "good" words and phrases?
(Sorry for the very late reply!)
1.
Doing literally what you propose would be quite difficult.
I would have to do something like the RL approach in the Learning to Summarize paper, with a separate policy model. I’ve considered that before, but the paper only used that approach with short action sequences of ~40 tokens max, and I expect it would require vastly more compute/data for much longer sequences.
2.
I can do a brute version version of this for free.
Frank generates many candidate responses to every prompt, and for a long time, I’ve logged all of these together with their selector scores.
This is roughly the type of data I would get from generating a huge number of posts and running them all through the selector to answer your question with brute force, except I already have the data sitting there on disk.
When I first got this ask, I went through these logs and looked at the very highest-scoring posts. Unsurprisingly, there were plenty that got extremely high scores, very near 100% probability.
However, the posts themselves were hard to interpret. They were generally short, but otherwise they didn’t even look like “bad posts with lots of ‘good’ phrases,” they just looked normal.
Some things to consider:
- I might have done the logging wrong somehow. Or done it wrong in the past, fixed the bug, and then forgotten about it.
- There have been many versions of the selector model, as I continually re-train it on new data.
This leaves us with a much smaller sample size for any given version. But if we use large sample over all versions, and just do an argmax, that preferentially selects for model versions with weird biases, rather than typical or “better” model versions.
- The model architecture I use for the selector uses a single pass of attention to “summarize” the activations from an interior layer of the GPT model to a fixed-length vector, then feeds this vector through a residual MLP. (In other words, the attention operation contracts over the sequence dimension of the input activations.)
I think this explains why the “very best” posts tend to be short. If there are things the selector likes and dislikes, a longer sequence will typically have some of both, whereas a short sequence can be short enough to only contain “good” elements. Because of the way softmax attention works, a short sequence with one “good” element looks as good as a longer sequence with many instances of that “good” element.
So what we’re seeing is not “the most intensely 'good’ posts,” it is “the most purely 'good’ posts, made up of only 'good’ constituent parts.”
(Also, long posts actually get fewer notes on average.)










