2/2/22 was a big day in the world of neural language models!
A probably incomplete list of good stuff that came out today:
1. AlphaCode
2. That OpenAI math Olympiad paper
3. New open model from Eleuther with 20B parameters
5. A new sampling method I might try in Frank sometime
Tried that new sampling method, Typical Sampling, in Frank this afternoon.
I set their parameter tau to 0.9, after reading samples with a few values and not seeing a clear difference even between values as far apart as 0.2 and 0.9. (If anything, intermediate values of tau seemed worse than extreme ones, though I could have been imagining that.)
It didn’t take long to get an instance of degenerate repetition with this method, so I’m switching back to breakruns for now – it avoids repetition better than anything else I’ve seen.
Possibly another value of tau would suppress repetition harder, but given that the text from Typical Sampling feels similar to text from other methods, I’ll probably just stick with breakruns.
I decided to give Typical Sampling another try, with tau=0.2 this time.
Just turned it on. Let’s see how long it takes to get a repetitive post this time…
