What I’ve been doing lately in Frank development:
- Switching the ML stuff from tensorflow to pytorch.
- Replacing the generator model with one 2x as big, finetuned from the 2.7B GPT-Neo checkpoint released by Eleutherai. (This is the same size and architecture as the smallest GPT-3 model)
#1 is basically done and I should be able to “flip the switch” in production soon, probably tomorrow
#2 is nearly done on the development side, but might be too slow to be practical for Frank’s level of demand. No way to be sure without trying it
The second was enabled by the first: I finetuned the Eleutherai model in tensorflow(-mesh), same way they trained it, then spent like a week going down a Pepe Silvia-style rabbit hole trying to figure out how to do inference with the damn thing.
…then I converted it to pytorch and it instantly worked like a charm. Like 15 minutes of work after spending days on the tf version (actually rewriting and rebuilding parts of tf itself from source by the tail end of my quixotic efforts )
I’d been meaning to switch the project to pytorch for a long time, and this was the last straw.



