Still have to make a longer post on Frank’s new image generator, but a few quick comments:
- Many thanks to everyone I talked to about this project in the EleutherAI discord! Special thanks to RiversHaveWings for suggesting I try diffusion for this problem + various helpful tips along the way.
- As far as I know, the image generator I made for Frank is the first neural image generator anyone has made that can write arbitrary text into the image!! Let me know if you’ve seen another one somewhere.
- The model is a text-conditioned denoising diffusion model. Or rather two of them, a 128x128 base model and a 256x256 upsampler.
- Coincidentally, just 3 days ago, OpenAI announced/released their own text-conditioned denoising diffusion model. I guess it’s an idea whose time has come! Their model is structured a little differently, and makes way better-looking images, although with the writing-text aspect.
- My code for this model is a heavily modified fork of OpenAI’s improved-diffusion repo. It’s on this branch. The Files Changed view here gives a clearer sense of what I changed. (Caveat: it’s extremely hacky research code)

