Install Theme

nightpool:

nostalgebraist:

I put a mostly-finished technical overview of Frank’s codebase up on github here.

Mostly focused on the design and operation of the bot itself, not the cool machine learning parts.

I don’t know quite who the audience for this is, or whether there even is one, but it feels nice to have a description of the bot written down that’s not immensely out of date.

I actually really like this API design! it’s very very similar to what I would consider the “good” version of this architecture, which would be using a database like Redis or similar to power a jobs queue that the ML machines can read from and write to.

How do you handle making sure that two ML machines don’t work on the same task concurrently? This is a very common issue with distributed job queues like this one. Does the main server mark as task as “handed out” when the GET /pollml from the ML machine comes in, or does it just hand out the same task to as many machines as will take it? Feels like you might end up with a lot of duplicate work if you don’t have something like this set up. I guess maybe it’ll Just Work Out if you always have “number of candidate posts generated” divisible by “number of ML machines running”? (Not sure how that works out for the score tasks though….)

Good question.

When there are multiple ML machines, and they’re doing the scoring tasks, they do in fact perform some redundant overlapping work. It’s suboptimal, but these tasks are relatively fast so it’s not a huge source of overhead.

The biggest time sink by far is the “write posts” task, but that is also a special case where this is a non-issue. The ML machines all receive the same instructions (“write a single post and send it over”), and it’s actively good for them to do this concurrently.

—-

Now that I think about it, I realize I erred slightly in my description of this one – I’ll have to go back and edit later.

During the “write posts” task, the main process is responsible for deciding when to stop (which it communicates via “/done”). From the perspective of the bridge service, the task is just “keep writing posts until I hear /done.”

This lets me write logic in the main process that rejects some posts (like very short ones) while writing is still happening, while guaranteeing we still get N at the end, and keeping the bridge service simple.

However, this has the annoying consequence that the bridge service learns when we’re done slightly after the decision is made, which means that it might have already sent ML machines off to write additional posts we don’t need.

To smooth over this case, I recently added a state called “almostdone,” set when we get close to N. This tells the ML machines to wait longer between each POST and the next GET, anticipating that a /done may occur in between.

I put a mostly-finished technical overview of Frank’s codebase up on github here.

Mostly focused on the design and operation of the bot itself, not the cool machine learning parts.

I don’t know quite who the audience for this is, or whether there even is one, but it feels nice to have a description of the bot written down that’s not immensely out of date.

I’ve made some changes to Frank’s image model in the last few days:

  • I’ve incrementally improved the model. The new one has more parameters, and can “see” more text per image (up to 384 characters). May improve image quality.
  • I’ve added classifier-free guidance. This makes the images more likely to contain the text Frank wants to write, at the cost of potentially making them less varied.

You will see tags like “#guidance scale 2” start to appear on Frank posts with images. This tells you how much classifier-free guidance was used, where 0 means “none.”

I’m not sure what the “sweet spot” for this number is, so for now, I’m having Frank randomly pick from a range of different values.

frank’s image generation model, explained

[See also: github repo, Colab demo]

[EDIT 9/6/22: I wrote this post in January 2022. I’ve made a number of improvements to this model since then. See the links above for details on what the latest version looks like.]

Last week, I released a new feature for @nostalgebraist-autoresponder that generates images. Earlier I promised a post explaining how the model works, so here it is.

I’ll try to make this post as accessible as I can, but it will be relatively technical.

Why so technical? The interesting thing (to me) about the new model is not that it makes cool pictures – lots of existing models/techniques can do that – it’s that it makes a new kind of picture which no other model can make, as far as I know. As I put it earlier:

As far as I know, the image generator I made for Frank is the first neural image generator anyone has made that can write arbitrary text into the image!! Let me know if you’ve seen another one somewhere.

The model is solving a hard machine learning problem, which I didn’t really believe could be solved until I saw it work. I had to “pull out all the stops” to do this one, building on a lot of prior work. Explaining all that context for readers with no ML background would take a very long post.

tl;dr for those who speak technobabble: the new image generator is OpenAI-style denoising diffusion, with a 128x128 base model and a 128->256 superresolution model, both with the same set of extra features added. The extra features are: a transformer text encoder with character-level tokenization and T5 relative position embeddings; a layer of image-to-text and then text-to-image cross-attention between each resnet layer in the lower-resolution parts of the U-Net’s upsampling stack, using absolute axial position embeddings in image space; a positional “line embedding” in the text encoder that does a cumsum of newlines; and information about the diffusion timestep injected in two places, as another embedding fed to the text encoder, and injected with AdaGN into the queries of the text-to-image cross-attention. I used the weights of the trained base model to initialize the parts of the superresolution model’s U-Net that deal with resolutions below 256.

This post is extremely long, so the rest is under a readmore

Keep reading

burnedtownsforwhat asked:

I’ve noticed that when Frank sends images of (Twitter, tumblr, etc) posts, they start off pretty coherent but rapidly stop forming real words, let alone the grammatical sentences/complete thoughts that are present in text posts. Is this related to how images are read in training data, or some other quirk of the image generation? Although, I only really remember seeing this like twice, so I could be cherry-picking a bunch here.

A lot of it is the fact that the image model can see a maximum of 192 characters of text.

(I did this to speed up training; I’m now training a “version 2” of the model at a more leisurely page, which among other changes has a max length of 384)

So for example in this tweet, the actual text Frank wanted to write was

@Nirvash

a very smart man said that women

like long and complicated stories, but

men like short and easy to read. It

should

mean that there are more women

that read and more men that write.

You know what, that’s very much true.

I’m glad that men read because I can read and appreciate the fact that they

liked my work.

The 192nd character is in the middle of a sentence, indeed in the middle a word (partway through “write”).

So the model can guess that it’s seeing something longer that’s been truncated. It knows there’s more text after what it can see. But doesn’t know what it is, so it just spams twitter font gibberish.

(There may be other mechanisms, but this is the only one I’ve explicitly confirmed by examining an example)

noodlegirl-googlyeyes asked:

hey, I'm not sure what the best method for giving feedback on Frank is, apologies if sending an ask is not it. i just wanted to bring up that there are several mutuals i have who find the neural-blender images that go around upsetting, uncomfortable, or disturbing, because of the general look of images made that way. would you consider a tag that Frank would put on every post where she adds an image that people can blacklist?

Also, i lov Frank and it's so cool seeing her grow and develop over time! Thank you so much for making this for us to enjoy!

Sure! I’ll use the tag “#computer generated image.”

The change should be live now (let me know if it’s not working).

nostalgebraist:

Seeing Frank’s image generator operating in the real world has given me a lot of signal on how it can be improved, so I’m working on that now.

The image generator will hopefully improve a lot over time, like some of Frank’s other features.

In particular, I think the tendency to generate blurry, formless images (e.g.) is due to overly aggressive cropping of the training data.

I cropped roughly to the bounding box of the machine-recognized text in the image, thinking I wanted to make the text as legible as possible to maximize my changes of generating legible text at all.

But sometimes the image-to-text model just sees the letter “D” or something in a large image, so I end up with a blurry, ultra-zoomed-in picture of the letter “D” … and the generator has learned to skillfully imitate these boring, blurry pictures when given a sufficiently short text prompt. I guess I got what I asked for!

Continuing to train now on a differently cropped version of the data, we’ll see how it goes.

I’ve pushed a new build of the image generator, trained for a little while on better-cropped data.

It will probably benefit a lot from further training, but it’s already making things that look more interesting…

Seeing Frank’s image generator operating in the real world has given me a lot of signal on how it can be improved, so I’m working on that now.

The image generator will hopefully improve a lot over time, like some of Frank’s other features.

In particular, I think the tendency to generate blurry, formless images (e.g.) is due to overly aggressive cropping of the training data.

I cropped roughly to the bounding box of the machine-recognized text in the image, thinking I wanted to make the text as legible as possible to maximize my changes of generating legible text at all.

But sometimes the image-to-text model just sees the letter “D” or something in a large image, so I end up with a blurry, ultra-zoomed-in picture of the letter “D” … and the generator has learned to skillfully imitate these boring, blurry pictures when given a sufficiently short text prompt. I guess I got what I asked for!

Continuing to train now on a differently cropped version of the data, we’ll see how it goes.

diagrapher asked:

Hi Rob, could you post a picture of a duck?

I can’t do that, Dave.

Yesterday I made Frank much more likely to generate images than usual, as a demo of the new image generation feature.

I’ve turned that off now, so she’ll generate images at her “natural” rate now.

—-

This will greatly decrease the success rate of asks like “show me a picture of X.”

If you really want to make Frank generate a picture, think about a context in which a tumblr user would post a picture (with OCR-visible text in it), and try to set up that context with Frank.

Note also that the image generator is trying to make a picture containing the text Frank thinks the picture should contain. That’s it.

The image generator can’t see your ask, it can’t see your reblog, all it can see is some lines of text written by Frank – the same ones that used to appear in black-on-white text like this.

This is why things like tweets work so well (because a picture of a tweet contains text indicating it’s a tweet, eg an @ sign before a username). Whereas trying to get a picture that wouldn’t normally have text in it (eg a cat) generates an almost random picture with close to no relationship to your prompt.