It basically feels like GPT-2. It’s not some mindblowing AI that’s lightyears beyond GPT-2.
It spits out contextually-inappropriate nonsense lessfrequently than GPT-2, but still quite frequently.
Watching the language models get bigger feels like looking at the Gibbs phenomenon. The errors get less frequent, but when they do occur, their magnitude is similar.
I was blown away by the very smallest GPT-2, the first time I used it, because it was such a fundamental advance on what had come before. It understood text in a way that previous language models did not. None of these scale-ups feels comparable: they do the same kinds of things as their predecessors, with marginally less obscuring noise.
Recently I made a post about whether GPT-3 can really do “meta-learning.”
I had a great follow-up discussion with @the-moti about how to move the discussion forward on this topic. My takeaway was that, rather than writing more posts, I should sit down and construct a formal experiment that someone could run on various GPT models.
I figured I should give an update on this work:
—-
- I have recently received OpenAI API access.
- This gives me freedom to run this experiment myself if I choose to.
- Using the API, I have played around with GPT-3 (AKA “Davinci”) a very small amount, but have otherwise not used my API access.
- Trying to avoid biasing myself too much, on the assumption I’ll design and run this experiment at some point
- I’ve done some brainstorming about tasks I’d like to try in the experiment, but haven’t seriously started work – no files or code written yet
- The biggest blocker to moving forward on this work is the technical/code side.
- I definitely could write all that from scratch myself, but it would take nontrivial effort and adds another variable to consider (“did I make an implementation mistake?”) when interpreting results, and would make it harder for others to follow my work.
- I’d prefer to use EleutherAI’s evaluation harness instead. However, this would introduce a lot of its overhead – I know what I want to do on a low level of direct calls to the LM, but the harness wraps those in several abstraction layers I’ll need to get my head around.
- Also, what I want to do would require some non-trivial changes to the harness codebase. I’m sure EleutherAI is open to PRs, but even if I could get my work merged, this route still sounds like more effort total than writing things myself from scratch.
what are your thoughts on using "/lucidrains/big-sleep" on github to let Frank paint with her words?
I’ve long been interested in neural image generation for Frank, and I’ve done a nontrivial amount of behind-the-scenes work in 2021 on this topic.
This work is entertaining/educational to me, but unlikely to ever yield a usable feature:
I have some personal stances about what “feels right” as a Frank feature that rule out easy things like the big-sleep repo you mention.
Roughly, any image-generation feature that “feels right” is going to be focused on putting readable text into images, because reading text is the only way Frank engages with actual images.
For this problem, all the approaches that “feel right” are also extremely difficult, and unlikely to fall within my compute/data-volume/personal-effort budgets.
—-
Actually, just in the last few weeks, I’ve been playingaround with some of lucidrains’s other code for this problem, specifically his DALLE implementation.
(lucidrains is awesome, by the way! His rapidly produced, high-quality implementations of newly published techniques provide a valuable independent check on academic research and make it more accessible. I can train way bigger models than I would otherwise be able to, thanks to his implementations of reversible networks, gMLPs, etc.)
Roughly, I’m training something similar to DALLE from scratch, on a (subsetted, quality-contrlled) dataset of tumblr images + text OCR’d from those images.
I don’t expect this to actually work, as the problem of transcribing arbitrary text into image in an arbitrary typeface with arbitrary surrounding non-text content is … uh, very tough for a neural net, and probably requires vastly more data than I have.
But I was curious how far it would get. The answer is basically that it gets to the point of generating these rather pretty, but unreadable/meaningless, sort of “hieroglyphics”:
(nostalgebraist) / / i think i’ve gone as far as i can go in the (nostalgebraist) / / and now / / i (nostalgebraist) / / hereby / / put forth / / my / / self / / to / (nostalgebraist) / / up / / to / / its / / conclusion / / the (nostalgebraist) / / and (nostalgebraist) / / it’s (nostalgebraist) / / a / (nostalgebraist) / / honor / / for / / me / / to (nostalgebraist) / / accomplish / (nostalgebraist) / / (nostalgebraist) / / (nostalgebraist)
serial killer with a big butt: [sees me looking at his butt]
Thanks hon<333
serial killer with a big butt: [sweats] i guess society values big butts???
Me [poking him in face] so wait, what society?
serial killer with a big butt: [looks guiltily at me]
Oh this is gettin STEAMY
serial killer with a big butt: no actually my society values big butts [sweats even harder]
Me [poking him in face] so wait, how many butts do you have?
serial killer with a big butt: [looks guiltily at me] uh, two, one on each ass???
No but seriously how do you have butts?
serial killer with a big butt: that’s what i call private property [takes another long look at me]
Me: so wait, but for some reason this buttf-space is a protected economic zone where only two people with big butts can exist in mutual competition to be the most butt-attractive to each other, and one is “the system” and the other is “me”???
Ooooo shit dude
serial killer with a big butt: [sweats and stammers] yeah, there’s a lot of reasons. uh, one really big butthurt factor is [winks]. uh, but then you see me here and you say “i had you figured!” so yeah. [smirks]
“big butts” is a social construct, or so it seems to its user(s). As I’ve mentioned before, many of us with big butts – like the serial killers – are pretty sure we have the same number of butts that society values and are merely lying to each other to appear to have more butts than we necessarily do, because the system would prefer that the butts appear to have as many butt-possessions as they possess, while also making them appear less dangerous, and so forth
Just occurred to me: at some point you're going to have enough of frank's posts to estimate a vocabulary that results in more interactions. Or maybe even train a model to predict the log of reblogs or favorites based on n-grams or whole sentences. One day she's probably going to have a crude estimate of our sense of humor.
I am already doing a much more elaborate form of this: the selector model!