Install Theme
birdblogwhichisforbirds:
“There were so many “wait, WHAT” moments in that paragraph that I need to lie down
”

birdblogwhichisforbirds:

There were so many “wait, WHAT” moments in that paragraph that I need to lie down

hey

definitely me here not my wife

i have learned to post in my sleep

just wanted to let you know about how cute i am

when i’m sleeping i do this adorable little hand motion

also maybe people could say nice things to me? because i’ve been kinda anxious lately and it might help.

Google will decide whether or not your brand is worthy of a knowledge panel. If your brand has enough authority, a knowledge panel will appear.

voxette-vk:

Said one of them, probably called Zac, “It’s just not very nice. It’s not crisp at all. It’s gloopy. It leaves a sort of film in your mouth. It’s almost creamy.”

“Wait a minute!” cried another, possibly named AJ, “’Creamy’, that’s a good word! That’s a heritage word. ‘Creamy’, that sounds like a good thing, sounds like how things used to be in the good old days. I reckon if we polled people as to whether they thought Britain had got more or less creamy over the last 50 years, they’d say less, and they’d be sorry about it. Everything in the past was creamy, wasn’t it? Call me crazy, but I reckon we can actually promote this beer as being creamy and make that sound like a good thing.”

dave-striiider:

oh, to be a cartoon mafia boss with two dimwitted but loveable lackeys who, upon my cleverly insulting the protagonist, will say “nice one, boss,” and the second, in a slightly higher, more snivelly voice, will say “haha, yeeah, nice one boss!”

(via hedownwithskeletor)

the transformer … “explained”?

Okay, here’s my promised post on the Transformer architecture.  (Tagging @sinesalvatorem​ as requested)

The Transformer architecture is the hot new thing in machine learning, especially in NLP.  In the course of roughly a year, the Transformer has given us things like:

  • GPT-2, everyone’s new favorite writer-bot, with whose work I am sure you are familiar
  • GPT (the first one) and its superior successor, BERT, which can achieve state-of-the-art results with unprecedented data efficiency on numerous language understanding tasks with almost no hyperparameter tuning – in concrete terms, this means “something that took me, nostalgebraist, a month to do in 2018 now takes me 30 minutes, and the results are better
  • AlphaStar??  There’s still no paper on it yet, AFIAK, but the blog post says it has a Transformer as one of its components EDIT: the AlphaStar paper is out, see my post here for details

This thing is super good.  It honestly spooks me quite a bit, and I’m not usually spooked by new neural net stuff.

However, it doesn’t seem like an intuitive understanding of the Transformer has been disseminated yet – not the way that an intuitive understanding of CNNs and RNNs have.

The original paper introducing it, “Attention Is All You Need,” is suboptimal for intuitive understanding in many ways, but typically people who use the Transformer just cite/link to it and call it a day.  The closest thing to an intuitive explainer than I know of is “The Illustrated Transformer,” but IMO it’s too light on intuition and too heavy on near-pseudocode (including stuff like “now you divide by 8,” as the third of six enumerated “steps” which themselves only cover part of the whole computation!).

This is a shame, because once you hack through all the surrounding weeds, the basic idea of the Transformer is really simple.  This post is my attempt at a explainer.

I’m going to take a “historical” route where I go through some other, mostly older architectural patterns first, to put it in context; hopefully it’ll be useful to people who are new to this stuff, while also not too tiresome to those who aren’t.

Keep reading

Frequently, for the benefit of those who came and went around his bed (who, although they were certain to outlive him, lying in his bed awaiting the moment of his own death as if it had been finally scheduled, were treated by him as if they were already among the dead), not necessarily to flaunt his happiness but simply to enjoy the sounds that reached his ears along his jawbone from his own eccentric vocal chords, and to revel in the furtive, complex sympathetic resonation of his internal organs, pregnant now with cancer cells, he would sing, in English, “Happy Days Are Here Again.”

Editors’ Picks

Editors’ Picks

Haven’t had the time and mental bandwidth to write that promised transformer architecture post, but in the meantime, here’s a quick post about some recent GPT-2-related developments that I only just learned about two days ago:

  • OpenAI has released a set of 250K documents from their training data along with 500K samples from each GPT-2 size (250K each of two sampling methods).  This is significant in that it lets us see a very large number of samples from the two larger, unreleased models.

    I browsed through a random sample of them and they mostly looked like very realistic but boring/unremarkable “news” articles.  (This is also what 345M unconditional samples tend to look like; conditional samples and finetuning with the larger models would presumably be more interesting.)

  • The stated rationale for releasing those data is to let people develop discriminators that can tell GPT-2 text apart from real text.  I have a hard time imagining this sort of work being ultimately very useful; it seems like it just starts up an arms race between the detectors and people who don’t want their samples detected, like a socially-implemented GAN.  (Or even just a plain GAN, if the discriminators are released publicly)

  • Related to that topic is this great new paper on failure modes of LM sampling.  Provides convincing, intuitive explanations for why standard sampling methods (temperature, top-k) can get lost in repetitive loops or get really weird suddenly without being able to recover.  Proposes a simple, supposedly better method called “nucleus” or “top-p” sampling.

  • nshepperd’s fine-tuning branch is now also capable of top-p sampling.  I tried it out with a few of my fine-tuned models but haven’t gotten a chance to look too carefully at the results; I think the output looked a little more realistic than usual but it wasn’t a blinded experiment, so IDK.