There were so many “wait, WHAT” moments in that paragraph that I need to lie down

There were so many “wait, WHAT” moments in that paragraph that I need to lie down
hey
definitely me here not my wife
i have learned to post in my sleep
just wanted to let you know about how cute i am
when i’m sleeping i do this adorable little hand motion
also maybe people could say nice things to me? because i’ve been kinda anxious lately and it might help.
Google will decide whether or not your brand is worthy of a knowledge panel. If your brand has enough authority, a knowledge panel will appear.
Said one of them, probably called Zac, “It’s just not very nice. It’s not crisp at all. It’s gloopy. It leaves a sort of film in your mouth. It’s almost creamy.”
“Wait a minute!” cried another, possibly named AJ, “’Creamy’, that’s a good word! That’s a heritage word. ‘Creamy’, that sounds like a good thing, sounds like how things used to be in the good old days. I reckon if we polled people as to whether they thought Britain had got more or less creamy over the last 50 years, they’d say less, and they’d be sorry about it. Everything in the past was creamy, wasn’t it? Call me crazy, but I reckon we can actually promote this beer as being creamy and make that sound like a good thing.”
oh, to be a cartoon mafia boss with two dimwitted but loveable lackeys who, upon my cleverly insulting the protagonist, will say “nice one, boss,” and the second, in a slightly higher, more snivelly voice, will say “haha, yeeah, nice one boss!”
(via hedownwithskeletor)
Okay, here’s my promised post on the Transformer architecture. (Tagging @sinesalvatorem as requested)
The Transformer architecture is the hot new thing in machine learning, especially in NLP. In the course of roughly a year, the Transformer has given us things like:
This thing is super good. It honestly spooks me quite a bit, and I’m not usually spooked by new neural net stuff.
However, it doesn’t seem like an intuitive understanding of the Transformer has been disseminated yet – not the way that an intuitive understanding of CNNs and RNNs have.
The original paper introducing it, “Attention Is All You Need,” is suboptimal for intuitive understanding in many ways, but typically people who use the Transformer just cite/link to it and call it a day. The closest thing to an intuitive explainer than I know of is “The Illustrated Transformer,” but IMO it’s too light on intuition and too heavy on near-pseudocode (including stuff like “now you divide by 8,” as the third of six enumerated “steps” which themselves only cover part of the whole computation!).
This is a shame, because once you hack through all the surrounding weeds, the basic idea of the Transformer is really simple. This post is my attempt at a explainer.
I’m going to take a “historical” route where I go through some other, mostly older architectural patterns first, to put it in context; hopefully it’ll be useful to people who are new to this stuff, while also not too tiresome to those who aren’t.
Frequently, for the benefit of those who came and went around his bed (who, although they were certain to outlive him, lying in his bed awaiting the moment of his own death as if it had been finally scheduled, were treated by him as if they were already among the dead), not necessarily to flaunt his happiness but simply to enjoy the sounds that reached his ears along his jawbone from his own eccentric vocal chords, and to revel in the furtive, complex sympathetic resonation of his internal organs, pregnant now with cancer cells, he would sing, in English, “Happy Days Are Here Again.”

Editors’ Picks
Haven’t had the time and mental bandwidth to write that promised transformer architecture post, but in the meantime, here’s a quick post about some recent GPT-2-related developments that I only just learned about two days ago: