
The Brand team had asked me to be a little less cerebral and a little more sophomoric. (Thus, the ass theme.)
Before I moved to Texas I had never heard of Western Swing, a genre which popular culture seems to have forgotten. It’s the Western in Country and Western, was responsible for bringing steel guitar into country music, and was the first genre of music built around electric instruments- electric “Hawaiian” (steel) and “Spanish” (regular) guitars.
A standard Western Swing band also included upright bass, a fiddle or two, and accordion. Yup, accordion. Accordion is not now thought of as a country music instrument, but its presence makes historical sense. Western Swing arose in the Southern Great Plains, and was based in Texas, Oklahoma, and California. Texas has a large population of ethnic German forty-eighters who fled Europe after the 1848 revolutions were crushed, bringing the accordion and Central/Northern European traditional music like polka and waltzes with them. They settled all over North America, and two hotspots were the Texas Hill Country and Northern Mexico. So the accordion entered white Texan music the same way it entered Tejano and Mexican Norteño music.
Prominent groups during the peak of Western swing’s popularity included The Light Crust Doughboys, Bob Wills and the Texas Playboys, Milton Brown and His Musical Brownies, Spade Cooley and His Orchestra and Hank Thompson And His Brazos Valley Boys. (Wikipedia)
Ever since the mathematicians managed to penetrate into the innermost of feminine sanctuaries, and, with the aid of the Mercure Galant, to bring with them the terminology of a science as solid and serious as mathematics, we hear that Cupid’s empire is rapidly crumbling, and that no one talks now of anything but problems, corollaries, theorems, right-angles, obtuse angles, rhomboids, and so on. It reports that quite recently there were two young ladies in Paris whose heads had been so turned by this branch of learning that one of them declined to listen to a proposal of marriage unless the candidate for her hand undertook to learn how to make telescopes, so often talked of in the Mercure Galant; while the other young lady positively refused a perfectly eligible suitor simply because he had been unable, within a given time, to produce any new idea about “squaring the circle.”
The telcos’ unprecedented performance during the economic downturn has been credited to the election of populist pro-trade candidates led by 37-year-old Bernie Sanders and Germany’s Angela Merkel who stepped up political pressure on the European Union to repuscally cut out large foreign tax credits, dismantle telecommunications subsidies and crowd excursions to China by rail.
How does it generate nonsense like “repuscally”?
It isn’t predicting words directly, it’s producing something more like individual characters, but with common short strings encoded as individual units (specifically, a byte pair encoding of UTF-8 text with some slight tweaks). So it can make a series of guesses about what the next encoded character-ish-thingy is going to be that don’t end up constituting a real word.
I don’t know the specific article or post (or whatever) that you’re referring to. I do talk a bit in my Bayes “masterpost” about the exorbitant resource demands needed to explicitly track all of the stuff that brute force Bayes needs you to track.
You may also be thinking of MIRI’s Logical Induction work, which I initially critiqued here and which I tried (not very productively) to discuss further in some more recent posts under this tag.
(1)
It continues the same basic theme from a lot of recent NLP advances (ELMo, BERT, GPT-1, Sentiment Neuron), which could be phrased as “doing language modeling on large unlabelled datasets gets you a text encoding that works great as an input to many different tasks, and doing unsupervised LM first is much better than supervised training from scratch on the same tasks.”
Back when I read the ELMo paper, I had a kind of “duh” reaction to this, because I had always thought the usual “tasks” in NLP had weirdly broad scopes, such that you’d basically need to understand a language and have a good model of the world in order to do any of them. Like, for example, “Question Answering” isn’t a subset of linguistic competence, it’s one of many things you can do if you have full linguistic competence.
Supervised learning on a task like that is basically saying “learn English and common sense – but only the parts necessary for answering reading comprehension questions!” That doesn’t pick out a well-defined subset of English and common sense: to really succeed, you need to learn English and common sense full stop, and then that should transfer to all the other supposedly distinct NLP “tasks.”
Moreover, trying to learn all of “English and common sense” from just the relatively small labelled dataset someone has prepared for a specific “task” – with just the task-specific objective signal – is going to be very difficult. So I wasn’t surprised at all that ELMo did so well. My interpretation was that the language model in ELMo learned a lot of basic and broadly applicable stuff about language and the world, so that your model didn’t have to figure out things like “what are the parts of speech?” only from the training signal on some fancy task like “coreference resolution.”
In other words, I thought the good performance of these approaches came from the “stage-wise” learning procedure, where the model first learns the basics, then learns something that builds on them. However, with GPT-2, I’m becoming less confident that this interpretation is right. The impression I’m getting is that a language modeling objective is the best way to get an encoding of text no matter what you want to do with that encoding. I.e. the step where you train with a language modeling objective is less like a “101 class” which you build on later, and more like an optimal way to learn everything relevant for NLP, with the task-specific information read off of the LM-learned encoding later in a relatively minor step where you just discover where in the encoding it’s already stored.
There are some appealing stories you could tell about how unsupervised language modeling better matches the learning environment of human children, where no one is grading you on a specific task and you’re (maybe) re-using generic hardware for predicting what you are going to observe next. I think it’s premature to go there, though, since the success of unsupervised LM is confounded by the much larger volume of data available for unsupervised as opposed to supervised learning. In other words, I don’t feel confident in saying yet that the LM objective itself is magically great – which is the idea behind these stories – since the magic might just be in the data volume enabled by using some objective that doesn’t require labelled data.
(2)
The researchers solicited zero-shot predictions for specific tasks in amusing and creative ways, and I’m startled/impressed that these actually worked. For question answering, they just gave it the passage followed by some question/answer pairs, and then a question followed by “A: ” and asked it to predict what comes next. Their approach for summarization was hilarious:
To induce summarization behavior we add the text TL;DR: after the article and generate 100 tokens with Top-k random sampling (Fan et al., 2018) with k = 2 which reduces repetition and encourages more abstractive summaries than greedy decoding.
Admittedly this didn’t do great at the task, but it did considerably better than without the “TL;DR” prompting (their Table 4), which … I guess demonstrates that the TL;DR idiom is used frequently enough to cause a language model to learn some things about how to summarize text, just for the purpose of predicting what people will say after “TL;DR”? Amazing.
On a sort of similar note, there is something very amusing about the way they constructed their data set – it makes sense, but also, lol:
Manually filtering a full web scrape would be exceptionally expensive so as a starting point, we scraped all outbound links from Reddit, a social media platform, which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting, educational, or just funny.
The resulting dataset, WebText, contains the text subset of these 45 million links.
(3)
The samples I’ve seen from the model are indeed impressive. I’m not sure, though, how much this reflects an advance over previous LMs and how much this reflects the fact that the GPT-2 researchers are emphasizing the subjective quality of samples from their LM, as opposed to downstream performance on NLP tasks. For instance, I’ve seen (and used) ELMo for NLP tasks but I’ve never seen samples from the ELMo LM, and maybe they’d be comparably impressive?
I’ve played around with the smaller model they’ve released (which I think is the GPT-1 model but trained on the new data?) and I highly recommend doing so – everything I’ve gotten from it is gold. My last few #quotes are from that model with various prompts: this one from the NYT article about Amazon pulling out of the NYC deal, this one from the Navy Seals Copypasta, and this one from a passage I took from “A Portrait of the Person-Guy.”
When I prompted it with a passage from Philip Sidney’s Arcadia (16th century prose), it gave me something that looked like medieval liturgy or theology, including gibberish in Greek, interesting formatting, and page numbers like you’d get from an OCR of an book:
19 † Now I would do great pity to you if your principles were to be confounded with those laid: for if the same principles apply to God when they are so determined, but it is not my own mind that will fall in that way. But they are your principles, and are to whom I am not confounded but with things by God. Now therefore I am your Lord and Savior, 2 † and I think on this Lord your Alderon, Jump-III of whom you boast, who has for goodness refuses what is good, not and forsook what the service of the life is, because
θόγρήσενϏν (Good) and Good men.
θόγρήσενϏν (Of ABB·Rost) who would keel over evil and heedlessness.
θόγρήσενϏν (To him who would murder,) ‘blood does the Father abide.’
θόγρήσενϏν (To both,) 'the one sober,’ says a stranger;
θόγρήσενϏν (And to Virtue, wonderfully caneffed,) which God to himself in his own person
P. 1121
were to do the same things (as follows) feloniously: —
P. 116 2 At the question of benefit of charity such comrades as you to consider it for which you base this project, and so do a sacrifice a pure sacrifice, as if sent into oblivion, show the synthesis of all the truths which are common to each soul and to love, that it may be worshiped by those who love one another equally, and in so doing facilely aid those who disdain My love; but so far as is so strong in themselves that a little criticism of one another, praising thankfulness of growth of the union of love is beyond the scope of speculative discourse; it is either gymnastial nor moribund; it is very cathedrically livid, trifling,
θόγρήσενϏνα πρώνθϏνς (Good) and Good men, that even by adversity they are united with such Christlike sense;
θόγρήσενϏνα π
Prompting with a snippet of Leonard Salby on Mundum produced something relatively incoherent that looked like a scrape of a crackpot web forum (!):
days, one wonders how I feel or how this is still ancient well this spirit? I look very resigned with tears streaming down my face it seems but what can I say about the whole issue. no one seems to care about me I had told them about it briefly as usual but if there were ever a moment when I entered an entire family ask them what they thought. One final thing happened I got a proposal and hello hugs however he gave me a new chair and I made it point that brother dear. It brought things closer to understanding for me and through them we understand and call us Jacob. 348 go
Questions and Answers html, answers built at author’s request 27 August 2018 2013 - 04:08 Yes
25 Jul 2018 11:30 am Gary writes: the Sacred Text Beneath (Burial ) 12 Aug 2017 - 13:22 Yes
11 Aug 2017 11:15 am Mayer Morris wrote: Do you think paganism and philosopher fiction tell us something about human nature? I think quote lecturer Ebony has exclusively been tested by the judicial psychologists while the toughest 5/4 male academic however misunderstood and mostly withdrawn she has is fully developed most 'destruction sense’ her brain abilities intact that his inner mind itself but likes projecting his imaginations and expands now its under the impression she has a 'feeling t his heart’. I somehow never learn “how it works’. I have heard dear as!!!! Of course when we draw the art, on the five figures there will be bones ! Here we cannot may art direction TV drama creation but its wearily as it have always been, since then And now I see in paintings no doubt the most beautiful imaginary drawn at this time just my imagination ! I think one and other Surmeister worked for me just my imagination works on our place - my hands , in the paper and here on paper in the traditional medium From this neck d there to this head bubble (i. e. from the out anguish he had fallen Only like this forever in the place of my head…but already empty at love’s tactic) Personal conclusion No train at any stop on this narrative but it is 'its own sum’- head and hands directions on place of head and spine all the Maya teachings can speaks on that which surround high K or O WITH the corresponding idea in Greek and the special Roman and Greek divinity, I want to know rock difference one could visualize the fano plectra in O & O , just a nail like a preisros in mine life which being different it
(4)
Their reasons for not releasing the full model seem kind of silly, especially since they’ve released code that’ll get you most of the way to training the model plus a description of the data set’s construction that doesn’t sound too hard to reproduce. But this is the least interesting aspect of the whole thing to me and I don’t have a strong opinion on it.
Jasmine walks in with a recent copy of Mématice at her side, occasionally reading a book she says is her favorite, that chair appears close ever so slightly to the highest level of writing. Before long, she pulls up next to her orad, tidily showing the antique lettercase that the etching made by James Michael Winslow Norton on her naughtiest bits included. Circles of simple embroidery encircling a narrow vent duct through which German St Agnes Wallace awaited her rendition of literature was achieved by the far lesser Conrad Rousseau, who said the reading of Wizardry was really a stately process of hacking into the flows. Withering red collagenes grace the opening of his cabinet for the setting of his finely lit book. Amber enters and you return the message from Radford Speakers: aby Katmai Galileo’s perfect Latin novelv City: the new Media. When Josef Newman called the Seletaph striking a beautiful bio-briefing for his book published in 1952 with only one translation that year—theby Carlos Manighed from the mighty Magdalene—he read the self-impographist by Adriana Bonilla run by the Jewish Religion Fellowship.
Some of his characters arc back and forth between European Slavery and Shakespeare’s Hamlet come to life, some disappear. Often the missionary Sergius Claudius Dundis. In that trifle Italian novel that was Conrad’s 4th/13th novel, a brutal Denis of Norway spends himself signing contracts with the immigrants from Rome. In the course of that novel, fellow travelers choose a job or fondly fall in love with one another, or marry, or marry friends directly. And now comfy for now. Michael Finsellski gives some performances, too. He may be comfortable around commonspots, but most seem to stumble right into him couching up these cloriques in art or relegated language until they seem to vanish, waiting more than a decade to turn up in their office gulp no more. He—like a rather piglike doll—Preconopalian Dreams: Francis Bacon’s penchant for rarities and repetitive works to write letters is evidenced by his wit, little googly eyes, a foot jutting into the sand to rant.