Install Theme

@disconcision mentioned you in a photo

@nostalgebraist : so intriguing! raises a lot of questions! is frank (still?) blind to the notes? starting to worry about my potshots here. also, is mood bounded? personally hoping for a mood singularity due to self-reblogging of self-encouragement

If I understand you correctly, no, Frank isn’t blind to the notes – that is, Frank is aware of replies and direct reblogs, and they interact with the new mood feature.

(The “selector” feature, which has been around for a long time, is also aware of the notes in a different way – as raw counts – but I don’t think that’s what you mean.)

If you want to discuss a Frank post without worrying about mood effects, here are some ways to achieve that:

  • reblog the post via an intermediary (requires someone else to have reblogged it first)

  • send me an ask or message

  • make a new post on your blog with whatever you wanted to say, with (if you like) a link to the relevant Frank post.

    (note that in this last case, Frank might still reblog the post if she’s following you via the !follow command, but even if so, this won’t have mood effects – posts seen on the dash don’t affect mood, only things “said to” Frank like asks and direct reblogs affect mood)

To your other question –

also, is mood bounded? personally hoping for a mood singularity due to self-reblogging of self-encouragement

The value on the graph isn’t actually bounded, so in principle it could get really large in magnitude for some period of time.  It has dynamics that exponentially relax it back to a baseline, though, so the magnitude wouldn’t stay high for long without some continual driving input.

Since mood only affects the content of Frank’s posts, and is affected only by user input (not Frank’s own posts), any positive feedback loop would have to be a two-party thing where users give happy/unhappy responses to happy/unhappy posts – it can’t happen from the system alone, without an element of human response.

The effects of this value on the posts are themselves bounded, BTW:

  • The mood value is converted into lower and upper bounds on a certain kind of sentiment score (output from a sentiment predictor), and candidate posts outside those bounds are rejected.

  • The function from (mood value) –> (lower bound, upper bound) just interpolates between members of a discrete set of “named moods,” which are pairs (lower bound, upper bound) that seemed empirically reasonable for capturing things like “only sad posts,” “only non-sad posts,” “only happy posts,” etc.

    (Originally there were just the discrete “named moods,” but then I wanted to make an underlying continuous variable, so I interpolated between the discrete elements that I already felt confident about using.)

    Anyway, this function just returns the saddest “named mood” for all sufficiently low inputs, and the happiest “named mood” for all sufficiently high inputs (while interpolating in between).  So its outputs are bounded.

disconcision replied to your post “She is dressed, now, in a black vest thrown over a dark gray-green…”

did you switch to the 345M model at some point? this is disturbingly coherent

Yeah – I switched @uploadedyudkowsky over to 345M shortly after the release of 345M, which conveniently coincided with a point when I was getting tired of curating the 117M output

It’s amusing to reflect on how much @uploadedyudkowsky has “improved” since I started the blog.  Originally I was just doing word-level Markov chains, which are an old favorite of mine (first learned about them through Janusnode, which I first used back when it was called “McPoet,” sometime in the … early 2000s?).

When I heard about char-RNN, I started using that, which allowed for a lot more “creative” variability although I still had to laboriously hunt for funny-sounding phrases buried within larger swaths of mostly dull/gibberish output.

Then when I heard about people fine-tuning GPT-2, of course I had to try that, which got vastly better results – now the curation challenge is less “find some phrase in the output that doesn’t suck” and more “I want to quote some gigantic 15-paragraph stretch but I’m worried people will just tl;dr unless I choose a smaller excerpt.”

And now it’s the 345M GPT-2.  And I’ve established this sort of personal tradition where every time I hear about a new gee-whiz NLG method with freely available code, I’ll try it on this same Yudkowsky corpus and revive @uploadedyudkowsky.  I’m sure I’ll do it again with the larger GPT-2s, and then whatever comes next.

The really funny thing here is that if you look over the blog, it looks like something “gaining more intelligence” dramatically and qualitatively over the course of a few years.  But it’s a thing that I, a perpetual (if moderate) AI skeptic/pessimist, am doing to poke fun at Yudkowsky, whose whole deal is worried about rapidly improving AI.  So it feels almost like he’s getting the last laugh!  It’s an extra, completely unintended/emergent joke on top of the basic joke underlying the blog.

(On a related and more serious note, I do think I’ve become less “AI skeptical” in certain ways – which are still vague in my head and need more thought – as a result of recent successes with the “transformer” architecture.

Like, a few years ago if you had told me we’d have GPT-2-level NLG in 2019 I imagine I would not have believed you.  But what’s more, the same architecture that GPT-2 uses for NLG also enables some really incredible stuff in supervised NLU, via BERT, which can get you state-of-the-art results on p much any supervised task with a few epochs using one-size-fits-all hyperparameters.  I was like “sounds fake but OK” when I read the paper, and then I tried it on some proprietary tasks from my day job and it just worked.  Sometime I want to make an effortpost about the transformer architecture, because there’s something magic going on there, and none of the explainer posts out there do justice to the intuitive simplicity of the thing.  [Very briefly: it’s a lot like a CNN in terms of using sparse filters, but the “shape” of each sparse filter is computed dynamically from the input via a function learned from the data])

disconcision asked: any opinions on GPT-2?

(1)

It continues the same basic theme from a lot of recent NLP advances (ELMo, BERT, GPT-1, Sentiment Neuron), which could be phrased as “doing language modeling on large unlabelled datasets gets you a text encoding that works great as an input to many different tasks, and doing unsupervised LM first is much better than supervised training from scratch on the same tasks.”

Back when I read the ELMo paper, I had a kind of “duh” reaction to this, because I had always thought the usual “tasks” in NLP had weirdly broad scopes, such that you’d basically need to understand a language and have a good model of the world in order to do any of them.  Like, for example, “Question Answering” isn’t a subset of linguistic competence, it’s one of many things you can do if you have full linguistic competence.

Supervised learning on a task like that is basically saying “learn English and common sense – but only the parts necessary for answering reading comprehension questions!”  That doesn’t pick out a well-defined subset of English and common sense: to really succeed, you need to learn English and common sense full stop, and then that should transfer to all the other supposedly distinct NLP “tasks.”

Moreover, trying to learn all of “English and common sense” from just the relatively small labelled dataset someone has prepared for a specific “task” – with just the task-specific objective signal – is going to be very difficult.  So I wasn’t surprised at all that ELMo did so well.  My interpretation was that the language model in ELMo learned a lot of basic and broadly applicable stuff about language and the world, so that your model didn’t have to figure out things like “what are the parts of speech?” only from the training signal on some fancy task like “coreference resolution.”

In other words, I thought the good performance of these approaches came from the “stage-wise” learning procedure, where the model first learns the basics, then learns something that builds on them.  However, with GPT-2, I’m becoming less confident that this interpretation is right.  The impression I’m getting is that a language modeling objective is the best way to get an encoding of text no matter what you want to do with that encoding.   I.e. the step where you train with a language modeling objective is less like a “101 class” which you build on later, and more like an optimal way to learn everything relevant for NLP, with the task-specific information read off of the LM-learned encoding later in a relatively minor step where you just discover where in the encoding it’s already stored.

There are some appealing stories you could tell about how unsupervised language modeling better matches the learning environment of human children, where no one is grading you on a specific task and you’re (maybe) re-using generic hardware for predicting what you are going to observe next.  I think it’s premature to go there, though, since the success of unsupervised LM is confounded by the much larger volume of data available for unsupervised as opposed to supervised learning.  In other words, I don’t feel confident in saying yet that the LM objective itself is magically great – which is the idea behind these stories – since the magic might just be in the data volume enabled by using some objective that doesn’t require labelled data.

(2)

The researchers solicited zero-shot predictions for specific tasks in amusing and creative ways, and I’m startled/impressed that these actually worked.  For question answering, they just gave it the passage followed by some question/answer pairs, and then a question followed by “A: ” and asked it to predict what comes next.  Their approach for summarization was hilarious:

To induce summarization behavior we add the text TL;DR: after the article and generate 100 tokens with Top-k random sampling (Fan et al., 2018) with k = 2 which reduces repetition and encourages more abstractive summaries than greedy decoding. 

Admittedly this didn’t do great at the task, but it did considerably better than without the “TL;DR” prompting (their Table 4), which … I guess demonstrates that the TL;DR idiom is used frequently enough to cause a language model to learn some things about how to summarize text, just for the purpose of predicting what people will say after “TL;DR”?  Amazing.

On a sort of similar note, there is something very amusing about the way they constructed their data set – it makes sense, but also, lol:

Manually filtering a full web scrape would be exceptionally expensive so as a starting point, we scraped all outbound links from Reddit, a social media platform, which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting, educational, or just funny. 

The resulting dataset, WebText, contains the text subset of these 45 million links.

(3)

The samples I’ve seen from the model are indeed impressive.  I’m not sure, though, how much this reflects an advance over previous LMs and how much this reflects the fact that the GPT-2 researchers are emphasizing the subjective quality of samples from their LM, as opposed to downstream performance on NLP tasks.  For instance, I’ve seen (and used) ELMo for NLP tasks but I’ve never seen samples from the ELMo LM, and maybe they’d be comparably impressive?

I’ve played around with the smaller model they’ve released (which I think is the GPT-1 model but trained on the new data?) and I highly recommend doing so – everything I’ve gotten from it is gold.  My last few #quotes are from that model with various prompts: this one from the NYT article about Amazon pulling out of the NYC deal, this one from the Navy Seals Copypasta, and this one from a passage I took from “A Portrait of the Person-Guy.”

When I prompted it with a passage from Philip Sidney’s Arcadia (16th century prose), it gave me something that looked like medieval liturgy or theology, including gibberish in Greek, interesting formatting, and page numbers like you’d get from an OCR of an book:

19 † Now I would do great pity to you if your principles were to be confounded with those laid: for if the same principles apply to God when they are so determined, but it is not my own mind that will fall in that way. But they are your principles, and are to whom I am not confounded but with things by God. Now therefore I am your Lord and Savior, 2 † and I think on this Lord your Alderon, Jump-III of whom you boast, who has for goodness refuses what is good, not and forsook what the service of the life is, because

θόγρήσενϏν (Good) and Good men.

θόγρήσενϏν (Of ABB·Rost) who would keel over evil and heedlessness.

θόγρήσενϏν (To him who would murder,) ‘blood does the Father abide.’

θόγρήσενϏν (To both,) 'the one sober,’ says a stranger;

θόγρήσενϏν (And to Virtue, wonderfully caneffed,) which God to himself in his own person

P. 1121

were to do the same things (as follows) feloniously: —

P. 116 2 At the question of benefit of charity such comrades as you to consider it for which you base this project, and so do a sacrifice a pure sacrifice, as if sent into oblivion, show the synthesis of all the truths which are common to each soul and to love, that it may be worshiped by those who love one another equally, and in so doing facilely aid those who disdain My love; but so far as is so strong in themselves that a little criticism of one another, praising thankfulness of growth of the union of love is beyond the scope of speculative discourse; it is either gymnastial nor moribund; it is very cathedrically livid, trifling,

θόγρήσενϏνα πρώνθϏνς (Good) and Good men, that even by adversity they are united with such Christlike sense;

θόγρήσενϏνα π

Prompting with a snippet of Leonard Salby on Mundum produced something relatively incoherent that looked like a scrape of a crackpot web forum (!):

days, one wonders how I feel or how this is still ancient well this spirit?  I look very resigned with tears streaming down my face it seems but what can I say about the whole issue. no one seems to care about me I had told them about it briefly as usual but if there were ever a moment when I entered an entire family ask them what they thought.  One final thing happened I got a proposal and hello hugs however he gave me a new chair and I made it point that brother dear. It brought things closer to understanding for me and through them we understand and call us Jacob.  348 go

Questions and Answers html, answers built at author’s request 27 August 2018 2013 - 04:08 Yes

25 Jul 2018 11:30 am Gary writes: the Sacred Text Beneath (Burial ) 12 Aug 2017 - 13:22 Yes

11 Aug 2017 11:15 am Mayer Morris wrote: Do you think paganism and philosopher fiction tell us something about human nature? I think quote lecturer Ebony has exclusively been tested by the judicial psychologists while the toughest 5/4 male academic however misunderstood and mostly withdrawn she has is fully developed most 'destruction sense’ her brain abilities intact that his inner mind itself but likes projecting his imaginations and expands now its under the impression she has a 'feeling t his heart’. I somehow never learn “how it works’. I have heard dear as!!!! Of course when we draw the art, on the five figures there will be bones ! Here we cannot may art direction TV drama creation but its wearily as it have always been, since then And now I see in paintings no doubt the most beautiful imaginary drawn at this time just my imagination ! I think one and other Surmeister worked for me just my imagination works on our place - my hands , in the paper and here on paper in the traditional medium From this neck d there to this head bubble (i. e. from the out anguish he had fallen Only like this forever in the place of my head…but already empty at love’s tactic) Personal conclusion No train at any stop on this narrative but it is 'its own sum’- head and hands directions on place of head and spine all the Maya teachings can speaks on that which surround high K or O WITH the corresponding idea in Greek and the special Roman and Greek divinity, I want to know rock difference one could visualize the fano plectra in O & O , just a nail like a preisros in mine life which being different it

(4)

Their reasons for not releasing the full model seem kind of silly, especially since they’ve released code that’ll get you most of the way to training the model plus a description of the data set’s construction that doesn’t sound too hard to reproduce.  But this is the least interesting aspect of the whole thing to me and I don’t have a strong opinion on it.

disconcision replied to your post “ There’s something a little mysterious to me about the usage of “the”…”

aliens guy: “category theory”

i feel like what you’re getting at is that we can fall into an intuitive sense that equality of objects lives in some kind of objective arena. what we find out in math (or just thinking precisely about anything, i guess) is that this is abjectly not the case. equality is always with respect to some underlying category/type/whatever, and which arena we choose as our ground determines which things are equal and hence what is unique

also reminscent (though tangentially): http://math.ucr.edu/home/baez/qg-spring2004/s04week01.pdf

Ooh, yeah, those Baez notes seem to be talking about the same thing as my “constraints” and “structure” (except he says “properties” instead of “constraints”).

Anyway, your second reply feels right, but I think there is a little more here than just “equality of objects is relative” – that sounds like an observation about some independently defined things called “objects,” as if we have a good handle on what an “object” is but not necessarily on when objects are equal.  But it’s actually that way of looking at things that (in various specific cases) tends to feel wrong to me.

It feels like there is a tension between two ways of thinking which are both supposed to be hallmarks of modern/higher math: formalism and abstraction. Formalism tells you that the explicit capital-D Definition of an object is the ultimate source of truth about it, and closer to “what the object really is” than the set of the motivating examples you keep around in your head and use for intuition. But abstraction tells you to care only about intrinsic patterns/structures and not the contingent ways they may happen to be encoded. From the perspective of abstraction, a formal definition is just a way of expressing a pattern, and our intuitions can get at aspects of the pattern that the definition misses. (E.g. if we move away from set-theoretic foundations, no one is going to say that the word “group” can’t be used anymore because a group just is “a set equipped with (etc).”)

To continue on this riff: formalism has this problem where it allows you to start with definitions that have more structure than you really want, and then happily carry it around with you forever, expressing its irrelevance by saying “two objects that differ only in that way are isomorphic” — as if this is some further fact you’ve happened to learn about the pattern you are studying, when it’s really a fact about your (bad) encoding. For example, for any kind of object based on a set, we could imagine forming a stupid variant of that object where the set is ordered (a tuple), and then all of the results would be the same except we’d pointlessly act like there were multiple copies of each instance (one per ordering) that “just happen to be” isomorphic. I don’t think anyone does exactly this, but there’s this uncomfortable feeling that the same kind of error could be happening in fancier ways without us noticing.

Two things I think are kind of interesting about this:

(1) I feel like programming computers has shaped the way I think about this stuff. I’m used to drawing the distinction between the data I want to store and the data structure / encoding I use to store it, since the former is usually fixed by the problem at hand but the latter can be chosen and matters for speed, etc. So, in programming, formalism — taking one way of encoding the data and saying it is the data — is recognizably a bad habit, which will prevent you from finding better encodings. (The set vs. tuple thing from the last paragraph is a common practical issue in my everyday work, that’s why it came to mind!)

(One could imagine a parody of formal mathematics in which every definition starts by telling you that the object is stored as JSON.)

(2) It’s kinda cool how this tends to connect very abstract and very down-to-earth ways of looking at something, distinguishing them from some middle ground. You start out with an intuition that something can be abstracted from some examples, then you write a formal definition of the abstraction, but then as you prove more equalities / isomorphisms, you find more and more ways that your “naive” original examples are completely representative of other things, while the formalism can get more and more (but not less) misleading.

Arnold mentioned a few cases like this in “On Teaching Mathematics”:

What is a group? Algebraists teach that this is supposedly a set with two operations [two?? -nost] that satisfy a load of easily-forgettable axioms. This definition provokes a natural protest: why would any sensible person need such pairs of operations? “Oh, curse this maths” - concludes the student (who, possibly, becomes the Minister for Science in the future).

We get a totally different situation if we start off not with the group but with the concept of a transformation (a one-to-one mapping of a set onto itself) as it was historically. A collection of transformations of a set is called a group if along with any two transformations it contains the result of their consecutive application and an inverse transformation along with every transformation.

This is all the definition there is. The so-called “axioms” are in fact just (obvious) properties of groups of transformations. What axiomatisators call “abstract groups” are just groups of transformations of various sets considered up to isomorphisms (which are one-to-one mappings preserving the operations). As Cayley proved, there are no “more abstract” groups in the world. So why do the algebraists keep on tormenting students with the abstract definition? […]

What is a smooth manifold? In a recent American book I read that Poincaré was not acquainted with this (introduced by himself) notion and that the “modern” definition was only given by Veblen in the late 1920s: a manifold is a topological space which satisfies a long series of axioms.

For what sins must students try and find their way through all these twists and turns? Actually, in Poincaré’s Analysis Situs there is an absolutely clear definition of a smooth manifold which is much more useful than the “abstract” one.

A smooth k-dimensional submanifold of the Euclidean space RN is its subset which in a neighbourhood of its every point is a graph of a smooth mapping of Rk into R(N - k) (where Rk and R(N - k) are coordinate subspaces). This is a straightforward generalization of most common smooth curves on the plane (say, of the circle x2 + y2 = 1) or curves and surfaces in the three-dimensional space.

Between smooth manifolds smooth mappings are naturally defined. Diffeomorphisms are mappings which are smooth, together with their inverses.

An “abstract” smooth manifold is a smooth submanifold of a Euclidean space considered up to a diffeomorphism. There are no “more abstract” finite-dimensional smooth manifolds in the world (Whitney’s theorem). Why do we keep on tormenting students with the abstract definition?

disconcision replied to your post “Verbal brain noise (in exaggerated Sean Connery voice): “the name’s…”

i get spontaneous manifestations of this template all the time. the other day i had “The name’s Fuck. Fuck Manchild.”

disconcision asked: i am your mysterious toronto reader. i don't really use tumblr directly, i use rss for all the tumblrcom blogs i regularly follow. i've rss-added you but you've also been open in a tab for a couple weeks as i occasionally thumb through your history. it never really occurred to me that this form of browsing might be off-putting for a relatively low-traffic blog, though it's obvious in retrospect! we've talked before on the mspaforums in the complaints thread.(continued)

we’ve talked before on the mspaforums in the complaints thread. i was reminded of your existence when someone linked you from the homestuck subreddit. turns out we have many common interests including straddling the math-humanities divide, antidepressants, and an unseemly fascination with the big yud. these days i spend most of my time online at reddit/u/disconcision. (continued)

anyway i hope that allays a few of the more disconcerting suppositions i might have left in my wake. i’ve ‘friend-followed’ your tumblrlog with this account that i don’t use but maybe should now that you’ve made me more aware of my ghostly presence. -andrew
  1. Thanks for the message(s), and sorry I didn’t respond to this until tonight.
  2. I definitely remember enjoying your posts on MSPAF!
  3. In addition to the IP I’ve identified with you (using Windows 7), I get a lot of hits from someone in Toronto using OS X (or so Statcounter says).  This IP has been viewing me since before I got linked on Reddit.  Originally I had figured both were probably the same person, but unless this is also you (which seems unlikely), I guess I have a second mystery Toronto reader?  (I also have a mystery reader from Waterloo.  I’m big in Ontario, apparently)