Chapter 8 of Almost Nowhere is up here

Chapter 8 of Almost Nowhere is up here
The only thing lacking is some explanation of how someone from medieval Europe convinced the Chinese to create a fake dynasty complete with bogus archives.
I was reading a paper (“Information-Based Clustering”) and got confused, and it turned out I was confused because I had an intuition about information theory that wasn’t true. So I figured I would post about it here in case anyone else has the same intuition.
The paper is about clustering. It starts out with a formulation where you have some similarity measure for the N points you want to cluster, and you want to put them in a fixed number of clusters, N_c.
They write down the objective function I’m used to: the average within-cluster similarity. But instead of just maximizing that, they say they want to maximize it subject to a constraint, eq. 3 in their paper. This is the mutual information between the point labels i and the cluster labels C, which they’ve re-written so it looks like a K-L divergence between p(C|i) and p©.
Now, to be honest, I’m still not entirely sure what we get out of this term. I think what it does is encourage imbalanced clusters, so if we’re asked which cluster a point belongs to, we can express it concisely (with a prefix code where the more common clusters get shorter code words), but still I’m not clear on why that’s desirable. I think the idea is that we can let the number of clusters get very large, and still get readable results, with a few big clusters and then some smaller ones capturing fine-scale stuff. (It also does something to the way we assign fuzzy class probabilities to each point, I guess.)
Forget about that, though. What confused me originally was that this measure of information doesn’t know how close together our points are. That’s all in the similarity measure, which we’re balancing against this information thing.
That means that a clustering rule that seems really simple, like “all points with x>0 are red, all other points are blue,” could have the same amount of information (or more) as one that seems really complicated, with lots of wiggly boundaries. I.e. whatever this is capturing, it isn’t how much information it would take to specify the clustering rule.
Thinking about this led me to realize that information theory per se is entirely about measure-related stuff, and only knows about metric stuff if you put that stuff in yourself, as a constraint.
The usual derivations of Shannon entropy, etc., start out with discrete distributions, where you have a bunch of discrete points that have nothing to do with one another. They have labels, but the labels are arbitrary, and you can permute them without changing anything.
Then, if you want, you can move to continuous distributions, although now there are subtleties (the differential entropy isn’t coordinate-independent, so you have to move to K-L divergence w/r/t some background measure that can stretch when you change the coordinates). Now, suddenly, we’re talking about functions on R^N, where we’re using to there being a metric.
But all of the information theory stuff has been carried over from the context of discrete, unrelated points, so if there’s a metric, it knows nothing about it. What it knows about are measurable sets, the equivalent here of the discrete points. But you can “permute the labels” of these sets all you want, without anything changing.
Like, take a standard normal distribution, and move parts of it around. Take the part on [-1, 1] and exchange it with the part on [1000, 1001]. This is a totally different distribution, with way different moments, and it’s discontinuous now … but it has the same (differential) entropy. Because if you had to make an optimal code for these distributions, all that matters is how likely one piece is relative to another piece – doesn’t matter where they are in relation to the other pieces.
As I said, you can bring metric stuff into the picture if you want, as in the paper with its similarities, or in maximum entropy where you fix things like the mean and standard deviation. But there’s something kind of weird about this to me. On the one hand, you’re saying that you care about metric stuff. On the other hand, you’re quantifying how efficiently you could code a sample from your distribution, using a measure of optimal coding efficiency that doesn’t know about distances.
But this does actually work, because once you write an objective function where distances matter, codes that exploit the distances start winning in your optimization problem over codes that don’t. What got me confused was that “exploiting distances” doesn’t actually let you use fewer bits per se – it just lets you convey distances better than another code using the same number of bits.
The paper cites something called rate-distortion theory, which seems to be the general theory of doing this. The idea is like: suppose I am using a lossy encoding, and I want to set a maximum on the squared error (“distortion”) between the reconstructed signal and the true signal. How many bits can I afford to throw away? And then you do this same kind of thing, where you minimize a mutual information with a constraint on the distortion (where the paper is basically minimizing distortion with a constraint on mutual information).
As a concrete example of “exploiting distances doesn’t actually let you use fewer bits per se,” consider two distributions over x ∈ [0, 1000]. Distribution A is uniform over [0, 10] and then drops to zero for x > 10. Distribution B splits this into two far-apart bins: one uniform over [0, 5] and one uniform over [995, 1000]. These have the same differential entropy (we can turn one into another by “permuting labels”).
But if we want to minimize distortion for distribution B, our code should start out with a symbol saying which bin you’re in, and then refining from there. If our code corresponds to a continuous distribution (with less entropy than B, so it’s compressed), we’ll probably want it to be bimodal; as the entropy gets lower and lower it would tend to a Dirac delta in the center of each bin. We wouldn’t have the same bimodality if we were trying to code A, I think (although I haven’t actually done the problem – I don’t even know if it has a solution!). The point is, the distances tell us what shape our code should have, and then the information measure tells us what the cheapest code with that shape is, even though it doesn’t care about shapes per se.
I’m sorry but is the conflict theory / mistake theory post a really vicious satire that some troll uploaded after hacking slatestarcodex, or
elaborately coining new terms for obvious concepts, establishing a new binary with which to categorise all discourse for gods sakes, this bit:
But overall I’m less sure of myself than before and think this deserves more treatment as a hard case that needs to be argued in more specific situations. Certainly “everyone in government is already a good person, and just has to be convinced of the right facts” is looking less plausible these days.
I mean
Okay, I’m going to be kind of a jerk here, but I think this is important.
Here are some comments I got on the post. Each blockquote is from a different person:
I never thought about this consciously and I think it is an enormously useful concept. Like, easily top 5 among the posts of this blog in terms of making me go ‘So much makes sense now that didn’t 30 minutes ago.’ And that is not light praise. I am in the process of sending this to about eight people, most with some personalized variation of ‘Remember [that half-articulated idea I tried to explain last week/that fight we had a month ago/that 'why is the world the way it is’ conversation from some point last year]? Reading this made it click for the first time.’ Ya done good, Scott.
This is a really interesting post that I think crystallizes a number of similar dynamics for me.
I have a friend who is clearly conflict, and I realize now I was dimly aware of this axis, and that I’ve been trying to convince him in arguments using the mistake approach for years until I gave up because I realized he just resets. It’s useful to have this crystalized in such a concise form.
I think this is a really useful axis of distinction, and not one I’ve seen articulated well before…this is yet another obviously insightful sketch of a concept I wouldn’t have gotten a lasso around on my own, so thank you for that, Scott.
Amazing post, and definitely something I had never explicitly considered before. In fact, I made an account just to comment on this.
First : thank you for this post, it makes me see lot of conflicting views way more clearily. I think you make a mistake in seeing mistake vs conflict as rational vs emotional though ; to me is it more technocrat vs politics, or global optimum vs Pareto front.
I really appreciate this post. I wrote my masters thesis a while back about different kinds of writing “from persecution”, i.e. writing about the conflict between you and The Man or whatever. The two archetypal modes I discussed I called “complaint” and “refusal”. Complaint would stress common ground, i.e. “We have a disagreement but we can potentially solve it if I lay out my perspective and you respond.” Meanwhile refusal would stress difference rather than commonality, i.e. “We are never going to see eye-to-eye on this, and my act of writing is more about self-expression than any submissive attempt to convince you.” I think these concepts map pretty reasonably onto mistake theory and conflict theory respectively
I really liked this post, these are some really interesting observations. I used to be a daily reader of your blog, but stopped for reasons I’ll explain below. I wouldn’t call myself a Marxist or a socialist but I’m certainly further left than most of your audience, I think. Hard Mistake Theorist probably describes me best, although I’d hesitate to label myself that way. I do agree with a lot of what you write here generally, and find your opinions and observations insightful.
This post shows some major intellectual growth, and I’m glad to see that you’re finally coming to understand where a lot of the people from outside neoliberal technocracy world are coming from.
I hadn’t thought of this before, but I think this likely cleaves reality along the joints. Good post, thanks for writing.
Great, thoughtful, post. I haven’t commented in awhile and this motivated me to do it. While I certainly fall more on the mistake theorist side, I do think conflict theorists get something fundamentally right about power relations.
As I see many others saying, I’m a mistake theorist who never really considered the dichotomy, and thought conflict theory was either an easy mistake or badly-phrased mistake theory. (That’s my poll answer, rest of comment not directly related)
Fascinating post about the difference between the mistake and conflict theorists. This is one of those things that is perfectly obvious once pointed out, but until then is really easy to miss.
I think the world view insight on the Mistake/Conflict axis is great! It clarifies a lot of discussions. Like a lot of things, it is easier to describe the Mistake/Conflict differences by pointing to people who are very strongly one side or the other of the divide. But it seems to me that lots of people are mixed in the approach they prefer (though I would tend to guess that more people are more conflict oriented).
The distinction between conflict and mistake theorists never occurred to me before I read this post. If I classify myself, I am way far on the mistake theorist side. When I was much younger, and knew less about anything in general, I was more of a conflict theorist. A big turning point for me was reading the books The Dictator’s Handbook and The 48 Laws of Power, as well as listening to The Great Courses lectures “Thinking Like an Economist: A Guide to Rational Decision Making.” Before then, I had no real grasp of incentives, margins, or how politicians end up motivated to do whatever they do. Now I have a much clearer sense of the gears that make previously incomprehensible things
I never thought about this consciously and I think I’ve just been misunderstanding other people as behaving inexplicably badly my whole life.
This is a good post, and I’ve not thought about it in a named way before. I think it’s a good basis for building more complex models of situational conflict vs mistake dynamics with.
This almost feels like an early birthday present. FINALLY!
Very good, I found the distinction between mistake and conflict oriented perspectives enlightening, and will gladly add it to my conceptual toolbox.
Also, the post got over a thousand notes on Facebook, made it onto the front page of Hacker News and Pinboard Popular, was tweeted by journalists from BBC, The Atlantic, and The Financial Times, and got about 50,000 views (way more than average for an SSC post).
I’m not saying this to boast, and I’m not saying it because I think popularity alone proves anything. I’m saying this to try to make the claim that what’s obvious to you isn’t always obvious to everyone else.
Like, call my posts wrong, call them stupid, point out problems with their arguments, or whatever. But every time I post something that some people email me to say changed their life, I get a bunch of other people emailing me to say “Why are you wasting everyone’s time posting obvious things?”, “How do you think anyone could be so stupid not to know this?”, “Why are you wasting time rediscovering the wheel?”.
The way people parse concepts is really confusing and hard to communicate. For every concept that you 100% have and think is obvious, there are a whole bunch of people missing it.
Also, the way hard concepts hide is by disguising themselves as much easier concepts, so if you’re trying to explain a hard concept, anyone who doesn’t get it accuses you of mouthing platitudes and makes fun of you. I talk about this more in Concept Shaped Holes Can Be Impossible To Notice.
Also, I think people vary in the fundamental machinery of how they process concepts. For some people (and I think I’m in this group) it’s useful to first present an oversimplified black-and-white binary version so that you know what the space you’re in looks like, and then it’s easy to make the model more complex from there. My guess is that some people do it completely differently and don’t find that at all helpful. And maybe to them it looks like people trying to explain concepts are just stupid and black-and-white about everything.
But it really hurts to have to deal with people telling me how obvious I am all the time. One reason I put off writing this article for so long is that I knew r/SneerClub would make exactly the thread they made about it, and various people would helpfully message me about it in case I didn’t know how ashamed I should be, and I wanted to wait until I was ready to deal with all of it.
I wish people could just say “Huh, this was obvious to me, but apparently not to some other people, maybe they know less than I do, guess I’ll let them learn it.”
FWIW, my first reaction to the post was “oh, this is obvious” followed by “no, that’s not the right phrase for the reaction I’m having, but I don’t know what the right phrase is.” It’s tempting to round off this objection to “this is obvious” simply because it’s a little hard to articulate what it actually is, so I suspect it gets rounded off to “this is obvious” by other people too.
If I try to articulate it, though, it’s something like this:
When an essay presents a new idea, the reader doesn’t usually come away feeling like the idea was completely novel and unfamiliar. Usually it feels more like the essay took some patterns they had already noticed in some way, perhaps faintly, and amplified those weak signals in their mind. (Possibly also synthesizing, generalizing, and/or clarifying them.)
The world is full of patterns, and we are always making fleeting note of observed patterns – “oh, X keeps happening to me, maybe that’s important” or “huh, that’s the third time I’ve seen Y cause Z.” Often these don’t hold up after further observations, or just don’t seem useful to think about, and we either forget about them completely or (often) commit them to that long-term scratchpad of things you occasionally think about in idle moments but otherwise ignore.
Usually when I read an essay and come away with a new concept, it feels like someone has reminded me of some of these noticed associations, drawn links between several of them, and made a case that this group of associations deserves to be elevated above all the others on the scratchpad, as something worthy of a concept with a name. It’s rare for an essay to make the full leap from individual examples to a big idea without relying on me to do have done any intermediate legwork already – noticing sub-patterns above the individual example level, so that I can look at the big idea and say “that feels important,” deriving the feeling from many bits of prior noticing. If I haven’t done any such legwork, then the writer has to get me all the way from isolated data points to a big idea, something that is really the domain of empirical research papers, not essays.
There are various ways this can go right. Sometimes there’s a single pattern I’ve already noticed, but which deserves amplification. Sometimes I’ve noticed some sub-patterns but haven’t seen the bigger pattern they are part of. And it can go wrong in the obvious way, if the proposed pattern just doesn’t fit my observations.
But it can also go wrong even if I agree that the proposed pattern is there. I see patterns all the time – it’s how I’m able to think at all, by constantly generating little hypotheses moment to moment. Not every one of these deserves amplification, though. Even the ones that are true, that name some actual regularity in the world, may not be important or useful enough to deserve promotion from the mental scratchboard to the tier of persisting concepts with names.
“This is obvious” is, I think, often about this. The implication is, “the author thinks that this has not been written about because it has not been noticed. The author overestimates their superiority in noticing. We have all noticed it, but we didn’t write about it because it didn’t deserve amplification.”
To get back to the topic at hand: a lot of your writing sounds like it is trying to convince the reader of the importance of a proposed concept. Often the concepts are given names, which is one way of helping a concept move up from the scratchpad and acquire a more solid and persistent existence. Motte/Bailey (I realize you didn’t invent those terms, but most readers hadn’t heard of them); the “Tribes” as more ingroups/outgroups distinct from named political or cultural groups; toxoplasma; Moloch. (I feel like I had more examples and I’m forgetting some of them.)
Often, when I read one of these posts, I have the pleasant feeling that someone has taken some things already on my mental scratchpad, shown me how they fit together, and convinced me that they deserve more attention than I’ve given them. With something like the Conflict Theory post, though, I feel like I’ve seen the pattern, but I haven’t really made much of it, and I still don’t see why I ought to.
People who had only seen some of the sub-patterns, and not the full thing, might experience it as a revelatory work of conceptual integration. But since these two functions – articulating a pattern, and arguing for its importance – are bound together in this kind of essay, the reader responses are a confusing mixture of “did I see a new pattern?” and “do I agree this pattern is important?“
(This is much less clear than I would like, so apologies if it doesn’t make any sense)
pulled this up for ${project} and realized some of my mutuals might not be aware of it, though they totally should
(via youarenotthewalrus)
Amanda hadn’t experienced a scene even remotely similar since her teen bootwoman years as a front fox for the all-girl cult band Angry Woman Cleaning House.
In the Kingdom of Javaland, where King Java rules with a silicon fist, people aren’t allowed to think the way you and I do. In Javaland, you see, nouns are very important, by order of the King himself. Nouns are the most important citizens in the Kingdom. They parade around looking distinguished in their showy finery, which is provided by the Adjectives, who are quite relieved at their lot in life. The Adjectives are nowhere near as high-class as the Nouns, but they consider themselves quite lucky that they weren’t born Verbs.
Because the Verb citizens in this Kingdom have it very, very bad.
In Javaland, by King Java’s royal decree, Verbs are owned by Nouns. But they’re not mere pets; no, Verbs in Javaland perform all the chores and manual labor in the entire kingdom. They are, in effect, the kingdom’s slaves, or at very least the serfs and indentured servants. The residents of Javaland are quite content with this situation, and are indeed scarcely aware that things could be any different.
Verbs in Javaland are responsible for all the work, but as they are held in contempt by all, no Verb is ever permitted to wander about freely. If a Verb is to be seen in public at all, it must be escorted at all times by a Noun.
Of course “escort”, being a Verb itself, is hardly allowed to run around naked; one must procure a VerbEscorter to facilitate the escorting. But what about “procure” and “facilitate?” As it happens, Facilitators and Procurers are both rather important Nouns whose job is is the chaperonement of the lowly Verbs “facilitate” and “procure”, via Facilitation and Procurement, respectively.
The King, consulting with the Sun God on the matter, has at times threatened to banish entirely all Verbs from the Kingdom of Java. If this should ever to come to pass, the inhabitants would surely need at least one Verb to do all the chores, and the King, who possesses a rather cruel sense of humor, has indicated that his choice would be most assuredly be “execute”.
The Verb “execute”, and its synonymous cousins “run”, “start”, “go”, “justDoIt”, “makeItSo”, and the like, can perform the work of any other Verb by replacing it with an appropriate Executioner and a call to execute(). Need to wait? Waiter.execute(). Brush your teeth? ToothBrusher(myTeeth).go(). Take out the garbage? TrashDisposalPlanExecutor.doIt(). No Verb is safe; all can be replaced by a Noun on the run.
In the more patriotic corners of Javaland, the Nouns have entirely ousted the Verbs. It may appear to casual inspection that there are still Verbs here and there, tilling the fields and emptying the chamber pots. But if one looks more closely, the secret is soon revealed: Nouns can rename their execute() Verb after themselves without changing its character in the slightest. When you observe the FieldTiller till(), the ChamberPotEmptier empty(), or the RegistrationManager register(), what you’re really seeing is one of the evil King’s army of executioners, masked in the clothes of its owner Noun.
(Steve Yegge, “Execution in the Kingdom of Nouns”)
jesus CHRIST this is funny
this was way, way too realistic
Buckle up, because it’s time for some Bayesian inference.
(via nuclearspaceheater)
Later - on June, 1855 - a Sunday Trading Bill was passed which, in the interests of keeping the lower classes sober, deprived them of their Sunday beer; and the common people of London congregated every Sunday in Hyde Park to the number of from a quarter to half a million and insubordinately howled “Go to church!” at the holiday-making toffs. Marx was ready to believe that it was “the beginning of the English revolution” and took himself so active a part in the demonstration that on one occasion he was nearly arrested and only escaped by entangling the policeman in one of his irresistible disputations. But the government gave the people back their beer, and nothing came of the agitation.
That doesn’t necessarily mean you need a treadmill desk, which some research suggests can hinder learning, attention and typing skills. Rather, find ways to move while you’re sitting and standing. On your feet, for example, use a foot rest to take the weight off of one foot and then the other. On your caboose, try reclining so your legs and torso form a 135-degree angle – the healthiest seated position, Tameling says. (That’s opposed to, say, sitting straight up at a 90-degree angle, or hunching forward.) “It might be making you look like you’re lazy, but you might be coming up with that next big idea.”
The rigid, neurotic, unhealthy 90-degree-pose Virgin and the relaxed, healthy, doesn’t-care-who-judges-him, disruptively innovative 135-degree-pose Chad