Install Theme

When the Culture War Comes for the Kids →

This is an odd article.  Despite the annoying (and probably editor-chosen) headline, it’s engrossing, entertaining to read, sometimes laugh-out-loud funny.  The quality of the prose is uncommonly good as newsmagazine articles go.  The tensions it talks about are recognizable and important, although I’m skeptical they’re really changed so much in recent years.

But … the author seems to be in a “fish can’t see water” situation with some aspects of the topic.  He talks about “meritocracy” as this thing he has has no particular faith in as a mechanism, but whose hoops he will cynically jump through if it’ll help his child.  Yet he seems to believe – as the meritocracy does, but I at least don’t – that the academic experiences of very young children are actually very important in themselves, and not just important for their value in a subsequent competition.

He depicts the NYC school system as a bizarre world where kids and parents apply for selective preschools as though they’re applying for a job or college, and his averred attitude is like, “it’s super weird that 5-year-olds these days effectively have ‘resumes,’ but if the system will punish my child for not doing things that ‘look good on his resume,’ as a parent I will help him do those things, I guess.”

And yet he cares so much about every little nuance of his son’s experience at school, and his hypothetical experience at schools they apply to and then turn down or are rejected from.  If he cared only about the meritocracy, he’d want to get his son into a school with a good enough name, and want him to get good grades and test scores, but he wouldn’t be concerned with the details of the assigned term projects, the educational philosophy (“the school’s pedagogy emphasized learning through doing”), the vegetable garden in the playground.  It isn’t the meritocracy that’s making him talk like this, even with tongue perhaps partially lodged in cheek:

The school had delicious attributes. Two teachers in each class of 15 children; parents who were concert pianists or playwrights, not just investment bankers; the prospect later on of classes in Latin, poetry writing, puppetry, math theory [sic], taught by passionate scholars. 

In short, he seems to have internalized the meritocracy’s notion that the school experiences of very young children have vast intrinsic significance.

It’s noteworthy that (unless I missed it) the article contains no discussion of the author’s own childhood.  His identity as a parent, and as someone interested in education, crowd out his identity as an adult who was once a child himself, with very different priorities.  Doesn’t he remember the childhood world where none of this mattered much, where homework was simply a chore, where the mind would frequently leave the teacher’s monologue behind to dwell on the sting of playground slights, the allure of new toys or daydream worlds?  Does he remember forgetting like 70% of what he “learned” within a year or two, or was that just me?

As far as I can remember, my early schooling was largely glorified daycare.  Things didn’t get “academically serious” until 4th grade, and that was only on accident: my teacher was self-serious and kind of a hardass, as dedicated to teaching me American history and civics as I was dedicated to ignoring them.  One of her more elaborate conceits, which the article’s author would have loved, involved role-playing a debate over some 19th-century political issue [no, I can’t remember which]; the roles assigned to us included Henry Clay and, inexplicably, Thomas Hobbes, whose views the class handout dubiously summarized by saying he “believed all men were born evil.”  (My father was very impressed: “I didn’t know who Thomas Hobbes was until college!”)

And things relaxed again in 5th grade.  I could not for the life of me tell you anything I learned in 5th grade, although I vaguely remember some sort of project about insects; the only crisp memory from that year was a creative writing project, for which I scrawled a 20-page comedy-fantasy epic entitled “Legend of the Cheese-It” [sic], and I probably would have done that in my free time anyway if not prompted.  I was in an accelerated program you had to test into, and yet as far as I can remember, many or even most of my classmates exhibited the same bored disengagement and idle whimsy.  I mean, we were kids.

And just as the author describes, there were political squabbles over the curriculum, and strange and misguided policies enforced with heavy hand by school administrators.  (The phrase “zero-tolerance” was wielded with zeal, to predictably perverse effect.)  None of this is new, and all of it only mattered up to a point.  We had inner and outer lives which the contrivances and absurdities of school unpleasantly interfered with but did not override in importance.  If there were attempts to indoctrinate us politically, they merely confused us, and it was only a few years later that we entered into political consciousness on our own terms – generally via the concept, irresistible to the 12-year-old mind, that “George W. Bush is dumb.”

Most of the successful people I know remember school like this, I think?  They do not trace their adult success back to the student-teacher ratio of their kindergartens, that’s for sure.  I suspect the author is living in more of a New York engaged-parent bubble than he realizes.

And, too, I suspect he’s conflating the very top “heights” of the meritocracy, the paths that go through Harvard and Yale, with the mainstream of the meritocracy, the choices that will get you a shot at the middle class if not quite as fancy a pedigree.  I’m sure the institutions with the highest-status names care about all sorts of odd things in their applicants, because there are so few slots for so many strivers, and they have to make distinctions somehow.  But by the very same token, those institutions are subject to pathologies that make them less desirable.  The brand name “Harvard” creates too many conflicting pressures for any one school to sustain; as just one example, superstar researchers must be hired to maintain the brand called “Harvard professor” within academia, but superstar researchers don’t makes the best teachers, which then dilutes the brand called “a Harvard education.”  Things considered the best to have on your resume, and not merely good, tend to distort under that strain.  The author seems to have observed this in action at the level of preschool (yes, preschool), kindergarten, middle school, but he hasn’t applied the lesson forward to the rest of his son’s future.

The result: the R70i Aging Experience, an aging-simulation suit that allows the wearer to experience firsthand the effects of vision and hearing loss, muscle deterioration and arthritis.

uploadedyudkowsky:

the end of a world…

…a world with no more black holes, and no more stars…

…no more stars and no more black holes…

…no more worlds…

…no more universes…

…no more time…

…no more free will…

…no more inner peace…

…no more understanding…

…no more love…

…no more laughter…

…no more life…

…no more death…

…no more…

“What?” said the boy who would never be named.

“Something along the lines of 30,000,000,000,000,000,000,000 years,” his mother said, “or a complete paradox in the meaning of time. Just remember that, so you don’t start making too much sense of it, and that no one’s ever claimed that the gods were stupid.”

“Thank you, Miss Weasley,” said the old wizard. “I don’t think you’ve let me down yet. I must continue on with this endeavor. Now…” The old wizard’s voice trailed off. “Please.”

“Is this a message to the people of Earth?” the boy said, his voice level.

The old wizard’s face was impassive. “This is my final message to you. And I am sorry.”

The boy stared at the old wizard, expressionless.

“Please,” the boy said. “I don’t think you’ve let me down yet. Please, just tell me how to get back to Earth.”

“There is no Earth,” said the old wizard. “There never was.”

“That… doesn’t seem very Hermione-ish,” the boy said slowly. “You wouldn’t have said that if you’d been Hermione.”

“So be it,” said the old wizard, and bowed low. “I will not speak of this again.”

The boy turned back to his mother. “Can I get you something?”

The woman’s face was unreadable. “Not yet.”

“Please,” said the boy. “I want to go home, I really do.”

“You won’t,” said the woman, “because you’re too young to fly now, and if you don’t want to, I won’t tell you. I don’t want you worrying about being disobedient to a witch who is older and wiser than you, and will make things harder if you try to be disobedient.”

The boy looked at her again.

And then he stood up, and turned to go out the door, and her hand reached out towards him, and the boy said no word, and his hands went on on their way.

furioustimemachinebarbarian asked: I think, but don't know for sure, that the reason variational Bayes methods look weird is that they were derived from physical principals following people like Jaynes. In practice, optimizing in variational Bayes looks like minimizing a free energy. The factorization over variables isn't generally true, but is likely physically true when your variables are the positions of a bunch of particles in thermodynamic equilibrium. It looks like a physics based method getting in over its head.

Ah! Yeah, that makes sense.

As it happens, the Gibbs distribution in stat. mech. used to confuse me too – it was clearly just wrong about some things, most obviously whether more than one value of the total energy is possible, and the sources I originally read about it did not clarify which calculations it was supposed to be valid for. And the confusing choice is the same one: replacing a distribution where variables “compete” with one where they’re independent, and then doing calculations on it as if it’s the original one.

But in stat. mech., you can go out and find rigorous arguments about why this calculation technique is valid and useful for specific things, like computing the marginal over M variables out of N when M<<N,  N –> ∞. By contrast, variational Bayes is presented as a way of getting an “approximate posterior,” which you then use for whatever calculations you wanted to do with the real posterior. Which allows for the sort of invalid calculations I used to worry about with Gibbs, like getting a nonzero number for var(E).

I suppose the Gibbs-valid calculations, of one or a few marginals from many variables, are what you want in statistics if you’re just trying to estimate the marginal for some especially interesting variable. Except… for any variable to be “especially interesting,” there must be something special about it that breaks the symmetry with the many others, which prevents the standard Gibbs argument from working. To put it another way, Gibbs tells you about what one variable does when there are very many variables and they’re all copies of each other, but a model like that in statistics won’t assign interesting interpretations to any given variable. It’s only in physics that you get collections of 10^23 identical things that you believe individually, actually exist as objects of potential interest.

It doesn’t mention the word “variational,” but Shalizi’s notebook page about MaxEnt is about exactly this issue, and it was very helpful to me many years ago when I was trying to understand Gibbs and various non-textbook uses of it.

There’s something that seems really weird to me about the technique called “variational Bayes.”

(It also goes by various other names, like “variational inference with a (naive) mean-field family.”  Technically it’s still “variational” and “Bayes” whether or not you’re making the mean-field assumption, but the specific phrase “variational Bayes” is apparently associated with the mean-field assumption in the lingo, cf. Wainwright and Jordan 2008 p. 160.)

Okay, so, “variational” Bayesian inference is a type of method for approximately calculating your posterior from the prior and observations.  There are lots of methods for approximate posterior calculation, because nontrivial posteriors are generally impossible to calculate exactly.  This is what a mathematician or statistician is probably doing if they say they study “Bayesian inference.”

In the variational methods, the approximation is done as follows.  Instead of looking for the exact posterior, which could be any probability distribution, you agree to look within a restricted set of distributions you’ve chosen to be easy to work with.  This is called the “variational family.”

Then you optimize within this set, trying to pick the one that best fits the exact posterior.  Since you don’t know the exact posterior, this is a little tricky, but it turns out you can calculate a specific lower bound (cutely named ELBO) on the quality of the fit without actually knowing the value you’re fitting to.  So you maximize this lower bound within the family, and hope that gets you the best approximation available in the family.  (”Hope” because this is not guaranteed – it’s just a bound, and it’s possible for the bound to go up while the fit goes down, provided the bound isn’t too tight.  That’s one of the weird and worrisome things about variational inference, but it’s not the one I’m here to talk about.)

The variational family is up to you.  There don’t seem to be many proofs about which sorts of variational families are “good enough” to approximate the posterior in a given type of problem.  Instead it’s more heuristic, with people choosing families that are “nice” and convenient to optimize and then hoping it works out.

This is another weird thing about variational inference: there are (almost) arbitrarily bad approximations that still count as “correctly” doing variational inference, just with a bad variational family.  But since the theory doesn’t tell you how to pick a good variational family – that’s done heuristically – the theory itself doesn’t give you any general bounds on how badly you can do when using it.

In practice, the most common sort of variational family, the one that gets called “variational Bayes,” is a so-called “mean field” or “naive mean field” family.  This is a family of distributions with an independence property.  Specifically, if your posterior is a distribution over variables z_1, …, z_N, then a mean-field posterior will be a product of marginal distributions p_1(z_1), …, p_N(z_N).  So your approximate posterior will treat all the variables as unrelated: it thinks the posterior probability of, say, “z_1 > 0.3″ is the same no matter the value of z_2, or z_3, etc.

This just seems wrong.  Statistical models of the world generally don’t have independent posteriors (I think?), and for an important reason.  Generally the different variables you want to estimate in a model – say coefficients in a regression, or latent variable values in a graphical model – correspond to different causal pathways, or more generally different explanations of the same observations, and this puts them in competition.

You’d expect a sort of antisymmetry here, rather than independence: if one variable changes then the others have to change too to maintain the same output, and they’ll change in the “opposite direction,” with respect to how they affect that output.  In an unbiased regression with two positive variables, if the coefficient for z_1 goes up then the coefficient for z_2 should go down; you can explain the data with one raised and the other lowered, or vice versa, but not with both raised or lowered.

This figure from Blei et al shows what variational Bayes does in this kind of case:

image

The objective function for variational inference heavily penalizes making things likely in the approximation if they’re not likely in the exact posterior, and doesn’t care as much about the reverse.  (It’s a KL divergence – and yes you can also do the flipped version, that’s something else called “expectation propagation”).

An independent distribution can’t make “high x_1, high_2″ likely without also making “high x_1, low x_2″ likely.  So it can’t put mass in the corners of the oval without also putting mass in really unlikely places (the unoccupied corners).  Thus it just squashes into the middle.

People talk about this as “variational Bayes underestimating the variance.”  And, yeah, it definitely does that.  But more fundamentally, it doesn’t just underestimate the variance of each variable, it also completely misses the competition between variables in model space.  It can’t capture any of the models that explain the data mostly with one variable and not another, even though these models are as likely as any.  Isn’t this a huge problem?  Doesn’t it kind of miss the point of statistical modeling?

(And it’s especially bad in cases like neural nets, where your variables have permutation symmetries.  What people call “variational Bayesian neural nets” is basically ordinary neural net fitting to find some local critical point, and placing a little blob of variation around that one critical point.  It’s nothing like a real ensemble, it’s just one member of an ensemble but smeared out a little.)

from BASIC to the fae realm, and back again

When I was a kid, I learned to program in a few different flavors of BASIC.  When I was a teenager, I tried to graduate from BASIC, which I knew was a sort of training-wheels language, to a “real” language, which at the time seemed to mean C, C++, or maybe Java.

This was a confusing and difficult process.  The first time, at age 13, I tried to learn C++ by reading one of those “Learn [language] in [X days]” books.  I read all the way through the book, but found most of it baffling, and never actually sat down to write any C++.  Then later, at 17, I said “I am going to learn C, dammit,” Googled my way to the About.com tutorial on C, and actually started programming in C.

I was able to use the language well enough to write some things like a simple raytracer, and I found the process entertaining, but mostly as a demonstration of will and dedication; I still found C itself pretty bizarre, and I felt proud of myself for managing to get things done using these strange alien invocations, rather than really grokking the language and the demands it made of me. My code was probably exquisitely terrible, although it did compile and run.

I gather this is a pretty typical experience from that time period. What’s funny, looking back, is that the “serious, adult programming” (i.e. programming for $) I currently do feels much closer to the Eden of BASIC than C did.

This isn’t just a high-level vs. low-level thing.  I think it’s more fundamentally about clarity of abstraction boundaries.

————

A high-level language like Python tries hard to put you in an abstract world of entities that feel stable and part of the same world, without the lower-level implementation poking its head through the cracks.  But (as far as my limited understanding goes) this is also true of the lowest-level languages.  The user of assembly or byte code doesn’t engage directly with any specific physical machine; they use an abstract interface called the instruction set architecture which can be implemented in multiple ways, and they engage with abstract entities like “logical registers” that may get mapped to different physical registers on the fly.  The mapping is invisible, outside of your control – securely “lower-level” than the programmer’s world.

C is not like this.  It doesn’t define any stable world that makes sense on its own terms.  Or rather, the only stable world that exists in C exists on a lower level than the entities the language seems to be talking about, and its constructs only make full sense as convenient and elaborate macros for common lower-level operations.  You can only really get C when you’ve learned to view a, say, “int” as a collection of shorthands for things people often do with little groups of bytes in memory, and not as an abstract data type.

————

The clearest example for me is arrays.  Does C “have” arrays?  Well, it has an array type, and you can create an array in stack memory and it behaves the ways you expect.  (Indeed, C’s interface to stack memory looks deceptively like high-level programming, which was very confusing to me at first.)

All right, let’s pass our array to another function.  C passes by value, but you generally don’t want to pass arrays by value, and indeed, C does not do so.  Instead, if you try to pass an array, you get something that’s almost an array, but lacks a piece of data (the length).

What is this thing?  Well, it’s actually a reference – but not a reference to the array (which would be passing the array by reference), it’s a reference to the first element.  But if you try using array indexing like a[i] on it, it works as if you had the array itself.  Why?  Because the “array indexing” notation in C works on any object that’s a reference: it’s actually shorthand for getting the value out of a reference that is “i units away” from the original reference.

Why does the concept of “i units away from this reference” make sense in C?  After all, you wouldn’t look at a URL and say “now give me the one to its left.” It makes sense in C … because C’s implementation of a reference is something called a “pointer” which stores a location in an actual address space (the floor is leaking!), and C’s implementation of an array allocates memory so the individual objects are actually next to each other in the address space, as opposed to say a linked list (the floor is disintegrating!).

…and then that all applies only for stack memory.  Creating the same array/“array” in heap memory looks totally different, and involves importing functions from the standard library, and those functions don’t know how much memory the different C objects need, so to get an array with 50 ints you need to look up how big an int is and convert that to bytes.  If you do that right, you get another one of the pseudo-array things (first element references), but again the function doesn’t know anything about C types so you’ll need to convert it to an int reference.

(If you think about it, these are steps that also need to happen when you’re making a stack array, but in that case, this rote and predictable bundle of operations was conveniently automated for you.  Not here – now it’s your turn to do them.  It takes a village, it seems, to implement array functionality.)

————

Sometimes in C the floor can look solid.  If you have an object of type double, then it takes up the right amount of memory (whatever it happens to be) to represent a double-precision floating-point number, and if you use arithmetic operators on it, floating-point arithmetic happens.  And you can just trust it to happen without thinking about how a specific computer is doing it.  C connects you with an implementation of floating-point numbers, as a high-level language would understand that phrase.  But the presence of such complete implementations makes it, if anything, harder to work simultaneously with the incomplete ones.  You’re constantly asking yourself, “wait, what level am I on right now?”

Some of my confusions when I first tried to learn C/C++ can be read as confusions about what I should expect to be “implemented” for me, exacerbated by a conflation between “fully implemented for you” and “possible without vast effort.”  One of the later chapter of the C++ book was an extended walkthrough of making a linked list class.  “Why am I supposed to be excited?”, I thought.  “At the end of the day this is basically an array, and I already have those.”  Rather than showing off nontrivial things you could do with C++, the example was about a trivial thing you couldn’t do with C++ without adding it to the language by hand.

In retrospect, these languages don’t look any less strange.  Now I’m used to hearing everything framed in terms of “APIs,” “information hiding,” “loose coupling” – an aspiration to make the expectation of my young self, of a single stable floor extending everywhere, as close to a truth as possible.

deusvulture replied to your post

“nightpool replied to your photo “As someone who keeps running into…”

fwiw I feel like I pretty frequently see those kinds of errors in rtf and doc parsing (a profusion of Âs is the classic example)

although maybe that’s a different thing

The Â thing in particular is a symptom of reading text in one character encoding as if it’s in another character encoding, see here.

Character encoding problems are common, but they’re at a lower level than what I’m talking about – the level of “what sequence of characters do these bytes even represent” rather than “given this sequence of characters, which ones are special control sequences.”  So they have the potential to come up in any kind of text processing, even in the hypothetical utopia where I have my ideal tabular data file format.

The direct analogue to the problems I have with CSVs would be something like interpreting an RTF as having lots of bold text because I typed “\b” somewhere in it, or conversely interpreting actual bold text in an RTF as regular text surrounded with “\b” and related sequences.  Or the like with margin sizes, other layout information, etc.

nightpool

replied to your photo

“As someone who keeps running into data issues at work that can be…”

isn’t there just a setting you can change in google sheets for this? I know there’s one in Excel although it may have downsides (computer-level instead of worksheet-level, etc)

Yeah, I think so.  IIRC, the issues I’ve had with Google Sheets aren’t with “smart” and not-strictly-necessary features like date parsing, but with escapes for field and line delimiters not getting handled correctly at some point in a larger pipeline.

This isn’t really an issue with Google Sheets per se, it’s an issue with the relative lack of investment in good exchange formats, and trustworthy software, for tabular data.  It’s conventional to talk as though a “CSV file” is a well-defined thing, and to provide “CSV” import/export options that don’t tell you exactly what they do, even though there are numerous distinct CSV standards (and non-standard implementations).  Google Sheets’ CSV functionality usually works, but by that I mean “its assumptions about CSVs usually turn out to agree with those in the other software I’m plugging it into,” which is not the same thing as being trustworthy in a predicable way.

I’ve actually concluded it’s safer to use Excel’s file format (really) when moving tabular data through software with poorly documented file parsers, because there’s a kind of dogfooding that works in my favor: Excel has to handle delimiters at least consistently enough to load its own files after saving them.  But for just handing around a table of strings this feels a bit ridiculous.

Contrast this with the formats that exist for formatted, but generally non-tabular, text.  This situation there isn’t perfect either, with plenty of non-open formats and incompatible version formats, yet somehow I’m able to open RTF and DOC files without ever seeing formatting escapes interpreted as text or vice versa.  And that’s a much more complicated case, with a much wider range of features to support.

dragon-in-a-fez:

today I learned that an estimated 20% of genetics papers may have errors because Excel automatically converted the names of genes into calendar dates

image

As someone who keeps running into data issues at work that can be traced back to the data going through Google Sheets at some point, I’m disturbed but not surprised that this type of issue is causing widespread havoc in academia too. (And I’m pessimistic that switching to Google Sheets is a good general solution, although it may fix this specific problem with date parsing.)

Literally just yesterday I was thinking, in response to the latest such incident, that there’s an oddly unfilled need for a simple, accessible table editor that just renders text organized into rows and columns, nothing else, with no parsing except for cell separators (and with that very carefully specified and controllable).

It’s easy to imagine this being a standard tool that came with every major OS, the way they all still have simple GUI text editors even though “word processors” also exist. But we don’t live in that parallel universe, and in ours there is this very simple, very common, apparently unfulfilled need. (*Gestures suggestively in the direction of any bored software developers who happen to be reading this*)

(via afloweroutofstone)

Those of you who have read The Instructions (my favorite novel) may be interested to know that – at long last, after I’d kind of assumed he’d disappeared in some way, decided do something else with his life – Adam Levin has a new novel coming out next April.  See here

Here’s the blurb:

The astonishing new novel by the NYPL Young Lions Fiction Award-winning author of The Instructions.

Bubblegum is set in an alternate present-day world in which the Internet does not exist, and has never existed. Rather, a wholly different species of interactive technology–a “flesh-and-bone robot” called the Curio–has dominated both the market and the cultural imagination since the late 1980s. Belt Magnet, who as a boy in greater Chicago became one of the lucky first adopters of a Curio, is now writing his memoir, and through it we follow a singular man out of sync with the harsh realities of a world he feels alien to, but must find a way to live in.
    At age thirty-eight, still living at home with his widowed father, Belt insulates himself from the awful and terrifying world outside by spending most of his time with books, his beloved Curio, and the voices in his head, which he isn’t entirely sure are in his head. After Belt’s father goes on a fishing excursion, a simple trip to the bank escalates into an epic saga that eventually forces Belt to confront the world he fears, as well as his estranged childhood friend Jonboat, the celebrity astronaut and billionaire.
    In Bubblegum, Adam Levin has crafted a profoundly hilarious, resonant, and monumental narrative about heartbreak, longing, art, and the search for belonging in an incompatible world. Bubblegum is a rare masterwork of provocative social (and self-) awareness and intimate emotional power.

Of the various author puff-quotes on the linked page, this one seems particularly and encouragingly on-brand:

“A book may be said to be a kind of fist, and the readers of such a fist-book as Bubblegum can surely not predict or prepare for the ecstatic bewilderment of the encounter, particularly when they are greeted in the depths of it by long-form theoretical analysis of their plight.”

Jesse Ball, author of Census