Install Theme

jadagul:

nostalgebraist:

jadagul:

nostalgebraist:

I took linear algebra as an undergrad and then took a slightly fancier version in my first year of grad school, and I understood all the “matrices <==> linear transformations” stuff, but I never really felt comfortable interpreting the actual entries of a matrix until my second year of grad school, when I learned the rule

the matrix-vector product A*v is a linear combination of the columns of A, with the coefficients given by the entries of v

I learned this from the excellent book Numerical Linear Algebra by Trefethen and Bau, and I don’t think I’ve ever heard it mentioned by anyone else outside of that book.  Yet it’s been invaluable to me, and not just for numerics.  Did I just miss out, or is this simple fact not disseminated widely enough?

The difficult thing about teaching linear algebra (he says, procrastinating from writing the last week of notes for the linear algebra class he is teaching) is that the entire subject is, like, four actual facts, each of which is repeated twenty times in slightly different language.

And you have a great example! We could talk about:

  1. A linear transformation, as a function with certain properties.

  2. Matrix mutiplication

  3. A system of linear equations

  4. A collection of dot products with the row vectors

  5. A linear combination of column vectors

  6. A hyperplane in some higher-dimensional space

  7. A semi-rigid geometric transformation of some space.

  8. A function determined entirely by what it does to some basis.

And those are all the same thing. I think typically students coming out of a (first) linear algebra class understand and have internalized a couple of those; can cite a couple others; and are completely oblivious to the rest. (Any may not have heard of some, because it’s hard to cover all eight; I know that my discussion of the geometric properties has been somewhat perfunctory.

But for any given person, some of these perspectives will make much more sense than others; and if your class doesn’t get you to the ones that work for you, you won’t understand nearly as much as if it does.

(The goal, of course, is to understand all of the perspectives, and to switch among them fluently, but that’s hard and definitely not happening in a first course. So you have to pick your focuses. The reason I was so unhappy with my college’s choice of textbook is that its focus is exacty the opposite of what I would like).

So, for instance, you say that the matrix product is a linear combination of the columns of the matrix, with coefficients given by the input vector. And you say that, and I think for a few seconds and say “huh, I guess that’s true.” But that’s not how I think about it; I think about it as a function that sends each standard basis element to the corresponding column vector.

Except those are literally the exact same thing. You write your input as a linear combination of your standard basis vectors, and then your function preserves linear combinations, and sends each basis vector to the corresponding column—so you get a linear combination of the column vectors.

And I think the thing I just said is pretty common to mention. It’s certainly necessary for doing any sort of change-of-basis stuff. But if it made more sense to you in different language, that’s 100% unsurprising.

If you’re interested, here’s a stab at describing why I find the columns thing so useful.

In a lot of physics-like contexts, it’s natural to write vectors with respect to a basis which has a special physical importance, but whose basis vectors don’t.  For instance, the position x(t) of a damped harmonic oscillator obeys

x’(t) = v

v’(t) = -(cv + kx)

This can be written as a matrix equation y’(t) = Ay, with y = (x, v)^T.

There is clearly something uniquely nice about the basis being used here.  x is position and v is velocity, and it’s a easier to interpret a solution written in terms of x and v than one in terms of (say) x + 2v and x - v.  In fact, since x and v are what you can actually measure, you have to specify how to transform to this particular basis, or you lose the physical meaning.

On the other hand, the basis vectors have little physical importance.  One basis vector is a state with zero velocity, and the other is a state with zero position, and there isn’t any interesting physical connection between the state (x, v) and the states (x, 0), (0, v).  So there’s no physical intuition you can attach to the question “how does A act on (x, 0)?”

In this type of problem, there is a different basis whose basis vectors have special physical meaning: the eigenbasis of A (each eigenspace is closed under time evolution and grows/decays/oscillates at rates given by the eigenvalue).  But you wouldn’t typically write down a problem in that basis at the outset, because you want to give the reader the directly measurable quantities first.


Now that I think about it, the above makes (x, v)^T feel like a covector: we naturally think about its coefficients (“how it acts on the basis elements”), not its decomposition with respect to some basis.  That suggests that this might all be less confusing if we wrote everything with row vectors instead of column vectors.  Vectors would multiply matrices on the left rather than the right, and we would naturally read this off as the vector transforming the matrix rather than vice versa.  But for whatever reason, column vectors are standard.

In fact, Trefethen and Bau’s comments on the columns thing can be viewed as trying to correct for the psychological effect of using column vectors instead of row vectors:

image

Will think more about this in the morning, after I’ve gotten some sleep. But my first reaction is to think that in some sense things are backwards. You have the matrix equation y’ = Ay, and you want to find y. So really your equation is A^(-1) y’ = y.

Basically, I think your second chunk is right; the reason this is feeling unnatural to you is that you’re never using the matrix to plug in y and get y’. Instead, you’re saying that the matrix gives you a (parametrized family of) functions, so you’re saying “I want to know position and velocity, and if I have this matrix I get that family of functions.”

You can always perform this sort of sleight of hand, of course. Any time you have a family of functions f_k: A -> B, you could instead think of this as a family of functions A_a: F -> B. Or as a single function from (F x A) -> B.

But if you find yourself doing this thing a lot, I can see why you’d want to think of linear algebra in a way I find slightly odd. (And I still don’t think I’ve totally wrapped my head around the way you’re thinking of it, so I may come back and revisit this thought later on).


Hm, I’ve thought about this a bit more and I think I figured out why this is weird for me, but I haven’t quite understood it yet. But basically, I don’t think I would think of “position and velocity” as a “basis” at all. x and v aren’t numbers; they’re functions.

You’re working on an infinite-dimensional function space squared; a basis for the whole space will have way more than two elements. And you’re right that the partition into “the position and the velocity” is more natural to the problem than the division into, like, “the position plus the velocity and the position minus the velocity”. But that doesn’t have anything to do with them being a “basis”, which they’re not.

Unless I guess our field of scalars is a function field or something? But that setup is weird enough that I don’t trust myself to understand it on this little sleep either.

Hmmm.  I’m not sure I understand the first part here, but if this is relevant, I don’t think the key point here depends on the fact that we’re solving an ODE.  A plain old matrix equation with the same feature is the Leontief model in economics, which models how much a set of industries has to produce when their products can be factors of production.  It reads

x = Ax + d

where x_i is the quantity produced of good i, d_i is the quantity of good i demanded from outside, and A_ij is the quantity of good i needed to produce 1 unit of good j.

Here the x_i and d_i are just real numbers, but we still have the fact that it’s not very (economically) meaningful to think of the production x as “composed” of basis vector states where only one good is produced, or the demand d as “composed” of states where only one good is demanded.  These generally don’t arise in reality, so we don’t care about them per se.

(If we start with a solution x = (I-A)^(-1) d for some particular and we want to look at how a change to d will affect x, then our basis vectors are more meaningful because it’s natural to imagine changing demand for one good in isolation.  But this is really a different question than the original problem asked – we’re now asking about a tangent plane to the solution surface of the original problem.  It just so happens that this question uses the same equation as the original one, due to linearity.)

(via jadagul)

jadagul:

nostalgebraist:

I took linear algebra as an undergrad and then took a slightly fancier version in my first year of grad school, and I understood all the “matrices <==> linear transformations” stuff, but I never really felt comfortable interpreting the actual entries of a matrix until my second year of grad school, when I learned the rule

the matrix-vector product A*v is a linear combination of the columns of A, with the coefficients given by the entries of v

I learned this from the excellent book Numerical Linear Algebra by Trefethen and Bau, and I don’t think I’ve ever heard it mentioned by anyone else outside of that book.  Yet it’s been invaluable to me, and not just for numerics.  Did I just miss out, or is this simple fact not disseminated widely enough?

The difficult thing about teaching linear algebra (he says, procrastinating from writing the last week of notes for the linear algebra class he is teaching) is that the entire subject is, like, four actual facts, each of which is repeated twenty times in slightly different language.

And you have a great example! We could talk about:

  1. A linear transformation, as a function with certain properties.

  2. Matrix mutiplication

  3. A system of linear equations

  4. A collection of dot products with the row vectors

  5. A linear combination of column vectors

  6. A hyperplane in some higher-dimensional space

  7. A semi-rigid geometric transformation of some space.

  8. A function determined entirely by what it does to some basis.

And those are all the same thing. I think typically students coming out of a (first) linear algebra class understand and have internalized a couple of those; can cite a couple others; and are completely oblivious to the rest. (Any may not have heard of some, because it’s hard to cover all eight; I know that my discussion of the geometric properties has been somewhat perfunctory.

But for any given person, some of these perspectives will make much more sense than others; and if your class doesn’t get you to the ones that work for you, you won’t understand nearly as much as if it does.

(The goal, of course, is to understand all of the perspectives, and to switch among them fluently, but that’s hard and definitely not happening in a first course. So you have to pick your focuses. The reason I was so unhappy with my college’s choice of textbook is that its focus is exacty the opposite of what I would like).

So, for instance, you say that the matrix product is a linear combination of the columns of the matrix, with coefficients given by the input vector. And you say that, and I think for a few seconds and say “huh, I guess that’s true.” But that’s not how I think about it; I think about it as a function that sends each standard basis element to the corresponding column vector.

Except those are literally the exact same thing. You write your input as a linear combination of your standard basis vectors, and then your function preserves linear combinations, and sends each basis vector to the corresponding column—so you get a linear combination of the column vectors.

And I think the thing I just said is pretty common to mention. It’s certainly necessary for doing any sort of change-of-basis stuff. But if it made more sense to you in different language, that’s 100% unsurprising.

If you’re interested, here’s a stab at describing why I find the columns thing so useful.

In a lot of physics-like contexts, it’s natural to write vectors with respect to a basis which has a special physical importance, but whose basis vectors don’t.  For instance, the position x(t) of a damped harmonic oscillator obeys

x’(t) = v

v’(t) = -(cv + kx)

This can be written as a matrix equation y’(t) = Ay, with y = (x, v)^T.

There is clearly something uniquely nice about the basis being used here.  x is position and v is velocity, and it’s a easier to interpret a solution written in terms of x and v than one in terms of (say) x + 2v and x - v.  In fact, since x and v are what you can actually measure, you have to specify how to transform to this particular basis, or you lose the physical meaning.

On the other hand, the basis vectors have little physical importance.  One basis vector is a state with zero velocity, and the other is a state with zero position, and there isn’t any interesting physical connection between the state (x, v) and the states (x, 0), (0, v).  So there’s no physical intuition you can attach to the question “how does A act on (x, 0)?”

In this type of problem, there is a different basis whose basis vectors have special physical meaning: the eigenbasis of A (each eigenspace is closed under time evolution and grows/decays/oscillates at rates given by the eigenvalue).  But you wouldn’t typically write down a problem in that basis at the outset, because you want to give the reader the directly measurable quantities first.


Now that I think about it, the above makes (x, v)^T feel like a covector: we naturally think about its coefficients (“how it acts on the basis elements”), not its decomposition with respect to some basis.  That suggests that this might all be less confusing if we wrote everything with row vectors instead of column vectors.  Vectors would multiply matrices on the left rather than the right, and we would naturally read this off as the vector transforming the matrix rather than vice versa.  But for whatever reason, column vectors are standard.

In fact, Trefethen and Bau’s comments on the columns thing can be viewed as trying to correct for the psychological effect of using column vectors instead of row vectors:

image

(via jadagul)

meisnewbie:

nostalgebraist:

I posted this as a comment over at Ozy’s blog, but I figured it was a long enough ramble that it might as well be a tumblr post.  Cut because it’s yet another post about incels and Nice Guys and all that stuff you don’t want to read yet another post about

Keep reading

IT’S ~well actually~ anime man time!!!!!! EXCITING PEDANTRY BELOW THE CUT.

OTAKU Credentials: I learned Japanese to read eroge, can mostly read simple Japanese without assistance (e.g. most manga) and can read college level stuff with some access to a dictionary. Know people who read 2ch pretty regularly and consume a large amount of eroge (think 50+ games and counting).

Keep reading

Ah, this is interesting and I think it does weaken that part of my argument a lot.  I am probably over-weighting Key because I’m only familiar with the VNs that I see recommended the most (due to curiosity about what “the best of the medium” looks like), and most others I see in that category (say on lists of “kamige”) are very far from the original romance themes and don’t seem to derive their appeal from any fantasy about male-female relations.  Plus Key did sell amazingly at least for a while.

Also, IMO Key’s work is just really bad* relative to other things commonly mentioned in the same breath, so I have to do more psychologizing than usual to reach a satisfying explanation for why it’s so well-regarded.  That does make my own preferences a hidden premise of the argument, though, which is not ideal.

I didn’t know anything about White Album 2, and wow yeah its popularity does seem like a big strike against my hypothesis.

*(except for Planetarian, which I really enjoyed when I read it, but that was around 8 years ago so the current error bars on that judgment are big)

(via meisnewbie)

nightpool:

nostalgebraist:

[cutting because dear god did this get long fast]

Thanks, this is also helpful.

I am wary of composability as an ideal, for reasons I stated here.  As you get more and more objects and methods involved in performing a given task, you’re allowing the actual code that performs that task to be spread more and more widely across the code base, and requiring the reader to trace back more steps in order to have full comprehension of what any line is actually doing.  And if you want to follow what the code is doing closely, you have to jump around nonlinearly more and more.  @gattsuru​ used the phrase “ravioli code” for this, and Googling it, it seems like other people have made the same complaint, e.g.:

I should have noted why I think that Ravioli Code is a bad thing (and hence that those who think it is good style are doing a disservice to their trade). The problem is that it tends to lead to functions (methods, etc.) without true coherence, and it often leaves the code to implement even something fairly simple scattered over a very large number of functions. Anyone having to maintain the code has to understand how all the calls between all the bits work, recreating almost all the badness of Spaghetti Code except with function calls instead of GOTO. It is far better to ensure that each function has a strong consistent description (e.g. “this function frobnicates the foobar”, which you should attach to the function somehow - in C, by a comment because there’s no stronger metadata scheme) rather than splitting it up into smaller pieces (“stage 1 of preparing to frobnicate the foo part of the foobar”, etc.) with less coherence. The principal reason why this is better is precisely that it makes the code easier overall to understand.

Of course, it makes sense to group things together if they actually tend to get re-used together (like having x and y coordinates be attributes of the same object), or to put some code in a function of its own if it forms a distinct conceptual block.  But my understanding is that the “ifs” in the previous sentence should be read as “if-and-only-ifs”; the point of abstractions like functions and objects is to group some things together as well as to separate them from other things, in order to exploit actual regularities of the task or conceptual domain in your code.

If, instead, you try to make everything as modular as possible all the time, you’re no longer making useful “these form a group, apart from these other things” distinctions; you’re just splitting everything up as finely as it possibly can be.

The first thing is that it’s all about the abstractions you can make and how leaky those abstractions are. The idea of ISP is to have strong boundaries at the interface level so you don’t have to worry too much about what the code on the “other side” is doing—obviously this is easier in a strongly typed language, but I don’t think there’s ever going to be any confusion about what section.students() returns, for example.

the quoted post was talking about just functions, not objects, so I don’t know how germane it is here. if you don’t have objects, you don’t have interfaces, and you don’t have interface segregation. (modules don’t really count, imo. you need objects)

The other thing that Sandi tried to nail into my head was that, when you’re designing code, you can’t predict the future. This means you should never write code that is more complicated then what you need to solve the task in front of you today. Never refactor your code until you have a reason to. And I don’t think anyone here is saying “you should split everything up as finely as it can be split”. I’m saying “split things up such that it makes your code declarative, readable, and understandable”.

In re: “you’re only going to be able to say “I understand what this is doing insofar as I understand exactly what all of these abstractions do” from your other post: that’s…. how coding works? you’re never going to understand the “full system”. It’s too big for that. I don’t know anything about transistor design, for example. I know a little bit about assemblers and objcode, but if I had to think about such things when I was writing Ruby, I’d go absolutely crazy. And so on up the stack. I know a fair bit of SQL by now, but I didn’t know any of it when I started writing ActiveRecord code (and active record is pretty bare metal as far as ORMs go!), and that was okay. I definitely didn’t know the full codebase when I joined my job, but I was able to understand enough of the bits I was looking at to push two bugfixes on my first day, in a 140k loc app. Abstractions make the world go round. (also, at this point, we can also get into unit testing and similar ways of assuring yourself of code correctness that don’t rely on understanding the whole world)

The quoted post mentioned methods, so the poster is definitely not assuming we don’t have objects.  And Googling around, people do seem to use the term “ravioli code” in a germane way, e.g. (from these slides)

image

Although some people use it as a laudatory rather than derogatory term (as here, although that also contains “Risotto code” for “a confusing lot of very, very little objects all interacting together”).

Uh, pasta aside, I just mean to say that this seems like a topic of actual disagreement, and not just me misunderstanding something.  See also gattsuru’s post.

Re “Abstractions make the world go round”: there’s a distinction between abstractions everyone can already be trusted to understand, and abstractions we’ve introduced that have to be newly learned when someone reads the code.  There are lots of references to abstractions in your first block of code, but they’re Python and Beautiful Soup abstractions, and I was already familiar with them when I read it.  But in your second block of code, I of course was not familiar with with the “Section” and “Student” abstractions.  Becoming familiar with them – in the way that I am familiar with Python and Beautiful Soup – would require figuring out things like the following:

  1. the Student objects returned by section.students() actually originate from scraping the page for reviews; we don’t actually have a unique object for each student, and if the same student writes multiple reviews for some reason, we’ll have more than one Student corresponding to them
  2. student.reviews() returns a list of strings, but the identically named section.reviews() returns a dict of id/list-of-string pairs.  On the other hand, section.students() and section.reviews_for() return lists, not dicts.  There is no way (afaik) to infer whether we are getting a list or a dict from the method names.
  3. whereas student.reviews() gives you a list of strings, and section.reviews() gives you a dict of id/list-of-string pairs, section.reviews_for(student) gives you a list of Student objects.  We have three (!) different data structures these two classes can give us if we innocuously ask them for things called “reviews.”

(3 is true if section.reviews_for does the thing I think it does, but I’m not sure?  I think “for s in students()” should be “for s in self.students()”, in which case we indeed do get a list of Students.)

I don’t mean to pick on the specific choices you made in your second block, which after you you said was just a sketch (and may be more intuitive to people who know things I don’t).  But however we design the second block, we have to make a lot of weighty choices about what we are putting behind each abstract wall.  What are we calling a “review” and a “student”?  What will the reader expect these things to be?

We are also making these kinds of choices when we name variables in imperative programming, but if we declare variables right before we use them, the reader can hold the definitions in their short-term memory, like they’re reading a book.  In your first block, after I read the first line I know that by “student” we actually mean “a thing we get out of page.select,” and if I ever forget, I can just say “hey, what were we doing again?” and look up a few lines.  I never have to wonder what the author of this code thought a “student” should “transparently” be.

(This can all be made easier with docstrings, but in this example that amounts to writing a comment explaining literally every line of code you write, which is clearly overkill.  The code explains itself, if you put it all in close proximity.)

I realize I’m being obnoxious here, and possibly fixating on the example at the expense of seeing the broader point.  I do see the advantages of lots of modularity for large codebases worked on by many people.  But for a simple scraping task like the one in your example, I don’t understand how the benefits outweigh the costs.

I also admit I’m someone who doesn’t do enough OOP even when it’s obviously useful for the task at hand, so I may not have developed the right habits yet.

Anonymous asked: That intelligence post started out implying it'd argue against the claim "the educated public ... thinks intelligence research is all bunk. By contrast ... there is a solid scientific consensus on intelligence" and ended up roughly reaffirming my belief in it. Maybe the author isn't from a bubble that wouldn't even accept that forms of intelligence are relatively well-correlated.

youzicha:

nostalgebraist:

I’ve heard this a number of times – that the educated public denies the existence of the positive manifold itself.

And I’ve definitely been around plenty of educated people who’ve said things (e.g. ”IQ is meaningless”) that amount to this, if read in a particular, literal way.

But I suspect there is not much substance to this apparent disagreement.  Do people actually think different abilities are not correlated, or just that they’re different systems which are only incidentally correlated, as in the case of physical fitness?  When people say “IQ is meaningless,” are they really saying “there are no positive correlations between different intellectual abilities,” or are they saying what any of us might say about the single-factor theory of fitness: “this is a shitty scientific theory which doesn’t tell us anything about what is actually going on”?

Like, even Stephen Jay Gould didn’t deny the existence of the positive manifold.  He just said it was unsurprising.  Do people who read his book miss this and end up thinking the positive manifold isn’t there?  Or is it more likely that people realize it’s there (or admit it might well be), but dismiss IQ in broad terms for other reasons?

I’m not sure that’s my impression, but rather than guessing we should probably try to be empirical. So here’s a very crude experiment: I typed “IQ is meaningless” into google, and looked at the pages that came up. They are as follows (I reordered them to group themes together).

8 Reasons The IQ Is Meaningless - Listverse

The reasons given are: (8) IQ tests were developed to find intellectually disabled children, so we should not expect them to be valid for other purposes. (7) Some tests include “general knowledge” questions, which is unfair. (6) What good is it to be a Mensa member, when the society does nothing helpful? (5) “And how do we measure the IQ of Ludwig van Beethoven? He was good at music, but not good at mathematics. His mathematical education stopped at arithmetic. He couldn’t even do intermediate algebra. If he were to take the test, he would probably score low, but the absence of math and science from his mind didn’t hurt his career much.” (4) “Most IQ tests are timed, which means your speed is part of the score. Even if you answer every question correctly, your slow speed will pull your IQ down a few points, sometimes many. But is speed important in life?” (3) “[Einstein] aced the math and science sections, but failed French, Italian, history, and geography. He had to spend a year in a run-of-the-mill vocational college until they let him retake the exam. So how can we trust the single [IQ] number?” (2) IQ tests would probably not predict if you are good at learning boxing or not, even though such aptitude also seems be a kind of intelligence. (1) “Of course intelligence is rather important to life as a human, and the higher one’s is, the better, but only if it is put to good use.”

Notably, the article never talks at all about whether IQ is causal, or representing a concrete biological mechanism. They talk about the measuring process being noisy (8,7), about how well IQ predicts worthwhile productivity (6, 4, 1), and about how different intellectual skills are not highly correlated (5, 3, 2). In general, I would summarize this as saying that IQ is meaningless because it will be a poor predictor of how well you will do in a given field.

Ignore the IQ Test: Your Level of Intelligence is Not Fixed for Life | IFLScience

(Google picked up this article because I fucking love science’s tagline for it is “Why your IQ score is meaningless.”)

“IQ tests measure our vocabulary, our ability to problem-solve, reason logically and so on. But what many people fail to understand is that if IQ tests measured only our skills at these particular tasks, no one would be interested in our score. The score is interesting only because it is thought to be fixed for life.”

However: people do get smarter as they get older, and this effect is hidden because the score is normalized compared to other people of the same age. Also, there is the Flynn effect. Also, early behavioural intervention can result in large IQ gains for children with autism. Also, increasing the length of compulsory schooling in Norway increased IQ. Also, dual n-back training improves IQ. And the author’s own research shows that you can improve IQ by “Relational Frame Training”, which you can buy from his relational frame training company.

Contrapositively, it seems that (according to the IFLScience editor) IQ would be meaningful if it did not respond to interventions. This is a testable claim, but it seems quite distinct from the questions of the causal status of g.

Why do so many people assert that IQ is a ‘meaningless’ test - Quora

The question asks, IQ correlates with life outcomes, so why do people say it’s meaningless? And the single answer says “High IQ does, to an extent, correlate with “success,” depending on how we define that. However, it is one of innumerable factors that do so, and it can be easily drowned in competing factors. A high IQ might make you more likely to get into Harvard, but has less of an impact than your parents’ alumni status […] But people act as though IQ is extremely important, and that minor variations in individuals will create noticeable and predictable differences in success, and that just isn’t true. IQ tests aren’t meaningless, but they certainly don’t tell you very much.“

I.e., both the questioner and the answer think that IQ is meaningful iff it predicts life outcomes.

Your IQ value is artificial, meaningless and does not matter, to anyone, whatsoever. This includes yourself.

This is a “change my view” post; the questioner summarizes their current position as

“But some people are objectively smarter than others” yes that’s true, obviously gifted people and geniuses are real and can objectively be pointed out as more talented than most people *in some areas*, but not all. What I’m saying is that trying to reduce all of a person’s skillsets into one standardized generalized comparable single one-size-fits-all numerical value is impossible. It is trying to measure the immeasurable.“

This seems to be a complaint that IQ compresses data poorly, because different skills are not correlated. It doesn’t seem to care about the causal mechanism.

Is IQ Meaningless in Business? - HubSpot Blog

The problem with relying on IQ – or even IQ-style, theoretical questions – to predict business success isn’t with what IQ actually measures. In fact, cognitive ability is certainly important for many jobs. Rather, the problem is with what IQ doesn’t measure. Your IQ score won’t tell you (or your boss, or your hiring manager) anything about your emotional intelligence, your creativity, or your practical intelligence, i.e. "street smarts.” […] If Not IQ, Then What? […] a 30-year study of more than 1,000 children found that cognitive control predicted success better than a child’s IQ

I could quote more, but in this article’s framing “is IQ meaningful” means exactly “is IQ a good predictor of success”.

IQ tests are ‘fundamentally flawed’ and using them alone to measure intelligence is a ‘fallacy’, study finds

The next two hits are particularly interesting for this discussion, because they are news reporting about a paper, Fractionating Human Intelligence, and they interview one of the paper authors, Roger Highfield.

The result in the paper is exactly the kind of thing that you have been writing a lot about: they say that performance on the IQ subtests derive from either two or three independent factors ('short term memory’, 'reasoning’, and 'verbal’) and that you can even see the responsible parts of the brain light up on fMRI. So this really goes directly towards the question of how the factor graph should be structured, and whether g has a biological mechanism or is just a statistical artefact. However, the reporting does not really focus on the causal structure.

Instead of a general measure of intelligence epitomised by the intelligence quotient (IQ), intellectual ability consists of short-term memory, reasoning and verbal agility. Although these interact with one another they are handled by three distinct nerve “circuits” in the brain, the scientists found.

“The results disprove once and for all the idea that a single measure of intelligence, such as IQ, is enough to capture all of the differences in cognitive ability that we see between people,” said Roger Highfield, director of external affairs at the Science Museum in London.

“Instead, several different circuits contribute to intelligence, each with its own unique capacity. A person may well be good in one of these areas, but they are just as likely to be bad in the other two,” said Dr Highfield, a co-author of the study published in the journal Neuron.

The scientists found that no single component, or IQ, could explain all the variations revealed by the tests. The researcher then analysed the brain circuitry of 16 participants with a hospital MRI scanner and found that the three separate components corresponded to three distinct patterns of neural activity in the brain.

“It has always seemed to be odd that we like to call the human brain the most complex known object in the Universe, yet many of us are still prepared to accept that we can measure brain function by doing a few so-called IQ tests,” Dr Highfield said.

“For a century or more many people have thought that we can distinguish between people, or indeed populations, based on the idea of general intelligence which is often talked about in terms of a single number: IQ. We have shown here that’s just wrong,” he said.

The whole way they are selling this result seem to focus on the data compression aspect, not the inquiry into the underlying mechanism. They are stressing that a single number is not a good explainer of variation. In particular, the statement that “A person may well be good in one of these areas, but they are just as likely to be bad in the other two” seems to come pretty close to denying the positive manifold, with the “just as likely” phrasing. (I guess he is referring to the components being uncorrelated; they were constructed to be orthogonal. But the subtests of course have loadings on all the components.)

IQ tests are 'meaningless and too simplistic’ claim researchers

This is another news report about the same paper, and its tendency is even more in the same direction. They say

They discovered that far from being down to one single factor, what is commonly regarded as intelligence is influenced by three different elements – short-term memory, reasoning, and verbal ability. But being good at one of these factors does not mean you are going to be equally gifted at the other two. […]

‘This really is a wake-up call. We have now shown that on the evidence, these tests are meaningless. ‘We need to stop trying to simplify the brain, which is very complicated organ, down to a number. ‘We need to think of intelligence like the Olympics. Is the gold medal winner in the marathon fitter than the gold medallist in the 100m sprint?’

This seems very reminiscent of the “8 Reasons” article—IQ is meaningless because there there are different, uncorrelated, skills, so one number can’t summarize intelligence. Even though the original paper is not about the positive manifold, the people reporting on is seem to wish that it was.

The IQ is Meaningless. | elephant journal

This is the last article that google found. It has two separate arguments. First, it again summarizes the “Fractionating Human Intelligence” paper: “short-term memory, reasoning and verbal agility. There are more, and they are often known by different names, but these three were found by Highfield’s study to be entirely handled by different nerve “circuits” in the brain. As such, there is no possible way that a solitary measure could capture intelligence. Each circuit would have it’s own individual capacity which would vary from person to person. Adding the circuits together comes out with a number that’s fundamentally meaningless; a number that tells you nothing about the individual’s ability and, as the study showed, cannot account for the variation between people and between tests.”

The first part of this quote, which says “Adding the circuits together comes out with a number that’s fundamentally meaningless”, is actually exactly in line with what you have been talking about; it makes the argument that IQ is meaningless because it’s not ontologically valid. Yay! On the other hand, the second part of the quote immediately undermines this again, because they state that IQ cannot tell you anything about an individual’s ability (a “predict outcomes” criterion) and cannot account for variation (a “data compression” criterion).

The second part of the article talks about how “IQ scores can change quite dramatically as a result of changes in family environment, work environment, historical environment, styles of parenting, and, most especially, shifts in level of schooling.” This is an argument we saw earlier, that IQ is only meaningful if it doesn’t respond to interventions.


Whew. So, if you want to change the mind of lay people about whether IQ is meaningful or not, what kind of evidence do you need? From the above, I feel there are three main things that leap out,

  • Does it predict job performance etc.
  • Does it respond to interventions.
  • And something which I have been glossing as “does it compress data well”; although the articles tend to use language like “immeasurable” or “cannot be reduced to a single number”.

I’m still unconvinced that the questions about “what should the factor graph look like” is really at the core of what most people think about when they discuss if IQ is meaningful or not.

Like, the position that you seem to attribute to most people is something like “of course IQ scores will predict outcomes, but only because everything is correlated”. But these examples are like “this person is really good at one thing, but that doesn’t mean he will be good at some other thing”. Muhammed Ali is great at boxing, Ludwig van Beethoven is bad at math, marathon runners are different from 100m sprinters, someone with good short-term memory is just as likely to be bad at reasoning. Such examples seem aimed to deny that there is correlation at all.

Many thanks for writing this up – this is exactly the kind of concrete evidence this conversation needs.

I do disagree with you somewhat about the interpretation of these posts, particularly with your final paragraph.  If people are trying to use these examples to show there is no correlation, then they’re making a really bad argument: there are a lot of people in the world (and in history), so there are going to be some noteworthy exceptions to any correlation unless it’s very, very close to 1 (and social science correlations never are).  Plus, some of these examples are of extreme high-performers, and “the tails come apart.”

Now, if they were explicitly saying that there was no correlation, and exhibiting these examples as evidence, I would just say they’re making a bad argument.  But if there’s another interpretation that makes the argument better, the principle of charity favors taking it.  And indeed, I think these examples make more sense as evidence against the practical importance of IQ.  This also goes well with the “can’t be reduced to a single number” thing – of course you can reduce anything to a single number (if you try hard and believe in yourself), it’s just a really bad idea sometimes.  These examples are supposed to make you think “huh, look at all this important stuff I’d miss if I only cared about a one-dimensional representation of intellectual capacity.”

Putting it in fancy theoretical language, these claims are not just about the probability distribution itself (and how well it can be “compressed” per se), but about the probability distribution together with some utility function.  We don’t care about “amount of information lost” (as defined in some purely mathematical, utility-agnostic way), but about "value to us of information lost.”


Also, I think we have to take a lot of care before declaring that a claim is “purely correlational” or non-causal, if the person doesn’t specify.

I personally don’t think that people actually make purely correlational claims very often unless they’re literally using the word “correlation” and talking about studies.  This is because people naturally think in terms of causes and effects, not correlations, and in fact there is no well-defined way to do inference “purely in terms of correlations” without some model of the data-generating process.  (I give you the correlation coefficients between A and B and between B and C; what’s the correlation coefficient between A and C?  Not a well-defined question without further information.  Although you could maybe put a bound on it by requiring the correlation matrix to be pos. def.)

This is probably why it’s so common to conflate correlation with causation – the former is an unfamiliar concept and impossible to work with on its own, so people reinterpret it as something they can work with.

Suppose it were the case that IQ was simply a measure of social class and not intelligence itself.  Because being in a higher social class makes it easier to get further in life, class will itself be correlated with various kinds of achievement, and thus IQ will be correlated as them well.  So if “IQ is meaningless” really means IQ is uncorrelated with a bunch of other stuff, it can’t even be a proxy for other stuff that causes that stuff, or is caused by it.  In other words, IQ has to be hermetically sealed off from anything important.  This seems much stronger than what people actually mean when they say things like “IQ is meaningless.”  (“IQ isn’t meaningless: it’s a proxy for social class!  Checkmate, atheists!”)

I think this applies to the positive manifold / “single number” stuff as well as to predictions of outcomes from IQ.  Consider for example Shalizi’s pedagogical example of PCA, using a dataset about cars:

[eta: i have no idea why tumblr is putting a quote from deusvulture before the shalizi quote – it’s not there in the post editor so i can’t remove it]

resinsculpture-deactivated20221:

(tagging @nostalgebraist, I suppose, since it was his post)

[…] all the variables except the gas-mileages have a negative projection on to the first component. This means that there is a negative correlation between mileage and everything else. The first principal component tells us about whether we are getting a big, expensive gas-guzzling car with a powerful engine, or whether we are getting a small, cheap, fuel-efficient car with a wimpy engine.

The second component is a little more interesting. Engine size and gas mileage hardly project on to it at all. Instead we have a contrast between the physical size of the car (positive projection) and the price and horsepower. Basically, this axis separates mini-vans, trucks and SUVs (big, not so expensive, not so much horse-power) from sports-cars (small, expensive, lots of horse-power).

If all we care about is compressing information, then “can the car data be boiled down to a single number?” depends only on how much of the variance is carried by that first PC alone.  But I think people would object to that one-dimensional model even if it explained a lot of variance, on the grounds that it’s a poor description of the “dynamics” of car production (i.e. the data-generating process).  Presumably, car manufacturers do not sit down and decide “are we making a big-expensive-inefficient-powerful car this time, or a small-cheap-efficient-wimpy car?”  They sit down and decide to make an SUV, or a sports car, or whatever, and then a bunch of society-wide factors about demand for different car features, costs of production, etc. end up producing this covariance structure in the end result.  Likewise, people who buy cars think things like “I want a minivan,” not “which of the two Car Genders do I want?”

I expect people would express this idea as “representing different car types on a single axis is silly/meaningless,” and they wouldn’t just be talking about % of variance explained.

somervta:

nostalgebraist:

voxette-vk:

nostalgebraist:

voxette-vk:

I wonder if you as a child thought “in the future, I will be the exact same person”

Um, yeah?

Like “when I grow up”, not “when they grind me up and replace me with an adult”…

Of course I have different attributes now: I’m taller, I have different interests, I’m more intelligent, etc. But I’m absolutely the same thinking being, causal agent, and subject of experience that inhabited my body 20 years ago.

If you think I’m wrong about that, fine. But I think my actions make sense on that premise. And yet the actions of (99% of) materialists do not make sense on their premises.

Or suppose you gave me a choice: I can be disintegrated and have a copy created with the same personality, memories, interests, etc., who will swear up and down she’s me; or I can have my mind “wiped” so that I lose all my memories and have a totally different personality, but I will continue to experience the pleasures and pains of the new personality, I would choose the latter in a heartbeat. I wouldn’t be pleased with losing my memories, but whether I’m the same thinking being is clearly the more important consideration. It’s not the only consideration (as there are some conditions) under which I wouldn’t want to go on existing), but without that no other consideration could possibly matter.

How would you feel about it after the “wipe,” though?

Like, suppose you were been “wiped” a moment ago (in principle, you might well have been), and someone tells you about this.  They say:

“Hey, all your memories of being [the person who owns tumblr account voxette-vk] are falsehoods.  There was no being wandering around with your precise set of interests and inclinations, acting as you remember acting.  Instead, your consciousness is the consciousness of someone with a different personality, who did different things.  Here are some facts about them –“

– and then they relate a bunch of information about the life and ways of a person who is absolutely nothing like you, whose life story has no relation to your memories, whose behavior doesn’t sound like anything you would do.  (”I lose all my memories and have a totally different personality […]”)

What now is the relation you feel to this other person, if any?  Do you feel like you have “picked up where they left off” in any meaningful sense?  Do you feel like “the same thinking being” as this total stranger who you have never heard of until this moment?

After the wipe, I wood course feel at home in my new personality. So while I would certainly be very hesitant to switch my personality for a random new one (being very happy with my current one), if I had extremely good reason to think I’d be happier with the new one, I’d definitely switch.

Which it why I find it strange to hear the refrain of “I wouldn’t want to ‘cure’ my [X] because then I wouldn’t be me anymore.” I think that conception of identity is wrong and harmful. It suggests that any sort of major self-improvement (e.g. giving up an alcohol addiction) is a form of suicide and thus undesirable.

Anyway, as to your main point, I don’t see that how I feel about the situation is relevant? I’m sure if such a thing happened to me, I wouldn’t feel much affinity for my past self. But…if I am that person, then I am that person. It’s just a fact. And I would much rather that person be able to e.g. access my savings (assuming I had some, lol) live in my house, etc. than some copy of my past self who was exactly the same personality-wise.

And moreover, if I had committed some crimes or done some heroic deeds before my personality wipe, I would still deserve the respective punishment or reward regardless of whether I remembered the acts or was now disposed to similar acts…

What I’m trying to do is talk about statements like this

And I would much rather that person be able to e.g. access my savings (assuming I had some, lol) live in my house, etc. than some copy of my past self who was exactly the same personality-wise.

except looking back on past-you(s), rather than future you(s).

By hypothesis, the wiped person has a different personality and memories.  Which means that you, right now, could (in principle) have just been wiped.  Suppose you are told this, and told about what you were like before (which sounds like a description of a stranger, b/c different memories).  Now you are given a choice: you can change the past so that (1) the person who was wiped to produce you was better off, or (2) some other stranger was better off.  Would you feel a preference between these two, analogous to the one in the sentence I quoted?

(If not, then identity is asymmetric in time – why?)

Or, better yet, suppose there were *two* people (two different bodies, normal people), who are both wiped. Then, they are reloaded - the same personalities, memories etc are put back, but they’re given to the wrong bodies by mistake. Would you still maintain that Body-A should own all the savings, house etc, even though Body-B is the one who remembers them? I can understand thinking (say) that the two original people don’t *exist* any more, but to say that they do, but they’re *not* the person who has all their memories, psychological traits, etc just seems incredibly strange.

Nice – that’s better than my example.

(via somervta)

voxette-vk:

nostalgebraist:

voxette-vk:

I wonder if you as a child thought “in the future, I will be the exact same person”

Um, yeah?

Like “when I grow up”, not “when they grind me up and replace me with an adult”…

Of course I have different attributes now: I’m taller, I have different interests, I’m more intelligent, etc. But I’m absolutely the same thinking being, causal agent, and subject of experience that inhabited my body 20 years ago.

If you think I’m wrong about that, fine. But I think my actions make sense on that premise. And yet the actions of (99% of) materialists do not make sense on their premises.

Or suppose you gave me a choice: I can be disintegrated and have a copy created with the same personality, memories, interests, etc., who will swear up and down she’s me; or I can have my mind “wiped” so that I lose all my memories and have a totally different personality, but I will continue to experience the pleasures and pains of the new personality, I would choose the latter in a heartbeat. I wouldn’t be pleased with losing my memories, but whether I’m the same thinking being is clearly the more important consideration. It’s not the only consideration (as there are some conditions) under which I wouldn’t want to go on existing), but without that no other consideration could possibly matter.

How would you feel about it after the “wipe,” though?

Like, suppose you were been “wiped” a moment ago (in principle, you might well have been), and someone tells you about this.  They say:

“Hey, all your memories of being [the person who owns tumblr account voxette-vk] are falsehoods.  There was no being wandering around with your precise set of interests and inclinations, acting as you remember acting.  Instead, your consciousness is the consciousness of someone with a different personality, who did different things.  Here are some facts about them –“

– and then they relate a bunch of information about the life and ways of a person who is absolutely nothing like you, whose life story has no relation to your memories, whose behavior doesn’t sound like anything you would do.  (”I lose all my memories and have a totally different personality […]”)

What now is the relation you feel to this other person, if any?  Do you feel like you have “picked up where they left off” in any meaningful sense?  Do you feel like “the same thinking being” as this total stranger who you have never heard of until this moment?

After the wipe, I wood course feel at home in my new personality. So while I would certainly be very hesitant to switch my personality for a random new one (being very happy with my current one), if I had extremely good reason to think I’d be happier with the new one, I’d definitely switch.

Which it why I find it strange to hear the refrain of “I wouldn’t want to ‘cure’ my [X] because then I wouldn’t be me anymore.” I think that conception of identity is wrong and harmful. It suggests that any sort of major self-improvement (e.g. giving up an alcohol addiction) is a form of suicide and thus undesirable.

Anyway, as to your main point, I don’t see that how I feel about the situation is relevant? I’m sure if such a thing happened to me, I wouldn’t feel much affinity for my past self. But…if I am that person, then I am that person. It’s just a fact. And I would much rather that person be able to e.g. access my savings (assuming I had some, lol) live in my house, etc. than some copy of my past self who was exactly the same personality-wise.

And moreover, if I had committed some crimes or done some heroic deeds before my personality wipe, I would still deserve the respective punishment or reward regardless of whether I remembered the acts or was now disposed to similar acts…

What I’m trying to do is talk about statements like this

And I would much rather that person be able to e.g. access my savings (assuming I had some, lol) live in my house, etc. than some copy of my past self who was exactly the same personality-wise.

except looking back on past-you(s), rather than future you(s).

By hypothesis, the wiped person has a different personality and memories.  Which means that you, right now, could (in principle) have just been wiped.  Suppose you are told this, and told about what you were like before (which sounds like a description of a stranger, b/c different memories).  Now you are given a choice: you can change the past so that (1) the person who was wiped to produce you was better off, or (2) some other stranger was better off.  Would you feel a preference between these two, analogous to the one in the sentence I quoted?

(If not, then identity is asymmetric in time – why?)

raginrayguns:
“ nostalgebraist:
“ nostalgebraist:
“ raginrayguns:
“ Seems like when someone writes like, “we care about this thing, so we used the standard quantitative measure of this thing,” @nostalgebraist is in the habit of asking, “why’s that...

raginrayguns:

nostalgebraist:

nostalgebraist:

raginrayguns:

Seems like when someone writes like, “we care about this thing, so we used the standard quantitative measure of this thing,” @nostalgebraist is in the habit of asking, “why’s that standard?” Especially if that measure has some aura of goodness or rightness about it, that makes you question whether it’s being used for intellectual reasons.

One such question was, why do statistics people always measure distance between two distributions using Kullback-Leibler divergence? Besides, you know, “it’s from information theory, it means information.”

Above, I’ve illustrated the difference between using KL divergence, and another measure, L2 distance. I’ve shown a true distribution which has two bell curve peaks, but the orange and purple distributions only have one, so they can’t match it perfectly. The orange distribution has lower L2 distance (.022 vs .040), and the purple curve has lower KL divergence (2.1 vs 3.0). You can see that they’re quite different:

  • the orange low-L2 one matches one peak of the true distribution, but has the other one deep in the right tail
  • the purple low-KL one goes between them and spreads itself out, to make sure there’s no significant mass in the tails

And this difference makes a real practical difference–using KL divergence actually is not always appropriate. When I’m doing statistical estimation, I often have a model for the data, but I don’t expect every data point to follow the model. So I expect the true distribution to have one peak which fits my model, plus some other stuff. So I don’t want to do maximum likelihood estimation, which is heavily influenced by that other stuff. And maximum likelihood estimation is actually choosing a model by minimizing a sample-based estimate of KL divergence. Instead, I minimize a sample-based estimate of L2 divergence–this is called L2 estimation, or L2E. (some papers about it here.) That way when I’ve inferred the parameters of my model, it matches the “main” peak of the data, and is robust to the other stuff.

The invention of L2E is actually informative about how standard KL divergence really is. Because, it was invented by someone in a statistical community where L2 divergence is standard. Specifically, non-parametric density estimation–think histograms and kernel density estimators. The guy is actually David Scott, who’s also known for “Scott’s rule” for choosing the bin width of a histogram, which you may have used if you’ve ever done “hist(x, method=‘scott’)” in R. Scott’s rule starts by looking at the mean and standard deviation of your sample, and then gives you the bin width that would be best for a sample of that size drawn from a normal distribution with that mean and sd. And how’s “best” quantified? It’s expected L2 distance between that normal distribution and the resulting histogram. Most papers you see on histograms and kernel density estimators will use L2 distance. He came up with L2E just by asking the question, what if we took the measure of fit used in nonparametric density estimation, and applied it to parametric models?

(code)

This is really interesting, thanks.  Especially the connection of MLE downsides to K-L downsides.

One thing that gets mentioned as a good quality of K-L is that it’s invariant to changes of coordinates.  L2 divergence doesn’t have this (I think? the squares ruin it, you get a squared factor and the “dx” can only cancel half of it).  How much of an issue is this in practice?  Like, it seems bad if you can totally change the distributions you get by squishing and stretching your coordinate system, but I guess if you have a really natural coordinate system to begin with … ?

Also, this made me think about how a sample distribution is going to have better resolution near the peak than in the tails, which could be one justification for caring more about the fit near the peak.  It seems like that could be put on a quantitative footing, too?  With theorems and stuff, even.  Maybe this is already a thing and I just don’t know it

re: change of coordinates, some observations about how L2 ends up being used:

  • in research on histograms and kernel density estimators, the problem is often to choose a bin width (for a histogram) or a bandwidth (for kernel density estimators), which are usually constant. So, then you’ve got the question, in what coordinate system does constant bin width/constant bandwidth makes sense? One where the smoothness of the distribution is sufficiently close to constant I guess.
  • I use L2E a lot and don’t think about the coordinates much. Usually i end up plotting the L2E fit over a histogram and being like, “yeah, that looks good.” If the distribution had really sharp spikes, or really long sparsely populated tails, I guess my histogram would look like crap and I might consider changing the coordinate system?

re: resolution near peaks vs near tails: yeah I guess but if we actually know the true form of the distribution MLE is more efficient. Scott describes the efficiency of L2E relative to MLE as similar to the efficiency of the median relative to the mean. So the responsiveness of MLE to the tails must be helping it, if we can trust that those values actually came from the theoretical distribution we’re fitting.

This kind of makes sense. Think of the extreme case of a uniform distribution on (μ-½, μ+½). Two data points near the edges completely determine μ, whereas two data points in the middle leave it ambiguous.

But maybe the uniform distribution is a bad example since it doesn’t seem to really generalize. In the case of a normal distribution with known sd, it doesn’t seem to matter where the data comes from. With a uniform prior over the mean, the posterior always has the same spread–always normal with variance equal to the sampling variance over the sample size. Doesn’t matter if you observe 2 data points and they’re both 0, or if one is -2 and the other 2–they both pin down an answer of “μ is around 0″ with exactly equal confidence. That’s actually really weird now that I think of it.

Oh, and here’s a case where it’s the other way around. Consider the maximum likelihood estimate of location for a cauchy distribution. We’re going to try and minimize the negative log likelihood. Which we do by solving

∑ 2(x_i-μ) / [(x_i-μ)^2 + 1] = 0

Each term is weighted inversely by its distance from the center. (sorry for saying this was the loglikelihood itself in an earlier version of the post!)

And this kind of looks like L2E in a way, don’t worry about the tails, and I think that this isn’t a coincidence. When I’m using L2E, I’m considering data points to have different reliability. Some tell me about the “primary mechanism” which I have a model for, whereas others don’t because they came from some other process. This is similar to how you can sample from a Cauchy distribution just by sampling from a normal distribution, but each time choosing the precision (1/variance) according to a chisquared distribution with 1 degree of freedom. This captures that idea of “variation in reliability.” And the ones farthest from the center are likely to be the low-precision samples which carry little information.

Hmm … I guess my idea about low resolution in tails only makes sense if you’re asking the question “how much does one function (the empirical PDF or CDF) look like another one (some theoretical PDF or CDF),” as opposed to “how likely is this sample to have come from this theoretical distribution?”  The latter is literally maximum likelihood, and the former seems like roughly what L2E does.

From the “how similar are these functions” perspective, extreme values aren’t really special.  Say you’re comparing your empirical PDF to a, say, a standard normal PDF.  If you see an individual data point at, say, -100, or -1000, this doesn’t actually “make the functions look less similar” by very much: the normal PDF is basically zero at both -100 and -1000, so you’ll take a hit from the empirical PDF being nonzero there, but with an appreciable sample size a single point won’t make it very much larger than zero, and what’s more, it hardly matters whether the point is -100 or -1000 (or -1e12), since you’re comparing to “basically zero” in all those cases.

By contrast, in maximum likelihood, you really care about those points because it’s extremely unlikely that you’d observe them if sampling from the standard normal, and -1000 is vastly more unlikely than -100.  The idea of “resolution” I had in mind doesn’t apply here; you aren’t trying to say how confident you are about the shape of the function there.  Like it’s probably true that the true PDF doesn’t have some weird little bump at -1000, it’s probably more continuous than that, but that single observation still gives you a huge amount of information about the relation between the true distribution and your hypothetical one.

It makes sense that L2E would be used for histograms, because with a histogram you really do want it to “look like the function,” since you’re going to be … looking at it, and treating it like it’s the function.

It seems like the resolution concept would be most relevant in comparing two empirical distributions, since there you don’t know the true probability of anything.  And the K-S test is used a lot for that, and it is less sensitive to extreme values, although I don’t know if there’s a principled connection between those two facts.  (Sometimes people say this is a flaw in the K-S test and use corrections or other tests because they want more sensitivity to extreme values.)

(via raginrayguns)

By the ignorant, for the ignorant: What is a rational discussion anyway? • r/slatestarcodex →

furioustimemachinebarbarian:

nostalgebraist:

jadagul:

bambamramfan:

But I do feel bad for some of these people. Specifically, my heart goes out to the ones who read the wrong book.

I think people don’t realize how dangerous a library is. It’s got a lot of books. Some of them belong in the Restricted Section. These books are dangerous if your powers aren’t sufficiently developed, these books contain shadows from the netherworld waiting to leap out and possess you in a moment of weakness, or they take over your mind as you read them and drive you to insanity.

It’s so easy to go wrong in economics. Give a bright, curious, smart, eager kid Zombie Economics, Debunking Economics, whatever Galbraith wrote, give him a bastardized version of Keynes or even just let him take a crack at the General Theory (it’s brilliant, but shouldn’t be the third book you read), and finish him off with a palatable version of Marx or some market socialism tract, and there you go, you’ve ruined him. He’s done. He is a hundred meters behind the starting line and accelerating away, and now any attempt to turn him in the right direction will be met with suspicion.

Knowledge is hard.

I basically endorse this. (The writing gets slightly obnoxious from time to time but I think that’s hard to separate from the message in a lot of ways).

It reminds me of some of the stuff I was saying about MIRI back when that was the Hot Topic.

I feel like am basically “that kid” (about economics but also a lot of other subjects) and I am not entirely sure why this is bad.

The model of expertise this poster has in their mind is a familiar one, but the more experience I have among actual experts, and the closer I get to actual expertise in anything, the more wrong it seems to me.  The fact that the author is a grad student seems significant; they are in a position where their role is to become better at being an academic in their (sub)field, by the standards of that (sub)field, where they have to suppress their urge to critically question those standards because that might hurt their career and at best won’t help it.

The author’s emphasis on impact factors and citations is confusing to me.  Maybe this is because I just want to get a Ph.D and not become an actual academic, but when I look through the literature I’m mainly looking for papers about some precise question of interest to me at the moment.  Sometimes these have few citations or are in low-tier journals because I need the answer to a question that’s not asked very often, not because the papers are bad.  When I’ve been in journal clubs, we read … well, we tried not to read bad or crackpot work, but there wasn’t any talk of “oh we should read things that get published in Nature and stuff because those are The Best Papers.”  (Many of the biggest-name researchers basically have a specific research program they or their lab is working on – it helps to have a consistent brand and be able to continually cite your old work – and this is interesting if you’re interested in that research program, but only reading this stuff will give you a blinkered view of the range of research that is actually happening.  And the trendy research programs from decades ago are not always remembered positively today, if at all.)

I don’t mean this as Bulverism, but while reading the post I got the impression that the writer is trying to resolve the cognitive dissonance between the academic mindset currently required of them and other ideas they already had about how inquiry should work, and has tried so hard to convince themselves the academic mindset is correct that they’ve ended up espousing an exaggerated, doctrinaire version of it.  (If you are unsettled by the tension between two ideas, one way to resolve that tension is to take one side and become sufficiently hardline about it that there no longer appears to be any validity to the other side.)

Also, it’s weird because I’m looking at the books he cites as “going wrong.”  Zombie Economics was written by Quiggin who is, among other things, an incredibly prolific economics professor (with the requisite publications and what not), Galbraith was also a published phd/economics professor at Harvard,etc.  

I find the essay confusing because the author is defining a “core” so narrow that reading books his own source-judging methods (publications,impact,knowledge in the community,etc) parses out as “mainstream” apparently starts you behind the starting line? 

Yes, I found that confusing too.  As far as I can tell the key distinction is between “reading books” and “reading the research literature, with the knowledge of all the tacitly assumed shared knowledge used by researchers.”  By “books” he doesn’t just mean popularizations, as he mentions Mankiw’s textbook and Keynes’ General Theory.  And he allows the hypothetical “kid” to read some papers too.  But he says

You can’t read just one research paper and understand it if you’re not part of the field, not unless you’re a once-a-generation genius.

His whole position here seems incoherent to me.  If none of us can make sense of the real research literature without some sort of intensive many-years-long education process (which we can undergo in at most one field per lifetime), then surely the best we can do is trust what the experts tell us in popularizations?  After all, they have the mystical-ability-to-read-papers, if anyone does.  But apparently, trying to gain some (imperfect) knowledge in this way will, like, totally destroy your mind or something.  (”And he’s screwed. He will never be a good economist. He will never be much of a thinker at all.”  Good grief!)  What are we all to do?  Become radical skeptics?

I agree with a softened, de-hyperbolized version of what I think his point is: academics assume a lot of implicit background knowledge when talking amongst themselves, papers are a form of “academics talking amongst themselves,” and so they’re easy to misinterpret unless you’ve had a lot of actual real-time conversations with researchers themselves.  That’s true.

But IME, tacit knowledge is far less crucial than he’s making it out to be, and more importantly, tacit knowledge is often wrong, with its falsehood more difficult to notice because it’s not explicitly talked about.  Groundbreaking work in many fields often involves questioning implicit assumptions in prior research; examples in his own field include the Lucas critique and loss of Pareto optimality under information asymmetry (both of which led to Nobel prizes!).

(via furioustimemachinebarbarian)

there is no “mainstream consensus” among intelligence researchers

vaniver:

nostalgebraist:

vaniver:

nostalgebraist:

How’s that for a clickbait title? ;)

The motivation for this post was a tumblr chat conversation I had with @youzicha.  I mentioned that I had been reading this paper by John L. Horn, a big name in intelligence research, and that Horn was saying some of the same things that I’d read before in the work of “outsider critics” like Shalizi and Glymour.  @youzicha said it’d be useful if I wrote a post about this sort of thing, since they had gotten the impression that this was a matter of solid mainstream consensus vs. outsider criticism.

This post has two sides.  One side is a review of a position which may be familiar to you (from reading Shalizi or Glymour, say).  The other side consists merely of noting that the same position is stated in Horn’s paper, and that Horn was a mainstream intelligence researcher – not in the sense that his positions were mainstream in his field, but in the sense that he is recognized as a prominent contributor to that field, whose main contributions are not contested.

Horn was, along with Raymond Cattell, one of the two originators of the theory of fluid and crystalized intelligence (Gf and Gc).  These are widely accepted and foundational concepts in intelligence research, crucial to the study of cognitive aging.  They appear in Stuart Ritchie’s book (and in his research).  A popular theory that extends Gf/Gc is knows as the “Cattell–Horn–Carroll theory.”

Horn is not just famous for the research he did with Cattell.  He made key contributions to the methodology of factor analysis; a paper he wrote (as sole author) on factor analysis has been cited 3977 times, more than any of his other papers.  Here’s a Google Scholar link if you want to see more of his widely cited papers.  And here’s a retrospective from two of his collaborators describing his many contributions.

I think Horn is worth considering because he calls into question a certain narrative about intelligence research.  That narrative goes something like this: “the educated public, encouraged by Gould’s misleading book The Mismeasure of Man, thinks intelligence research is all bunk.  By contrast, anyone who has read the actual research knows that Gould is full of crap, and that there is a solid scientific consensus on intelligence which is endlessly re-affirmed by new evidence.”

If one has this narrative in one’s head, it is easy to dismiss “outsider critics” like Glymour and Shalizi as being simply more mathematically sophisticated versions of Gould, telling the public what it wants to hear in opposition to literally everyone who actually works in the field.  But John L. Horn did work in the field, and was a major, celebrated contributor to it.  If he disagreed with the “mainstream consensus,” how mainstream was it, and how much of a consensus?  Or, to turn the standard reaction to “outsider critics” around: what right do we amateurs, who do not work in the field, have to doubt the conclusions of intelligence-research luminary John Horn?  (You see how frustrating this objection can be!)


So what is this critical position I am attributing to Horn?  First, if you have the interest and stamina, I’d recommend just reading his paper.  That said, here is an attempt at a summary.

Keep reading

I disagree with several parts of this, but on the whole they’re somewhat minor and I think this is a well-detailed summary.

Note how far this is from Spearman’s theory, in which the tests had no common causes except for g! 

Moving from a two-strata model, where g is the common factor of a bunch of cognitive tests, to a three-strata model, where g is the common factor of a bunch of dimensions, which themselves are the common factor of a bunch of cognitive tests, seems like a natural extension to me. This is especially true if the number of leaves has changed significantly–if we started off with, say, 10 cognitive tests, and now have 100 cognitive tests, then the existence of more structure in the second model seems unsurprising.

What would actually be far is if the tree structure didn’t work. For example, a world in which the 8 broad factors were independent of each other would totally wreck the idea of g; a world in which the 8 broad factors were dependent, but had an Enneagram-esque graph structure as opposed to being conditionally independent given the general factor would also do so.


When it comes to comparing g, Gf, and Gc, note this bit of Murray’s argument:

In diverse ways, they sought the grail of a set of primary and mutually independent mental abilities. 

So, the question is, are Gc and Gf mutually independent? Obviously not; they’re correlated. (Both empirically and in theory, since the investment of fluid intelligence is what causes increases in crystallized intelligence.) So they don’t serve as a replacement for g for Murray’s purposes. If you want to put them in the 3-strata model, for example, you need to have a horizontal dependency and also turn the tree structure into a graph structure (since it’s likely most of the factors in strata 2 will depend on both Gc and Gf).


Let’s switch to practical considerations, and for convenience let’s assume Caroll’s three-strata theory is correct. The question them becomes, do you talk about the third strata or the second strata? (Note that if you have someone’s ‘stat block’ of 8 broad factors, then you don’t need their general factor.)

This hinges on the correlation between the second and third strata. If it’s sufficiently high, then you only need to focus on the third strata, and it makes sense to treat g as ‘existing,’ in that it compresses information well.


This is the thing that I disagree with most strenuously:

In both cases, when one looks closely at the claim of a consensus that general intelligence exists, one finds something that does not look at all like such a consensus. 

Compared to what? Yes, psychometricians are debating how to structure the subcomponents of intelligence (three strata or four?). But do journalists agree with the things all researchers would agree on? How about the thugs who gave a professor a concussion for being willing to interview Charles Murray?

That’s the context in which it matters whether there’s a consensus that general intelligence exists, and there is one. Sure, talk about the scholarly disagreement over the shape or structure of general intelligence, but don’t provide any cover for the claim that it’s worthless or evil to talk about a single factor of intelligence.

Keep reading

The context I have in mind is me talking to other people who are normally interested in talking about thorny methodological issues and contrarian academic positions.

For my part, I think that if by “exists” we mean “compresses information well,” then we can automatically get “g exists” from “positive manifold + high correlations.”

My claim is twofold: first, “compresses information well” (along with some other claims about durability) is the standard usage of “g exists,” and if one wants to use a subtle meaning, one should use a subtle phrasing. The statement “g isn’t causal” can’t be misinterpreted in the way that “g doesn’t exist” can.

To borrow an example from climate change, saying “global warming has stalled,” while correct for the standard definition of “global warming,” is generally misleading because the defect is in the definition more than the prediction; recent energy imbalance has mostly been going into the deep oceans, which historically aren’t counted as part of that definition, but are probably still relevant to the overall problem. The statement “global warming has been mostly affecting the deep ocean recently” points at the same issue but in a way that makes clear that we’re talking about a subpoint that doesn’t contradict the main point, which is accumulating energy imbalance to the Earth.

(I have seen at least three people using Shalizi’s critique as support for the belief that IQ is meaningless, not the more specific claim that Spearman’s specific hypothesis of a single causal g is wrong. This is why I respond to disagreements like this, and I think that attempts to attach the standard meaning to the opaque phrase “positive manifold” is basically obscurantist.)

Other responses below the fold:

Keep reading

Keep reading