Install Theme

In a recent post, Scott linked an interesting paper about controlling for statistical confounders.  The paper draws some pretty damning conclusions, all based on the simple idea that you’re never really controlling for X, you’re controlling for your imperfect proxy for X.  Since the proxy is imperfect, if you’ve measured some associated variable Z, it’ll usually give you information about the true value of X above and beyond what your proxy tells you, and the usual approach mistakes this for an independent effect of Z above and beyond its association with X.

That’s very interesting, but it strikes me as just one facet of a bigger issue with statistical controls which has always unsettled me.  There is something oddly backwards about the whole idea.


You, the scientist, want to publish an exciting new study about some variable of interest called Y.  Everyone knows about ten different variables that “obviously” affect Y; call these, collectively, X.  A study saying “X affects Y!” would not be new or exciting.  No, you want to say that some other variable, Z, affects Y.  No one has discovered that yet.

A problem arises: Z is also associated, in various ways, with various of the ten components of X.  What if the correlation between Z and Y (or nonzero regression coefficient, or whatever) is just due to the already known X-to-Y association?  How can you tell?

The usual answer is: make some sort of model predicting Y from both X and Z, and show that the model uses some information from Z to predict Y, even though it knows about X, too.  Success!  Now you can claim that Z is associated with Y.  You are now free to forget about your model, which was merely a tool you used to draw this conclusion.  You didn’t really care about predicting Y, and you don’t care whether your model is the best model for predicting Y, or even a good one.  It has served its purpose, and into the dumpster it goes.


As I said, there is something backwards about this.  Your claim about Z and Y depended entirely on Z helping some model predict Y.  Clearly, the strength of your argument must depend on the quality of this model.  If the model is a bad model of the relationship between X and Y, before Z is even added to the picture, then it’s hard to conclude anything from what happens when you add in Z; if your model doesn’t capture the relationships we think are there in the first place, its use of Z could just be an attempt to “put them back in.”

(For example, someone’s BMI is inversely proportional to the square of their height.  The electrostatic force between an electron on someone’s head and an electron on their heel is also inversely proportional to the square of their height.  Suppose, absurdly, that someone tries to model the relationship between height and BMI by doing linear regression on the two.  This will fare poorly, because the relationship is inverse-square, not linear.  But if they add in the electrostatic force as a regressor, it will of course have a nonzero coefficient, and predict BMI much better than the height term.  This does not show that this force is associated with BMI “even controlling for height”!)


This was brought forcefully to my attention recently when I was reading a recent study about alcohol consumption and mortality.  The big punchline was that, in a huge meta-analysis, it only took something like 7 standards drinks / week (not the 14 specified in the US guidelines) to negatively impact mortality.

There was a big problem with this claim that has nothing to do with this post, namely that the researchers meant “the confidence interval for 7 drinks / wk just barely excluded no effect” (it was nearly symmetric about a hazard ratio of 1.0).  This is the same old problem where people try to figure out when an effect “turns on” or “turns off” by noticing when they start being able to reject the null, which is the kind of thing you are taught not to do in Stats 101 but which is nonetheless endemic in the medical literature.

But anyway, even after facepalming over that, I was curious about how the study adjusted for confounders.  So many things are associated with mortality, and so many things are associated with alcohol consumption – how do you disentangle it all?  And the authors clearly tried to do their due diligence on this front.  My eyes started to glaze over as I read the list of confounders they controlled for:

HRs were adjusted for usual levels of available potential confounders or mediators, including body-mass index (BMI), systolic blood pressure, high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), total cholesterol, fibrinogen, and baseline measures for smoking amount (in pack-years), level of education reached (no schooling or primary education only vs secondary education vs university), occupation (not working vs manual vs office vs other), self-reported physical activity level (inactive vs moderately inactive vs moderately active vs active), self-reported general health (scaled 0–1 where low scores indicate poorer health), self-reported red meat consumption, and self-reported use of anti-hypertensive drugs.

My first reaction upon reading this was to think, “okay, some of these may or may not have been poorly operationalized, and that may have affected their results in problematic ways not captured in the sensitivity analyses in their appendix, or maybe not, because how the fuck would I know when there’s so much going on in their mortality model?”

And then I was like, wait.  They have a “mortality model.”  They’re only focusing on the coefficients for one variable, but it’s got a zillion variables in it.  It sounds like it could be the sort of model used by the people who are actually interested in predicting mortality as accurately as possible – say, insurance companies – as opposed to people who are just interesting in making claims about alcohol.

But they aren’t telling me how good their model is.  I have no idea if it’s similar to the models the insurance company people use, or if the insurance company people would turn up their noses at it.  Their model was created on the spot to make some claims about alcohol, and even if I spent a day scratching my head and trying to understand it, the next day I might read a paper with another mortality model, and have to repeat the process.  There must be hundreds of models like this, invented on the spot for the purposes of statistical controls, and then discarded.

It feels like there should be someone in charge of maintaining our best models of things like mortality.  Questions about individual variables, like alcohol, could be investigated on a common footing.  Instead, we have hundreds of claims about how some Z affects some other Y, derived from different models, which might not all be true if stitched together into a single framework.

argumate:

davidsevera:

argumate:

davidsevera:

This biography I’m reading unintentionally portrays young Napoleon as a pretentious douche who writes shitty self-insert novellas and keeps almost getting fired for constantly faking illnesses.

you sound almost envious

It’s impressive that he can go from writing stories about how he dies in battle, “pierced by a thousand blows”, after his girlfriend cheats on him to being one of the greatest military geniuses in history.

somewhere out there is an erotic fan-fiction author who will one day reunite the shattered American Republic.

(via argumate)

basslan:

nostalgebraist:

I could be totally off base here, but I am getting the feeling that “link sharing” functionality on social media websites is having a subtle, insidious negative effect on the quality of discussion.

It isn’t just that people are sharing too much “fake news” – i.e., the problem isn’t just that people are sharing bad links, and things are fine when they’re sharing good links instead.  I’m worried about the more basic fact that so much discussion takes the following form:

1. Someone links to an article (news, opinion, or some mix of the two), perhaps with some text expressing their opinion of their content

2. Other people comment, expressing their own opinions of the content

The problem with this is that it lets the news media set the agenda for online discussion.  When a bunch of journalists are writing about a topic, and writing about it in a way that generates sufficiently “viral” or “shareable” headlines, the internet starts discussing that topic.  When they stop, the internet stops.  (Not completely, but to a larger extent than one might want.)  The very fact that articles are produced by human beings, under the influence of all the usual power structures and human foibles, ends up getting ignored.

I’m not talking about anything super controversial here – it’s not like I think the news media is one big conspiracy or anything.  Just, like, basic media literacy.  Press releases exist, and if you are reading a brief, not very in-depth article about a company doing some new thing – even if the article takes a negative tone! – it is probably based on a press release.  Publicists exist.  I’m sure that, for example, Jordan Peterson has people managing his relations to the media, and that the recent flareup of Peterson coverage (he’s everywhere!) has been shaped and managed to some extent by those people.  Sponsors exist, and media outlets with gonzo, “unfiltered” aesthetics can in fact be some of the coziest with their sponsors:

Charles Davis, a former Vice freelancer who briefly served as an associate editor, said four stories he wrote or edited during his tenure with the company were killed because they ran counter to Vice’s business interests.

One of his pieces was about the South by Southwest festival in Austin, which relies on thousands of volunteers — potentially in violation of labor laws, according to Davis’s story. The story was in the final stages of being edited, Davis said in an interview, when an editor told him that the piece was being rejected because Vice had a co-sponsorship deal with AT&T at the festival.

“Marketing overruled editorial,” Davis said.

Some months before Davis submitted the South by Southwest story, editors killed another article by him, this one about unpaid labor in the commercial film and TV industry.

Those two rejections were preceded by a story by Davis that Vice did publish. This one was about the use of unpaid labor at competing publications.

Davis was subsequently fired after being told that the company no longer wanted to maintain an editorial staff in Los Angeles, where he was based. Thereafter, he went public with his concerns about editorial meddling, posting a series of screen captures on Twitter that he said were from emails from Vice editors to him enforcing Vice’s “brand” policies.

One of these reads, “Hey, [a senior editor] asked me to remind you that any ‘brand’ mention — basically any mention of a large entity that we might be making some kind of business deal with — should get run up the flagpole” for review by senior managers.

Said Davis: “What I kind of discovered is that Vice is looking to please so many investors and advertisers. You have the freedom [at Vice] to say, ‘Screw the police!’ or ‘Screw Israel!’ but if you say ‘Screw [a sponsor]!’ that’s a different story.”

I don’t think people don’t know these things, in the abstract.  But as long as the discussions happen within the sandboxes delineated by this or that article, the abstract knowledge doesn’t matter.  We have to start, sometimes, at some place other than the place the media spotlight is currently pointing.

but isnt this just essentially the bias and possible conflict of interest that happens … everywhere? idk like i feel like this is sort of similar to my brother claiming that all news sources are biased and thus cannot be trusted,,, like even so, you have to get your news from somewhere. also wouldn’t reading from multiple varied sources counteract a little of this?

also what would be other places to begin the conversation? and would these places be accessible to people unfamiliar with jargon or esoteric knowledge or people without a lot of time on their hands?

I’m not saying the news is less trustworthy than some other unspecified thing you should be paying attention to instead.  I’m saying that not all conversations should start with “hey, so how about this specific news article?“

Our conversations should (frequently) be driven by an interest in what is actually happening around us, not by a need to react to any specific piece of text that’s been written about it.  Reading multiple sources is good – what I’m talking about is speaking from the distinctive understanding that exists in your own head, what you get out of those multiple sources (and everything else), without just reacting first to this article, and then reacting later to this other one, and so on.

Hope that makes sense – I’m pretty tired and having trouble making myself clear

(via basslan)

fuuuuuuuuuuuuuuuuuuuuuuuuuuuuck asked: No worries if you don't have time or simply don't feel like it, but I'd be interested to hear any more thoughts you have on Harkaway's Gnomon.

nostalgebraist:

I’ve been meaning to write a review on Goodreads, but haven’t gotten around to it yet.

Review is here.

I could be totally off base here, but I am getting the feeling that “link sharing” functionality on social media websites is having a subtle, insidious negative effect on the quality of discussion.

It isn’t just that people are sharing too much “fake news” – i.e., the problem isn’t just that people are sharing bad links, and things are fine when they’re sharing good links instead.  I’m worried about the more basic fact that so much discussion takes the following form:

1. Someone links to an article (news, opinion, or some mix of the two), perhaps with some text expressing their opinion of their content

2. Other people comment, expressing their own opinions of the content

The problem with this is that it lets the news media set the agenda for online discussion.  When a bunch of journalists are writing about a topic, and writing about it in a way that generates sufficiently “viral” or “shareable” headlines, the internet starts discussing that topic.  When they stop, the internet stops.  (Not completely, but to a larger extent than one might want.)  The very fact that articles are produced by human beings, under the influence of all the usual power structures and human foibles, ends up getting ignored.

I’m not talking about anything super controversial here – it’s not like I think the news media is one big conspiracy or anything.  Just, like, basic media literacy.  Press releases exist, and if you are reading a brief, not very in-depth article about a company doing some new thing – even if the article takes a negative tone! – it is probably based on a press release.  Publicists exist.  I’m sure that, for example, Jordan Peterson has people managing his relations to the media, and that the recent flareup of Peterson coverage (he’s everywhere!) has been shaped and managed to some extent by those people.  Sponsors exist, and media outlets with gonzo, “unfiltered” aesthetics can in fact be some of the coziest with their sponsors:

Charles Davis, a former Vice freelancer who briefly served as an associate editor, said four stories he wrote or edited during his tenure with the company were killed because they ran counter to Vice’s business interests.

One of his pieces was about the South by Southwest festival in Austin, which relies on thousands of volunteers — potentially in violation of labor laws, according to Davis’s story. The story was in the final stages of being edited, Davis said in an interview, when an editor told him that the piece was being rejected because Vice had a co-sponsorship deal with AT&T at the festival.

“Marketing overruled editorial,” Davis said.

Some months before Davis submitted the South by Southwest story, editors killed another article by him, this one about unpaid labor in the commercial film and TV industry.

Those two rejections were preceded by a story by Davis that Vice did publish. This one was about the use of unpaid labor at competing publications.

Davis was subsequently fired after being told that the company no longer wanted to maintain an editorial staff in Los Angeles, where he was based. Thereafter, he went public with his concerns about editorial meddling, posting a series of screen captures on Twitter that he said were from emails from Vice editors to him enforcing Vice’s “brand” policies.

One of these reads, “Hey, [a senior editor] asked me to remind you that any ‘brand’ mention — basically any mention of a large entity that we might be making some kind of business deal with — should get run up the flagpole” for review by senior managers.

Said Davis: “What I kind of discovered is that Vice is looking to please so many investors and advertisers. You have the freedom [at Vice] to say, ‘Screw the police!’ or ‘Screw Israel!’ but if you say ‘Screw [a sponsor]!’ that’s a different story.”

I don’t think people don’t know these things, in the abstract.  But as long as the discussions happen within the sandboxes delineated by this or that article, the abstract knowledge doesn’t matter.  We have to start, sometimes, at some place other than the place the media spotlight is currently pointing.

fuuuuuuuuuuuuuuuuuuuuuuuuuuuuck asked: No worries if you don't have time or simply don't feel like it, but I'd be interested to hear any more thoughts you have on Harkaway's Gnomon.

I’ve been meaning to write a review on Goodreads, but haven’t gotten around to it yet.

You will find no other product that has gone so far to achieve the ideal, and true, breakfast and snack food.