Install Theme

prophecyformula:

I think people should [learn to use simple causal inference tools].  I want people who aren’t me to do it, because other people involved in these conversations know more about statistics than I do, and there are subtle statistical issues involved.  (As I found out yesterday, using two of the many different causal search algorithms in TETRAD gave me two completely different results, so one has to pay close attention to what background assumptions are made by each method.)  But the barrier to entry to simply using TETRAD, as opposed to using it shrewdly, are very low.  I learned it in 30 minutes.

tl;dr in this particular case I played the role of “the TETRAD guy” – someone else had a data set and did conventional methods to it, I took the data set and did TETRAD to it.  However, there is no reason I should have this role.  Other people would probably be better at it than I am, and TETRAD can be learned in half an hour.  And if we all learned to use it, we’d never need to waste any time arguing about “but what would Clark Glymour say?  does it matter?” because we could just do what he would do.

nostalgebraist

I heartily endorse this (and indeed I have brought shame upon my family by not learning to use TETRAD yet). 

I want to raise the question of: how much value is added by learning to use TETRAD vs. learning to use TETRAD shrewdly? To illustrate what I mean, consider sabermetrics. Bill James was not a professional statistician; he was just some dude working in a pork and beans plant. Most of the stuff he did is way less rigorous than what passes for statistical analysis in scientific journals today, and even those “more rigorous” analyses are pretty crap. But James’ work was vastly better than what came before, and, better, it opened up new vistas. Even today, most of the writers at Fangraphs limit themselves to tools that R.A. Fisher could have used in his lifetime, but they manage to use those tools to have interesting and fresh insights about the game of besoboru.

On the other hand. Consider the case of the Gaussian copula. David Li is a very smart man; he earned a Ph.D. in statistics, qualified as an actuary, worked as a quant for several I-banks. These are not things that dumb people do. But when he used a better-than-previously-existing but imperfect model to quantify the risk of certain financial derivatives, and people followed him in that, it ended up sort of blowing up the economy.

I guess the crucial question is, are we in a Jamesian regime, wherein playing with TETRAD can give us robust and useful insights even if we’re not all that rigorous about it? Or are we in a Li-type regime, where doing that will blow up in our faces?

I guess it depends on what kinds of results are considered “blowing up in one’s face” here?

Causal search looks at aspects of the data that regression doesn’t (specifically, conditional independences), and there’s a good argument to be made that this facet of the data can give you some good pointers about what sort of causal structure you’re seeing.

I think one of the strengths of this is that it makes the causal stories part of the statistical analysis, rather than setting them off as a subsequent, qualitative “storytelling” step.  In practice, people often do a regression, which in itself says nothing about causation, and then make some problem-specific argument about the direction of causation (say, B can’t cause A because A temporally precedes B, or B probably doesn’t cause A because that would just be really weird, or … ).

TETRAD actually lets you include these kind of “stories” (the GUI has “Knowledge Nodes” which let you encode things like “A can’t cause B” or “variables in this ‘tier’ are all downstream from those in this other ‘tier.’ ”), which means that the storytelling is part of the quantitative statistics.  You give it the arguments you would make if you were writing up an “interpretation” of a regression, and it quantitatively combines those with the data and tells you things about which causal structures it probably has.

The danger here is the danger that always comes with operationalizing something, which is that it can lull you into not thinking.  At least with regressions everyone knows there’s interpretation going on; with something like TETRAD, it can feel like you just plug everything you know (data and background knowledge) into a magic black box, and out pops reality!  But of course, causal search algorithms are only good as their assumptions (and even then, only guaranteed to be good in the asymptotic limit).  Just as with regressions, the operation can’t be treated as a black box that obviates thought.

I guess I am kind of blasé about this because I feel like the current situation, with regression and correlation, is so bad that it’s hard to complain about anything else.  Whether or not we use causal inference tools, we are going to weave causal stories anyway, and often we are going to do it very badly, using less information than we actually have (no conditional independence).  Causal search includes the cause-weaving as part of the statistical procedure rather than separating it out, which doesn’t make it infallible, but does remove “no one understands causation anyway” as a catch-all defense for any story – that is, if TETRAD says all sorts of causal structures are consistent with your data, then that is evidence in favor of agnosticism, and no one can say “well all causal stories are interpretation, so I’m sticking with mine, thanks.”  Operationalizing this stuff can, in other words, clarify our level of ignorance: sometimes the causal structure will be relatively clear, and sometimes it won’t, but never will you be able to fall back on “no one really knows anyway, so I’m believing X and you can’t stop me.”

In other words, as long as you keep all the background assumptions in mind and don’t treat TETRAD as a magic black box, the advantage of TETRAD is that sometimes it’ll tell you a really clear, equivocal story, and sometimes it’ll tell you a murky ambiguous story, and this is a distinction you could never get from a regression.  Regression and other standard tools have the flaw of making it look like all causal analysis is equally “sinful,” so your confidence in any causal story is just a matter of personal chutzpah.  Where TETRAD can let you say, “no, all the search methods orient that edge this way with low p value!” or “well really all sorts of graphs are consistent with the data … ”

  1. nostalgebraist reblogged this from epistemic-horror and added:
    I guess this a case where the “finnickiness” of causal search that su3su2u1 mentions has an upside. The nice thing here...
  2. epistemic-horror reblogged this from nostalgebraist and added:
    Right, I think my concern is that it’s really easy to push the background assumptions into, um, the background, and...