Install Theme

In today’s edition of “I look up something on Google Scholar, read a highly cited paper in a good journal and it sucks”: Bolla et. al., “Dose-related neurocognitive effects of marijuana use”

  • n = 22
  • binned arbitrarily (?) into three groups of sizes 7, 8 and 7
  • mean and std dev of sample are reported, but no histograms, no way to tell if the binning was at all natural
  • researchers are trying to look at the effects of marijuana use on cognition, but this is confounded because the people in their sample who used more marijuana had lower IQs; to deal with this, they regress all 35 of their cognitive tests on measured IQ, and subtract out the effect of IQ
  • that is, they do an IQ test, and then they do a bunch of other cognitive tests which are presumably correlated with IQ (some of them are literally taken from IQ test batteries other than the one they used)
  • so their variables are cognitive tests, controlled for IQ – which is itself the sum of a bunch of other cognitive tests
  • no principled distinction (as far as I can tell) between the “IQ” cognitive tests and the other ones, e.g. they note approvingly that their IQ battery is correlated with the WAIS-R (r=0.79), then include a test from the WAIS-R among their “other” / “non-IQ” tests
  • no controlling for multiple comparisons
  • that is, they plugged 3 (arbitrary) tiny groups into an ANOVA with 35 dependent variables, and judged comparisons significant each time they had (uncorrected) p < .05 on a post-hoc t-test
  • they found 14 significant comparisons (out of 105 total, 3 pairs times 35 variables)
  • there may be statistical reasons that it’s not just 105 raw comparisons, I’m not sure, but in any case, it’s hard to say this wouldn’t happen by chance when we’re so far from asymptotics (we’re comparing groups of sizes 7, 8 and 7)
  • most of the significant results were .01 < p < .05 (they marked p < .01 separately)
  • the reviewers also explored nonlinear effects, finding some significant ones (no report of how many tests were done total)
  • the authors include two figures (“A” and “B”) to illustrate how IQ and marijuana consumption interact; these have some really weird features which are presumably due to the small sample size
  • like in figure A (Repetition of Numbers Task), for the higher IQ group, performance goes up and then down (it’s best at the “medium” consumption level)
  • while in figure B (Stroop test), the higher IQ group does monotonically better with increasing consumption, with the amusing result that if you take the plot literally, the way to do best on a Stroop test is to have a high IQ and also smoke 94 joints per week
  • wait what does that even mean though, like are these people literally hand-rolling more than 90 individual marijuana cigarettes every single week of their lives??
  • like I’m assuming “joints / wk” is an established technical measure that they can convert to, if people aren’t smoking literal joints, right?
  • the authors assessed this quantity by using a questionnaire called the “DUSQ,” citing a text called “Addictive drug survey manual” by S.S. Smith, which seems to only appear in citations, my university library doesn’t have it, Google Books doesn’t have it
  • trying to look up the “joints / wk” or “joints / day” concept in the literature leads to gems like this paper from 2011, which tried to actually empirically determine the conversion rates between joints and other marijuana consumption units, and found that they were wildly different from the ones assumed (on what basis, one wonders) in another standard questionnaire (not the one used in the paper under discussion, which again is inaccessible to mere mortals)
  • I give up
  • paper has been cited 534 times
  • paper was published in Neurology, which from a cursory glance appears to be the premier journal for, well, neurology

baron-cohen miscellany

(Follow-up to this post)

I want to mention, in a concise way if possible, some of the other problems I’ve noticed in a quick review of the systemizing/empathizing research.  There are a lot of distinct problems, so I will try to be more terse than usual.

Keep reading

(In lieu of a longer post I’ve been planning to write, with the gist “the systemizing/empathizing research is elaborately terrible, and from a cursory glance the people-vs.-things literature looks a lot more rigorous, so it’s funny that the former is so much more popular”)


Act One

image

Simon Baron-Cohen and colleagues: “As can be seen, the results cluster in the SQ-EQ space and do not randomly fill the chart. This suggests that it may not be possible to score anywhere in SQ-EQ space, and that there may be constraints operating, such that SQ and EQ are not independent.”

Me: “uh that just looks like a big blob to me, where are the clusters”


Act Two

image

Me: “Oh huh that’s interesting.  The male and female control groups overlap a lop, so they don’t form clusters per se – looks like the distribution for controls is unimodal? – but now that you mention it, the autistic group is very distinct.  I didn’t see it as a “cluster” in the first plot because you had more control than autistic subjects, so it just looked like variance.  But if you wanted to train a binary classifier to distinguish autistic from control subjects, it’d work pretty well, you’d get a decision boundary covering like most of the far bottom of the graph but excluding a bit on the left – ”


Act Three

Simon Baron-Cohen and colleagues: “Since there is no unique way to break up the results of our data analysis into identifiable groups along the D dimension, we propose a classification based upon the cumulant plot of Figure 2a. This generates 5 brain types […]”

Me: “Wait, five clusters?  I can’t see any support for having that many, but at least you have enough boundaries that you can get the green dots well separated from the others, so whatever, go ahead – ”

Simon Baron-Cohen and colleagues:

image

Me: “what the fuck”