@slatestarscratchpad‘s latest post, “Against Individual IQ Worries,” is largely about the dangers of treating population means as though they were typical individual cases. It’s everywhere!

@slatestarscratchpad‘s latest post, “Against Individual IQ Worries,” is largely about the dangers of treating population means as though they were typical individual cases. It’s everywhere!
Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations? (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)
Hmmm. Convergence (in the sense that a Taylor series converges to the function it represents) is unmetrizable.
How is this handled in calculus of variations?
Also I’m not entirely sure what you’re thinking of as a criterion. Could you give an example of two very close differential equations?
No idea how it’s handled in calculus of variations. Huh.
Here is an example inspired by the fine-tuning thing. We have a differential equation with coefficients in front of the terms. We talk about how “if the coefficients were a tiny bit different, the behavior of the solution would be very different.” Now we can imagine keeping the coefficients fixed but varying the equation, by adding new small terms (i.e. slightly varying their coefficients starting at zero), or by changing an existing term (change an exponent in a continuous way, say). Now, for some sort of change in the solution, you can talk – in a casual way at least – about how little you need to change the equation to get a “comparable” or “comparably large” change in the solution.
(We may be talking about sudden phase transition-like changes in the solution, so the changes themselves may not be continuous, but you might have a sense of distance for equations like “it is this much change of exponent away from some given transition”)
For any one equation, this just seems like a mildly amusing game, but could there be any regularities across many equations (or many definitions of “comparable”), so that there might be general facts?
This is almost covered by the theory of stochastic differential equations. That’s the theory of differential equations where you add a random function of time to one side. Usually the random function is “white noise”, technically known as a Wiener process, but you can pick any distribution on the space of functions that you like. The theory of these SDEs is well understood, highly applicable, and moderately beautiful.
Adding a random function of space is also a known thing. Usually people use a Markov random field, which is a generalization of a Markov process to multiple dimensions. That’s usually used to perturb a PDE, but you’re interested in perturbing an ODE. People have done that too. I remember seeing some beautiful visualization of the motion of an electron beam through a potential given by a Markov random field.
It sounds like you’re asking about adding a random function of both space and time. I’m not familiar with SDEs perturbed by a random function of both space and time, but I figure somebody must have thought about it. It seems like a reasonable generalization, now that you point it out.
Either you are misreading me, or I’m confused what this has to do with my question. I know what SDEs are. I’m not necessarily interested in ODEs rather than PDEs, in fact the reverse (although either is fine). And I don’t see how SDEs or their extensions give us a metric or measure on the space of differential equations, or correspond to the sort of thing I wondered about the last paragraph of my second post.
(via identicaltomyself)
Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations? (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)
Hmmm. Convergence (in the sense that a Taylor series converges to the function it represents) is unmetrizable.
How is this handled in calculus of variations?
Also I’m not entirely sure what you’re thinking of as a criterion. Could you give an example of two very close differential equations?
No idea how it’s handled in calculus of variations. Huh.
Here is an example inspired by the fine-tuning thing. We have a differential equation with coefficients in front of the terms. We talk about how “if the coefficients were a tiny bit different, the behavior of the solution would be very different.” Now we can imagine keeping the coefficients fixed but varying the equation, by adding new small terms (i.e. slightly varying their coefficients starting at zero), or by changing an existing term (change an exponent in a continuous way, say). Now, for some sort of change in the solution, you can talk – in a casual way at least – about how little you need to change the equation to get a “comparable” or “comparably large” change in the solution.
(We may be talking about sudden phase transition-like changes in the solution, so the changes themselves may not be continuous, but you might have a sense of distance for equations like “it is this much change of exponent away from some given transition”)
For any one equation, this just seems like a mildly amusing game, but could there be any regularities across many equations (or many definitions of “comparable”), so that there might be general facts?
(via just-evo-now)
Esther [to the tune of “I’ve got soul, but I’m not a soldier”]: “I’ve got norms, but I’m not a normie … ”
Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations? (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)
Thus daseinisation
‘brings-a-quantum-proposition-into-existence’ (the hyphens are very important) by hurling it
into the classical snap-shots of the world provided by the category of contexts.
(Note: I think I made this post too long by going on too many digressions. A short version capturing the main point would probably be better overall, although this at least gave me the chance to ride some entertaining hobby-horses.)
1.
I’ve noticed that several seemingly unrelated frustrations of mine can all be classified as “people should care more about entire probability (or frequency) distributions, rather than summary statistics like averages.”
This is frequently a problem in academic papers. Many of the problems with that godawful marijuana paper I posted about earlier involved the authors doing complicated things to dredge individual numbers (p < .05, etc.) out of their small sample, when with only n=22, a set of histograms would have been much more informative. With only 22 people, your statistical power probably isn’t very high, so it’s hard to tell what it means that you can or can’t get p < .05 for something. But if there’s an effect on a particular metric, we should be able to see it just by plotting a histograms of heavy users on that metric and one of light users.
Indeed, those histograms would contain many other interesting facts. They would tell you, for instance, whether a given “average” effect was the result of everyone experiencing roughly that effect, or the result of half of the people experiencing no effect while half experience one twice as big, or whatever. It would let you see differences that wouldn’t show up in a t-test, where a distribution changes shape while still having a similar mean. It would let you see the difference between statistical and practical significance – you could see when there’s a big effect that the study just doesn’t quite have the power to detect, and when there’s a statistically significant but tiny difference that’s swamped by individual variability. (You can usually infer the latter from std. devs. if they’re supplied, but this isn’t always possible, and the former is usually invisible.)
You get all of these things for a simple reason: the distribution implicitly contains all of the other information. Every derived number you see in a statistical paper came from a distribution (or collection of them), and if you knew the distribution, you could re-derive all the numbers. But it doesn’t go the other way: you can’t re-derive the distribution from the numbers.
(Unless you know the distribution has a parametric form, but this is irrelevant to real data; “these frequency counts could have come from this Gaussian” does not let you reconstruct the original counts, and the original counts contain more information, e.g. information that might lead you to disagree with the assertion about Gaussianity.)
There’s kind of a tradeoff here, since I’m literally talking about exhibiting pictures and determining things by “seeing them in the picture,” which is uncomfortably subjective. If you wanted to make it more objective, you’d have to come up with numerical proxies for the judgments you’d be making visually, which gets you back to … exactly what I was trying to get away from.
But it isn’t really that stark. One thing that would vastly improve a lot of papers I read is just including more histograms, even if they also included all the same derived numbers. Additionally, even if we’re deriving numbers, it’s possible to have an attitude that pays more “due respect” to the distribution. This is a reason to prefer nonparametric tests, but parametric tests are fine too as long as you can justify using them. For instance, a lot of standard intuitions (about the interpretation of the mean and tests that rely on it) break down for data that is not unimodal. But a lot of data is unimodal, so this may not be a problem. But it’s very rare for authors to just tell me the data is unimodal, even though it’d just take a few words to do so. (A histogram would also help, and most of the cases where this information is included are ones where it’s included implicitly via histogram.)
I want to focus on this unimodality issue more, because it’s central to the problem. We have a habit of simplifying a distribution down to a single number, usually a mean; if a second number is included, it’s some measure of spread around the mean. A bimodal distribution can’t be captured in one number, and a mean-and-spread won’t capture it either. So implicit in our whole way of talking about results in social and medical science is that everything is unimodal, or else nothing would make sense. Indeed, this is a problem even for unimodal distributions that are skewed – since the mode, median, and mean are different, the mean (which is typically the number reported) is a poor guide to the “typical” case, either in the sense of “most common” (mode) or “50th percentile” (median).
Here’s an example from another marijuana paper (PDF). This paper had a really intriguing result – previous studies had shown that cannabinoid receptor availability is suppressed in regular users but bounces back somewhat after 28 days of abstinence, and this study showed that most of the bounce-back (~75%) happens within just 2 days, with the remaining 26 just adding some extra on top.
However, only means are reported, so that this time trajectory (“75% of the 28-day improvement in 2 days, the remaining 25% in 26 days”) may not have occurred in any experimental subject, like the proverbial “average American family” that has 2.5 kids even though literally no one has 2.5 kids. It could be that some people bounce back completely in 2 days (or fewer) while others improve slowly and linearly; it could be that some people improve fast while others don’t improve at all; it could be that everyone improves exponentially with the same half-life, so people who started lower bounce back faster on a linear scale; it could be that you can only bounce back if you’re above a certain threshold, but if you are then it’s fast. All of these possibilities have strong and different implications for individual users.
When I was thinking about that other study, I joked with myself that you’d get higher-quality information just from talking to a few stoners. On reflection, I think this is less of a jokey exaggeration than I realized. These studies have small samples (the one I just talked about had 11 dependent subjects and 19 controls). These sample sizes are in kind of a transitional regime between case studies and proper statistical samples. Because tests with a small sample will have low power, I feel wary of drawing any conclusions from the observed patterns of significance and non-significance (although these are often presented as the main results). Since I don’t think you can do much with derived quantities (of the sort that usually get derived), I am correspondingly more interested in individual cases.
After all, if nothing else, we have a collection of individual cases here, and the “case study approach” can still be interesting with very few cases, while the statistical approach cannot. If your sample size is three, and you give me detailed info about all three cases, I have at least learned about 3 things that can happen to human beings. If the variance is high, all the better: now I know about 3 quite different things that can happen to human beings. But if your sample size is 3 and you only report the results of statistical tests – which are all going to turn out non-significant, probably – I have learned nothing from you.
So, if you know (or know people who know) 11 stoners, you have 11 (colloquially presented) case studies. This provides a lot of information, if not about overall trends, then about the sorts of things that can happen in individuals – which, after all, is what all of this (i.e. medical and public health research) is supposed to be about. I can even start to get a sense of the relative frequencies of distinct subgroups: if 3 of 11 stoners experience Pattern X while the other 8 experience Pattern Y, well, I’d like more data, but that’s already suggestive.
But if you form an averaged time trajectory over the 11 and never give me more detail about the distribution beyond that coarse average, I don’t have any Pattern Xs or Pattern Ys, I just have a mean pattern that may not correspond to anyone’s story. I have graphs like this (from the paper):

Here we have a picture of the Average Stoner During A Tolerance Break, who may not resemble any particular stoner at all, like the average family with 2.5 kids. Those error bars aren’t quantiles, BTW, they’re SEM, so we don’t even have any skew information here.
Rather than providing us with anything about individual trajectories, the authors concentrate instead on p-values. Their reasoning – and I hope I am committing a misreading here! – is based on one of those errors they warn you about in Stats 101 classes: they are interpreting non-significance as conveying positive information about the world. They present it as a big deal that while they got significance between stoners and non-stoners, the result is no longer significant after 2 days of marijuana abstinence:
Compared with HC subjects, [11C]OMAR volume of distribution was 15% lower in CD subjects (effect size Cohen’s d of 1.11) at baseline in almost all brain regions. However, these group differences in CB1R availability were no longer evident after just 2 days of monitored abstinence from cannabis. [my emphasis]
Of course, the p-values would all slide downwards with a bigger sample, so if anyone does larger studies of this, they will predictably “find” that it takes longer than 2 days to lose significance.
Right next to that figure is another one, with a more appropriate vertical axis, which shows what the authors mean by a “no longer evident” difference:

That’s right: even after 28 days of abstinence, they’d closed less than half the gap between them and non-users. But since the sample size is small, they could only get p=0.27 for this clearly-there difference. (Remember, the bars here are SEMs, so the STDs will be a lot bigger.)
This had been a bit of a digression, since I’m not sure these mistakes about null results have much to do with “respecting distributions.” But I do think I can justifiable use this as another example of “this is your brain on summary statistics.” Some of these mistakes are probably due to the emphasis on “significant = important” that is ingrained by publication criteria, but it also evinces a willingness to discard a lot of the information in your data.
To provide a Gallant to pair with the Goofus above, here’s yet another weed paper with a small sample. They provide a lot of fine-grained, bimodality-tolerant, case-study-like detail:
The general trend was a decrease in blood pressure within the first 43 min after onset of smoking, but an initial increase in blood pressure was observed among some participants. Concerning individual mean arterial blood pressure, the largest decreases in mean arterial blood pressure were observed with the high THC dose with drops up to 41% below baseline (from 121 to 71 mmHg). Subjects 2, 23 and 12 showed the greatest decreases in mean arterial blood pressure whilst their THC serum concentration was 34, 213, and 137 μl/L, respectively, at 43, 17 and 7 min after onset of smoking. Subjects 10, 22 and 19 showed limited initial increase in mean arterial blood pressure (up to 37% above baseline, from 87 to 119 mmHg). Mean arterial blood pressure was still below baseline levels 8 h post-smoking for a majority of the participants.
And they even make plots where they just throw together every single participant’s time course:

Admittedly these look ugly, and I’m sure there are much nicer ways of presenting the same information. Still: this is the “ask some stoners” of graphs, and I mean that as high praise. These graphs can answer many questions you might want to ask, even if the researchers didn’t ask them: what different types of trajectories are possible, the range at any time, where the distribution is peaked and how far away the unusually high/low trajectories are, etc. Admittedly, you could be asking these questions more rigorously than by eyeballing a figure – but the authors probably aren’t going to answer every such question rigorously, so these pictures (like histograms) provide an indispensable supplement.
2.
Speaking of public health issues, I think I also see the downstream effects of these bad habits on the doctors who consume medical research. (I would imagine there are similar effects on people who act upon social science research.)
I complained a while back about how I received different responses from different psychiatrists (and my GP) about benzodiazepenes. It seemed like each doctor had a single opinion about benzos, and didn’t adapt it much to the patient. The “benzos are bad” doctors would be unmoved when I mentioned I’d been on the same dose for years, even though “people have to keep taking higher and higher doses” is one of the reasons the “benzos are bad” idea exists. I think there was an element of “cover your ass” here, but it felt like the usual presumption of expertise in doctor-patient relationships was breaking down, as each doctor would refer to a supposed “standard opinion” which happened to concur with their own, clearly non-universal opinion.
Benzos are a case where lack of unimodality is important. As I mentioned, one reason why doctors are wary of benzos is tolerance, specifically the need to ramp up the dose more and more over time. It is true that some patients exhibit this pattern when prescribed benzos. But then, there are those (like me) who don’t. From this article “reappraising” benzos [below, “BZDs”]:
Although there are occasional reports of patients with anxiety disorders who increase the dose of BDZs to continue experiencing the initial anti-anxiety effect or who experience a loss of therapeutic benefit with the continuing treatment with BDZs, a body of evidence shows that the vast majority of patients with anxiety disorders do not have a tendency to increase the dose during long-term treatment with BDZs [30,69–72]. Therefore, tolerance to anxiolytic effects of BDZs usually does not occur in the course of long-term treatment. When patients increase the dose of BDZs, this usually appears in the context of other substance misuse.
This suggests bimodality, or a unimodal distribution not very well represented by its peak. There is a population subtype that requests increasing doses, and people outside that subtype generally do not.
Likewise, a common concern is the withdrawal syndrome (or equivalently “dependence,” which is “characterized by the symptoms of withdrawal upon abrupt discontinuation and no tolerance”). But again, this is not universal, and this may be another issue of “population subtypes”:
Withdrawal symptoms occurring after an abrupt cessation of long-term BDZ use are not inevitable; such problems were reported in approximately 40% of individuals taking BDZs regularly [80,81] and they were more likely in people with personality disorders, especially those with passive-dependent personality traits [82,83] (ibid.)
(I myself have abruptly stopped taking BZDs and then not taken them for periods of several weeks, and I’ve never experienced withdrawal symptoms. I am the 60%.)
We have distinct desired and undesired patterns, so it would seem that the clinician’s task is to think about whether their patient is displaying (or likely to display) the undesired pattern, and act accordingly. Instead, what we have gotten is a one-size-fits-all concept which says that, “overall,” BZDs can cause worrying tolerance and dependence issues, and so one should treat them warily as a last resort. This means that even when evidence about my own personal BDZ use over 3+ years is available, doctors prefer to consult assessments of BZDs and SSRIs “overall,” throwing away the distribution in favor of a single number.
(I don’t want to play up my own frustrations about this, which are very minor as frustrating medical care goes; I’m using myself as an example solely because it’s a case I’ve read a bit about.)
(Some of the oddness I am trying to explain is no doubt the result of the pharma industry pushing heavily for the SSRIs, which are newer than BZDs. For instance, it’s gradually been realized (by the profession – more quickly by patients, one would assume) that SSRIs can have a bitchin’ withdrawal syndrome themselves.)
3.
This problem also appears in many conversations that are not, superficially about statistics. A familiar example is conversations about attractiveness. A lot of people talk as though there’s just one scale of attractiveness, which would only be true if the assessors of attractiveness had unimodal (and strongly peaked) preferences. In concrete terms: if you think that the way for any straight man to become more attractive is to improve him on a single “what women want” metric, you are assuming that straight female preferences are (if not all literally identical) strongly peaked around a single mode, so that boiling the distribution down to a single number is a reasonable approximation.
Everything in my relevant experiences suggests this is false. There are people who like all sorts of things, and there are things far from the mean/median that are nonetheless interesting to many (perhaps there is another mode at these points). This does not mean that scales of attractiveness do not exist, but that there are multiple such scales worth considering (at least one per mode), and that climbing the nearest scale is probably better than climbing the mean/median scale.
There is something analogous in political attitudes that assume sociological groups are homogeneous blocks. This post is already too long, though.
Had a bad dream last night that was a long series of false awakenings. I would “wake up,” some weird/bad stuff would happen, I would realize I was dreaming, and then shortly thereafter I would “wake up” into another version of the dream. Eventually I wised up to the point that I would instantly gain lucidity whenever anything strange would happen, but that just made the cycle go faster.
I think this has happened to me before? It feels familiar. Anyway, it’s a cleverly horrifying concept, so good job brain
this has been stuck in my head all day but i don’t know what i’m meant to do with it
[image: a three by three alignment grid, similar to a dungeons and dragons alignment grid, of empty boxes. the first row is labeled same worm, same hat, and same mood, the second oh worm, oh hat, and oh mood, and the third big worm, big hat, and big mood.]
(via guywife)