Install Theme

mind viruses about body viruses

I was going to write this as a Slate Star Codex comment, but I’m going to make it a tumblr post tagging @slatestarscratchpad instead, since experience suggests it’s likely to be more widely and carefully read in this form.  (Crossposting to LW too, so you may be reading this there, possibly with mangled formatting.)

The idea frontier

I am getting more and more concerned about the “information epidemiology” of the public conversation about Covid-19.

Here are some distinctive features I see in the public conversation:

1. Information intake must be triaged.

There is a very large amount of new publicly available information every day.  There are no slow news days.  “Keeping up with the story” in the way one would keep up with an evolving news story would be a full-time job.  

Many of us do not have time to do this, and I imagine many of those who do have time cannot tolerate the experience in practice.  In fact, there can be a tradeoff between one’s level of personal involvement in the crisis and one’s ability to “follow it” as a news story.

(I work for a telemedicine company, and after a day of dealing with the ever-changing impacts of Covid-19 on my work, I have relatively little patience left to read about its ever-changing impacts on absolutely everything else.  That’s just me, though, and I realize some people’s mental bandwidth does not work like this.)

2. Abstractions are needed, and the relevant abstractions are novel and contested.

Crucial and time-sensitive decisions must be made on the basis of simulations, abstract mental models, and other intellectual tools.

In some sense this is true of everything, but in most cases we have a better sense of how to map the situation onto some large reference class of past intellectual work.  When there is an economic downturn, the standard macroeconomic arguments that have existed for many decades pop back up and make the predictable recommendations they always make; even though there is no expert consensus, the two or three common expert stances are already familiar.

With Covid-19, this is not so.  All the intervention types currently under discussion would be, in their own ways, unprecedented.  As it struggles to follow the raw facts, the general public is also struggling to get its head around terms and concepts like “suppression,” “containment,” “contact tracing,” etc. which were (in the relevant senses) not part of our mental world at all until recently.

Thus, relative to most policy debates, this one has a strange frontier energy, a sense that we’re all discovering something for the first time.  Even the professional epidemiologists are struggling to translate their abstract knowledge into brief-but-clear soundbites.  (I imagine many of them have never needed to be public communicators at this kind of scale.)

3. There is no division of labor between those who make ideas and those who spread them.

There is a hunger for a clear big picture (from #1).  There are few pre-established intellectual furnishings (#2).  This means there’s a vacuum that people very much want to fill.  By ordinary standards, no one has satisfying answers, not even the experts; we are all struggling to do basically the same intellectual task, simultaneously.

None of us have satisfying answers – we are all the same in that respect.  But we differ in how good we are at public communication.   At communicating things that sound like they could be answers, clearly, pithily.  At optimizing our words for maximum replication.

It is remarkable to me, just as a bare observation, that (in my experience) the best widespread scientific communication on Covid-19 – I mean just in the sense of verbal lucidity and efficiency, effective use of graphs, etc., not necessarily in the sense of accuracy or soundness – has been done by Tomas Pueyo, a formerly obscure (?) expert on … viral marketing.

(To be clear, I am not dismissing Pueyo’s opinions by citing his background.  I am hypothesizing his background explains the spread of his opinions, and that their correctness level has been causally inert, or might well have been.)

The set of ideas we use to understand the situation, and the way we phrase those ideas, is being determined from scratch as we speak.  Determined by all of us.  For the most part, we are passively allowing the ideas to be determined by the people who determine ideas in the absence of selection – by people who have specialized, not in creating ideas, but in spreading them.

4. Since we must offload much of our fact-gathering (#1) and idea-gathering (#2) work onto others, we are granting a lot on the basis of trust.

Scott’s latest coronavirus links post contains the following phrases:

Most of the smart people I’ve been reading have converged on something like the ideas expressed in […]

On the other hand, all of my friends who are actually worried about getting the condition are […]

These jumped out at me when I read the post.  They feel worryingly like an “information cascade” – a situation where an opinion seems increasing credible as more and more people take that opinion partially on faith from other individually credible people, and thus spread it to those who find them credible in turn.

Scott puts some weight on these opinions on the basis of trust – i.e. not 100% from his independent vetting of their quality, but also to some extent from an outside view, because these people are “smart,” “actually worried.”  Likelier to be right than baseline, as a personal attribute.  So now these opinions get boosted to a much larger audience, who will take them again partially on trust.  After all, Scott Alexander trusts it, and he’s definitely smart and worried and keeping up with the news better than many of us.

What “most of the smart people … have been converging on,” by the way, is Tomas Pueyo’s latest post.

Is Tomas Pueyo right?  He is certainly good at seeming like a “smart” and “actually worried” person whose ideas you want to spread.  That in itself is enough.  I shared his first big article with my co-workers; at that time it seemed like a shining beacon of resolute, well-explained thought shining alone in a sea of fog.  I couldn’t pull off that effect as well if I tried, I think – not even if the world depended on it.  I’m not that good.  Are you?

My co-workers read that first post, and their friends did, and their friends.  If you’re reading this, I can be almost sure you read it too.  Meanwhile, what I am not doing is carefully reading the many scientific preprints that are coming out every week from people with more domain expertise, or the opinions the same people are articulating in public spaces (usually, alas, in tangled twitter threads).  That’s hard work, and I don’t have the time and energy.  Do you?

I don’t know if this is actually an effective metaphor – after all, I’m not a viral marketer – but I keep thinking of privilege escalation attacks.

It is not a bad thing, individually, to place trust some in a credible-sounding person without a clear track record.  We can’t really do otherwise, here.  But it is a bad thing when that trust spreads in a cascade, to your “smartest” friends, to the bloggers who are everyone’s smartest friends, to the levers of power – all on the basis of what is (in every individual transmission step) a tiny bit of evidence, a glimmer of what might be correctness rising above pure fog and static.  We would all take 51% accuracy over a coin flip – and thus, that which is accurate 51% of the time becomes orthodoxy within a week.

Most of the smart people you’ve been reading have converged on something like … 

#FlattenTheCurve: a case study of an imperfect meme

Keeping up with the lingo

A few weeks ago – how many? I can’t remember! – we were all about flattening the curve, whatever that means.

But this week?  Well, most of the smart people you’ve been reading have converged on something like: “flattening” is insufficient.  We must be “squashing” instead.  And (so the logic goes) because “flattening” is insufficient, the sound byte “flatten the curve” is dangerous, implying that all necessary actions fall under “flattening” when some non-flattening actions are also needed.

These are just words.  We should be wary when arguments seem to hinge on the meaning of words that no one has clearly defined.

I mean, you surely don’t need me to tell you that!  If you’re reading this, you’re likely to be a veteran of internet arguments, familiar from direct experience and not just theory with the special stupidity of merely semantic debates.  That’s to say nothing of the subset of my readership who are LessWrong rationalists, who’ve read the sequences, whose identity was formed around this kind of thing long before the present situation.  (I’m saying: you if anyone should be able to get this right.  You were made for this.)

It’s #FlattenTheCurve’s world, we just live in it

What did “flatten the curve” mean?  Did it mean that steady, individual-level non-pharmaceutical interventions would be enough to save hospitals from overload?  Some people have interpreted the memetic GIFs that way, and critiqued them on that basis.

But remember, #FlattenTheCurve went viral back when fretting about “coronavirus panic” was a mainstream thing, when people actually needed to be talked into social distancing.  The most viral of the GIFs does not contrast “flattening” with some other, more severe strategy; it contrasts it with nothing.  Its bad-guy Goofus character, the foil who must be educated into flattening, says: “Whatever, it’s just like a cold or flu.”

No one is saying that these days.  Why?  How did things change so quickly?  One day people were smugly saying not to panic, and then all of a sudden they were all sharing a string of words, a picture, something that captivated the imagination.  A meme performed a trick of privilege escalation, vaulted off of Facebook into the NYT and the WaPo and the WSJ and the public statements of numerous high officials.  Which meme? – oh, yes, that one.

We are only able to have this conversation about flattening-vs-squashing because the Overton Window has shifted drastically.  Shifted due to real events, yes.  But also due to #FlattenTheCurve.  The hand you bite may be imperfect, but it is the hand that feeds you.

Bach, the epidemiologists, and me

Joscha Bach thinks #FlattenTheCurve is a “lie,” a “deadly delusion.”  Because the GIF showed a curve sliding under a line, yet the line is very low, and the curve is very high, and we may never get there.

Is he right?  He is definitely right that the line is very low, and we may not slide under it.  Yet I was unimpressed.

For one thing, Bach’s argument was simply not formally valid: it depended on taking a static estimate of total % infected and holding it constant when comparing scenarios across which it would vary.

(This was one of several substantive, non-semantic objections I made.  One of them, the point about Gaussians, turned out to be wrong, in the sense that granting my point could not have affected Bach’s conclusion – not that Bach could have reached his conclusion anyway.  This argument was my worst one, and the only one anyone seemed to notice.)

Something also seemed fishy about Bach’s understanding of “flatten the curve.”  The very expert from whom he got his (misused) static estimate was still tweeting about how we needed to flatten the curve.  All the experts were tweeting about how we needed to flatten the curve.  Which was more plausible: that they were all quite trivially wrong, about the same thing, at once?  Or that their words meant something more sensible?

The intersection of “world-class epidemiologists” and “people who argue on twitter” have now, inevitably, weighed in on Bach’s article.  For instance:

image
image
image
image

And I can’t resist quoting one more Carl Bergstrom thread, this one about another Medium post by a viral marketer (not the other one), in which Carl B’s making the exact same damn point I made about the static estimate:

image
image

Like me, these people make both substantive and semantic objections.  In fact, theirs are a strict superset of mine (see that last Bergstrom thread re: Gaussians!).

I am not saying “look, I was right, the experts agree with me, please recognize this.”  I mean, I am saying that.

But I’m also saying – look, people, none of this is settled.  None of us have satisfying answers, remember.  We are all stressed-out, confused glorified apes with social media accounts yelling at each other about poorly defined words as we try to respond to an invader that is ravaging our glorified-ape civilization.  Our minds cannot handle all this information.  We are at the mercy of viral sound bites, and the people who know how to shape them.

What is it the rationalists like to say?  “We’re running on corrupted hardware?”

Carl Bergstrom championed a meme, #FlattenTheCurve.  He believed it would work, and I think it in fact did.  But Carl Bergstrom, twitter adept though he may be, is still someone whose primary career is science, not consensus-making.  In a war of memes between him and (e.g.) Tomas Pueyo, I’d bet the bank on Pueyo winning.

And that is frightening.  I like Pueyo’s writing, but I don’t want to just let him – or his ilk – privilege-escalate their way into effective command of our glorified ape civilization.

I want us to recognize the kind of uncertainty we live under now, the necessity for information and idea triage, the resulting danger of viral soundbites winning our minds on virality alone because we were too mentally overwhelmed to stop the spread … I want us to recognize all of that, and act accordingly.

Not to retreat into the comfort of “fact-checking” and passive consultation of “the experts.”  That was always a mirage, even when it seemed available, and here and now it is clearly gone.  All of us are on an equal footing in this new frontier, all of us sifting through Medium articles, twitter threads, preprints we half understand.  There are no expert positions, and there are too many facts to count.

Not to trust the experts – but to exercise caution.  To recognize that we are letting a “consensus” crystalize and re-crystalize on the basis of cute dueling phrases, simplified diagrams and their counter-simplified-diagrams, bad takes that at least seem better than pure white noise, and which we elevate to greatness for that alone.  Maybe we can just … stop.  Maybe we can demand better.  Wash our minds’ hands, too.

Our intellectual hygiene might end up being as important as our physical hygiene.  Those who control the levers of power are as confused and stressed-out as you are, and as ready to trust viral marketers with firm handshakes and firm recommendations.  To trust whichever sound byte is ascendant this week.

Thankfully, you have some measure of control.  Because we are all on flat ground in this new frontier, your social media posts are as good as anyone’s; you can devote your mind to making ideas, or your rhetorical skill to promoting specifically those ideas you have carefully vetted.  You can choose to help those with power do better than the status quo, in your own little way, whatever that may be.  Or you can choose not to.

Okay, words aside, does the right strategy look like the famous GIF taken literally, or like a feedback system where we keep turning social distancing on and off so the graph looks like a heart rate monitor, or like a “hammer” reset followed by a successful emulation of South Korea, or

I don’t know and you don’t know and Tomas doesn’t know and Carl doesn’t know.  It’s hard!  I’m hadn’t even heard of “R_0” until like two months ago!  Neither had you, probably!

Marc Lipsitch’s group at Harvard has been putting out a bunch of preprints and stuff that look reputable to me, and are being widely shared amongst PhDs with bluechecks and university positions.  Their most recent preprint, from 3 days ago, appears to be advocating the heart rate monitor-ish thing, so yay for that, maybe.  But … this sounds like the same information cascade I warned against, so really, I dunno, man.

However, I will suggest that perhaps the marginal effect of sharing additional reputable-seeming takes and crystalizing weekly orthodoxies is negative in expectation, given an environment saturated with very viral, poorly vetted words and ideas.

And that your best chance of a positive marginal impact is to be very careful, like the people who won’t trust any medical intervention until it has 50+ p-hacked papers behind it, has been instrumental in the minting of many PhDs, and has thereby convinced the strange beings at FDA and the Cochrane Collaboration who move at 1/100 the speed of you and me.  Not because this is globally a good way to be, but because it locally is – given an environment saturated with very viral, poorly vetted words and ideas.

That you should sit down, take the outside view, think hard about whether you can make a serious independent intellectual contribution when literally everyone on earth, basically, is trying to figure out the same thing.

And you know, maybe you are really smart!  Maybe the answer is yes!  If so, do your homework.  Read everything, more than I am reading, and more carefully, and be ready to show your work.  Spend more time on this than the median person (or me) is literally capable of doing right now.  This is the value you are claiming to provide to me.

If you can’t do that, that is fine – I can’t either.  But if you can’t do that, and you still boost every week’s new coronavirus orthodoxy, you are an intellectual disease vector.  Don’t worry: I will hear it from other people if I don’t hear it from you.  But you will lend your credibility to it.  Whatever trust I place in you will contribute to the information cascade.

This work, this hard independent work collecting lots of raw undigested information, is actually what Tomas Pueyo seems to be doing – I mean, apart from framing everything in a very viral way, which is why you and I know of his work.  We are saturated with signal-boosts of the few such cases that exist.  We do not need more signal-boosts.  We need more independent work like this.  Please do it.  Or, if not that, then be like the lady in that very problematic GIF: don’t panic, but be careful, wash your mind’s hands, and (yes) flatten the intellectual curve.

Carl T. Bergstrom on Twitter →

This thread is a fun takedown of yet another entry in the “bad coronavirus Medium article” genre, and some of the points echo mine re: Bach

“Flattening the Curve” is a deadly delusion →

nostalgebraist:

[EDIT: hello SSC readers!  This is a post I wrote quickly and with the expectation that the reader would fill in some of the unstated consequences of my argument.  So it’s less clear than I’d like.  My comment here should hopefully clarify things somewhat.]

———————–

[EDIT2: people seem really interested in my critique of the Gaussian curve specifically.

To be clear, Bach’s use of a Gaussian is not the core problem here, it’s just a symptom of the core problem.  

The core problem is that his curves do not come from a model of how disease is acquired, transmitted, etc.  Instead they are a convenient functional form fitted to some parameters, with Bach making the call about which parameters should change – and how much – across different hypothetical scenarios.

Having a model is crucial when comparing one scenario to another, because it “keeps your accounting honest”: if you change one thing, everything causally downstream from that thing should also change.

Without a model, it’s possible to “forget” and not update a value after you change one of the inputs to that value.

That is what Bach does here: He assumes the number of total cases over the course of the epidemic will stay the same, whether or not we do what he calls “mild mitigation measures.”  But the estimate he uses for this total – like most if not all such estimates out there – was computed directly from a specific value of the replication rate of the disease.  Yet, all of the “mild mitigation measures” on the table right now would lower the replication rate of the disease – that’s what “slowing it down” means – and thus would lower the total.

I am not saying this necessarily means Bach is wrong, either in his pessimism about the degree to which slowing measures can decrease hospital overloading, or in his preference for containment over mitigation.  What I am saying is this: Bach does not provide a valid argument for his conclusions.

His conclusions could be right.  Since I wrote this, he has updated his post with a link to the recent paper from Imperial College London, whose authors are relatively pessimistic on mitigation.

I had seen this study yesterday, because an acquaintance in public health research linked it to me along with this other recent paper from the EPIcx lab in France, which is more optimistic on mitigation.  My acquaintance commented that the former seemed too pessimistic in its modeling assumptions and the latter too optimistic.  I am not an epidemiologist, but I get the impression that the research community has not converged to any clear conclusion here, and that the range of plausible assumptions is wide enough to drive a wide range of projected outcomes.  In any case, both these papers provide arguments that would justify their conclusions if their premises were true – something Bach does not do.]

———————–

I’ve seen this medium post going around, so I’ll repost here what I wrote about it in a Facebook comment.

This article simply does not make sense.  Here are some of its flaws:

- It assumes the time course of the epidemic will have a Gaussian functional form.  This is not what exponential growth looks like, even approximately.  Exponential growth is y ~ e^x, while a Gaussian’s tail grows like y ~ e^(-x^2), with a slower onset – the famous “light tails” of the normal distribution – and a narrow, sudden peak.  I don’t know why you’d model something that infamously looks like y ~ e^x as though it were y ~ e^(-x^2), even as an approximation, and the author provides no justification.

- Relative to a form that actually grows exponentially, most of the mass of a Gaussian is concentrated right around the peak.  So the top of the peak is higher, to compensate for the mass that’s absent from the light tails.  Since his conclusions depend entirely on how high the peak goes, the Gaussian assumption is doing a lot of work.

- No citation is provided for 40%-to-70% figure, just the names and affiliations of two researchers.  As far as I can tell, the figure comes from Marc Lipsitch (I can’t find anything linking it to Christian Drosten).  Lipsitch derived this estimate originally in mid-February using some back-of-the-envelope math using R0, and has since revised it downward as lower R0 estimates have emerged – see here for details.

- In that Lipsitch thread, he starts out by saying “Simple math models with oversimple assumptions would predict far more than that given the R0 estimates in the 2-3 range (80-90%),” and goes on to justify a somewhat lower number.

The “simple math” he refers to here would be something like the SIR model, a textbook model under which the fraction S_inf of people never infected during an epidemic obeys the equation R_0 * (S_inf - 1) - ln(S_inf) = 0.  (Cf. page 6 of this.)

Indeed, with R_0=2 we get S_inf=0.2 (80% infected), and with R_0=3 we get S_inf=0.06 (94% infected).  So I’m pretty sure Lipsitch’s estimate takes the SIR model as a point of departure, and goes on to postulate some extra factors driving the number down.

But the SIR model, like any textbook model of an epidemic, produces solutions with actual exponential growth, not Gaussians!  There is no justification for taking a number like this and finding a Gaussian that matches it.  If you believe the assumptions behind the number, you don’t actually believe in the Gaussian; if you believe in the Gaussian (for some reason), you ought to ignore the number and compute your own, under whatever non-standard assumptions you used to derive the Gaussian.

- What’s more, he doesn’t say how his plotted Gaussian curves were derived from his other numbers.  Apparently he used the 40%-70% figure together with a point estimate of how long people spend in the ICU.  How do these numbers lead to the curves he plotted?  What does ICU duration determine about the parameters of a Gaussian?  Ordinarily we’d have some (simplified) dynamic model like SIR with a natural place for such a number, and the curve would be a solution to the model.  Here we appear to have a curve with no dynamics, somehow estimated from dynamical facts like ICU duration.

- Marc Lipsitch, on his twitter, is still pushing for social distancing and retweeting those “flatten the curve” infographics.  I suppose it’s conceivable that he doesn’t recognize the implications of his own estimate.  But that is a strong claim and requries a careful argument.

I don’t know if Lipsitch has read this article, but if he has, I imagine he experienced that special kind of discomfort that happens when someone takes a few of your words out of context and uses them to argue against your actual position, citing your own reputation and credibility as though it were a point against you.

Reblogging this again, since I’ve added a bunch of clarifications and extensions at the top after it was linked on SSC today.

“Flattening the Curve” is a deadly delusion →

[EDIT: hello SSC readers!  This is a post I wrote quickly and with the expectation that the reader would fill in some of the unstated consequences of my argument.  So it’s less clear than I’d like.  My comment here should hopefully clarify things somewhat.]

———————–

[EDIT2: people seem really interested in my critique of the Gaussian curve specifically.

To be clear, Bach’s use of a Gaussian is not the core problem here, it’s just a symptom of the core problem.

The core problem is that his curves do not come from a model of how disease is acquired, transmitted, etc.  Instead they are a convenient functional form fitted to some parameters, with Bach making the call about which parameters should change – and how much – across different hypothetical scenarios.

Having a model is crucial when comparing one scenario to another, because it “keeps your accounting honest”: if you change one thing, everything causally downstream from that thing should also change.

Without a model, it’s possible to “forget” and not update a value after you change one of the inputs to that value.

That is what Bach does here: He assumes the number of total cases over the course of the epidemic will stay the same, whether or not we do what he calls “mild mitigation measures.”  But the estimate he uses for this total – like most if not all such estimates out there – was computed directly from a specific value of the replication rate of the disease.  Yet, all of the “mild mitigation measures” on the table right now would lower the replication rate of the disease – that’s what “slowing it down” means – and thus would lower the total.

I am not saying this necessarily means Bach is wrong, either in his pessimism about the degree to which slowing measures can decrease hospital overloading, or in his preference for containment over mitigation.  What I am saying is this: Bach does not provide a valid argument for his conclusions.

His conclusions could be right.  Since I wrote this, he has updated his post with a link to the recent paper from Imperial College London, whose authors are relatively pessimistic on mitigation.

I had seen this study yesterday, because an acquaintance in public health research linked it to me along with this other recent paper from the EPIcx lab in France, which is more optimistic on mitigation.  My acquaintance commented that the former seemed too pessimistic in its modeling assumptions and the latter too optimistic.  I am not an epidemiologist, but I get the impression that the research community has not converged to any clear conclusion here, and that the range of plausible assumptions is wide enough to drive a wide range of projected outcomes.  In any case, both these papers provide arguments that would justify their conclusions if their premises were true – something Bach does not do.

P. S. if you’re still curious what I was on about w/r/t the Gaussian, I recommend reading about thin-/heavy-/exponential-tailed distributions, and the logistic distribution as a nice example of the latter.]

———————–

I’ve seen this medium post going around, so I’ll repost here what I wrote about it in a Facebook comment.

This article simply does not make sense.  Here are some of its flaws:

- It assumes the time course of the epidemic will have a Gaussian functional form.  This is not what exponential growth looks like, even approximately.  Exponential growth is y ~ e^x, while a Gaussian’s tail grows like y ~ e^(-x^2), with a slower onset – the famous “light tails” of the normal distribution – and a narrow, sudden peak.  I don’t know why you’d model something that infamously looks like y ~ e^x as though it were y ~ e^(-x^2), even as an approximation, and the author provides no justification.

- Relative to a form that actually grows exponentially, most of the mass of a Gaussian is concentrated right around the peak.  So the top of the peak is higher, to compensate for the mass that’s absent from the light tails.  Since his conclusions depend entirely on how high the peak goes, the Gaussian assumption is doing a lot of work. [EDIT: I no longer think Bach would have drawn a different qualitative conclusion if he had used a different functional form.  See the step function argument from ermsta here.]

- No citation is provided for 40%-to-70% figure, just the names and affiliations of two researchers.  As far as I can tell, the figure comes from Marc Lipsitch (I can’t find anything linking it to Christian Drosten).  Lipsitch derived this estimate originally in mid-February using some back-of-the-envelope math using R0, and has since revised it downward as lower R0 estimates have emerged – see here for details.

- In that Lipsitch thread, he starts out by saying “Simple math models with oversimple assumptions would predict far more than that given the R0 estimates in the 2-3 range (80-90%),” and goes on to justify a somewhat lower number.

The “simple math” he refers to here would be something like the SIR model, a textbook model under which the fraction S_inf of people never infected during an epidemic obeys the equation R_0 * (S_inf - 1) - ln(S_inf) = 0.  (Cf. page 6 of this.)

Indeed, with R_0=2 we get S_inf=0.2 (80% infected), and with R_0=3 we get S_inf=0.06 (94% infected).  So I’m pretty sure Lipsitch’s estimate takes the SIR model as a point of departure, and goes on to postulate some extra factors driving the number down.

But the SIR model, like any textbook model of an epidemic, produces solutions with actual exponential growth, not Gaussians!  There is no justification for taking a number like this and finding a Gaussian that matches it.  If you believe the assumptions behind the number, you don’t actually believe in the Gaussian; if you believe in the Gaussian (for some reason), you ought to ignore the number and compute your own, under whatever non-standard assumptions you used to derive the Gaussian.

- What’s more, he doesn’t say how his plotted Gaussian curves were derived from his other numbers.  Apparently he used the 40%-70% figure together with a point estimate of how long people spend in the ICU.  How do these numbers lead to the curves he plotted?  What does ICU duration determine about the parameters of a Gaussian?  Ordinarily we’d have some (simplified) dynamic model like SIR with a natural place for such a number, and the curve would be a solution to the model.  Here we appear to have a curve with no dynamics, somehow estimated from dynamical facts like ICU duration.

- Marc Lipsitch, on his twitter, is still pushing for social distancing and retweeting those “flatten the curve” infographics.  I suppose it’s conceivable that he doesn’t recognize the implications of his own estimate.  But that is a strong claim and requries a careful argument.

I don’t know if Lipsitch has read this article, but if he has, I imagine he experienced that special kind of discomfort that happens when someone takes a few of your words out of context and uses them to argue against your actual position, citing your own reputation and credibility as though it were a point against you.

twocubes:

Why do people keep complaining about my plan to destroy the Earth? It’s largely uninhabited! If I shot a random point on its surface, do you know what the probability is that I’d hit any human? It’s less than one tenth of one percent of one percent!

(via transgenderer)

furioustimemachinebarbarian asked: I think, but don't know for sure, that the reason variational Bayes methods look weird is that they were derived from physical principals following people like Jaynes. In practice, optimizing in variational Bayes looks like minimizing a free energy. The factorization over variables isn't generally true, but is likely physically true when your variables are the positions of a bunch of particles in thermodynamic equilibrium. It looks like a physics based method getting in over its head.

Ah! Yeah, that makes sense.

As it happens, the Gibbs distribution in stat. mech. used to confuse me too – it was clearly just wrong about some things, most obviously whether more than one value of the total energy is possible, and the sources I originally read about it did not clarify which calculations it was supposed to be valid for. And the confusing choice is the same one: replacing a distribution where variables “compete” with one where they’re independent, and then doing calculations on it as if it’s the original one.

But in stat. mech., you can go out and find rigorous arguments about why this calculation technique is valid and useful for specific things, like computing the marginal over M variables out of N when M<<N,  N –> ∞. By contrast, variational Bayes is presented as a way of getting an “approximate posterior,” which you then use for whatever calculations you wanted to do with the real posterior. Which allows for the sort of invalid calculations I used to worry about with Gibbs, like getting a nonzero number for var(E).

I suppose the Gibbs-valid calculations, of one or a few marginals from many variables, are what you want in statistics if you’re just trying to estimate the marginal for some especially interesting variable. Except… for any variable to be “especially interesting,” there must be something special about it that breaks the symmetry with the many others, which prevents the standard Gibbs argument from working. To put it another way, Gibbs tells you about what one variable does when there are very many variables and they’re all copies of each other, but a model like that in statistics won’t assign interesting interpretations to any given variable. It’s only in physics that you get collections of 10^23 identical things that you believe individually, actually exist as objects of potential interest.

It doesn’t mention the word “variational,” but Shalizi’s notebook page about MaxEnt is about exactly this issue, and it was very helpful to me many years ago when I was trying to understand Gibbs and various non-textbook uses of it.

There’s something that seems really weird to me about the technique called “variational Bayes.”

(It also goes by various other names, like “variational inference with a (naive) mean-field family.”  Technically it’s still “variational” and “Bayes” whether or not you’re making the mean-field assumption, but the specific phrase “variational Bayes” is apparently associated with the mean-field assumption in the lingo, cf. Wainwright and Jordan 2008 p. 160.)

Okay, so, “variational” Bayesian inference is a type of method for approximately calculating your posterior from the prior and observations.  There are lots of methods for approximate posterior calculation, because nontrivial posteriors are generally impossible to calculate exactly.  This is what a mathematician or statistician is probably doing if they say they study “Bayesian inference.”

In the variational methods, the approximation is done as follows.  Instead of looking for the exact posterior, which could be any probability distribution, you agree to look within a restricted set of distributions you’ve chosen to be easy to work with.  This is called the “variational family.”

Then you optimize within this set, trying to pick the one that best fits the exact posterior.  Since you don’t know the exact posterior, this is a little tricky, but it turns out you can calculate a specific lower bound (cutely named ELBO) on the quality of the fit without actually knowing the value you’re fitting to.  So you maximize this lower bound within the family, and hope that gets you the best approximation available in the family.  (”Hope” because this is not guaranteed – it’s just a bound, and it’s possible for the bound to go up while the fit goes down, provided the bound isn’t too tight.  That’s one of the weird and worrisome things about variational inference, but it’s not the one I’m here to talk about.)

The variational family is up to you.  There don’t seem to be many proofs about which sorts of variational families are “good enough” to approximate the posterior in a given type of problem.  Instead it’s more heuristic, with people choosing families that are “nice” and convenient to optimize and then hoping it works out.

This is another weird thing about variational inference: there are (almost) arbitrarily bad approximations that still count as “correctly” doing variational inference, just with a bad variational family.  But since the theory doesn’t tell you how to pick a good variational family – that’s done heuristically – the theory itself doesn’t give you any general bounds on how badly you can do when using it.

In practice, the most common sort of variational family, the one that gets called “variational Bayes,” is a so-called “mean field” or “naive mean field” family.  This is a family of distributions with an independence property.  Specifically, if your posterior is a distribution over variables z_1, …, z_N, then a mean-field posterior will be a product of marginal distributions p_1(z_1), …, p_N(z_N).  So your approximate posterior will treat all the variables as unrelated: it thinks the posterior probability of, say, “z_1 > 0.3″ is the same no matter the value of z_2, or z_3, etc.

This just seems wrong.  Statistical models of the world generally don’t have independent posteriors (I think?), and for an important reason.  Generally the different variables you want to estimate in a model – say coefficients in a regression, or latent variable values in a graphical model – correspond to different causal pathways, or more generally different explanations of the same observations, and this puts them in competition.

You’d expect a sort of antisymmetry here, rather than independence: if one variable changes then the others have to change too to maintain the same output, and they’ll change in the “opposite direction,” with respect to how they affect that output.  In an unbiased regression with two positive variables, if the coefficient for z_1 goes up then the coefficient for z_2 should go down; you can explain the data with one raised and the other lowered, or vice versa, but not with both raised or lowered.

This figure from Blei et al shows what variational Bayes does in this kind of case:

image

The objective function for variational inference heavily penalizes making things likely in the approximation if they’re not likely in the exact posterior, and doesn’t care as much about the reverse.  (It’s a KL divergence – and yes you can also do the flipped version, that’s something else called “expectation propagation”).

An independent distribution can’t make “high x_1, high_2″ likely without also making “high x_1, low x_2″ likely.  So it can’t put mass in the corners of the oval without also putting mass in really unlikely places (the unoccupied corners).  Thus it just squashes into the middle.

People talk about this as “variational Bayes underestimating the variance.”  And, yeah, it definitely does that.  But more fundamentally, it doesn’t just underestimate the variance of each variable, it also completely misses the competition between variables in model space.  It can’t capture any of the models that explain the data mostly with one variable and not another, even though these models are as likely as any.  Isn’t this a huge problem?  Doesn’t it kind of miss the point of statistical modeling?

(And it’s especially bad in cases like neural nets, where your variables have permutation symmetries.  What people call “variational Bayesian neural nets” is basically ordinary neural net fitting to find some local critical point, and placing a little blob of variation around that one critical point.  It’s nothing like a real ensemble, it’s just one member of an ensemble but smeared out a little.)

nostalgebraist:

nostalgebraist:

We predicted that individuals scoring highest on the Cautious/Social Norm Compliant scale would be significantly more likely to be members of an organized, conventional religious group, as this is consistent with genetic data associating aspects of the serotonin system with religiosity (Lorenzi et al., 2005; Ott et al., 2005) and traditionalism (Golimbet et al., 2004).

oh is that why, huh

I’m taking the personality questionnaire that this dumb study is about and

image

🤔

image

Your Hogwarts House: Female

nostalgebraist:

We predicted that individuals scoring highest on the Cautious/Social Norm Compliant scale would be significantly more likely to be members of an organized, conventional religious group, as this is consistent with genetic data associating aspects of the serotonin system with religiosity (Lorenzi et al., 2005; Ott et al., 2005) and traditionalism (Golimbet et al., 2004).

oh is that why, huh

I’m taking the personality questionnaire that this dumb study is about and

image

🤔

We predicted that individuals scoring highest on the Cautious/Social Norm Compliant scale would be significantly more likely to be members of an organized, conventional religious group, as this is consistent with genetic data associating aspects of the serotonin system with religiosity (Lorenzi et al., 2005; Ott et al., 2005) and traditionalism (Golimbet et al., 2004).

oh is that why, huh