Install Theme

eightyonekilograms:

shacklesburst:

sigmaleph:

nostalgebraist:

I imagine some people have been curious to hear more details about how @nostalgebraist-autoresponder works, so here’s a relatively complete post on that.  Very long.

Keep reading

this is quite interesting! and, separately, it’s quite validating that other people find tumblr’s API/pytumblr as frustrating as I do

Yeah, it’s what stopped me from starting the multiple bots I was thinking about implementing one time or another.

I find it highly suspicious that the Chinese characters you randomly chose almost perfectly encapsulate what you’re using them for (”friend” for username delimiting, “region” for post content delimiting, “meet” for ask stuff, “letter” for original post, … okay, simplified “duty” for tag delimiting is a bit of a stretch but it can also mean “post”, as in position, so still).

It sure is something when someone says “I want to build a machine learning model to imitate realistic human speech and then hook it up to Tumblr’s API” and the second part of that sentence is the harder technical challenge.

Hahaha… I mean, it is and it isn’t?  Like, there’s a similar reversal of intuitive difficulty when I do this kind of thing at work, even though we get to design the APIs there.

Doing impressive “machine learning” often amounts to script kiddie stuff – not much more than import StateOfTheArtModel; my_model = StateOfTheArtModel(); my_model.fit(x, y); – but creating a lasting, usable shared interface for anything is fundamentally hard and people spend their whole careers arguing about it.

I was going to say “this feels like that freshman year / senior year meme with Luke Skywalker,” but then I realized that “describing what it looks like in my head” is not the only thing one can do with a hypothetical meme image, so here’s this dumb thing I just made:

image

mind viruses about body viruses

@slatestarscratchpad (thread clipped for length, responding to this)

First of all, thank you for the the thoughtful and charitable response.

Re: my overall message

Second of all, yeah, my post is not too clear on a lot of things and went through some message drift as I was writing.  The message I had in mind when I started was 100% about being more careful in curation, not about doing independent work.

Then I ended up spinning this big theory of why curation was not being done carefully.   Roughly, I hypothesized that – although there is a large volume of material being produced – very little of it would qualify for curation under normal circumstances.  Either because the quality is too low (e.g. obviously bad amateur pet theories) or because the format is too indigestible (e.g. convoluted high-context twitter threads that are hard to even permalink clearly).  Hence, some of us are lowering our usual curation bars just to let anything through.

Since “maybe don’t curate anything at all” felt underwhelming as a recommendation, I added a suggestion that we could try improving the supply side.  I didn’t really mean that more independent work of any sort is good, since as you say we are glutted with independent work.  I meant more independent work good enough to pass even “peacetime” thresholds for curation, stuff that very clearly shows its work, collects scattered expert observations into an easily digestible whole without oversimplifying, doesn’t rely on misleading inflammatory phrases to get your attention, etc.

(I do think your masks post falls in this category, and thank your for writing it.)

Maybe the supply-side point is wrong – maybe, as you say in your final para, there are enough good takes out there and the limiting factor is finding and spreading them.  I don’t have a strong opinion either way there.  What I do see is the signal-boosting of stuff which I personally find “iffy” but would maybe provisionally endorse in the absence of anything better.  If better work is being done, we really need to start curating that instead.  If not, then whoever is capable of produce better work needs to produce it, and then we need to curate it.

Re: my objections to recent SSC posts (big picture)

Like I said, I got carried away with grand theorizing as I wrote.  But the original impetus for me writing the post was very simple and concrete: I read the “Hammer and dance” section in your latest post and was frustrated by it.

Taken together with my frustration about your previous discussion of Bach, it felt like there was a pattern where you were both sharing and endorsing some things without clearly understanding them or being able to summarize them adequately.

I worried that these endorsements would aid an information cascade.  But also, “an information cascade is happening” seemed like a relatively charitable option among potential explanations for the pattern.  That is, conditional on “Scott is endorsing this thing he doesn’t really understand,” your action is more defensible if it’s supported by an impression that many independent observers are converging on the same endorsement, rather than if it’s completely based on your (by hypothesis, insufficient) personal assessment.

But this “more defensible” reading still isn’t defensible enough.  When these decisions are being made on intellectual trust, and some of that trust is not well founded (e.g. the trust I suspect many people place in SSC on this topic), we are likely to see quick formation of consensus far beyond what is epistemically licensed.

Okay, you might say, but what’s the alternative – just sharing nothing?  I agree with what you wrote here:

If I stay inside and don’t spread the actual coronavirus, I’ve trivially made everyone’s lives better. If I shut up and don’t spread any intellectual memes, then that just means that people’s thoughts are being shaped by the set of everyone except me. This is good if I’m worse than average, bad if I’m better than average. Or to put it another way, I’m making a net contribution if I signal-boost true/important things disproportionately often compared to their base rate […].

This is true if we model you as a “pure transmitter” who propagates ideas without modifying them in the process.  What I’m worried about, though, is ideas acquiring an ever-growing halo of credibility/consensus as they’re endorsed by individually credible people who cite all the other credible people who believe them, etc.

As I’m writing this, I realize this is a key thing I didn’t adequately emphasize in OP: the concern isn’t about mere passing on of information, it’s about the side effects that can occur as it’s passed on.  This means my metaphor of an “information epidemic” just like a disease was, although entertainingly meta, not actually accurate or helpful. 

I would be happy with a bare link to Pueyo’s or even Bach’s pieces, without explicit endorsement, perhaps just with a note like “seems interesting but I can’t evaluate it.”  (You have said roughly that about many other things, and I approve of that.)  I would also be happy with a detailed “more than you want to know” type analysis of any of these pieces.

What I am not happy with is a link with a rider saying you endorse it, that the smart people you’re reading endorse it, that it’s the new consensus, etc., without an accompanying deep dive or evidence of good individual vetting.  When iterated, this is a cascade.

Re: my objections to recent SSC posts (specifics)

Here’s are the concrete cases I object to, which made me think I was seeing a bad pattern.

First, here is how you originally glossed Bach’s article in the 3/19 links post:

An article called Flattening The Curve Is A Deadly Delusion has been going around this part of the Internet, saying that it’s implausible to say R0 will ever be exactly 1, so you’re either eradicating the disease (good) or suffering continued exponential growth (bad) without a “flat curve” being much of a possibility.

I won’t explain here why this is not accurate, since I already wrote an SSC comment to that effect.  Shortly after I posted my comment, you modified what’s in the post to say something more accurate which also sounded much like the gloss I wrote in my comment.  (I guessed that this was a reaction to my comment, although I could be wrong.)

Although I appreciate that you made the correction, the damage was done: I was convinced that you had shared the Bach article without understanding it.  If you later came to understand it and still thought it was share-worthy, that’s fine in itself, but understanding was apparently not necessary for sharing.  Further, this called the other Coronalinks into question a la Gell-Mann amnesia: if there’s an error in the one case I happen to have already scrutinized for my own reasons, there are likely some errors in those I haven’t.

Then, in the 3/27 links post, you wrote:

I relayed some criticism of a previous Medium post, Flattening The Curve Is A Deadly Delusion, last links post. In retrospect, I was wrong, it was right (except for the minor math errors it admitted to), and it was trying to say something similar to this. There is no practical way to “flatten the curve” except by making it so flat that the virus is all-but-gone, like it is in South Korea right now. I think this was also the conclusion of the Imperial College London report that everyone has been talking about.

This appears to be an explicit endorsement of the entire article, except the “minor math errors.”  That is, “it was right (except for the minor math errors it admitted to)” implies “everything that was not one of the minor math errors was right.”

I don’t know how to square this with your comments on Bach in the post I’m responding to (I broadly agree with those comments, FWIW).  You describe being initially confused by Bach’s article, then only understanding it after reading other things that made the same point better.  If Bach’s article is confusing, and there are better substitutes, why continue to tout Bach’s article as something “right” and worth reading?

Perhaps a more useful way to say that is: it sounds like you are doing two separate things.  You’re reading articles, and you’re forming a mental model of the situation.  The model can update even when re-reading the same article, if it happens you come to understand it better.  If Bach’s article confused you, but it and things like it eventually caused a useful update to your mental model, then the valuable piece of information you have to transmit is the content of that model update, not the confusing and misleading texts from which you eventually, with effort, distilled that update.  Sharing the texts with endorsement will force others through the same confusion at best, and permanently confuse them at worst.

Remember, there is a lot of stuff in the Bach article beyond the one fact about how low the line is.  I too did not know how low the line was until I read Bach, and in that sense Bach’s meme – including its inflammatory, thus viral, title – was a kind of success.  But it’s a success at transmitting one fact which we didn’t know but every epidemiologist did.

We can take this fact on board and proceed, without – for instance – co-signing an article that explicitly advocates lockdown to stop geographic spread (i.e. creating effectively disease-free zones) as the only solution that will work, something not recommended in any of the ICL or Harvard papers, insofar as I’ve read and understood them.

Closing comments

I realize this is likely to sound like I’m picking nits with phrasing, or perhaps like fixating on a case where you said I was wrong and bloviating until you concede I was “right.”

If I’m kind of unduly fixated on Bach’s article, well … I guess I just think Bach’s article was really bad, although it happened to teach many of us a 101-level fact for the first time.  I may be more confident in this judgment than you, but it doesn’t sound like you were incredibly impressed either – Bach was the first person you saw saying a true thing you didn’t understand until people said it less badly.  

If the best sources for basic information are this polluted with badness, then the supply-side is really messed up and someone less inadequate needs to step up and fix it.  Meanwhile, we should acknowledge the badness and accord no points for merely showing up, because that will mislead people and redistribute a maxed-out attention budget towards the consumption of misleading material.

Or, if there are better sources out there, they really need to be boosted and actively suggested as substitutes for their worse counterparts.  Until Carl Bergstrom gets a Medium account, the best distiller/synthesizer available who writes in a digestible format might well be Pueyo, and his confidence + lack of domain background make me wary.  And he’s the best – there are worse ones.  In relative terms these people may be the best we have, but absolute terms are the ones that matter, and the ones we should apply and communicate.

You are already forming your own model, distinct from these writers’, and in my opinion almost certainly better.  That model could be valuable.  Promoting worse models as stand-ins for it is not valuable.  If your defense of Bach is that he caused you to update a piece of your model, then you are not saying Bach is right – you’re saying, like it or not, that you are.

“Flattening the Curve” is a deadly delusion →

humanfist:

nostalgebraist:

I’ve seen this post going around, so I’ll repost here what I wrote about it in a Facebook comment.

This article simply does not make sense.  Here are some of its flaws:

- It assumes the time course of the epidemic will have a Gaussian functional form.  This is not what exponential growth looks like, even approximately.  Exponential growth is y ~ e^x, while a Gaussian’s tail grows like y ~ e^(-x^2), with a slower onset – the famous “light tails” of the normal distribution – and a narrow, sudden peak.  I don’t know why you’d model something that infamously looks like y ~ e^x as though it were y ~ e^(-x^2), even as an approximation, and the author provides no justification.

- Relative to a form that actually grows exponentially, most of the mass of a Gaussian is concentrated right around the peak.  So the top of the peak is higher, to compensate for the mass that’s absent from the light tails.  Since his conclusions depend entirely on how high the peak goes, the Gaussian assumption is doing a lot of work.

- No citation is provided for 40%-to-70% figure, just the names and affiliations of two researchers.  As far as I can tell, the figure comes from Marc Lipsitch (I can’t find anything linking it to Christian Drosten).  Lipsitch derived this estimate originally in mid-February using some back-of-the-envelope math using R0, and has since revised it downward as lower R0 estimates have emerged – see here for details.

- In that Lipsitch thread, he starts out by saying “Simple math models with oversimple assumptions would predict far more than that given the R0 estimates in the 2-3 range (80-90%),” and goes on to justify a somewhat lower number.

The “simple math” he refers to here would be something like the SIR model, a textbook model under which the fraction S_inf of people never infected during an epidemic obeys the equation R_0 * (S_inf - 1) - ln(S_inf) = 0.  (Cf. page 6 of this.)

Indeed, with R_0=2 we get S_inf=0.2 (80% infected), and with R_0=3 we get S_inf=0.06 (94% infected).  So I’m pretty sure Lipsitch’s estimate takes the SIR model as a point of departure, and goes on to postulate some extra factors driving the number down.

But the SIR model, like any textbook model of an epidemic, produces solutions with actual exponential growth, not Gaussians!  There is no justification for taking a number like this and finding a Gaussian that matches it.  If you believe the assumptions behind the number, you don’t actually believe in the Gaussian; if you believe in the Gaussian (for some reason), you ought to ignore the number and compute your own, under whatever non-standard assumptions you used to derive the Gaussian.

- What’s more, he doesn’t say how his plotted Gaussian curves were derived from his other numbers.  Apparently he used the 40%-70% figure together with a point estimate of how long people spend in the ICU.  How do these numbers lead to the curves he plotted?  What does ICU duration determine about the parameters of a Gaussian?  Ordinarily we’d have some (simplified) dynamic model like SIR with a natural place for such a number, and the curve would be a solution to the model.  Here we appear to have a curve with no dynamics, somehow estimated from dynamical facts like ICU duration.

- Marc Lipsitch, on his twitter, is still pushing for social distancing and retweeting those “flatten the curve” infographics.  I suppose it’s conceivable that he doesn’t recognize the implications of his own estimate.  But that is a strong claim and requries a careful argument.

I don’t know if Lipsitch has read this article, but if he has, I imagine he experienced that special kind of discomfort that happens when someone takes a few of your words out of context and uses them to argue against your actual position, citing your own reputation and credibility as though it were a point against you.

I dislike that this sloppiness is present in the main anti-flattening article, but at the same time I have yet to hear a single flattening proponent give any sort of model based estimate for how long social distancing would have to last despite this being one of the main factors determining if flattening is a viable strategy.  And this is despite having read more flattening related articles than is probably healthy and asked this question directly on several occasions (though the shear firehose of information does mean I could have missed something).

I’ve probably read less of this stuff than you, but personally I get the sense that epidemiologists are being cautious about quoting concrete numbers because they tend to get misunderstood, misused, or just fixated on to an inappropriate degree.

The 40%-to-70% figure, for example, was a very rough estimate based on the reasoning “it should be somewhere below the number I get out of a simple SIR model, and somewhere above the numbers from 2 historical examples.”  It was based on an early estimate of R_0 that’s higher than more recent estimates, and it doesn’t capture how the outcome varies with the interventions you perform (because those change R).  But it’s still being widely quoted and used in other people’s back-of-the-envelope calculations.

I imagine that concrete numbers about social distancing, from a similarly reputable researcher or group, would likewise undergo “community spread” and acquire an aura of being “the estimate” – which could actually be a downgrade in public knowledge, insofar as the conclusion “social distancing is helpful” can be drawn much more confidently than any particular quantitative version of it.

I am not an epidemiologist myself and only know what I’ve read in the last few weeks, so take everything I say (including OP) with a correspondingly sized grain of salt, but … my impression is that model-based quantitative estimates are hard, because everything is sensitive to the details of numbers like R which interventions will change to some extent but not to an extent we can know with any quantitative precision.  Meanwhile, we have some compelling case studies – comparing US cities in 1918, or Hubei vs. the rest of China in 2020 – suggesting that social distancing works extremely well.

If we use a mathematical model, we have enough degrees of freedom (especially if it is even remotely realistic), and enough uncertainty associated with numeric inputs like R_0/R, that we can probably generate a whole range of estimates that make social distancing look relatively good/bad, short/long, etc.

Because it and other interventions will push R downward to some extent, they will not just “flatten” a constant-mass curve but actually lower the total number of people that are ever infected (yet another problem with the OP is that it ignores this!).  So very optimistic estimates about this effect could yield very optimistic conclusions, e.g. the extreme case where R gets close to 1 and the thing just fizzles out.  That extreme may feel unrealistic, but rejecting it on the grounds of “feeling unrealistic” is not a model-driven conclusion, it’s guesswork based (at best) on case studies that kind of passes through a mathematical model, superfluously, on its way to becoming a conclusion.  Might as well just skip the model and say “the case studies show you should do social distancing fast and hard,” which is what the experts are doing.  See e.g. the paper on Wuhan vs. Guangzhou by Li, Lipsitch and others, which basically says “fast and hard interventions saved Guangzhou, so they should be done in the US” without explicitly modeling what the latter might look like.

It’s reminiscent of the persistent situation in some parts of economics, where it’s easy to make memorable and memetic qualitative arguments that something is good or bad – stuff like the broad idea of gains from trade, analogous here to “flatten the curve!” – and it’s also easy to produce compelling case studies in which something appeared to succeed or fail.  But if you try to bridge the two with a more quantitative, “crunchy” math model, you have enough degrees of freedom that you can paint in virtually whatever details you want between the lines given by the other available information, or even stray outside those lines if you aren’t careful.  The tail is wagging the dog: at best you get out what you already knew, but you have to do a lot of work to even achieve that, and even then you’ll end up with the false precision of the sci-fi character who reports “the ship has a 98.7594738% chance of blowing up in the next 60 seconds.”

(Final disclaimer: again, I am not an epidemiologist!!)

@necarion​ (thread clipped for space)

If you had a hyperbolic latent space model *(pun brain, being hyperbolic: absolutely the best possible approach, there is no latent space model better)*, where the encodings and relationships were learned by the classifier, isn’t there a problem where “depth” in the hyperbolic space would start to become an overwhelming factor in the distance metric? Like, I’d you allowed for lots of space between “oncologist” and “dermatologist”, wouldn’t you also end up with a lot of space between either and “doctor”? I could see some silly results, like there being a smaller distance between “doctor of philosophy” and “doctor” than between “doctor” and “oncologist”. Or am I getting the approach wrong?

I think you’re right about how the distance metric behaves (not completely sure), but you’re assuming we want the distance metric to measure conceptual similarity, and we don’t necessarily need that.

Intuitively, what makes concepts similar or dissimilar has a lot to do with the kind of thing they point to (their position on the non-depth axis), and not as much to do with the specificity level of their pointing (position on the depth axis).

This is like a continuous/fuzzy version of the child/ancestor relations in the underlying tree structure: “oncologist” is inherently similar to “medical doctor” because it’s a child of “medical doctor” in the tree, a property enjoyed by any sub-sub-subtype of doctor but not by any kind of non-doctor.  But if you can embed trees in a continuous space, hopefully you can also derive useful continuous versions of important tree relations like parent/child, and you can use this rather than just distance when needed.  IIUC, “hyperbolic entailment cones” purport to provide just this.

So, the hyperbolic metric doesn’t correspond better to intuitive similarity, what advantage am I claiming for it?  Well, the distances between things matter in NN training even before we impose any interpretation on them, because they affect gradients / interact with regularization.  This is hand-wavey, but IMO it’s bad if your parameters require tuning at too many different scales at once, and it will tend to leave some scales neglected by the optimizer in favor of others. 

(Fine-tuning weights is costlier than setting them to just anyplace in a more coarse range of values; learning a new fine-scale distinction costs about as much as refining the details of a coarse-scale distinction you mostly know already.  So it might never “oncologist,” preferring to invest ever further in refining the exact edge of the doctor vs. non-doctor boundary.  We think those aren’t equally important, and we need to convey that in the metric.)

(via necarion)

@marlemane (thread snipped for length)

Is it really necessary to embed your graph in a space? There’s a perfectly fine notion of distance on graphs you can define without respect to any embedding.

As I understand it, you’re mainly using the embedding of the graph into space so that you can classify by separation with hyperplanes, right? That is, a NN is a nonlinear map on your space that sends “x-like” to one side of the plane and everything else to the other side.

But couldn’t you hypothetically have a concept tree and a NN that tracks down branches based on the input? Something like object —> animate —> mammalish —> dog —> husky. A directed graph can even accommodate partially overlapping categories in a way that metric embedding necessarily cannot, so that you can also get to husky by object —> animate —> soft animate! —> husky.

As a purely anecdotal point, this model feels much more like how my daughter learned the world. She first learned objects, then animals as a category, then dogs, then specific breeds.

I’m not sure I understand your argument, but here are some stray comments:

The most interesting thing here is not picking nodes from graphs already known in advance, but learning graph structure automatically from data.  Although something that helps you do the latter will generally help with the former too.

There’s inherent value here in knowing that you can embed something in a differentiable manifold, because an NN is a machine for “learning” mappings between differentiable manifolds.  (They have to be differentiable because the “learning” involves using derivatives.)

Of course, lots of NNs have outputs that don’t live on manifolds.  Like discrete labels, or just True vs. False.  But if you look under the hood, these are really just compositions of two pieces:

  1. A map X -> Y between two manifolds, which is learned from data in a complicated way (with 99% of research energy going into the complications of this step)

  2. A simple, fixed, user-supplied map Y -> Z between the output manifold of step 1 and the actual output space Z

In classification by hyperplanes, for example, step #1 is everything up until the point where you have all the signed distances from the hyperplanes, and then step #2 is where you read off which of those distances is highest.

Thus, “under the hood,” an NN is always learning to select points on a manifold.  There may be an additional step of translation/interpretation which converts the thing the NN naturally does (”I have selected this point”) to a judgment we care about (”the picture is a dog,” or something).

But only works insofar as these judgments are actually well-modeled by selecting points on a manifold.  If your output space Z has some property you care about, it matters whether that property can be “translated” into some property defined on manifolds.

——

Here’s an example.  Imagine the elements of Z are truth-assignments on a boolean algebra.  In principle, for your map Y -> Z from the manifold Y, you could choose anything whatsoever; you could carve up Y into whatever subsets you want and give each one some arbitrary truth-assignment.  But you’d have to make sure that all these truth-assignments were consistent, obeying the rules of Boolean algebra – this would be “your job,” and not something that happens automatically.

On the other hand, suppose you choose Y -> Z in a particular way, with conjunctions in the algebra always translating into set intersections on the manifold, and disjunctions in translating into set unions.  Then the rules of the algebra will always be obeyed, “for free,” in the output you get.  A Boolean-algebraic structure was already there in the manifold, so the outputs of the manifold-learner already had that structure, even before you did any interpretation.

——

Likewise, in the case of graphs, you can always find some way to map a manifold Y onto some particular graph Z.

But if you know graphs of some kind can be embedded in Y without distortion, that means the structure is “already there” in Y, like the Boolean-algebraic one.

So, you can have hope that a generically powerful manifold learner for Y will also be a generically powerful learner for those graphs – by virtue of its manifold-learning powers alone.  You can have hope that manifold learning will naturally and automatically this kind of pattern in the data (because it is a kind of pattern natural to manifolds, which a good manifold learner ought to care about).  You no longer need to worry about the tension between the problem you care about and the “manifold version” of it which the learner cares about – the “manifold version” of the problem just is the problem.

(via marlemane)

the-real-numbers:

necarion:

nostalgebraist:

jadagul:

This looks cool and I need to read it later.

the-real-numbers:

Just, uh, gonna leave this here for… reasons

https://arxiv.org/pdf/1610.08401.pdf

(Tagging @stumpyjoepete​ since he tagged me on this post)

This is definitely a cool result.

It’s an extension of previous adversarial example work, showing that you can find a single adversarial perturbation  – i.e. a very faint, nearly imperceptible pattern you can layer on top of an image that will cause neural net classifiers to mis-classify it – that works generically for any image in the standard ImageNet challenge dataset.  These even generalize across different classifiers, to some extent.

My strong hunch is that this is a “feature, not a bug,” and reflects the inherent mismatch between the ImageNet challenge and real vision, rather than reflecting a flaw in neural net image classifiers.

The paper doesn’t draw this conclusion, but it contains various pieces of evidence pointing in that direction, IMO.  Namely:

  • As mentioned, if you design one of these “universal perturbations” to target one classifier, it will also tend to fool other classifiers, even those with very different architectures.

    This increases the burden of proof for someone arguing that this reflects a flaw in how these models classify images: this person would not be arguing just that some architecture has a blind spot, they’d be arguing that many apparently distinct architectures somehow have the exact same blind spot.

    On the other hand, the different architectures have this in common: they’re all good at the ImageNet challenge.  So if “susceptibility to universal perturbations” is actually a natural result of being good at ImageNet, it’s no surprise that all the architectures have that property.  (Humans find the ImageNet challenge difficult without special training, so it’s not a problem for this hypothesis that humans aren’t thus susceptible.)

  • The authors do a finetuning experiment that tried to teach the VGG-F architecture not to misclassify the perturbed images.  This helped a little, but cannot get the model below a “fooling rate” of 76.2%, which is still high.

    To explain this as a defect in the architecture, one would have to imagine that the universal perturbations are somehow “invisible” to it in a way that prevents them from learning a signal correlated with them; this seems implausible.  [ETA: of course the perturbations aren’t invisible to the models, otherwise they wouldn’t work.]  But if “don’t misclassify the perturbed images” actually competes with “do well at ImageNet,” then of course you won’t get very far on the former while still trying to preserve the latter.  (In this connection, note also the following: “This fine-tuning procedure moreover led to a minor increase in the error rate on the validation set […]”)

  • The incorrect class labels given to perturbed images tend to come from some very small set of “dominant” labels, as visualized in the directed graph.

    This made me think of a hypothesis like “there are a few classes in the ImageNet challenge that have certain distinctive visual patterns not shared by any other classes, and so the optimal way to identify these classes (in the context of the challenge) is just to check for these patterns.”

    This seems a priori plausible.  The ImageNet challenge asks for classification at a very fine-grained level, without partial credit for getting the right general sort of thing.  Many of the 1000 ImageNet challenge classes are specific species (or other low-level taxonomic group) of animal.  The images themselves, largely scraped from Flickr, are photographs of the animals (or other things) from numerous angles, in numerous contexts, sometimes partially obscured, etc.  In this context, developing a high-level concept like “bird” is actually quite difficult, and of limited value (no partial credit for knowing it’s a bird unless you can tell exactly what kind of bird).  But identifying the distinctive markings that are the hallmark of one exact kind of bird will work.

    When you get points for saying “African grey” but not for another kind of parrot, and you have to do this across diverse pictures of African greys, and you’re a neural net that doesn’t know anything at the outset, of course you’re going to develop a detector for some exact textural feature that only African greys have and use that as your African grey detector, and skip over the much harder task of developing detectors for “parrot” or “bird.”

    (African grey is in fact one of the dominant labels.  Macaw is another.)

The authors do this other thing where they look at singular values of a matrix of vectors from images to the nearest decision boundaries, and show that these vectors have some orientations much more often than others.  I’m not sure I understand this part – isn’t it just a restatement of the result, not an explanation of it?  (If this were false, wouldn’t the result be impossible?)

Anyway, this way of describing the situation – “the nearest decision boundary is frequently in a specific direction” – needs to be interpreted in light of the dominant labels things.  It would be different, and arguably more interesting, if there weren’t dominant labels, or if they weren’t quite so dominant; in that case the result would mean that the models identify certain textural differences as inherently “salient for distinctions.”

Instead, it just means that the models make some distinctions differently than others.  Some distinctions are made in a more “realistic” way, on the basis of higher-level features that correspond to different pixel-level variations depending on what base image you’re varying.  And then, some are just simple pattern detectors that always look about the same on the pixel level.  And again, that’s not really surprising.  Distinguishing bird from non-bird is a high-level judgment, but distinguishing one species within birds really is a matter of looking for one telltale pattern that’s relatively stable across orientations.

Now, if you’re a human who has to track objects over time, understand salient categories like “is this animate?”, and so on, you will tend to make the “YES-bird” and “YES-African-grey” judgments simultaneously.  Thus it sounds bizarre for something to say “YES-African-grey” when it’s looking at a bathtub that happens to have a bit of the African grey texture sprinkled on top.  But if you’re an ImageNet challenge machine, the “YES-bird” judgment doesn’t even exist in your universe.  In the toy ImageNet universe, in fact, it is arguably not even wrong to classify that bathtub as an African grey – for in that universe, there are no birds as such, and there is no such thing as a bird for a bathtub to be distinctively not.

Are there CNN training sets that include these hierarchies? So something could be an African Grey and a Parrot and Bird? Or modifying the network to go through some sort of word embedding, so results that are particularly closely clustered might be “partly” acceptable to the training?

There are CNN data sets that have hierarchical classes in the DSP/ML space. I’m not sure how available they are to laypeople. Sometimes you can handle the subclass superclass problem by classifying on the subclasses and have an additional loss factor for superclasses/categories, although I imagine you could try having one head CNN for superclasses that passes off the processed images to various trunks for subclassing.

But say for example if it’s hard to tell the difference between a titmouse and a pug. The traditional superclass may send titmice to the wrong subclass net and you’re guaranteed to get a wrong answer.

Although, you may find that you might want to superclass based on the most confused subclasses, which could mean training a subclassifier and determining superclasses with a mutual information approach or eyeballing a confusion matrix, then trying again.

A relevant, fairly new area of research that I find exciting is hyperbolic embeddings.  Some key papers are

  1. The original paper introducing them (or the one everyone cites, anyway)
  2. This paper which provided an important conceptual advance over #1
  3. This one which builds up some of the necessary building blocks for neural nets over these spaces

The idea behind hyperbolic embeddings is… hmm, let me describe it this way.  Suppose you have some hierarchically nested categories, and you’re trying to model them in Euclidean space in some way.

There are two (?) ways to do this (this distinction is mine, not from the above papers):

  • “Map” model: each category is a region of R^n, and the hierarchy’s nesting relation is represented by the R^n subset relation.  Like, “human” might be some blob of R^n, and “doctor” is a proper subset of that blob, and then “oncologist” is a proper subset of “doctor,” and so forth.

    This is like a map, where “doctor” is inside “human” the way “Colorado“ is inside “U.S.”

  • “Tree” model: each category is a point in R^n, and the points are arranged like a literal picture of a branching tree structure.   If the tree(s) start at the origin, the nesting relation is represented by R^n vector magnitude, with more specific categories further from the origin.

Now, a downside of the “map” model is that finer-grained category distinctions are encoded as smaller distances in R^n.  This might sound natural (aren’t they “smaller” distinctions?), but the practical importance of a distinction doesn’t necessarily scale down with its specificity.  (Sometimes it’s very important whether a doctor is an oncologist or not, even though that’s a “fine-grained” distinction if your perspective also captures doctor vs. non-doctor and human vs. non-human.)

One might hope that the “tree” model could solve this problem: you can have each level “fan out” from the previous level in space, making its nodes just as far apart from one another as the nodes in the previous level.

But, in Euclidean space, there isn’t enough room to do this.  Deeper levels in the tree have exponentially more nodes, so you need exponentially more volume to put them in, but going further from the origin in R^n only gives you polynomially more volume.

However, hyperbolic space gives you just what you want: exponentially more volume as you go out.  Like in the famous Escher illustrations (visualizing the Poincare disk model of 2D hyperbolic space):

image

In the actual hyperbolic metric, the bats are all the same size.  A tree embedded in the Poincare disk model might look like (figure from the Poincare Embeddings paper):

image

where again, things don’t actually get closer together near the rim, they’re just visualized like that.

OK, so what does that have to do with the original topic?

Well, almost any classifier you encounter these days is going to do two things: map its inputs onto a (Euclidean) latent space in some complicated non-linear fashion, and then divide up that latent space into regions for the different labels.  (Usually the latter step is done with hyperplanes.)

We’re discussing ways of letting the classifier “know” that the labels have a hierarchical structure, with some of them “going together” as part of a larger group, which might then be part of an even bigger group etc.

If we do this by allowing “partial credit” for labels in the same coarse class (as in @necarion​‘s word embedding proposal), this will encourage the network to put these labels close together in the latent space.  Which is like the “map” model: all the types of bird will get assigned to adjacent regions, and you could draw a big shape around them and say “this is ‘bird’.”  So at best we end up with the “map” model, with its “oncologist problem” as described above.

Alternately, you can actually change the model to explicitly encode the hierarchy – like what the @the-real-numbers​ describes, where you have different classifiers for different levels.  This can let you get around the downsides of the Euclidean “map” model, because the different classifiers can operate only on their own scales: the coarse classifier that just has to output “bird” is free to squash lots of bird types close together in its latent space, while the intra-bird classifier gets a whole latent space just for birds, so it can make them far apart.

Suppose – as the hyperbolic embedding work suggests – that the discriminations we want out of the model cannot be mapped well onto distances in Euclidean space.

Then:

  • The partial-credit approach says “let’s just do the best we can in Euclidean space, with the nesting relation of an arbitrary hierarchy modeled by the subset relation on a Euclidean space learned from data with that hierarchy.”

    This provides an intrinsic model for “nesting” as a generic concept, but distances inside the same model don’t behave in all the ways we’d like (oncologist problem).

  • The multiple-classifier approach says “let’s give up on modeling the nesting relation of an arbitrary hierarchy; instead let’s tie ourselves down to one specific hierarchy, and design N copies of Euclidean space tailored for it.”

    This does not provide an intrinsic model for "nesting” as a concept – you’re tied to one particular case of nesting, expressed by the output code that maps the various latent spaces to parts of your specific hierarchy.

With hyperbolic latent space, hopefully you can model the nesting relation as a relation in the space (intrinsic) and still have the distinctions you want to make map naturally onto distances in the space (no oncologist problem).

tsutsifrutsi:

nostalgebraist:

I did finally finish the second season of Legion and … hoo boy

I was wondering where they were going the mental illness theme, and uh, they definitely went somewhere with it, that’s for sure!  I kind of wish they hadn’t, now!

On the upside, the last few episodes were emotionally involving, had moments that felt real and raw, and made a (last-minute) attempt to move the show beyond mere stylish randomness.  On the downside, they were a complete mess that felt like two or more distinct storylines jammed together inconsistently and executed too fast, and – more egregiously – contained the most weirdly, brazenly incoherent and unreal portrayal of mental illness I’ve seen in mainstream “serious” fiction in a long time.

Honestly, I’m less angry about it than just plain confused how this thing got into the world in the first place.  Like, do the writers actually expect the audience to share their strange (and factually inaccurate) assumptions?  Are they knowingly straying from reality in favor of a stereotypical cartoon notion of “insanity,” and if so, how (and why) do they expect this to sync up with all the parts of the show that appear to be about real (albeit stylized) things happening to real humans?  (I am a bit angry that the social justice flavored critiques of the ending have taken this stuff completely in stride, but I guess that’s par for the course)

Specifically, the ending involves a long, elaborate set of conflations/confusions between:

1. Common, if awful, personality flaws that people can have without being mentally ill (and many do)

2. Psychopathy

3. Schizophrenia

For every pair of these (1+2, 2+3, 3+1), there are one or more moments where the two are implied to be the same thing, or to be connected by some deep link too obvious to spell out, or the like.  More on this under a cut because spoilers

Keep reading

(Stop me if I’m way off; I haven’t actually watched the show.)

That actually sounds kind of sensible? Like, he is a relatively normal “bad person”; but he is being gaslighted by a group of abusive “friends” into believing that he is a crazy bad person; and this is extremely traumatic, enough to cause a weird dissociative fugue in pretty much anyone, A Clockwork Orange style—but this person likely does have specific traumas that are being dredged up here, making this an even more triggering event. So he ends up painting an impressionistic portrait of a moment of feeling like a world-killing Evil Overlord, (a Van Gogh mania in negative emotional valence—something usually more directly reacted to with self-harm, like with Van Gogh himself, or as described in, uh, that Nine Inch Nails album. You know the one. [All of them.])

Oh, and the beliefs of his “friends” are, I would guess, a statement about how society inescapably views mental illness and rotten character as two faces of the same coin. (See: everyone with NPD on Tumblr, who has to go around explaining all day that narcissism doesn’t somehow force you to do bad things to people, and nobody ever believes them and continues to think NPD by itself is a sufficient explanation for e.g. abusive parenting.)

Is this maybe a Poe’s Law thing? Is the story hitting you over the head insufficiently hard with the degree to which it’s implying that this is a societal satire: a portrait of a society that tries to both tell bad people they’re really just broken (medicalizing personality flaws) and broken people that they’re really just bad (moralizing mental illness)?

I like this from a pure “free play of interpretations” angle, but it is pretty inconsistent with the show’s moment-by-moment emotional cues (music, shot framing, etc.), and also inconsistent with stated authorial intent (in the interviews I link here).

I guess it’s conceivable that Noah Hawley is straight-up lying in interviews to maintain the sanctity of a planned, later twist – and this would have to be revealed as a twist, since if it’s “true” it has flown over the heads of every critic out there – but it seems implausible.

OTOH, now I’m thinking back to the end of the first season, when @disconcision predicted the show would eventually reveal the superhero stuff was all in his head the whole time.  Back then I was like “nah,” but after Season 2 I’m starting to think that’s the sort of “shocking” cliche this show would embrace, and it would provide room for Hawley’s statements to be technically true, correctly describing the current trajectory of the superhero narrative and implicitly silent about the mundane-reality narrative – or at least only applicable to it when translated across a bridge of metaphor.

Keep reading

(via tsutsifrutsi)

uncursedslimemold:

nostalgebraist:

I did finally finish the second season of Legion and … hoo boy

I was wondering where they were going the mental illness theme, and uh, they definitely went somewhere with it, that’s for sure!  I kind of wish they hadn’t, now!

On the upside, the last few episodes were emotionally involving, had moments that felt real and raw, and made a (last-minute) attempt to move the show beyond mere stylish randomness.  On the downside, they were a complete mess that felt like two or more distinct storylines jammed together inconsistently and executed too fast, and – more egregiously – contained the most weirdly, brazenly incoherent and unreal portrayal of mental illness I’ve seen in mainstream “serious” fiction in a long time.

Honestly, I’m less angry about it than just plain confused how this thing got into the world in the first place.  Like, do the writers actually expect the audience to share their strange (and factually inaccurate) assumptions?  Are they knowingly straying from reality in favor of a stereotypical cartoon notion of “insanity,” and if so, how (and why) do they expect this to sync up with all the parts of the show that appear to be about real (albeit stylized) things happening to real humans?  (I am a bit angry that the social justice flavored critiques of the ending have taken this stuff completely in stride, but I guess that’s par for the course)

Specifically, the ending involves a long, elaborate set of conflations/confusions between:

1. Common, if awful, personality flaws that people can have without being mentally ill (and many do)

2. Psychopathy

3. Schizophrenia

For every pair of these (1+2, 2+3, 3+1), there are one or more moments where the two are implied to be the same thing, or to be connected by some deep link too obvious to spell out, or the like.  More on this under a cut because spoilers

Keep reading

By your description, this storyline sounds like something written by a person who was victimized by a (1) and then stewed about it for long time afterwards.  Maybe someone who was in a relationship with a person with (1)-type problems who treated them badly.

While ruminating about it afterwards, they found it satisfying to imagine that the person who hurt them was also (2) and (3), that they were a truly twisted monster. (And thus also that there’s no doubt about who was in the right in their conflict.)

Then the person wants to share their thoughts about what a 1+2+3-monster the other person was. So they include such a character in a storyline of a script they’re writing. Then they fantasize about how the other person will watch the show and recognize that the insane, evil character is modeled after them, and be crushed as they realize what a bad person they are.

I doubt it’s that vindictive, as the show spent a lot of time (nearly 2 seasons over 2 years) treating the character as a sympathetic protagonist before this final “twist.”  But it would make sense for it to be driven by bad experiences with David’s brand of (1), since the depiction of that is so convincing at the same time everything else pointedly isn’t.

Having read some interviews with Noah Hawley, I get the sense he really is very confused about how mental illness works IRL – in this interview he seems to think it can be treated by “love,” but in a deterministic way, so that the moment the relationship goes bad, you relapse.  So he’s able to write a real and convincing story about a relationship going bad due to (1), but then he thinks that can cause (2) and (3), which he thinks are the same, and perhaps even the same as (1) in some sense.  There’s probably a more charitable reading of all this, but I’m honestly not sure what it is.

What I’m really curious about, though, is whether Hawley realizes how the (fascinating) final scene actually looked.  More spoilers

Keep reading

(via uncursedslimemold-blog-deactiva)

typicalacademic:

nostalgebraist:

Automatic parsers for natural language are pretty good these days.  I use the spacy one all the time, and although it occasionally makes mistakes, it’s reliable enough that almost all of my parsing-related bugs come from code I put on top of it (or from ungrammatical input).

This makes me very curious why people don’t use them as components in deep learning architectures for text.  For neural machine translation, chatbots, etc., the popular models all use “attention” modules that emphasize certain parts of the (representation of the) input when producing each part of the output, or “self-attention” which a similar thing inside of the encoder and decoder (not between them).  This allows them to sort of learn how syntax works.  But everything is still tied to this idea of a sentence as a “sequence,” where you say “okay, I’m producing word #7, what information do I need to do that?”  

This is a weird question, because “word #7 in a sentence” is not a natural category, and the relevant information depends on what word #7 is doing syntactically, among other things.  (N. B. there are fancier positional encodings than just word #, but they’re all positional.)

If you’ve written the six words “I, who enjoy tasty food, will” then the next word is going to be a verb with word #6 as its auxiliary and word #1 as its subject, and words #2-5 are only relevant for semantic context.  OTOH if you’ve written “When choosing a restaurant, I usually” then the next word will be a verb with word #5 as its subject, word #6 as an adverb, and words #1-4 are only relevant for semantic context.  Etc.

It seems much more natural to have a decoder that makes a syntactic tree piece-by-piece, rather than a sequence, with the words in the tree ending up wherever they have to be.  Likewise, we could have the encoder to take a syntactic tree as input, and perhaps use tree-like structures for the latent representation.  This means we don’t have to learn grammar on top of the rest of the problem domain, it ensures grammatical output, and it gives us representations of long-range dependencies that don’t degrade as we insert arbitrary numbers of words (relative clauses, etc.) in between.  Since we have good automatic parsers, we can automatically make trees to feed to the encoder, and we can automatically make training data for the decoder even if we don’t have a hand-parsed corpus for the problem domain.

If I weren’t so busy I’d be trying this out myself (and probably running into all sorts of unexpected pitfalls, but that’s research for you).

#admittedly this is all only as good as the parser and the parser may well be the kind of model i’m arguing against

yep, spaCy is built on top of those sequence models :P it’s actually a really cool architecture that slightly gets away from the “everything is a sequence” thing: a sequence model/RNN produces a “summary” of the sentence, but those summary vectors then get used to make decisions about how to add each word to the tree structure you’re building. (And people put attention in here too of course.) But most of the processing still happens in the sequence model, with very generic rules to help ensure the tree ends up semi-grammatical.

That said, using syntax features is really useful and a lot of neural models do actually still do it for more complicated tasks. Giant neural stacks just sound cooler and get more press. (also they work better right now but I feel like that’s at least partly a byproduct of the hype which shifts research focus, not the cause)

@disconcision​ said: any engagement with the object-level is apparently considered cheating

I mean, there’s reasons for that! Grammar is complicated and has lots of exceptions, and it’s different for every language. Good parsers for English are the results of insane amounts of effort, both exhaustive-search-via-grad-student for the best techniques and vast amounts of linguistic annotation. If that effort hasn’t been put into another language—say Hindi—then your parser sucks and will put an upper bound on the accuracy of anything you try to do with it.

Yeah, that all makes sense.  I guess what really frustrates me is the current state of affairs for people (like me) who want to use these technologies to do things.

The vast majority of ~fancy neural~ stuff out there, both in available pre-trained models and even papers I read, is entirely end-to-end.  There are exceptions, like using Inception features as input to some other thing, but most of the time (certainly in the neural NLP stuff I know about) it seems like we treat every task as completely distinct and train it end-to-end.

This is fine if you want to do exactly what some group of researchers have already done with a neural model (although if they haven’t made pre-trained weights available, training data may be a problem), but usually you aren’t, and having so little freedom to compose anything is weird and frustrating.  I wish there was more interest in neural components that consume or produce things other than “end” input and output.  Kinda feels like a world with no APIs or libraries where we have to rewrite all functionality from scratch to make one product, and then again from scratch to make the next.

(ETA: I guess pretrained word embeddings are one exception, so that’s nice)

otter4hwpdumplings:

nostalgebraist:

@slatestarscratchpad​‘s new post on stimulant prescribing and ADHD is good.

One thing I’m curious about that was not addressed in the post is the role, in all of this, of computerized tests – specifically, “continuous performance tests.”

I had to take one of these – the TOVA (Test of Variables of Attention) – when I went in to get tested for ADHD in 2014.  (I was in grad school at the time, and wanted to get tested for the same reasons as the “Senior Regional Manipulators Of Tiny Numbers” Scott talks about.)  The tester said I didn’t have ADHD, and at the time I assumed my normal TOVA results weighed heavily in her decision, and (also) that this was normally how such things were decided.

But Scott’s post makes it sound like the usual procedure is a lot more of a human judgment call.  He mentions a variety of things that prescribers do to make themselves feel better about their decisions, but none of them are “administer a computerized test with no human oversight and always follow what it says (or always do so unless you can think of a really good reason not to).”  If nothing else, this would certainly reduce worries about human biases.

I say “if nothing else” there because the same thing would be true of any such test, even if it had no diagnostic value at all.  (Then your decisions would suck – but even then, not because of your biases!)  However, tests like the TOVA may indeed have a lot of diagnostic value.  That is, they may have good sensitivity and specificity in discriminating controls from people with ADHD diagnoses***.

(There are even some studies showing it can discriminate these groups from people who are “faking bad,” i.e. malingering.  This makes some sense if the distribution is light-tailed, e.g. normal, so that that if you overdo your faking by just a little bit you’ll stray from a region where 5% of the population lives to a region where 0.01% of it does.)


For one thing, if this is true, it means that we could just automate the whole process and get roughly the same results we were getting before, but without worries about human factors getting in the way.

Additionally, if true this is scientifically interesting, in part because of what it says about existing (non-computerized) diagnostic techniques.  Scott’s post describes a very fuzzy, human process with a lot of variation between clinicians.  But apparently this process has enough reliability to agree with a computerized test a lot of time, which would not be a priori obvious.

Moreover, if (as Scott says) ADHD is one extreme of a continuous/unimodal distribution, then we could use the TOVA to figure out where clinicians are already implicitly setting the cutoff.  Scott writes:

We could still have a principled definition of ADHD. It would be something like “People below the 5th percentile in ability to concentrate, as measured by this test.”

We aren’t doing this, but what we are doing may be accidentally similar to it.  The Schatz et. al. 2001 study, discussed further below, includes an ROC curve showing us how many false and true positives we get for various thresholds.  The thresholds are for “T scores,” which are apparently like z-scores except the mean is set to 50 and the SD to 15, so that e.g. a threshold of 65 (the recommended one) means you say everyone who’s 1.5 SDs or more above the mean of the reference population has ADHD.

If everything were normally distributed, you could get quantiles out of this, and translate clinical behavior into cutoffs separating X% of the population from  (100-X)% of it.  (Well, sort of – the “reference population” here is neither the full population nor the non-ADHD population, it’s sort of a mixture determined by the selection criteria used to make the normative stats.)  Of course, as usual, the people who made the reference stats don’t say anything about whether the distribution was normal.  But this kind of analysis could be done by someone, in principle, anyway.


(***Caveat: the most widely cited study I could find on this was is Forbes 1988, which – astonishingly – was not blinded.  That is, the TOVA was administered in the process of making the diagnostic decisions against which it was later compared, and were [Forbes’ words] “usually known before the final diagnosis was made.”  Forbes goes on to claim that different TOVA results would not have flipped any of the diagnoses, to which my reaction is “okay, great, so if that was true, why did you show them to the clinicians at all?

However, there are also studies like Schatz et. al. 2001 that give the TOVA to people who have already had a formal diagnosis done before the study started, and also to controls.  There are still worries like “are we sure the original diagnoses didn’t use the TOVA or a similar test?” and “given our screening procedures for controls, what base rate of undiagnosed ADHD should we expect in our control population, i.e. how sure are we that some of our control ‘false positives’ weren’t true positives?”, so I still am not impressed with the evidence quality I’ve seen.  That said, if you grant for the sake of argument that Schatz et. al. did things right, they get good sensitivity/specificity results too.  Oddly, they interpret their results as bad news for the TOVA, on the basis that it does worse than a test based on parent ratings, but since the original diagnoses themselves involved parent ratings, this doesn’t seem like a fair/useful basis for comparison.)

I use executive function tests like the TOVA in my research. The idea of placing anything except a very small amount of weight on their results for the purposes of a diagnosis makes me pretty uncomfortable.

Most good executive function tasks have low between-subjects variability (like the TOVA, Go-NoGo task, Flanker task, etc), but this is also why they make pretty poor tools for establishing clear individual differences. This idea was explored quite explicitly in a recent paper (Hedge, Powell, & Sumner, 2017), where they evaluated the variance and test-retest reliability of seven commonly used response tasks.

You should honestly consider getting re-evaluated, if you believe that the TOVA was the primary diagnostic tool used to diagnose you. “Real” Adult ADHD diagnoses include parent interviews, several scales (e.g., Brown ADD scales, non-ADHD tests, etc), a fairly comprehensive assessment of your personal background, and so forth.

Also, a cursory survey of the sample sizes for these TOVA studies is pretty damning. Any individual difference study with a sample size under 100 (per group) should be thought of as only preliminary.

I also want to push back on @slatestarscratchpad​‘s apparent trivializing of the DSM for the purposes of diagnosis, although this is only done kind of facetiously (I hope, anyway). The potential for people to malinger the DSM is altogether irrelevant when your main objective is to correctly diagnose individuals who do genuinely suffer from some kind of mental illness. Symptom clusters are, at present, the best tool we have to diagnose individuals and recommend appropriate treatments. In regards to the idea that ADHD could be defined as “people below the 5th percentile in ability to concentrate, as measured by this test“, that test will probably never exist for any mental illness ever. Ever. There is not a single neuropsychological test today for any mental illness that is better or even near to being as good at diagnosing an individual–or customizing their treatment–compared to symptoms and symptom clusters. Because identical symptoms and symptom clusters emerge out of wide and even non-overlapping range of breathtakingly complex neurocognitive abnormalities, the likeliness that we will stumble on some test that correctly diagnoses the cluster of symptoms we call OCD 99% of the time, or even 95% of the time, is low.

Granted, mere symptoms are still not good enough to help people get the right treatments, which is why there is a massive push among researchers to get clinicians and clinical researchers to consider mental illness with the kind of approach seen in the NIMH’s Research Domain Criteria (RDoC) project. Abandoning the categorical approach of the DSM (”ADHD”, “unipolar depression”, etc) will not only do more to actually help patients treat their symptoms, but even has the potential to solve the issue of bullshitting/malingering from all the Senior Regional Manipulator Of Tiny Numbers trying to extract drugs from their exhausted psychiatrists in one fell swoop.

Oh my god I need to lie down.

A few reactions:

(1) Thanks for the link to the Hedge, Powell, & Sumner paper – looks very interesting.

(2) When I said I thought my (non-)diagnosis was largely based on the TOVA, I don’t mean that the evaluator just did a quick TOVA and sent me on my way.  She did a bunch of stuff – including an intelligence test (prorated WAIS), getting questionnaires (BAARS-IV) from me and my father and my girlfriend, some other tests, and a conversation about my personal and mental history – and sent me a 9-page report on all of it afterwards.

From my perspective, though, most of this was clearly kinda useless.  She dutifully collected a lot of different kinds of information, but on the evidence of the written report, she didn’t use it to form some sophisticated multi-dimensional view of my case.  In a way, the opposite was true: if she had spent the entire several-hour interaction looking at exactly one aspect of my case, she might have been able to drill down into subtle details, but since she broke the interaction up into many smaller bits, each bit was – of necessity – a lot shallower.

For instance, on the questionnaires, each of the three respondents (me/girlfriend/father) gave markedly different answers from the other two, but instead of diving further into this discrepancy, she just noted it and went on with her interpretations.  Likewise, she had trouble reconciling my appearance of high life satisfaction in the interview with my relatively dark answers on an emotional functioning questionnaire, but rather than explore that further, she just decided on an interpretation (roughly, “he has a lot of problems but is unusually OK with that state of affairs”) and ran with it in the report.  And so on.

Now, perhaps this was just a bad clinician, and what she gave me was still not a “real” adult ADHD test.  But everything I said above could apply just as well to an earlier neuropsych evaluation I had as a teenager (not for ADHD), and to evaluations I’ve heard about from friends.  By which I mean, even if there’s a Right Way to do this stuff, I don’t think I trust actual working clinicians to execute it reliably in the real world.  (This is not necessarily an insult; they’re busy and there are a lot of people out there to treat.)

This is all a roundabout way of saying that I had hoped her assessment was largely based on the TOVA, since the whole “holistically integrate many streams of information” thing clearly failed, as I’ve seen it do in other cases, and pretty much expect it to do in the typical real world case.  A simple computerized test, or a set of them, may be worse than an evaluation done the Right Way by an ideal practitioner – but as a patient I can only access real practitioners, not ideal ones, and I’m not sure I trust them any more than I’d trust some well-designed but completely automatic test.  (Probably less, TBH.)

(3) Relatedly – I don’t think @slatestarscratchpad is arguing against symptom clusters.  He’s talking, in part, about how the understanding of “the ADHD symptom cluster” which is actually applied in practice does not fit the science very well, which seems like the same kind of  concerns that motivates RDoC.

Whether or not scientifically motivated mental illness categories will ever be diagnosable via “a single test” seems largely to depend on what we count as “a single test.  I take your point that a single neuropsychiatric test, in the sense we currently understand the phrase “neuropsychiatric test,” is not going to be fully diagnostic, because mental illnesses involve more than one dimension of neuropsychiatric function.  But that doesn’t mean it isn’t possible to take our best understanding of all the dimensions involved, distill it, and make a brief effective diagnostic tool that would fit the normal English meaning of the phrase “a single test.”  Cf. Scott’s old post “Does the Glasgow Coma Scale exist? Do comas?” (although I still disagree with him about the IQ case specifically).

(via otter4dumplings)