this sounds fantastic????
:D
I highly recommend giving it a try – I went into it with a “I’ll never finish this but it might be fun for 50 pages” attitude, and then never looked back once I’d started

this sounds fantastic????
:D
I highly recommend giving it a try – I went into it with a “I’ll never finish this but it might be fun for 50 pages” attitude, and then never looked back once I’d started
Finally finished A Glastonbury Romance, just in time for the new year. You may remember this as the book I kept #quoting in ways that made people curious where the quotes were from – I know @baroquespiral asked, I think there was at least one other person but I can’t remember who.
Anyway, I wrote a review, which you can read here if you like
Back when I was an undergrad, I remember becoming convinced at some point that the “Generalized Stokes’ Theorem” was just a dirty trick which involved setting up non-obvious definitions with an eye to getting the curl and divergence theorems as special cases, and then claiming that the consequences of those definitions for higher dimensions were correct.
(Sort of analogous to fitting a line through two points and then claiming that since the line fits the two points, the sample from which the two points were taken must fit along the line. [Or, substitute almost any curve you like for “line” here.])
But by the time I became mature enough to realize that I must not be the only person in history to have thought of this “brilliant” idea, I didn’t remember enough about differential forms to really decide whether it made sense. Does anyone know whether this objection has been developed seriously?
If you’re suggesting that the proof of the Generalized Stokes’ Theorem involves a sleight of hand where you prove a few cases and then claim you’ve completed the proof in full generality, this isn’t true. Rigorous proofs of the Generalized Stokes’ Theorem exist.
If you’re suggesting that the only interesting cases of the Generalized Stokes’ Theorem are the divergence and curl theorems, this also isn’t really true. I don’t know much about the physics applications, but in differential geometry you need the theorem in its full generality to get an understanding of de Rham cohomology and connect it to other cohomology theories.
I’m not sure either of those is actually what you’re getting at, though. If it’s something else, could you clarify?
I don’t mean either of those. What I mean is: imagine you’re a physicist or mathematician sometime around 1800, and “differential geometry” in its modern sense isn’t really something you know about. But the divergence and curl theorems are known to be useful.
So now you might think, “maybe there’s some broader structure that includes these two things as special cases. I wonder what it is?” The question is: would you then be led, uniquely, to differential forms and the Generalized Stokes’ Theorem? If all you knew was that you wanted to come up with “a generalization” of those two theorems, is that the only nontrivial one you could come up with?
Statements like “you need it to understand de Rham cohomology” aren’t relevant here because if that isn’t the only possible (nontrivial) generalization, then there might be other directions you could go with might get you things that aren’t quite the same as differential forms. If differential forms are just one of the directions you can go, then “it helps you understand stuff based on differential forms” doesn’t justify going in that particular direction. (If we’d gone in some other direction, we’d be saying “well, it helps us understand [other thing based on something else]!”)
I hope this is not too unclear. I’m pretty tired
This exposition by Terence Tao (http://www.math.ucla.edu/~tao/preprints/forms.pdf) makes what I think is a pretty convincing case that trying to generalize the idea of a signed integral leads you naturally to differential forms. They’re not just an algebraic magic trick that happened to be useful in other ways. This doesn’t mean they’re the only natural way to generalize the curl and divergence theorems, though. All I can say for that is that I’ve never heard of any others.
Reblogging this to mention that I finally understood this a little while ago, while reading Lawrie’s Unified Grand Tour of Theoretical Physics. Specifically, I want to note that I wasn’t really noticing anything deep in the above posts, just not fully understanding differential forms.
What had always bothered me was the (-1)^k in the the product/derivation rule for the exterior derivative d. This flips a sign when the dimension of the odd but not when it is even, which struck me as the sort of contrivance one might introduce if trying to “fit a curve through” the “data points” given by pre-existing results for the cases k=1 (FTC), k=2 (Kelvin-Stokes theorem), and k=3 (divergence theorem).
I was aware that if you simply excluded the (-1)^k, the output would no longer be a differential form, because differential forms have to be antisymmetric. But this just pushed my suspicion back to the antisymmetry of differential forms.
However, I had also independently accepted that it made perfect sense (for non-“curve fitting”-related reasons) to make differential forms antisymmetric. I just never remembered I had done so when I was going through the line of thought just described.
(Differential forms need to be antisymmetric because – as Tao explains very clearly – we are trying to formalize integrals over oriented regions. To compute a flux, say, you need a surface normal, and if you want to describe the computation by specifying an area element, that element needs to know [implicitly] about the surface normal. I was an undergrad physics major so this all feels very familiar to me.)
So what was missing from my “curve fitting” argument was that we are fitting a curve through three points with an extra constraint, which is forced on us by other considerations. This still doesn’t pin down the result uniquely, since (just for the sake of argument) we could define a different exterior derivative with stupidly contrived extra terms that equal zero when k<4. But that really would be contrived, unlike the (-1)^k thing, which fits our 3 data points and our constraint, doesn’t treat any particular k as special, and enables further concepts that work nicely in arbitrary dimension. So I’m fully satisfied now.
Yeah, that’s been my impression. His nationwide fame did begin with The Art of The Deal, but The Apprentice and other media appearances spread the same image of Trump (“emblem of the successful tycoon” as this article put it) to an even wider audience.
Even then, Trump’s fame was of course as “successful tycoon” and not “TV actor,” but the main vector for the spread of this impression was media appearances.
(Celebrity chefs seem like a close analogy; yes, they are famous “as chefs” and not just as TV stars, but most of us have never eaten their cooking or even read reviews of it, and are aware of them as chefs only via seeing them on TV.)
anon you have a potentially good point, and an admirable knack for grandiloquent insults, but alas these two qualities alone do not make for a person worth engaging with on the internet
nonspoilery Terra Ignota thought: the characterization in Will to Battle was leagues better than the earlier volumes. in TLtL and especially Seven Surrenders, it often felt like there were a billion characters to keep track of, but most of them felt like the same unlikeable personality, whereas here the characters seem both way more differentiated and way more sympathetic
I’m still only 2/3 of the way through, but I agree. What has especially struck me about Will to Battle is that it’s the first book where I feel at home in the fictional world, able to make my own evaluations of characters and situations without worrying that another paradigm shift will hit in 20 pages and rewrite all the rules I thought I knew.
From the characters’ perspective, WtB contains by far the most earthshaking and disorienting events, but from my perspective the trajectory was the opposite. The first book was the most disorienting, despite retrospectively being mostly setup for the real action (!), while now the characters are experiencing a wilder ride than ever while I’m luxuriating in the ability to finally parse this series the way I would any ordinary novel. And – perhaps in anticipation of this effect – WtB works harder than the other books to supply the ordinary-novel sorts of joys, like well-differentiated characters that elicit strong emotional responses (and oh god, do they ever).
The following seemed mildly interesting when it occurred to me, YMMV:
The idea of President Donald Trump would be shocking to many/most people 10 or 15 years ago, and part of their shocked reaction would stem from the sense that Trump is not a “serious person.” We expect reality TV stars (even if they are also real-life businessmen) to be the sort of people satisfied with the insulated pseudo-importance of reality TV stardom. Not the sort of people who would reach further, for the importance that comes from holding the levers of real power, from controlling life outside the TV.
(Trump did run for president in 2000, but I didn’t know it at the time, and I don’t think many did.)
That much is, I think, obvious. What is not as obvious is that the idea of Russian meddling in the election would ping the same absurdity sensor. Because people are worried about Russia having swung the election by doing things like spamming a web forum (Facebook), and 10 or 15 years ago, web forums were also part of the “unserious world.”
(Indeed everything about Facebook would have had this sense of disorienting unseriousness about it, from its sophomoric origins as a tipsy college nerd’s Hot or Not website to the way that nerd, now one of our foremost business magnates, seems perpetually fixed in a state of looking about 15 years old.)
(Sort of inspired by this post by @deusvulture)
“people are worried about Russia having swung the election by doing things like spamming a web forum” to the tune of $100,000 over 3 years. I’m fairly certain I know at least three specific individuals who had more impact on the election (in expectation) than Russia’s ad buy.
The idea of Donald Trump being president would have been shocking to many people 14 months ago, and to me it *still* has a sense of disorienting unseriousness to it. I can only infer that about 48% of the population doesn’t regard him as unserious, perhaps thinking of him as a businessman rather than a reality tv host?
Tbh, to the extent that I had an opinion about them at all, many media appearances by past presidents have struck me as cringey or otherwise unserious, though I tended to assume that they were the devices of younger staffers rather than their own hearts’ desires. (Obama in particular might have been the most introverted president since Coolidge, and it’s hard to imagine him coming up with dancing for a daytime TV audience on his own.)
I wonder if the difference is that past unseriousnesses took place on prestige or at least neutral media outlets (SNL, Ellen, &c) whereas Trump is into more red-tribey stuff (reality TV, pro wrestling). To the extent that the media market continues to fragment, providing fewer consensus outlets, opportunities will increase for each tribe to take the other’s media spots as merely further proof of their unseriousness.
Lurking somewhere in the background is the fact that, without separate heads of government and heads of state, the head of government must embody the National Seriousness singlehandedly. Having Oprah be Queen of the United States of America would open up more space for a soporific technocrat to rule by her consent as President, but oh well.
Or (Thiel mode), Trump is unserious, but popular sovereignty is also unserious, so we might as well just formalize it.
Looking at Trump’s TV appearances vs. (say) Obama’s misses the elephant in the room, which is that Trump was most famous for his TV appearances when he announced his candidacy, while Obama and most (?) past presidents were famous for being politicians when they announced theirs.
Whether or not Obama looked “serious” dancing on Ellen in 2007 (or whatever), the public’s impression of his “seriousness” was formed mostly by things other than daytime TV. (Indeed, this kind of awkwardness from politicians is often seen as almost a kind of humblebrag – of course they’re not naturals at this, because they’re used to more “serious” pursuits.)
So I don’t think we need to look at differences between TV shows to make sense of this. Nonetheless, there’s something involving differences between TV shows that I find interesting here, and I haven’t found anywhere else to put it, so here goes:
People tend to speak of “reality TV” as a monolithic block, but IME the category lumps together two very different things. (I’m sure there are like 1000 thinkpieces on this, but I don’t know of any of them.)
First, we have shows with a documentary pretense – shows that purport to provide a window into real things the viewer might be curious about, such as industries (Kitchen Nightmares, Bar Rescue, Cake Boss, Hotel Impossible), recreational pursuits (Storage Wars, Toddlers & Tiaras), or special demographics of one kind or another (Kardashians, Duck Dynasty, Sister Wives, Little People Big World).
Some of these are much worse than others – some of them are utterly terrible – but they all share a dedication to maintaining suspension of disbelief. Although “we all know” these shows involve writerly oversight and don’t really present unedited reality, they at least look real. If I let myself forget my skepticism, it’s quite easy to get my brain to think I’m watching an honest documentary, at least for the span of an episode. I honestly find a lot of these shows really entertaining, and can see why they’re so popular.
But there’s a second category as well: shows in which everyone, including the audience, knows that the premise is something unique contrived for the purposes of the show. This includes most shows with the “voting off the island” structure, including The Apprentice. These shows tend to have a lot of drama, but the viewer is not meant to think they are being shown the real drama taking place in some corner of the human world. These shows aren’t exploring some quaint cultural tradition of “voting people off islands”; they impose that structure on their participants.
IME these shows tend to be far worse than the former category. In particular, the acting tends to be terrible – although it would be possible to make a show like this that suspends disbelief, these ones generally don’t, for whatever reason.
If you’ve only heard the premise of The Apprentice, you might be surprised to find it in this category. Isn’t it supposed to be about a real (if unusual) event – an actual businessman looking for an actual new apprentice? In fact I had always assumed this show was more like the ones I listed in the first category, and I was kinda shocked when I actually sat down and watched some episodes of it.
The Apprentice makes almost no pretenses of being about real business work, even though the business-related first-category shows demonstrate that it is possible to do this convincingly. In each episode, the characters are asked to pull off some business-related project entirely by themselves – from management to marketing to physical labor – with nothing like the compartmentalized division of labor used by real companies. (In one episode I watched, two groups were tasked with “a new product campaign for Burger King,” which in practice meant deciding on a new burger to promote, designing a marketing angle, and then running the product opening at a single franchise, with one guy outside the building hawking the burger and others doing various jobs inside the restaurant. Trump yelled at one group for not assigning enough people to be cashiers.) The tasks feel more like high school group projects than anything. And the drama is mostly generated by who each contestant personally dislikes (and wants to get voted off the show next).
To relate this back to the original discussion, if only tangentially: I think it is possible to discuss the badness of The Apprentice (and it is really bad) without making it about red vs. blue stuff. The Apprentice is bad in the way second-category shows are bad, and I don’t think the first/second category distinction has an overall red/blue valence.
Duck Dynasty, whose viewership is famously skewed toward the South (and toward Trump voters), is a first-category show. I’ve never actually watched it, but I remember having a lot of fun watching Sister Wives, whose viewership also skews toward red states (even if you discount Utah, where it is especially popular for obvious reasons). Another first-category show I’ve enjoyed is Bar Rescue, a show whose premise (a tough-talking dude gives brutally honest advice to failing bars) doesn’t exactly scream “for coastal liberals,” and whose audience appears to span the red/blue divide.
For comparison, here’s the Google Trends data for The Celebrity Apprentice (which I’m using because the original Apprentice stopped airing before Google “improved its geographical assignment” on 1/1/11 – I don’t know what that means exactly, but it sounds relevant).
(There’s probably something sociologically relevant about which of these shows are on cable, but I’ve already thought about this stuff enough for one day.)
(via lambdaphagy)
@slatestarscratchpad‘s new post on stimulant prescribing and ADHD is good.
One thing I’m curious about that was not addressed in the post is the role, in all of this, of computerized tests – specifically, “continuous performance tests.”
I had to take one of these – the TOVA (Test of Variables of Attention) – when I went in to get tested for ADHD in 2014. (I was in grad school at the time, and wanted to get tested for the same reasons as the “Senior Regional Manipulators Of Tiny Numbers” Scott talks about.) The tester said I didn’t have ADHD, and at the time I assumed my normal TOVA results weighed heavily in her decision, and (also) that this was normally how such things were decided.
But Scott’s post makes it sound like the usual procedure is a lot more of a human judgment call. He mentions a variety of things that prescribers do to make themselves feel better about their decisions, but none of them are “administer a computerized test with no human oversight and always follow what it says (or always do so unless you can think of a really good reason not to).” If nothing else, this would certainly reduce worries about human biases.
I say “if nothing else” there because the same thing would be true of any such test, even if it had no diagnostic value at all. (Then your decisions would suck – but even then, not because of your biases!) However, tests like the TOVA may indeed have a lot of diagnostic value. That is, they may have good sensitivity and specificity in discriminating controls from people with ADHD diagnoses***.
(There are even some studies showing it can discriminate these groups from people who are “faking bad,” i.e. malingering. This makes some sense if the distribution is light-tailed, e.g. normal, so that that if you overdo your faking by just a little bit you’ll stray from a region where 5% of the population lives to a region where 0.01% of it does.)
For one thing, if this is true, it means that we could just automate the whole process and get roughly the same results we were getting before, but without worries about human factors getting in the way.
Additionally, if true this is scientifically interesting, in part because of what it says about existing (non-computerized) diagnostic techniques. Scott’s post describes a very fuzzy, human process with a lot of variation between clinicians. But apparently this process has enough reliability to agree with a computerized test a lot of time, which would not be a priori obvious.
Moreover, if (as Scott says) ADHD is one extreme of a continuous/unimodal distribution, then we could use the TOVA to figure out where clinicians are already implicitly setting the cutoff. Scott writes:
We could still have a principled definition of ADHD. It would be something like “People below the 5th percentile in ability to concentrate, as measured by this test.”
We aren’t doing this, but what we are doing may be accidentally similar to it. The Schatz et. al. 2001 study, discussed further below, includes an ROC curve showing us how many false and true positives we get for various thresholds. The thresholds are for “T scores,” which are apparently like z-scores except the mean is set to 50 and the SD to 15, so that e.g. a threshold of 65 (the recommended one) means you say everyone who’s 1.5 SDs or more above the mean of the reference population has ADHD.
If everything were normally distributed, you could get quantiles out of this, and translate clinical behavior into cutoffs separating X% of the population from (100-X)% of it. (Well, sort of – the “reference population” here is neither the full population nor the non-ADHD population, it’s sort of a mixture determined by the selection criteria used to make the normative stats.) Of course, as usual, the people who made the reference stats don’t say anything about whether the distribution was normal. But this kind of analysis could be done by someone, in principle, anyway.
(***Caveat: the most widely cited study I could find on this was is Forbes 1988, which – astonishingly – was not blinded. That is, the TOVA was administered in the process of making the diagnostic decisions against which it was later compared, and were [Forbes’ words] “usually known before the final diagnosis was made.” Forbes goes on to claim that different TOVA results would not have flipped any of the diagnoses, to which my reaction is “okay, great, so if that was true, why did you show them to the clinicians at all?”
However, there are also studies like Schatz et. al. 2001 that give the TOVA to people who have already had a formal diagnosis done before the study started, and also to controls. There are still worries like “are we sure the original diagnoses didn’t use the TOVA or a similar test?” and “given our screening procedures for controls, what base rate of undiagnosed ADHD should we expect in our control population, i.e. how sure are we that some of our control ‘false positives’ weren’t true positives?”, so I still am not impressed with the evidence quality I’ve seen. That said, if you grant for the sake of argument that Schatz et. al. did things right, they get good sensitivity/specificity results too. Oddly, they interpret their results as bad news for the TOVA, on the basis that it does worse than a test based on parent ratings, but since the original diagnoses themselves involved parent ratings, this doesn’t seem like a fair/useful basis for comparison.)
I use executive function tests like the TOVA in my research. The idea of placing anything except a very small amount of weight on their results for the purposes of a diagnosis makes me pretty uncomfortable.
Most good executive function tasks have low between-subjects variability (like the TOVA, Go-NoGo task, Flanker task, etc), but this is also why they make pretty poor tools for establishing clear individual differences. This idea was explored quite explicitly in a recent paper (Hedge, Powell, & Sumner, 2017), where they evaluated the variance and test-retest reliability of seven commonly used response tasks.
You should honestly consider getting re-evaluated, if you believe that the TOVA was the primary diagnostic tool used to diagnose you. “Real” Adult ADHD diagnoses include parent interviews, several scales (e.g., Brown ADD scales, non-ADHD tests, etc), a fairly comprehensive assessment of your personal background, and so forth.
Also, a cursory survey of the sample sizes for these TOVA studies is pretty damning. Any individual difference study with a sample size under 100 (per group) should be thought of as only preliminary.
I also want to push back on @slatestarscratchpad‘s apparent trivializing of the DSM for the purposes of diagnosis, although this is only done kind of facetiously (I hope, anyway). The potential for people to malinger the DSM is altogether irrelevant when your main objective is to correctly diagnose individuals who do genuinely suffer from some kind of mental illness. Symptom clusters are, at present, the best tool we have to diagnose individuals and recommend appropriate treatments. In regards to the idea that ADHD could be defined as “people below the 5th percentile in ability to concentrate, as measured by this test“, that test will probably never exist for any mental illness ever. Ever. There is not a single neuropsychological test today for any mental illness that is better or even near to being as good at diagnosing an individual–or customizing their treatment–compared to symptoms and symptom clusters. Because identical symptoms and symptom clusters emerge out of wide and even non-overlapping range of breathtakingly complex neurocognitive abnormalities, the likeliness that we will stumble on some test that correctly diagnoses the cluster of symptoms we call OCD 99% of the time, or even 95% of the time, is low.
Granted, mere symptoms are still not good enough to help people get the right treatments, which is why there is a massive push among researchers to get clinicians and clinical researchers to consider mental illness with the kind of approach seen in the NIMH’s Research Domain Criteria (RDoC) project. Abandoning the categorical approach of the DSM (”ADHD”, “unipolar depression”, etc) will not only do more to actually help patients treat their symptoms, but even has the potential to solve the issue of bullshitting/malingering from all the Senior Regional Manipulator Of Tiny Numbers trying to extract drugs from their exhausted psychiatrists in one fell swoop.
Oh my god I need to lie down.
A few reactions:
(1) Thanks for the link to the Hedge, Powell, & Sumner paper – looks very interesting.
(2) When I said I thought my (non-)diagnosis was largely based on the TOVA, I don’t mean that the evaluator just did a quick TOVA and sent me on my way. She did a bunch of stuff – including an intelligence test (prorated WAIS), getting questionnaires (BAARS-IV) from me and my father and my girlfriend, some other tests, and a conversation about my personal and mental history – and sent me a 9-page report on all of it afterwards.
From my perspective, though, most of this was clearly kinda useless. She dutifully collected a lot of different kinds of information, but on the evidence of the written report, she didn’t use it to form some sophisticated multi-dimensional view of my case. In a way, the opposite was true: if she had spent the entire several-hour interaction looking at exactly one aspect of my case, she might have been able to drill down into subtle details, but since she broke the interaction up into many smaller bits, each bit was – of necessity – a lot shallower.
For instance, on the questionnaires, each of the three respondents (me/girlfriend/father) gave markedly different answers from the other two, but instead of diving further into this discrepancy, she just noted it and went on with her interpretations. Likewise, she had trouble reconciling my appearance of high life satisfaction in the interview with my relatively dark answers on an emotional functioning questionnaire, but rather than explore that further, she just decided on an interpretation (roughly, “he has a lot of problems but is unusually OK with that state of affairs”) and ran with it in the report. And so on.
Now, perhaps this was just a bad clinician, and what she gave me was still not a “real” adult ADHD test. But everything I said above could apply just as well to an earlier neuropsych evaluation I had as a teenager (not for ADHD), and to evaluations I’ve heard about from friends. By which I mean, even if there’s a Right Way to do this stuff, I don’t think I trust actual working clinicians to execute it reliably in the real world. (This is not necessarily an insult; they’re busy and there are a lot of people out there to treat.)
This is all a roundabout way of saying that I had hoped her assessment was largely based on the TOVA, since the whole “holistically integrate many streams of information” thing clearly failed, as I’ve seen it do in other cases, and pretty much expect it to do in the typical real world case. A simple computerized test, or a set of them, may be worse than an evaluation done the Right Way by an ideal practitioner – but as a patient I can only access real practitioners, not ideal ones, and I’m not sure I trust them any more than I’d trust some well-designed but completely automatic test. (Probably less, TBH.)
(3) Relatedly – I don’t think @slatestarscratchpad is arguing against symptom clusters. He’s talking, in part, about how the understanding of “the ADHD symptom cluster” which is actually applied in practice does not fit the science very well, which seems like the same kind of concerns that motivates RDoC.
Whether or not scientifically motivated mental illness categories will ever be diagnosable via “a single test” seems largely to depend on what we count as “a single test. I take your point that a single neuropsychiatric test, in the sense we currently understand the phrase “neuropsychiatric test,” is not going to be fully diagnostic, because mental illnesses involve more than one dimension of neuropsychiatric function. But that doesn’t mean it isn’t possible to take our best understanding of all the dimensions involved, distill it, and make a brief effective diagnostic tool that would fit the normal English meaning of the phrase “a single test.” Cf. Scott’s old post “Does the Glasgow Coma Scale exist? Do comas?” (although I still disagree with him about the IQ case specifically).
(via otter4dumplings)
There are 2 great fiction books which I’ve arbitrarily decided I want to finish before the end of the year, one of which is extremely exciting and engrossing and the other of which I only have 20 pages left in (and it’s good too)
And yet crystalizing this for myself as a goal has suddenly made me feel like aimlessly browsing the internet to avoid reading, for all “responsibilities” must be avoided, this is the inviolable way of things
#I also had not heard of the mmpi
If you knew anyone who worked at the reactor, they all had to take it in the process of becoming operators. So you probably know a number of people who have taken it.
This is the one where they ask you whether you would enjoy the career of a florist?
Yes. IIRC it also asks you if you would enjoy repairing door hinges? (In both cases the “mentally healthy” answer is “no”)
….I think I would definitely enjoy repairing door hinges? Hinges are nice, it’s nice when they work properly. This sounds like a goofy caricature of actual psychology.
(”Would you enjoy repairing door hinges?”
“Yeah, probably.”
“Do you like cupcakes?”
“Sure.”
“How many Mountain Goats–”
“Do you make up these questions, or do they write ‘em down for you?”
“You’re in a desert. You look down and see a tortoise…”)What I’ve heard is that these questions were deliberately put (and kept) on the test because they turn out to have a high correlation with the presence or absence of certain mental illnesses, but not in a way that is obvious to non-experts, so that if you were deliberately trying to look like you did or didn’t have a condition, you wouldn’t know which answer to pick.
I dug around a bit on the internet to find a scoring key for the test. The two MMPI-2 questions are:
74. I would like to be a florist.
465. I like repairing a door latch.It seems that “florist” only counts towards one of the ten clinical scales: Scale 5, a.ka. the “Mf” / Masculine-Feminine scale. Wanting to be a florist counts as feminine. The “door latch” question doesn’t count towards any of the ten scales.
The original source for both questions is a pioneering masculinity-feminity test from 1936, the Attitude-Interest Analysis Test by Terman and Miles. Here’s a description from Martin and Finn:
Terman and Miles pioneered the use of sex differences in item responses as the basis for measuring “mental masculinity and femininity.” Although they expressed reservations about relying solely on sex differences as the basis of masculinity-femininity, they did so, and this approach was subsequently used extensively by others in the development of masculinity-femininity measures. Items were selected for their Attitude-Interest Analysis Test (AIAT) if they showed significant male-female endorsement differentials. […]
Exercise 5 first asks subjects to rate whether they would like certain types of work, such as architect (masculine), nurse (feminine), florist (feminine), optician (feminine), preacher (feminine), and bookkeeper (feminine). The next section of Exercise 5 asks subjects if they like certain types of people, such as men with beards (masculine), infidels (masculine), very forgiving people (feminine), and very quiet people (masculine). The next section of this exercise asks about liking Charlie Chaplin (masculine), movie love scenes (feminine), adventure stories (masculine), dramatics (feminine), civics (feminine), hunting (masculine), Drop the Handkerchief (feminine), and repairing a door latch (masculine). (You may notice that a number of these items found their way into the MMPI.)
Exactly how they made it to the MMPI is apparently a bit unclear. Martin and Finn again:
A preliminary version of Scale 5 was presented in the original 1942 MMPI manual and then again in the revised manual published in 1943 by Hathaway and McKinley. Although Hathaway was interested in studying various forms of sexual deviance, apparently this was not the sole or even major motivation behind developing the scale (in contrast to what many contemporary books report). The 1942 manual labels the sixty-item scale “The Interest Scale” and states that it was constructed to assess “the tendency toward masculinity or femininity of interest pattern … a high score indicates a deviation of the basic interest pattern in the direction of the opposite sex” (p. 8). The 1942 manual then elaborates: “The Mf score is often important in reference to vocational choice. Generally speaking, it is well to match a subject vocationally with work that is appropriate to his Mf level” (p. 8).
Thirty-seven of the items on Scale 5 were drawn from the original 504 MMPI items; another twenty-three were added sometime during 1940–42, and were part of a set of fifty-five items adapted from “sections 5, 6, and 7 of the Terman and Miles Attitude-Interest Test” (Dahlstrom, Welsh, & Dahlstrom, 1972, p. 5). The twenty-three items did not survive Hathaway’s multiple comparisons and make it onto Scale 5. However, they were retained in the MMPI.
(I guess this is what happened to “door latch”: it was included in the set of questions, but then dropped from the scoring?)
Also, uh,
So, okay, problems with the MBTI:
1. The Jungian type theory on which it’s based predicts that we’ll see bimodal distributions of scores corresponding to each of the four factors. We don’t; instead we see distributions that are pretty much normal.
2. Related to (1): the MBTI has
prettyreally poor test-retest reliability. If you’re in the squishy middle on a couple axes, you might go from an ENFJ to a INFP in a month. (The reliability of the individual scores on each scale is higher, but that doesn’t matter if all you’re reporting is the type.)3. Factor analyses basically never recover the MBTI’s structure. Usually they find that a five-factor model fits the data better, or a model with extra loadings, or something like that. In particular, the S/N and J/P scores are correlated, which screws with things.
I mean, the test definitely seems to measure something. Like, I’m pretty sure INTJs are massively overrepresented in Local Internet Subcultures. But it’s closer to sortinghatchats than to something like the MMPI.
Dahlstrom et al. (1972) said the twenty-three added items were selected because of their “promise in identifying sexual inversion as shown in the studies of Terman and Miles” (p. 201), and this seems to agree with Hathaway’s (1956) general emphasis on the goal of identifying homosexuals. Constantinople (1973) posited that original MMPI items discriminating men and women among the Minnesota Normals (the original normative sample) were added to those derived from the AIAT and then subjected to further analyses. Although this hypothesis makes sense, there is no way to confirm it.
In another step, item responses were compared between a criterion group of “thirteen homosexual invert males”—for whom no demographic information was reported—and “average males” (Hathaway, 1956, p. 110). The thirteen homosexual men appear to have been selected on the basis of their overt effeminacy, which Hathaway and McKinley believed indicated a constitutional factor underlying their homosexuality (Hathaway, 1956). This was in contrast to “pseudo-homosexuals,” who were believed, in the prevailing clinical wisdom of the time, to be heterosexual men who engaged in homosexual behavior because of some form of psychopathology.
Subsequent research on the Mf scale does not seem to have been very productive. Early on there was attempts to use it identify homosexuals, or to correlate it with vocational interests, neither of which were successful. More recently there was an project to rework the MMPI-2 into a “Restructured Form” (MMPI-2-RF) by keeping the questions but developing scoring rules using modern factor analysis; however, the Mf scale was not restructured but simply omitted, presumably because it is generally considered useless.
On the other hand, Martin and Finn develop a set of seven Mf “subscales”—that is, they apply factor analysis to the Mf results, identify 7 principal components, and then select MMPI-2 questions which have loading onto those components. And here the “door latch” question finally makes an appearance! Answering no counts towards “Mf1 Denial of Stereotypic Masculine Interests”. But they now gloss it as “465. Like to fix things”, so it seems like somewhere along the way the wording got updated.
The impression I got from all of this is that the questions are not necessarily all that informative, but that they are kept unchanged because there have been so many decades of research on the answers. Anyway, it’s good to know that the nation’s nuclear reactor operators are carefully vetted.