Install Theme

To follow up on the last few posts: I think some people idolize physics too must, not just as an exact science, but as a sort of practice of pure reason and using clear thinking to unravel the universe etc.

From all my experiences in the field (and in related fields, because former physicists go on to do a lot of other things), the very strong impression I get is that what makes you really really good at (theoretical) physics is having the right sort of brain in a very particular way, which does not necessarily make you good at “thinking” in general

Some of it is almost this finely tuned sense of illogic, an ability to tell when being strictly logically correct would not be necessary.  There are plenty of physics arguments – like the ones I mentioned in the previous post – that don’t really make sense when you think about them, and I think part of being a “good physicist” is being able to distinguish these “bad arguments that work for some reason” from the kind of bad arguments that don’t work

Rather than thinking of physicists as the Ultimate Smartpeople (cf. Feynman worship, people asking Stephen Hawking for his opinions about everything under the sun, etc.) it’s probably better to see physicists as stereotypical wizards or witches, who have some kind of special talent but aren’t necessarily the best people to ask about anything else, because they rely more on a kind of mystical capacity than the common sense the rest of us have to use to get by

Reading about statistical mechanics is always frustrating because many of the concepts involved are very philosophically subtle.  If you really want to wrap your head around why they work, you have to go to a lot of trouble.

Often, physicists don’t want to go to that kind of trouble – they just want to get to get around to deriving stuff.  As a result, you encounter a lot of very strange “arguments” that seem to just be someone’s half-assed way of justifying to themselves a thing they that already know “works.”

The second law of thermodynamics, for instance, is often said to “assign an arrow of time” to otherwise timeless physics.  Entropy increases in the forward time direction, and decreases in the backwards direction.  Sometimes you see glib “explanations” of this fact which give you an example like, say, a box that is half filled with gas and half filled with vacuum, and point out that if the gas is made up of particles bouncing around with random initial velocities, the gas as a whole will tend to “diffuse” and fill up the whole box, reaching a higher-entropy state.

The funny thing about this “argument” is that if the velocities are truly random (with no directional bias), then it shouldn’t matter if you reverse them all; after all, the “reversed” set of velocities might just as easily have been picked (by the random velocity-selector) as the original “unreversed” ones.  But reversing the velocities is the same thing as reversing time, so with truly random velocities we’d expect the gas to diffuse backward in time as well – i.e., we’d expect the velocities we picked to be the lowest entropy state achieved by the gas, which had higher-entropy in the past and will have higher entropy in the future.  That’s not what the second law says!

Or consider this argument found in a text about large deviations and entropy.  To motivate things like the Gibbs distribution, the author gives the example of an unfair die (some faces are more likely than others).  With a fair die the average value of many rolls should be about 3.5, because that’s the average of the numbers from 1 to 6.  But suppose you know that your unfair die has some other average, like 4.  How do you determine the most likely probabilities of the various faces?

This question obviously (?) has no real answer; there are many different ways one can contrive a die to have such an average, and any estimate of which is “most likely” would surely has to depend on some model for the process by which loaded dice are made.  (E.g. one would expect very different loaded dice to be made for playing a game in which higher numbers are better than those in which even numbers are better, even if the average might be the same in both cases!)

Instead of answering the unanswerable question, the author instead proceeds to assume that the die is fair, but that we live in one of the unlikely possible worlds in which after many, many die rolls, the average has not converged to 3.5.  He then assumes (why?) that the average is in some interval (which of course does not contain 3.5), and computes the most likely empirical frequencies after many rolls conditioned on the fact that we are in this very unlikely world.

This is a computation you can do if you want to, and it resembles the computation you do when you derive the Gibbs distribution, which is why he’s doing it.  It has nothing, however, to do with loaded dice.  (The derivation of the Gibbs distribution, by the way, generates some strange non-justifications of its own … )

When I read things like this I wonder if some practical joke is being played on me, or whether there is some subtle, brilliant line of sense to the argument that I’m just not capable of grasping.  I keep wondering how people who are capable of being so extraordinarily precise about mathematical notation and the subtleties of proofs can make arguments that seem to fail on such a basic intuitive level.  I feel like the authors of texts like these must have minds very different from mine.

As has been said by others we statisticians have a lot of difficulties trying to `sell our wares’ to physicists. Physicists are renowned for their arrogance. They believe there is nothing much to know about statistics and that they can easily invent it for themselves if necessary. And one has to admit they are dammed clever and for instance do mathematical calculations with more ease and speed and originality than most mathematicians. This is certainly some cause for us to adopt some humility when dealing with them. The sheer amount of things they know is amazing and the finely tuned intuition about physical reality with which they home in on the right answer despite getting logico-mathematical arguments usually wrong is amazing too.

(Richard Gill on teaching statistics to physicists, quoted by Cosma Shalizi here)

(Posted partly for the amusing meanness but largely for the reminder that even within “mathy” academic subjects there are a number of different thinking styles, and that being good at these subjects is not the same as just being good at logic)

mathematica:
“ that-angle-of-refraction-though:
“ The Meissner Tetrahedron - a solid of constant diameter
”
Meissner body
”

mathematica:

that-angle-of-refraction-though:

The Meissner Tetrahedron - a solid of constant diameter

Meissner body

(via raginrayguns)

eccentric-nucleus replied to your post: In general, the results were highly no…

is this about bubble configurations

Yes!

It’s from this paper which is highly entertaining even when I don’t understand it, which is most of the time

raginrayguns asked: thanks for the maxent paper -- i want to add, though, that in bringing up entropy, I was saying that we often see maximum entropy /frequency/ distributions. Which is questionable for a different set of reasons. I don't really understand entropy, but I do tend to expect it to go up when a bunch of things are evolving in ways that aren't strongly correlated with each other. I do know that entropy tends to go up when you spread out a distribution like with a butter knife, so that kinda makes sense

What do you mean by “frequency distributions”?  I don’t think I’ve heard that phrase before.  Do you mean empirical distributions?

I don’t really understand entropy either, but then I’m not sure anyone does.  The issue I was trying to call attention to is that physicists (and related scientists) use the term “entropy” in a lot of different ways, and talk about “entropy maximization” in lot of different contexts, and in my experience a lot of this is very folkloric and intuitive and not really very well understood even by the people who talk about it.

In particular, there’s this supposed connection between

  1. maximizing Shannon entropy to get the Gibbs distribution in stat mech (a very well-established technique/ritual), and
  2. the idea of “maximum entropy” as some sort of general principle of inference, or general tendency of “systems” broadly (or vaguely) construed

E.g. sometimes people who are doing #1 will invoke #2 as the reason it works.  But the connection is actually pretty murky and there are people who think the “general principle of inference” stuff is pretty much BS and #1 works for different reasons.  (It can certainly be justified without use of #2.)  Cosma Shalizi has a page here expressing this viewpoint.

I can’t claim to understand this stuff especially well, but as far as I understand, the distinction between #1 and #2 comes down to the meaning of the “constraints,” e.g. the “given mean and variance” that gives you a Gaussian when you maximize entropy.  In #1, the constraint is an actual, physical thing: a closed system has a fixed energy.  In particular, as Shalizi says at the end of that page I linked:

If we draw a very large sample from a uniform distribution, and throw out all the samples which do not have certain average values, then with exponentially-large probability, the empirical distribution of the remaining samples will be very close to the one which maximizes the Gibbs-Shannon entropy under the constraints.

This is exactly what you want to do in the relevant places in stat mech: the particles do not have any tendencies to be anywhere in particular in state space, except that as a whole they must satisfy certain constraints.  (There’s a pretty readable discussion of this in section 2 of these notes — section 2.5 has the derivation of the Gibbs distribution, specifically in the context of a system interacting with a heat bath [what Shalizi calls an “environment”].)

All of this is real physical fact: particles in the relevant systems really do have to obey the energy constraints, and they really do behave uniformly otherwise.  (The physics explains why we have to constrain the energy to be some constant E and not, say, the reciprocal of the energy to be 1/E, which would give a different distribution — in the heat bath case, the constraint you really have is E_1 + E_2 = E, and that is not the same statement as 1/E_1 + 1/E_2 = 1/E.)

The further this stuff gets away from specific physics, I think, the more you have to be careful about what philosophical assumptions are being made and whether the whole thing is a good idea.  In particular, I’m not sure MaxEnt is a good way to deal with any system that (in some intuitive sense) seems to be “patternless”; there are a lot of different ways to be random, and as Shalizi says, not all probability distributions observed in nature look like the kinds you get out of MaxEnt.  If there is a physical/theoretical reason to think that something really is “uniform but with constraints,” as is the case in stat mech, then MaxEnt will work.  If you just know nothing about a process except that it has a certain mean and variance (or some other such functionals), I think what you need is more theory and more observation, not MaxEnt.

(Part of the physics folklore about MaxEnt is that it gives you the “least biased” distribution consistent with the constraints, which is a whole other thing I don’t really get.)

Anyway, this post was too long and too rambling, and probably recounted to you some things you already knew, but I hope it was worthwhile in some way … 

ryanandmath:

How to solve a quartic polynomial (i.e. good luck)

(Source: plus.google.com, via thededekindadafunction-deactiva)

what is bayesianism? we (i) just don’t know

raginrayguns:

nostalgebraist:

hot-gay-rationalist:

somervta:

nostalgebraist:

*snop*

A few points:

  1. *snip*
  2. *snop*
  3. The justification of synchronic probabilism (i.e. what Cox purportedly does, though not everyone agrees that it actually does so) is the least questionable aspect of all of this to me.  I’m willing to accept intuitively that if I should be assigning a “plausibility” to every proposition, then my “plausibilities” should obey the probability axioms.  What I am less sure of is, first, that I should be assigning plausibilities, and second, that I should update these plausibilities by conditionalization (the “Bayesian update”).
  4. *snip*

regarding obvoiusness. It’s obvious to me that, if I’ve got plausibilities for everything, and I accept Cox’s theorem, then I’m going to update my beliefs by conditionalization. When I read Cox’s theorem, I was reading plausibility(A|B) as how plausible A would be if I knew B to be true. And in fact I don’t think you could prove the theorem if that wasn’t what it meant.

So, I say it’s obvious to me, but of course it’s possible that I’m wrong. It’s possible that I read Cox’s theorem, in Jaynes’s book, and missed some subtle unjustified equivocation of plausibility(A|B) with how plausible A would be if I knew B. But it’s hard for me to understand how you could read a proof of Cox’s theorem and not get the impression that that’s what it means.

IMO Jaynes is not especially clear on this particular issue.  When he develops probability theory, he talks about a hypothetical “robot” which he refers to as “reasoning” and “deciding” (note the diachronic language).  This means that all of his results seem like diachronic ones, but only if you accept that the postulates he is using are diachronic ones themselves.  But Cox’s Theorem is often stated and proven, by others, using only synchronic postulates and synchronic conclusions.

Jaynes develops what he calls “the Cox theorems” in Chapter 2 of his book Probability Theory, using this somewhat ambiguous language.  Later on, in Chapter 4, when he starts talking about hypothesis testing, he makes the point that conditionalization is not the obvious way to respond to scientific evidence, and this this may even be true for those of us who have read Ch. 2 (or the equivalent) and absorbed Cox:

image

(His statement “In a sampling context […]” is the same point I made in the original post when I talked about pulling things out of a box vs. testing scientific hypotheses.)

So I think the reason Jaynes is not too clear on synchronic vs. diachronic in Ch. 2 is that he already believes in conditionalization, and hopes you will soon enough.  But it isn’t the point of Ch. 2 to convince you of it.

Indeed Jaynes goes on to argue in Ch. 4 (and it looks like elsewhere throughout the book, though I’ve only read the first few chapters) that “sampling theory” (the subject of Ch. 3) and “hypothesis testing” (the subject of Ch. 4) should really behave the same way.  This is what is not obvious to me, and I can’t find a place in Jaynes’ book where he provides an argument that convinces me of it, as opposed to an assertion that such an argument will appear, or that the conclusion will be evident once one understands things well enough.

Now, is it possible that even if Jaynes doesn’t think his Ch. 2 justifies conditionalization, it does nonetheless, if you interpret his postulates as diachronic ones, and deem them to be intuitive?  (This is what you did, if I’m reading you right.)  Certainly it’s possible.  I’ve been sitting here for a little while trying to think up an argument for why the hypotheses that lead to Cox’s theorem aren’t intuitive in diachronic form, and I can’t think of one.

TBH my only argument here is an external one, not an internal one.  If the “diachronic Cox” argument, so to speak, really did get you conditionalization, then it would be a very intuitive argument for conditionalization, and many people would have come upon it before and realized its power.  (If nothing else, they might encounter it so by interpreting Jaynes Ch. 2 as you did!)  You would expect to find it mentioned as one of the standard arguments for conditionalization in the philosophical literature (alongside e.g. Dutch books).  But you don’t – it isn’t mentioned in Earman’s book, in the Stanford Encyclopedia of Philosophy article, in the survey of Bayesianism I linked earlier, etc.  You’d also expect Jaynes himself to realize the power of his own argument, and simply claim that Ch. 2 establishes conditionalization, rather than stating it as an independent “principle,” (4-1) in the screenshot above, which may be non-obvious outside of sampling theory.

In other words, I don’t know where “diachronic Cox” goes wrong, but it would astonish me if it were some sort of perfectly sufficient argument – obviating Dutch Books and Teller’s “Principle P” and all the other arguments people have made – that had somehow slipped under everyone’s nose.  That’s a plausibility argument, not a knock-down proof, but as a Bayesian I’m sure you can appreciate that sometimes the former can be nearly as good as the latter.

(via raginrayguns)

differentialprincess:

why would it be dimensionless as a distribution? if it still physically represents a density then its units need to reflect the thing it is a density over. maybe i don’t understand what you mean

so as a distribution the delta function eats a function and spits out the value of that function at a point, say δ(f) = f(0)

so it doesn’t do anything to the units? maybe. it’s weird

it’s hard for me to reconcile the useful physical picture of a point mass/charge/impulse with the more general distribution picture.

I think the connection is that physically, δ(f) needs to know what f takes as an argument.

If you first think of δ as a limit of distributions of, say, mass over space, then for those distributions, you’d want to integrate them against a function f whose argument refers to points in space.  If you rescale your units, you’d have to rescale the distributions appropriately.  Or if you translated your coordinate system, you’d have to translate the distribution appropriately.

Likewise, after taking the limit, if you translate your coordinate system – or more generally switch to a system y(x) where y(0) != 0 – you need to change δ(f) so it picks out the (physically) correct value of f.

The delta function as a distribution just eats a function and spits out a value, but its origins in the physical picture affect its transformation properties under change of coordinates (which are essentially what we means by “dimensions”/“units”).

One of the issues I’m trying to figure out in this math book project is how to deal with derivatives.  I really want to avoid trying to explain calculus per se, for two reasons:

  1. I think people probably have negative associations with it, either as the math class they never got to or the one they did get to but in which they no longer cared, or just because calculus is kind of a shibboleth that culturally separates the math nerds from everyone else, and I want to vault completely over all that stuff into stuff the average person doesn’t know to feel anxious about because they’ve never heard of it
  2. Most of the details of calculus aren’t relevant to what I’m doing (the relevant derivatives are the easy ones, like exponentials)

So far I’m just referring to first and second derivatives as “speed” (should be velocity but I’m trying to be friendly) and “acceleration,” and relying on people’s intuitions about cars, which I assume are strong.  I think this will work well for time derivatives, but I’m worried what will happen when I get to spatial derivatives.  Simply saying “this is sort of like acceleration, except in space, which means it’s like curvature, trust me” is not very good at all.

The problem is that, although I feel comfortable saying things like “there is a linear operator that takes position to acceleration and here are its eigenfunctions,” I’m apprehensive about saying that that operator is called “differentiation” or talking about “rates of change,” because I have the impression that people associate that stuff with annoying nerds and/or teachers.  Maybe I’m being way too paranoid about this, I dunno.  (I could also be unconsciously responding to my own memories of taking calculus much later than everyone else I knew.)

Basically, I want people to know that “differentiation” produces a “rate of change” without having to know anything in detail about it, and I strongly want to avoid having some sort of little mini calculus primer because I think it would just make people stop reading.  I’m not sure how to make this happen.