
pde and algebraic geometry are already deeply connected, this isn’t new or scandalous—see mumford’s lectures on theta-functions for example
Sure, but I’m talking more in a social sense – those aspects of PDEs are usually not of any interest to people dealing with PDEs in an applied context, but the inverse scattering transform is (sometimes)
E.g. usually in stuff like physical oceanography you just talk about Fourier modes and how a nonlinearity will “couple together” the Fourier modes and cause them to exchange energy. The inverse scattering transform provides a set of modes specialized for your nonlinear PDE, and this has oceanographic applications, so the bridge is between algebraic geometry and actual applications
Math shipping: I wonder if the inverse scattering transform could be seen as some sort of scandalous liaison between extremely applied math (Fourier transform, nonlinear wave equations) and extremely abstract/forbidding math (algebraic geometry)?

yOU SEE THEY KNOW THEY ALL KNow I’m not the only one
I need a ship name for the combinatorics-probability theory estranged-child-or-student-comes-back-to-their-parent-or-master headcannon, something along the line of ‘the Isbell duality’ (that’s the geometryXalgebra shipname) but for the relationship between probability and combinatorics.
oh no
it’s happening
math shipping
(via sometheoryofsampling)
1. Core objection: People do not seem to “have” prior probabilities, even approximately. Real-world levels of belief in propositions aren’t well represented by numbers obeying the probability axioms. Being a Bayesian in the real world means trying to do something your mind is not well-suited for. Whether or not this approximation produces good reasoning is a thing that needs to be investigated, and my hunch is that often it doesn’t (see here).
1b. Even if we could have prior probabilities, I’m not sure this would be the right thing to do.
2. Shakier and less important objection: Dutch Book arguments don’t convince me that conditionalization (Bayesian updating) is right. Other arguments (e.g. Jaynes’ version of Cox’s theorem) may do the job but I am confused about why the Dutch Book is so famous if there are better arguments out there.
3. Rhetorical objection: it isn’t a part of “Bayesianism” per se, but something that often comes along for the ride is a focus on updating as the core of “rational thought,” with “prior construction” as this magical thing that happens once at the start and is not often thought about. This is misleading since real thinking (e.g. in the history of science) is usually all about coming up with new ideas, that is (in Bayesian terms) expanding your prior to things that were not in its support to begin with (hence this can’t be framed as an update because you didn’t have a prior probability for the idea before you came up with it). To me a theory of “rational thought” has to involve ideas about how to efficiently search through the space of possible explanations and “Bayesianism = rationality” ignores this issue.1. *nods* This is true, and I agree with the linked post. And I think I have never personally uttered the phrase “I have updated in the direction of [your position],” but I have thought that I ought to do stuff along these lines. My usual (internal) response to these situations is “try to remember a list of the actual evidence instead of some number” and it works to varying but on average fairly good levels of success.
1b. Could you elaborate?
2. Yeah, totes agree here, I is confuse.
3. I agree that this is a problem that is often neglected/glossed over, but I disagree that it is necessarily equivalent to expanding your prior’s support to things that hadn’t previously been in it (though intuitively it seems like that is indeed a way of putting it). I wrote a thing that’s tangential to this and I think closer to what I’d call “normatively correct” in which logical omniscience isn’t assumed (it’s consistent with Jaynes’ proof, in any case).
In such a case, computations (like proving a theorem and thus finding out that e.g. ‘A → B’) can and should be used as evidence, and so the proposition “I gave hypothesis H2 some thought.” could have a meaning and a meaningful impact on your knowledge. So instead of changing the support of your priors, you leave a sentence for “all hypotheses I haven’t thought of” and thinking about some new hypothesis can meaningfully drastically shift your probabilities.
Plus I’m pretty sure this is quite similar to what happens in real life. For example, most physicists know that Quantum Theory is incomplete and incorrect, and if they were to somehow meaningfully ascribe probabilities to their beliefs, P(“Some hypothesis I haven’t thought of”|X) would be a pretty large number. So there is some sense, I think, in reasoning about hypotheses you don’t know yet.
(Incidentally, ever since I wrote that post I’ve had a tab with this paper open waiting for me to read it. Sssiiigghhhh.)
——-
So I think we pretty much agree about almost everything here? Although it seems you don’t think that, even if it were possible, we ought to emulate Bayesian reasoning, and I don’t know why that’d be because I find this intuitively very appealing.
And elaborating on this emulation: while I agree that it’s not actually possible to be even reasonably close to a perfect Bayesian agent most of the time, there are still some insights that I think are useful and that come from it (though not necessarily exclusively from it), which I’ve listed here (this is a somewhat old post, I am going to rewrite it eventually). And by the way, those insights are not shared by a very large part of the people I’ve met, and even an idea as simple as that of quantitative reasoning is a fairly rare phenomenon.
Okay, for now I’m just going to try to address 1b, which I admit was totally mysterious as I stated it.
As usual, I’m taking cues from Cosma Shalizi, specifically the blog post “Bayes < Darwin-Wallace.”
So, since we’re considering Bayes as a normative ideal, let’s imagine we’re creatures that could assign “plausibilities” (which would then, by some standard argument, have to be probabilities) to hypotheses, without approximation. We could be Jayesbots. But we could also do other things. What should we do?
Well, one simple alternative is the following: you start with a hypothesis space H, but no plausibilities. You wait for data to come in. Then, at any time, you simply “believe in” whichever hypothesis has the maximum likelihood of producing the data you saw.
This has a certain philosophical appeal: it’s just going with the “best” hypothesis in a certain sense. It also has a certain mathematical appeal: as Shalizi says, it can be seen as choosing the “point of closest approach” to the data in H, the closest H gets to touching/including what you saw.
The downside of doing this is that it’s vulnerable to sampling fluctuations. Your chosen hypothesis will bounce around a lot as data comes in and you keep choosing slightly odd hypotheses that over-fit the noise in the data. (Think about trying to determine the probability of H and T with a coin — maximum likelihood will just select the rates you’ve seen in the sample, which may not settle down to the true values until you have a lot of flips.) This is a practical, not a philosophical downside.
Now Bayes is nicer than this, practically. Your “belief” if you use Bayes — in the sense of “the thing you’d use to make expected utility calculations and thus decisions” — is a sort of weighted average over every point in H. You believe a little bit in every point in H, just in some more than others. Practically speaking, this is nice because it means you’ll over-fit less.
I keep saying the words “philosophical” and “practical.” What I’m getting at here is that both Bayes and maximum likelihood have strong intuitions behind them that say “this is just the right thing to do,” which is what I mean by “philosophical” reasons for using them. (In the case of objective Bayes, there are even more such intuitions.) But if you get to pick which method you use — you could be a Jaynesbot, but you could also be something else — then you can also ask “which of these behaves better in practice”? This is what I’m concerned about here.
A certain sort of person might say “maximum likelihood just feels so right to me that I don’t care that it over-fits” — this person would have philosophical reasons that override all practical concerns. Likewise, one could feel the same about Bayes. I can’t argue with the personal choices of either of these people. But suppose that the philosophical intuitions are less than infinitely compelling, and we’re open to practical considerations, too.
Well, in this case, Bayes did better than maximum likelihood. The way in which it did better can be interpreted in terms of the “bias/variance tradeoff” — it decreased variance by adding bias. (Your results are less variable across different realizations of the process you’re looking at — less variance — but at the cost of everything being biased in the direction of looking like your prior.) But this trade-off is a familiar issue in statistics, and people have ideas about how to do it “right” — to find the sweet spot between bias and variance. And it’s not clear that Bayes, even objective Bayes, gets to the sweet spot. It is essentially one approach among many to the balance of bias and variance. (Maximum likelihood is an extreme case: all variance, no bias.)
Can it be shown that Bayes somehow gets the trade-off uniquely right? I don’t know this stuff well enough to know. Shalizi seems skeptical, though his reasoning is not spelled out very explicitly. In any case, these are the sort of concerns behind point 1b. If we see Bayes as a method for fitting data, then it must be compared to other possible methods that may not have anything like a prior, and it’s not clear that it’s the best one in terms of practical performance.
(Again, if one finds philosophical arguments for Bayes infinitely compelling, then this is all irrelevant. I don’t — in particular, the idea that one should have a prior before one has seen any data seems if anything a bit counter-intuitive to me. Objective Bayesianism, where one picks the prior uniquely by acting maximally ignorant of everything but what one knows, seems a bit more appealing, but it also seems like a way of minimizing the weirdness of having a prior before you see any data, when you could also just not have one at all. It might turn out to be the case that objective Bayesianism makes for methods with a nice bias/variance tradeoff, but that’s in the realm of practicality. In the realm of philosophy/intuition, objective Bayes feels to me like a weird half-measure — accept the weirdness of having a prior before you see any data, but then try to minimize that weirdness, rather than rejecting such a prior outright.)
I didn’t see anything wrong in the post (which, for others’ reference, here).
I guess you could be wrong in your criticism of Harry’s statement about “Turing computability,” but only because I have no idea what I means by that statement.
Like, it seems obvious that physicists talk about GR possibly allowing closed timelike curves and this doesn’t make the equations of GR any less capable of being approximated on a Turing machine, so the idea “a computer couldn’t simulate this” seems clearly wrong? (There might be more than one self-consistent possibility for what happens on the CTC, but at worst that would be indeterminacy that means you’d need more initial data to uniquely solve the equations, which isn’t really a computability issue?)
On the other hand, maybe he’s assuming that the simulation is running in polynomial time (in some reasonable sense) in the external-to-simulation universe, and that by doing too much time travel you could slow it down a lot by forcing it to do NP computations in order to find a self-consistent state (the famous “you can do NP-complete computations with a time machine” thing)? But why would that be a problem? Even if you did something that made it took a million times longer (in the external universe) to compute more of the internal universe, you wouldn’t be able to notice that from the internal universe. (EY is a huge fan of Permutation City so this should be a familiar point to him)
I can’t think of any possibly meaning for Harry’s statement that makes it true, but I’m so unsure about what it’s supposed to mean that I don’t feel like i can confidently declare it false.
[snip]
[snip]
[snip]
Or to put it another way: “well P = NP if you make time irrelevant”. Which is makes it meaningless. Because then P = PSPACE = EXP = DECIDABLE. Why stop at NP?
The whole point of the P = NP problem isn’t about how to break the rules to solve problems quickly. It’s that our understanding of complexity theory is so poor that we can’t even show whether there are problems where you can check a solution quickly but cannot solve them quickly, OR if being able to check a solution quickly implies you can solve them quickly.
I don’t think it’s true that introducing CTCs makes PSPACE = EXP = DECIDABLE. Aaronson says that P_CTC = PSPACE in his chapter on time travel. (P_CTC is the class of things you can do in polynomial time, where the relevant extent of time is the length of the CTC.)
CTCs don’t quite make time irrelevant, since the length of the CTC itself is still a variable. You still have to have enough time inside the CTC to do whatever magic trick is guaranteed to be consistent iff the answer came out at the end. In the case of NP, say, we can use the “magic trick” of just “checking answers,” which can be done in polynomial time. In some harder cases (I think?) the length of the CTC has to get non-polynomially longer with the size of the problem, unlike with NP.
I’m not entirely sure why this is true (I’m very sleep-deprived and can’t tell if Aaronson doesn’t really argue for PSPACE being an upper bound for P_CTC, or whether he does and I’m just not noticing it). Here’s the relevant chapter of the lecture notes, which I think is either identical or close to identical to the book chapter based on it.
I didn’t see anything wrong in the post (which, for others’ reference, here).
I guess you could be wrong in your criticism of Harry’s statement about “Turing computability,” but only because I have no idea what I means by that statement.
Like, it seems obvious that physicists talk about GR possibly allowing closed timelike curves and this doesn’t make the equations of GR any less capable of being approximated on a Turing machine, so the idea “a computer couldn’t simulate this” seems clearly wrong? (There might be more than one self-consistent possibility for what happens on the CTC, but at worst that would be indeterminacy that means you’d need more initial data to uniquely solve the equations, which isn’t really a computability issue?)
On the other hand, maybe he’s assuming that the simulation is running in polynomial time (in some reasonable sense) in the external-to-simulation universe, and that by doing too much time travel you could slow it down a lot by forcing it to do NP computations in order to find a self-consistent state (the famous “you can do NP-complete computations with a time machine” thing)? But why would that be a problem? Even if you did something that made it took a million times longer (in the external universe) to compute more of the internal universe, you wouldn’t be able to notice that from the internal universe. (EY is a huge fan of Permutation City so this should be a familiar point to him)
I can’t think of any possibly meaning for Harry’s statement that makes it true, but I’m so unsure about what it’s supposed to mean that I don’t feel like i can confidently declare it false.
[snip]
[snip]
No, the emphasis there was on the word straightforwardly. The most straightforward way of computing a causal universe is, well, in order. Compute thing, then compute consequences of thing, then compute consequences of that thing, and so on.
But that doesn’t work for CTCs, which need to be computed more like “compute all universes that start from these initial conditions, then discard all except the ones that are self-consistent.” You can’t do cause-effect chains when CTCs are involved there. CTCs can only be computed by brute-force, which is not the “standard” way of computing a thing, style of thing.
Okay, I see what you mean now, but this distinction seems kind of trivial to me. There is no “standard way of computing a thing”; there are just various algorithms that, say, approximate various differential equations to one order of accuracy or another.
If GR includes solutions with CTCs and we can approximate them to arbitrary accuracy numerically, then it’s Turing computable. It might not look like quite like an ordinary method for solving PDEs (or whatever), but who cares? We want to solve the problem posed to us, that’s all. I don’t know where this cultural rule about “the most straightforward way of computing a causal universe” is coming from. I don’t live in a world where people have to “compute” various “causal universes” all the time and have a set of standards built up around this; I live in a world described by various equations where people sometimes have to find approximate solutions to those equations using computers, and do it however fits the task.
(Technically it might be the case that sometimes solutions with CTCs exist but the Cauchy problem for that spacetime is not well-posed, i.e. you can’t “predict" what the system is going to do from initial data? I don’t know much from Googling around but it seems like we’re not sure about these kind of things yet. I think if the Cauchy problem was ill-posed, that might mean that there are too many possible CTCs, and the physics needs to be completed with some extra information about how nature picks a unique one? But that’s a problem with the equations, not with your computer’s ability to integrate them.)
I just read a formula that involved a set of indices m_i (e.g. m_1, m_2, etc.), each of which was summed over, and also an index just called m, which was also summed over. At one point there was a coefficient “m_m” meaning “the particular m_i with i equal to m.”
I guess you can, unambiguously, do that. But why would you
“I was observing the motion of a boat which was rapidly drawn along a narrow channel by a pair of horses, when the boat suddenly stopped – not so the mass of water in the channel which it had put in motion; it accumulated round the prow of the vessel in a state of violent agitation, then suddenly leaving it behind, rolled forward with great velocity, assuming the form of a large solitary elevation, a rounded, smooth and well-defined heap of water, which continued its course along the channel apparently without change of form or diminution of speed. I followed it on horseback, and overtook it still rolling on at a rate of some eight or nine miles an hour, preserving its original figure some thirty feet long and a foot to a foot and a half in height. Its height gradually diminished, and after a chase of one or two miles I lost it in the windings of the channel. Such, in the month of August 1834, was my first chance interview with that singular and beautiful phenomenon which I have called the Wave of Translation.”
(John Scott Russell chases a soliton, from Report on Waves, 1838)
Mathematics courses have given the average physics or engineering student a rather warped view of global expansion methods: The coefficients are everything, and values of f(x) at various points are but the poor underclass.