Install Theme

szhmidty:

nostalgebraist:

Shout-out to the Bellman equation, which is cool and useful and also has one of those great derivations that feels like a joke

Could you explain? Wikipedia isn’t revealing the humour.

(@digging-holes-in-the-river​ also asked)

I guess I find it funny because I first read about it in the context of Q-learning, where the goal is to find the best policy (the best action at every time), and at first I was like “what the hell?  instead of learning a policy directly, they’re doing all this work to learn this weird other thing called Q?”

But then the Bellman equation shows why if you know Q (as a function), you immediately know the optimal policy.  And the proof is like a punchline, where you suddenly see why you should care about Q.  As a dialogue:

A, a rash neophyte: “I want to always know which action to take to maximize my (time-integrated discounted) rewards.”

B, ancient and wise: “Ah!  Then you’ll be interested in this magical function I call Q.  It tells you the maximum time-integrated discounted reward you could possibly get, starting from the situation you’re in.”

A, a rash neophyte: “Why would I care about that?  If it tells me ‘you could achieve a time-integrated discounted reward of 104282.3,’ I still won’t know how to get that reward.  The function would just be teasing me!”

B, ancient and wise: “But tell me, do you agree that the maximum time-integrated discounted reward right now equals the maximum reward on the next step, plus the maximum time-integrated discounted reward from all the other steps?”

A, a rash neophyte: “… duh?  Are you trolling me?”

B, ancient and wise: “But if you pull a discount factor out of the second term, it’s just the maximum time-integrated discounted reward at the next state.”

A, a rash neophyte: “… and?”

B, ancient and wise: “We have a name for that.  It’s Q, evaluated at the next state.  So Q_t is just the maximum reward from the next step, plus the discount factor times Q_{t+1}.”

A, a rash neophyte: “Wait, so if I knew how to calculate Q, I could find the best action just by plugging actions and immediate results into the equation?  I don’t have to think about the entire infinite future, just the next step?  Why didn’t you tell me Q was so amazing?”

B, ancient and wise: “I did, young one.  I did.“

(via szhmidty)

Shout-out to the Bellman equation, which is cool and useful and also has one of those great derivations that feels like a joke

wirehead-wannabe replied to your post “If you know undergrad-level physics and want a nice pleasant unified…”

Does “undergrad physics” mean “having taken a couple physics classes in undergrad” or “having majored in physics”?

Probably the latter.  (My first reading, the one where I found it hard to follow, was right after I got an undergrad degree in physics, which can be taken as providing a lower bound on the book’s difficulty)

Really, though, all of the physics is built from the ground up (that’s the point), so the truly necessary background is mathematical.  The book will be probably accessible to a person iff that person has spent some time reading and understanding the sort of texts that contain sentences like this:

image
A cute moment from the Lawrie book

A cute moment from the Lawrie book

If you know undergrad-level physics and want a nice pleasant unified textbook about all the stuff you missed by not taking graduate physics coursework, I strongly recommend “A Unified Grand Tour of Theoretical Physics” by Ian Lawrie

Friendly (but not, you know, too friendly), concise but not too concise, explains and contextualizes things rather than just asserting them (but without the explanatory/contextual material crowding out the content), etc.

I tried reading this book right after undergrad and found it hard going – which was humiliating, as I’d just gotten a physics degree and had imagined I was the intended audience – but now, 7 years later, it’s not hard going at all.  I’m not sure what happened to me in the interim?  Maybe grad school just made me better at reading technical texts of any kind, or maybe I just got better at reading, period.

(Although I haven’t gotten to the gauge theory chapter yet, which is where I quit the first time, so we’ll see)

After reading this handout, you should take a look at your friends’ multivariable calculus books and convince yourself that we really have proven the classical theorems, with the added benefit that our approach to integration avoids the mysticism which surrounds the pseudo-“definitions” of integration over surfaces and curves with “area elements” and “line elements” as in the big thick multivariable calculus books. This is not a point to be dismissed lightly: it is crucial that we have not just created an elaborate machine which spits out theorems that formally look like the classical results. You must convince yourself that the intuition lying behind the classical approach to (trying to) define the integrals on both sides of the classical theorems really is accurately captured by our precise definitions of how to integrate via partitions of unity (keeping in mind that all such sums are finite in the case of compact manifolds). More specially, when our definition of integration of differential forms is combined with the vector calculus translation made possible by the Riemannian metric tensor, then you must convince yourself that the resulting precise definitions of surface integrals, etc. as in our general vector calculus theorems really does give what one intuitively wants to be working with in those multivariable calculus books. If you think about the recipes in those books for actually computing their fancy integrals in terms of local coordinate systems, you’ll see that it really is just our approach to integration in disguise (except that we don’t have any of the mathematical imprecision which is inherent in the obscure “definitions” of those books: such definitions are incapable of providing an adequate foundation to actually prove things in a convincing manner, and that’s why such books never present proofs for the classical theorems at a level of rigor that gets beyond a “plausibility argument”).

dude.  chill out

identicaltomyself:
“ alexyar:
“ gininthecampari:
“tag yourself, I’m topological Joe
”
i’m proving theorems without \infty-categories
”
I’m annihilated by a Morava K-theory. Not by all of them, but being annihilated once is enough.
”

identicaltomyself:

alexyar:

gininthecampari:

tag yourself, I’m topological Joe

i’m proving theorems without \infty-categories

I’m annihilated by a Morava K-theory. Not by all of them, but being annihilated once is enough.

(via identicaltomyself)

nostalgebraist:

Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations?  (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)

I think I failed to communicate what I was going for in this post.  Some rambling, which may or may not help clear it up:

Another way of putting my question is, “what would a ‘space of differential equations’ be like?  Is there a way this concept could be interesting / non-trivial?”

It seems like the big question here is “what do we mean by ‘differential equation’?”  I don’t mean ODE vs. PDE or something; let’s just say they’re ODEs.  So, is “the equation” the set of solutions (or {initial data: solution} pairs)?  Is it the actual string of symbols we write on paper?  Something in between?

The “trivial” option is defining the equation by its solutions, in which case we are just left considering a space of tuples like (v, f), where v is the initial data (a vector) and f is the solution (a function).  Then a “space of ODEs” would just be some banal analysis doodad.

But that doodad wouldn’t look anything like our colloquial notion of “ODE space,” where say we categorize equations into linear or nonlinear, look at eigenvalues of linear equations, etc.  The hope in making a “space of ODEs” would be that some of the structure in this colloquial concept could be formalized, and abstracted away from the symbol strings we write down on paper.  (Thus we might, e.g., be able to construct some formal version of the colloquial notion “add a new term to the equation,” but without actually counting terms in a written equation, where you can often change the number of terms by algebraic manipulations.)

When I asked about measures and metrics in the OP, the idea was that these would be structures on a hypothetical “space of differential equations.”  But the more fundamental question is, “can we make such a space interesting, in the sense of the previous paragraph?“


Here is an example of the kind of process I’m imagining.  Imagine we’ve never heard of linear algebra, but we’ve seen systems of linear equations, and we start thinking about what a “space of systems of linear equations” would look like.

So, we are considering problems Ax = b.  Let’s say A is square and invertible, so we’re only looking at problems with unique solutions.  In the hypothetical, we don’t know linear algebra and are mostly used to thinking about these problems as wholes – i.e. we may have a concept of a matrix “A,” but we think of it as always paired with a vector “b.”

When we ask how to parameterize our space, we might first think about the entries of A and b, but soon we will be clever and realize that all our equations could equivalently be written in “solution form” x = A^{-1} b” (where our concept of “taking a matrix inverse” is “solve a problem”).  In other words, perhaps each problem just is its solution: two problems Ax = b with the same solution are really one problem, and our bad notation just makes it look like two.

This is the “trivial” option.  It gives us a “space of problems” that is straightforward (just the space of possible solutions), but it throws away all of our ideas about what problems are.

On the other hand, we could study the structure of the problem, and then we would realize that there is a lot going on in A if we look at it apart from any specific b (or x).  Then we’d develop linear algebra, which can (in various ways) formalize our intuitions about what makes problems similar or different, and can reveal a lot of structure beyond the mere numbers we write down on the page.


So I guess the analogous thing would be spaces of differential operators?

identicaltomyself:

nostalgebraist:

identicaltomyself:

nostalgebraist:

evolution-is-just-a-theorem:

nostalgebraist:

Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations?  (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)

Hmmm. Convergence (in the sense that a Taylor series converges to the function it represents) is unmetrizable.

How is this handled in calculus of variations?

Also I’m not entirely sure what you’re thinking of as a criterion. Could you give an example of two very close differential equations?

No idea how it’s handled in calculus of variations.  Huh.

Here is an example inspired by the fine-tuning thing.  We have a differential equation with coefficients in front of the terms.  We talk about how “if the coefficients were a tiny bit different, the behavior of the solution would be very different.”  Now we can imagine keeping the coefficients fixed but varying the equation, by adding new small terms (i.e. slightly varying their coefficients starting at zero), or by changing an existing term (change an exponent in a continuous way, say).  Now, for some sort of change in the solution, you can talk – in a casual way at least – about how little you need to change the equation to get a “comparable” or “comparably large” change in the solution.

(We may be talking about sudden phase transition-like changes in the solution, so the changes themselves may not be continuous, but you might have a sense of distance for equations like “it is this much change of exponent away from some given transition”)

For any one equation, this just seems like a mildly amusing game, but could there be any regularities across many equations (or many definitions of “comparable”), so that there might be general facts?

This is almost covered by the theory of stochastic differential equations. That’s the theory of differential equations where you add a random function of time to one side. Usually the random function is “white noise”, technically known as a Wiener process, but you can pick any distribution on the space of functions that you like. The theory of these SDEs is well understood, highly applicable, and moderately beautiful.

Adding a random function of space is also a known thing. Usually people use a Markov random field, which is a generalization of a Markov process to multiple dimensions. That’s usually used to perturb a PDE, but you’re interested in perturbing an ODE. People have done that too. I remember seeing some beautiful visualization of the motion of an electron beam through a potential given by a Markov random field.

It sounds like you’re asking about adding a random function of both space and time. I’m not familiar with SDEs perturbed by a random function of both space and time, but I figure somebody must have thought about it. It seems like a reasonable generalization, now that you point it out.

Either you are misreading me, or I’m confused what this has to do with my question.  I know what SDEs are.  I’m not necessarily interested in ODEs rather than PDEs, in fact the reverse (although either is fine).  And I don’t see how SDEs or their extensions give us a metric or measure on the space of differential equations, or correspond to the sort of thing I wondered about the last paragraph of my second post.

Well, I’m confused why you’re confused. Darn this low-bandwidth no-context medium!

It seems to me that SDEs are exactly a probability distribution over differential equations, so there’s your measure that you were looking for. And the theory of SDEs tries to answer your questions like “what happens when we add small terms to this equation” or “is there a phase-transition-like change in the solution”, so it seems relared to what you’re thinking about.

SDE solutions are stochastic processes, i.e. probability distributions over trajectories, but in SDE solutions almost none of the trajectories could have been produced by a non-stochastic differential equation, because the set of  everywhere-differentiable functions has Wiener measure zero.  So I don’t see how this yields any useful measure over equations that aren’t stochastic to begin with.

… although now I realize you mention that “the stochastic term can be any distribution over the space of functions that you like.”  When I first read this, I was imagining that (like the usual Wiener process) our distribution would not depend on the solution, so (again like the usual Wiener process)  it’d be a “forcing term.”  But do you mean we can have things like dX_t = dW(x)_t where dW(x)_t is some stochastic term dependent in some complicated way on X?  I’ve never heard of this; to me “SDEs” always meant white noise, the Ito vs. Stratonovich thing, and not much else.

(via identicaltomyself)

the-moti:

nostalgebraist:

evolution-is-just-a-theorem:

nostalgebraist:

Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations?  (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)

Hmmm. Convergence (in the sense that a Taylor series converges to the function it represents) is unmetrizable.

How is this handled in calculus of variations?

Also I’m not entirely sure what you’re thinking of as a criterion. Could you give an example of two very close differential equations?

No idea how it’s handled in calculus of variations.  Huh.

Here is an example inspired by the fine-tuning thing.  We have a differential equation with coefficients in front of the terms.  We talk about how “if the coefficients were a tiny bit different, the behavior of the solution would be very different.”  Now we can imagine keeping the coefficients fixed but varying the equation, by adding new small terms (i.e. slightly varying their coefficients starting at zero), or by changing an existing term (change an exponent in a continuous way, say).  Now, for some sort of change in the solution, you can talk – in a casual way at least – about how little you need to change the equation to get a “comparable” or “comparably large” change in the solution.

(We may be talking about sudden phase transition-like changes in the solution, so the changes themselves may not be continuous, but you might have a sense of distance for equations like “it is this much change of exponent away from some given transition”)

For any one equation, this just seems like a mildly amusing game, but could there be any regularities across many equations (or many definitions of “comparable”), so that there might be general facts?

So the main fundamental difficulty in defining this is that, in certain regimes of the parameter space, a “tiny” change in one term can completely dominate, For instance if we have a linear equation, and we add a very tiny nonlinear term, we get completely new behavior for large values, and can get new behavior for small values over long time periods. 

Let’s first examine what happens when this difficulty is removed. Suppose that our differential equation describes the evolution of a point on a compact manifold. Then we can just view it as a vector field on that manifold. If we put a Riemannian metric on that manifold, then we have a natural metric on the space of vector fields given by the L^p norm of the difference. To prove a theorem of the form “close vector fields have similar behavior for short times”, it looks like we also need some control on the derivatives of at least one of the two vector fields.

What if the vector fields are on two different spaces? If we have metrics on both spaces, and they are diffeomorphic, we can measure a minimum over all  diffeomorphisms of some measure of the distortion of the metric + some measure of the distortion of the vector field.

However, I don’t see a reasonable measure of difference for vector fields on metricless smooth manifolds or vector fields on different spaces. 

Passing now to the more standard O.D.E.s in some number of real variables, we can view them as vector fields on R^n, and then the obvious thing to do is integrate the difference against some rapidly decaying function. Then the difference would only be finite for vector fields that don’t grow too rapidly. Ideally, this function represents some kind of probability of our physical system being in different states (which must somehow be independent of the laws of physics???). 

Thanks, I really like this.  One thing that’s interesting about the vector field perspective is that in cases where I think of “a small change in the equation having a big effect on the solutions,” the corresponding change in the vector field isn’t small.  I realize that’s something of a tautology – if the solutions change a lot, then by necessity their tangent vectors change a lot – but it means vector fields are a “naming scheme” for equations that varies continuously with the sets of solutions, unlike the “naming scheme” of writing down equations in symbols.

(Simple example: when I see x’ = k*x, I think of this as having either “growing solutions” or “decaying solutions,” and switching at k=0, which make crossing zero sound like a small change that makes a big difference.  But the vector field for any nonzero k gets arbitrarily big for large enough x, so the differences as we approach zero will start to blow up in the L^p norm.

Another example: in singular perturbation problems like ϵx’’ + x’ + F(x) = 0, where the highest derivative term is multiplied by an ϵ, if you write it as a vector field (x’, x’’) on the phase space (x, x’), and retain both dimensions of the phase space even when ϵ=0 [where the problem is first order and this is normally superfluous], the expression for x’’ completely changes as we move from ϵ=0 to ϵ>0.  Again, this is as it has to be, since we know the solutions change and that means the vector field has to, but it’s nice to have the “similar equations, different solutions!” feeling just … dissolve.)

(via the-moti)