Install Theme

Nonlinear Dynamics 1: Geometry of Chaos →

the-axiom-of-hope:

what is this course about?

The theory developed here (that you will not find in any other course :) has much in common with (and complements) statistical mechanics and field theory courses; partition functions and transfer operators are applied to computation of observables and spectra of chaotic systems.

Nonlinear dynamics 1: Geometry of chaos (see syllabus)

  • Topology of flows - how to enumerate orbits, Smale horseshoes
  • Dynamics, quantitative - periodic orbits, local stability
  • Role of symmetries in dynamics

Nonlinear dynamics 2: Chaos rules (see syllabus)

  • Transfer operators - statistical distributions in dynamics
  • Spectroscopy of chaotic systems
  • Dynamical zeta functions
  • Dynamical theory of turbulence

The course, which covers the same material and the same exercises as the Georgia Tech course PHYS 7224, is in part an advanced seminar in nonlinear dynamics, aimed at PhD students, postdoctoral fellows and advanced undergraduates in physics, mathematics, chemistry and engineering.

why this course

Most institutions have too few graduate students in any narrow speciality to offer a high level specialized course tailored to them. This Specialized Open Online Course is an experiment in sharing such advanced course with the off-campus research cohort. [A longer blurb]

prerequisites

A basic background in linear algebra, calculus, ordinary differential equations, probability theory, classical and statistical mechanics: ability to work with equations involving vectors and matrices, differentiate simple functions, and understand what a probability distribution is. Weekly homework assignments require both analytic and numerical work, so we will teach you Python as we go along. Knowledge of Matlab or Octave or another programming language is a very helpful. For introductory literature, check the book.

(via the-axiom-of-hope-deactivated20)

unknought:

nostalgebraist:

unknought:

nostalgebraist:

Back when I was an undergrad, I remember becoming convinced at some point that the “Generalized Stokes’ Theorem” was just a dirty trick which involved setting up non-obvious definitions with an eye to getting the curl and divergence theorems as special cases, and then claiming that the consequences of those definitions for higher dimensions were correct.

(Sort of analogous to fitting a line through two points and then claiming that since the line fits the two points, the sample from which the two points were taken must fit along the line.  [Or, substitute almost any curve you like for “line” here.])

But by the time I became mature enough to realize that I must not be the only person in history to have thought of this “brilliant” idea, I didn’t remember enough about differential forms to really decide whether it made sense.  Does anyone know whether this objection has been developed seriously?

If you’re suggesting that the proof of the Generalized Stokes’ Theorem involves a sleight of hand where you prove a few cases and then claim you’ve completed the proof in full generality, this isn’t true. Rigorous proofs of the Generalized Stokes’ Theorem exist.

If you’re suggesting that the only interesting cases of the Generalized Stokes’ Theorem are the divergence and curl theorems, this also isn’t really true. I don’t know much about the physics applications, but in differential geometry you need the theorem in its full generality to get an understanding of de Rham cohomology and connect it to other cohomology theories.

I’m not sure either of those is actually what you’re getting at, though. If it’s something else, could you clarify?

I don’t mean either of those.  What I mean is: imagine you’re a physicist or mathematician sometime around 1800, and “differential geometry” in its modern sense isn’t really something you know about.  But the divergence and curl theorems are known to be useful.

So now you might think, “maybe there’s some broader structure that includes these two things as special cases.  I wonder what it is?”  The question is: would you then be led, uniquely, to differential forms and the Generalized Stokes’ Theorem?  If all you knew was that you wanted to come up with “a generalization” of those two theorems, is that the only nontrivial one you could come up with?

Statements like “you need it to understand de Rham cohomology” aren’t relevant here because if that isn’t the only possible (nontrivial) generalization, then there might be other directions you could go with might get you things that aren’t quite the same as differential forms.  If differential forms are just one of the directions you can go, then “it helps you understand stuff based on differential forms” doesn’t justify going in that particular direction.  (If we’d gone in some other direction, we’d be saying “well, it helps us understand [other thing based on something else]!”)

I hope this is not too unclear.  I’m pretty tired

This exposition by Terence Tao (http://www.math.ucla.edu/~tao/preprints/forms.pdf) makes what I think is a pretty convincing case that trying to generalize the idea of a signed integral leads you naturally to differential forms. They’re not just an algebraic magic trick that happened to be useful in other ways. This doesn’t mean they’re the only natural way to generalize the curl and divergence theorems, though. All I can say for that is that I’ve never heard of any others.

Reblogging this to mention that I finally understood this a little while ago, while reading Lawrie’s Unified Grand Tour of Theoretical Physics.  Specifically, I want to note that I wasn’t really noticing anything deep in the above posts, just not fully understanding differential forms.

What had always bothered me was the (-1)^k in the the product/derivation rule for the exterior derivative d.  This flips a sign when the dimension of the odd but not when it is even, which struck me as the sort of contrivance one might introduce if trying to “fit a curve through” the “data points” given by pre-existing results for the cases k=1 (FTC), k=2 (Kelvin-Stokes theorem), and k=3 (divergence theorem).

I was aware that if you simply excluded the (-1)^k, the output would no longer be a differential form, because differential forms have to be antisymmetric.  But this just pushed my suspicion back to the antisymmetry of differential forms.

However, I had also independently accepted that it made perfect sense (for non-“curve fitting”-related reasons) to make differential forms antisymmetric.  I just never remembered I had done so when I was going through the line of thought just described.

(Differential forms need to be antisymmetric because – as Tao explains very clearly – we are trying to formalize integrals over oriented regions.  To compute a flux, say, you need a surface normal, and if you want to describe the computation by specifying an area element, that element needs to know [implicitly] about the surface normal.  I was an undergrad physics major so this all feels very familiar to me.)

So what was missing from my “curve fitting” argument was that we are fitting a curve through three points with an extra constraint, which is forced on us by other considerations.  This still doesn’t pin down the result uniquely, since (just for the sake of argument) we could define a different exterior derivative with stupidly contrived extra terms that equal zero when k<4.  But that really would be contrived, unlike the (-1)^k thing, which fits our 3 data points and our constraint, doesn’t treat any particular k as special, and enables further concepts that work nicely in arbitrary dimension.  So I’m fully satisfied now.

Anonymous asked: I want to read your book about “the methods of mathematical physics” for a lay/popular audience! No matter how little of it you wrote! I assume I'm not the only one! But even if that's so, please consider posting it - either here or on lapsarianalyst!

I’ll think about it. Unfortunately, almost all of what I actually wrote was a grandiose introduction, which made big claims for how cool the finished book would be, but without enough specifics for it to be very interesting on its own. I think it was well-written in an aesthetic sense, but I’d be a bit embarrassing to post now, given that I never wrote the book and probably never will.

That said, I haven’t even thought about the project at all in, I dunno, maybe a year – so now that I’ve remembered its existence, there’s a chance I’ll pick it up again, perhaps with more modest goals. For reference, the original outline went

Chapter 1: Hooke’s law, “everything is a spring” via linearization of Newton’s 2nd law about equilibria

Chapter 2: Extending Ch. 1 to cover exponential growth/decay in addition to sinusoids, so we have a general solution to 1st order scalar linear ODEs, with introduction of complex variable method

Chapter 3: Linear algebra (so we can solve 1st order linear vector ODEs, and see that they’re just Ch. 2 material after eigendecomposition)

Chapter 4: Fourier series/transform

Chapter 5: 1st order (nonlinear) PDEs, solution via characteristics (might merge this into Ch. 6 or skip it, since the applications are all linear 2nd order and it disrupts the flow)

Chapter 6: 2nd order linear PDEs, classification, solution via Fourier series/transform

Chapter 7: classical E&M as example application of above

Chapter 8: non-relativistic QM as example application of above

I was planning to explain as much as this as possible without equations, and using invented “easy” notation when equations were needed, trying to split the difference between an ordinary pop science book and a real textbook. This proved to be extremely difficult even for the simplest cases, which is why I never got very far.

studyinglogic:

Ulam has much the same sentiment as the man in the last two panels. Rota quotes Ulam (in his book Indiscrete Thoughts, p. 58) as saying:

What makes you so sure that mathematical logic corresponds to the way we think? Look at that bridge over there. It was built following logical principles. Suppose that a contradiction were to be found in set theory. Do you honestly believe that the bridge might then fall down?


My personal view is that the sentiment above is wrong: if 1 and 1 don’t equal 2, then the bridge couldn’t be built in the first place, and civilisation couldn’t get off the ground from the start, since counting couldn’t work. 

What Ulam and the man above are suggesting is that it’s possible for mathematics as we know it to go wrong, and for the world to remain the same. But I cannot agree with that assumption: mathematical truths are (to me) paradigms of necessary truths.


To see which side you’re on, try this thought experiment (not original to me):

Imagine that in this world, whenever people add 2 objects and 2 objects together, a malicious demon always adds 1 object. So two oranges stacked with two oranges become five oranges, and so on. Do you think our mathematical calculations would come out any different? Would we conclude that 2+2=5, or would we still have our normal law of 2+2=4?

In 800+ notes I’m sure someone else has already made the following argument, but:

This thought experiment relies on a confusion about the relationship between math, physical law, and reality.

Before I flesh that out, here is my answer to the thought experiment.  We would still ultimately conclude that 2+2=4, because the demon apparently has a well-defined notion of what counts an object.  For instance, if (as stated) you have to get four oranges together to make the demon do its magic, then the demon considers “an orange” to be an object, but not “half an orange” (since two oranges, put together, do not turn into 2 and a half oranges).  So, in this hypothetical world, the usual counting rules (including 2+2=4) would work except in special cases, and we could discover all of ordinary science by looking at the behavior of every piece of the word that is not a “complete object” according to the demon.

Now if I understand correctly, this is the conclusion OP wants me to draw: 2+2=4 is a necessary truth, so it’s true even in a world where it superficially appears to be false.  But I think this is confused.  We are conflating two very different things:

(1) which statements are true in a given formal system (in this case, some version of arithmetic with the natural numbers)

(2) which truths about reality can be reliably derived using a model based on that formal system

There are plenty of well-defined formal systems that don’t make good models of certain parts of reality, and this has no bearing on the a priori truth value of theorems in those systems.  In F_2, the finite field with 2 elements, 2+2=0 (indeed, 1+1=0).  This doesn’t make a good model of what happens when you put oranges together, but it’s still a true theorem.

Without specifying which physical system we are trying to model, the question “does 2 plus 2 equal 0?” doesn’t even have a well-defined answer.  It does in F_2, it doesn’t in Z.  If we are using addition to model putting oranges together, Z is the right system.  If we’re using addition to model XOR on binary digits, F_2 is the right system.

I think Ulam is referring to a rather different issue.  When we build bridges (etc.), we use some principles of reasoning and calculation that seem to work (i.e. to be good models); additionally, people have tried to axiomatize these principles.  Ulam is saying that some axiomatization may turn out to be bad, and that this may not in itself repudiate the principles, which seems undeniably right to me.  There is more involved in the axiomatizations than is necessary for individual applications like building a bridge, which is why (for instance) we are able to dispute whether to include the Axiom of Choice without physical applications simply settling the matter one way or the other.

(via studyinglogic)

xxxdragonfucker69xxx:

ok listen as a kid i thought science had to be super rigorous 100% exact all the time but it turns out at high levels it is all approximations all the time, theres not really a point to this post i just want to spread awareness of something we have to do

the equation for force on a spring is pretty well known, kx^2 (quadratic, goes with the square of the distance). the graph looks like this

here comes part one of the bullshit: theres a mathematical technique called a taylor expansion where you can take any equation (as long as it doesnt do a few mathematically rude things) and turn it into a bunch of polynomials (a+bx+cx^2+dx^3), which makes things a lot mathematically simpler

part two of the bullshit: if you zoom in real small you can ignore most of the latter terms, so its basically just a x^2 equation

so basically if you draw any goddamn squiggle and zoom in REAL close it looks like a parabola and therefore any REALLY small thing acts like a spring

atoms in a molecule. bumps on a road. electrons in an atom (before you get to the nasty stuff). just now i was thinking about bubble wrap. a weight on a pendulum. probably, like, interpersonal relationships. its all springs. boing

This is one of those physics things I wish more people knew

Back in 2014 I started trying to write a book about “the methods of mathematical physics” for a lay/popular audience, and I didn’t get very far, but this is how it started

image

(via prospitianescapee)

You guys successfully nerdsniped me with this trends-in-happiness stuff, and now I’m trying to back away from the rabbit hole before it pulls me in (I actually downloaded the General Social Survey and started playing with the data! so many variables!).  But here are the most salient things I’ve learned, for people curious about what this research means:


1. The paper that originally got me nerdsniped, “The Paradox of Declining Female Happiness” (Stevenson and Wolfers 2009), used data from the U.S. General Social Survey, so I’ve mostly looked at that.  There are other data sources (see e.g. this interesting response to S&W 2009) that don’t have some of the GSS’ flaws.  But I get the impression that the GSS is pretty popular with researchers.


2.  The most important thing you need to know about the happiness measures on the GSS is that they are extremely coarse-grained.  The survey item which produced the big “paradoxical” result about female happiness was the following question:

‘‘Taken all together, how would you say things are these days – would you say that you are (3) very happy, (2) pretty happy, or (1) not too happy?’’

Those are the only three options.  The GSS does also ask about satisfaction with some specific areas of life, like finances and work (with 4 possible responses), and also asks about whether you have a happy marriage (same exact 3 options as on the general happiness question).

The only observed trend here, then, is increases/decreases in the fraction of respondents occupying each of these three boxes.  Given that fact, I was really impressed with Stevenson and Wolfers 2008 (which I promo’d yesterday), in which the authors claim they can estimate, from just this information, the effects of time and demographic on the mean and variance of an underlying continuous distribution – without assuming the functional form of that distribution, and while simultaneously having to estimate the cutoffs that slice that continuum into the three boxes!  I still have a “sounds fake but okay” reaction to this – I’m surprised the model is identifiable at all, and am kinda concerned about the stability of the estimates.

Technicalities aside, I was really excited about being able to get the variance as well as the mean, because given these 3 boxes, “happiness inequality” seems more morally salient to me than mean/median happiness trends.

Why?  Well, think about the categories.  I honestly am not sure what to make of people opting for “pretty happy” instead of “very happy,” or vice versa.  If I imagine the General Social Survey people knocking on my door at various times in my past, I can imagine myself answering one or the other of those two on the basis of, like, how the past week had gone.  I don’t see myself as aiming, in life, for a state of being that is consistently “very happy” as distinguished from “pretty happy.”  Indeed, part of me reflexively bristles at the (callous?) indifference to outward circumstances that I imagine such a state would require!

On the other hand, the times in my life when I would have answered “not too happy” (the lowest possible option) are sharply distinguished from the others, and encompass some states of misery which I would very much like to prevent in others.

So, insofar as any “overall trend” here would mix together these two distinctions, it’s hard to interpret.  But a decrease in variance, toward a mean that is at least somewhere in the middle, implies that we are raising people up from the “not too happy” box – which is all I care about.

Hence I was encouraged to hear that variance on this question has declined greatly, across and especially within groups, to the point of swamping the mean shift.


3.  That still isn’t the full story though.  Because remarkably few people use the lowest category.  Either people are far happier than I (and the conventional wisdom) would imagine, or they are putting on an artificially happy face for the researchers.

Here are the male and female trend lines for the “not too happy” response (from the online data explorer, check it out):

image

You’ll note that they line up very closely, which is interesting.  But also, they’re consistently between 10% and 20%.  Apparently the remaining 80% of the U.S. population has been either “pretty happy” or “very happy” for the past 4 and a half decades!  A golden age!

I first noticed this when I was working with the data offline and drilling down into a specific category – I think it was “married women who report their marriages are ‘not too happy’” (n.b. this is from the marital happiness question, not the general one).  And I noticed that suddenly everything was really noisy, because my sample sizes were as small as 20-40 people per year.  (For marital happiness this phenomenon is even more extreme – it’s more like 5% of women who say “not too happy,” with a full 60-70% reporting “very happy.”)

We appear to be studying, and fretting over, the slight variations in bliss level of a mostly blissed-out populace.  Since this does not resemble the actual country I live in, something must have gone wrong with our measuring apparatus.

Deep down, my mind seems to believe that true reality is continuous and discrete ontologies are just fake plastic toys we come up with because they’re sometimes easy to think about

And this has, unconsciously, influenced my choice of focus in physics/math stuff – even though it seems like a very suspect assumption, and even though it ultimately stems from a non-rational feeling that, like, we humans should be ~too sinful~ to grasp true reality and thus it can’t involve discrete elements, since a sufficiently small set of discrete elements can “fit in the human mind all at once with nothing left out”

(and even a structure that is too large to grasp can be divided into such pieces)

Anonymous asked: Just wanted to let you know I spent way too much time replying to your comment on Marginal Revolution concerning Logical Induction. I don't expect a reply, but hope at least somebody will read it. Should spend less time defending other people's papers. Best, Lee Wang.

Funnily enough, I saw this message just after returning to tumblr from MR, because I wanted to repost here the reply I wrote to you!  So, your time was definitely not wasted from my perspective, since it led me to think about an aspect of the topic I had not thought about before.

My reply:

Thanks for the reply, Lee. I agree with you that assigning probabilities to theorems *at all* is a nontrivial problem, and insofar as the LI paper moves forward our understanding of this problem, that is good.

However, I am not convinced that the paper ever gives us a satisfying “assignment of probabilities to theorems.” What it gives us is two things — the limiting probabilities P_{\infty}, and the finite-time “probabilities” P_n. The former has some pleasant properties like assigning 0 < P < 1 to sentences independent of the axioms, but is essentially irrelevant to the original motivation of “how do we reason about things we can’t prove yet?”, since it is obtained “at the end” after every possible deduction has been made. (So it can make use of every possible proof, and the only thing it does above and beyond deduction is to put numbers on independent sentences.)

On the other hand, the finite-time P_n do not even have to form a probability measure at any time, and we cannot use them to do the sorts of things we would like do with probabilities, like decision theory. For instance, the authors define expectation values at time n but only prove that they behave well in the limit, and indeed the definition at time n involves an arbitrary choice which the authors justify because it washes out in the limit (see p. 40). Of course, any way that P_n fails to be a probability measure will disappear in the limit, because the limit is P_{\infty}; but if we let ourselves wait for P_{\infty} we forfeit any ability to reason probabilistically about theorems before they are proven. If we want to do that, we have to use the finite-time P_n, but in fact we cannot reason probabilistically with these at all.

notgrantpeters asked: Curious: what resources are you using to study Bayes-as-practiced? I've been putting it off for years (ever since Wasserman's All of Nonparametric Statistics tantalized me by excluding all of Nonparametric Bayes) but, especially if you'll be writing about it, now seems like a good time to learn

I haven’t looked into it in any organized way yet, so mostly random papers / blog posts / Wikipedia.  I’ll probably look into Gelman’s book sometime.

For nonparametric Bayes, I’ve just been reading random tutorial articles on Gaussian and Dirichlet processes (of which there are zillions).  Also, after @somervta mentioned David Duvenaud recently in another context, I’ve been looking into his research, esp. his PhD thesis (available on that page) about fancy things you can do with Gaussian processes by automatically building their kernels.

I’ve also been reading about variational autoencoders, which are a nice point of overlap between neural net stuff (finding a good low-dimensional encoding of a signal) and Bayes.  This post was helpful, although I don’t like some of the expositional choices there.

(The upshot is basically: in the Bayes perspective, you have some latent variable model where you’re willing to assume a distribution for the latent variables, but you don’t know the function that maps from them to the observed variables, so you have to do something fancy to learn that function while not knowing, for any particular data point, what value of the latent variables was actually realized.  In the neural net perspective, this is like training an autoencoder, where the “assumed distribution for the latent variables” appears as a regularizer encouraging your learned encoding to spread the training data out in a nice uniform way in the lower-dimensional space)

brief ignorant notes on bayesian methods

I have written a lot on this tumblr about (mostly against) “strong Bayesianism” or “Jaynesianism,” but I have mostly been silent about the pros and cons of Bayesian methods as they are actually practiced.  This is, honestly, because I don’t know much about Bayesian methods as they are actually practiced, although I am trying to learn more.

Back when I wrote that Bayes masterpost, @raginrayguns​ rightly took me to task for ignoring “hedging” as a virtue of Bayesian modeling.  Something that stands out to me when I read about Bayes-in-practice is that hedging is seen as extremely important – indeed, often as the whole point of the exercise.

This is quite different from the Jaynesian perspective, where both prior and posterior are representations of real beliefs, and hence it is important to get the prior “right” (through MaxEnt or something).  In practical Bayesian work, the prior is treated more as a way to do model averaging; what matters is not whether it philosophically “reflects our beliefs in the absence of evidence” but whether it leads to averaging over models in a way we like.

You have probably seen it before, but that Gelman/Shalizi paper is relevant here – says you should do hypotheco-deductivism with Bayesian models, where both the model class and the prior are falsifiable hypotheses.

One very intuitive (to me) justification for model averaging is automatic quantification of variance (and its consequences).  If you just fit one “best” model, you can happily chug along making predictions with it, but you ought to worry about how much each of these predictions would have varied if you had fitted the model on slightly different data (with different noise, say).  Since a Bayesian method effectively uses every model in the model class and averages over them, it perhaps captures this variability?  I am used to seeing this done with the bootstrap, which directly generates “different data”; there is supposedly a connection between the bootstrap and the Bayesian thing (which uses only the real data but still uses multiple models), but I don’t fully understand it yet.

A superficially obvious “gotcha” argument goes like this: “even if some averaging is being done under the hood, the Bayesian model still just outputs conditional probabilities, like any probabilistic model.  Thus ‘Bayesian averaging over model class C’ produces a single model for each training data set, and is thus choosing a single ‘best’ model from some other model class (call it C-prime).  One could then argue that it would be better to average models from C-prime according to some prior, obtaining C-prime-prime, and so on ad infinitum.”

I haven’t really worked that through and I don’t know whether it truly makes sense.  It also seems misleading in that it dismisses “averaging under the hood” as though this is a mere computational choice and can’t be discerned from the resulting conditional probabilities, but that doesn’t seem like it’s true.  Except for special cases (involving Gaussianity/linearity), I have a hard time thinking of apparently non-Bayesian methods that can be re-written as Bayesian averages in a nontrivial way.  (Random forests might be Monte Carlo sampling from trees according to likelihood? not sure.)

This suggests that there may be special features conferred by the Bayesian averaging process which can be read off of the results even if you didn’t know there was averaging under the hood, but if so, I don’t know what they are (or how to look for info on this).

In a machine learning context, Bayesian methods (relative to others) feel less solidly rooted in Breiman’s “algorithmic modeling" culture – like they still have one foot in the “data modeling” culture.  There is a great deal of focus on technical methods for sampling from ~*~*the posterior*~*~, with the implication that it is clearly this great amazing thing and we are justified in going to great lengths to approximately compute it.  This is a bit confusing to me since the posterior is just a combination of a model class and a prior, and the prior is often just some computationally convenient distribution (Gaussian, Dirichlet), so it seems like we’re working very hard to compute something whose definition we chose for our own convenience rather than its optimality.

Discussions of the Dirichlet process, for instance, often start out with talk of “adaptively choosing the number of clusters” – leading me to say “great, so what’s the best way to do that?” – and then jump into discussions of the Chinese restaurant process without telling me why the clusters should be generated in this way rather than any other.

(Actually, if someone can point me to a justification of the Dirichlet distribution that isn’t “it’s a conjugate prior, which is computationally convenient,” that would be helpful)