Install Theme

identicaltomyself:

nostalgebraist:

evolution-is-just-a-theorem:

nostalgebraist:

Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations?  (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)

Hmmm. Convergence (in the sense that a Taylor series converges to the function it represents) is unmetrizable.

How is this handled in calculus of variations?

Also I’m not entirely sure what you’re thinking of as a criterion. Could you give an example of two very close differential equations?

No idea how it’s handled in calculus of variations.  Huh.

Here is an example inspired by the fine-tuning thing.  We have a differential equation with coefficients in front of the terms.  We talk about how “if the coefficients were a tiny bit different, the behavior of the solution would be very different.”  Now we can imagine keeping the coefficients fixed but varying the equation, by adding new small terms (i.e. slightly varying their coefficients starting at zero), or by changing an existing term (change an exponent in a continuous way, say).  Now, for some sort of change in the solution, you can talk – in a casual way at least – about how little you need to change the equation to get a “comparable” or “comparably large” change in the solution.

(We may be talking about sudden phase transition-like changes in the solution, so the changes themselves may not be continuous, but you might have a sense of distance for equations like “it is this much change of exponent away from some given transition”)

For any one equation, this just seems like a mildly amusing game, but could there be any regularities across many equations (or many definitions of “comparable”), so that there might be general facts?

This is almost covered by the theory of stochastic differential equations. That’s the theory of differential equations where you add a random function of time to one side. Usually the random function is “white noise”, technically known as a Wiener process, but you can pick any distribution on the space of functions that you like. The theory of these SDEs is well understood, highly applicable, and moderately beautiful.

Adding a random function of space is also a known thing. Usually people use a Markov random field, which is a generalization of a Markov process to multiple dimensions. That’s usually used to perturb a PDE, but you’re interested in perturbing an ODE. People have done that too. I remember seeing some beautiful visualization of the motion of an electron beam through a potential given by a Markov random field.

It sounds like you’re asking about adding a random function of both space and time. I’m not familiar with SDEs perturbed by a random function of both space and time, but I figure somebody must have thought about it. It seems like a reasonable generalization, now that you point it out.

Either you are misreading me, or I’m confused what this has to do with my question.  I know what SDEs are.  I’m not necessarily interested in ODEs rather than PDEs, in fact the reverse (although either is fine).  And I don’t see how SDEs or their extensions give us a metric or measure on the space of differential equations, or correspond to the sort of thing I wondered about the last paragraph of my second post.

(via identicaltomyself)

evolution-is-just-a-theorem:

nostalgebraist:

Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations?  (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)

Hmmm. Convergence (in the sense that a Taylor series converges to the function it represents) is unmetrizable.

How is this handled in calculus of variations?

Also I’m not entirely sure what you’re thinking of as a criterion. Could you give an example of two very close differential equations?

No idea how it’s handled in calculus of variations.  Huh.

Here is an example inspired by the fine-tuning thing.  We have a differential equation with coefficients in front of the terms.  We talk about how “if the coefficients were a tiny bit different, the behavior of the solution would be very different.”  Now we can imagine keeping the coefficients fixed but varying the equation, by adding new small terms (i.e. slightly varying their coefficients starting at zero), or by changing an existing term (change an exponent in a continuous way, say).  Now, for some sort of change in the solution, you can talk – in a casual way at least – about how little you need to change the equation to get a “comparable” or “comparably large” change in the solution.

(We may be talking about sudden phase transition-like changes in the solution, so the changes themselves may not be continuous, but you might have a sense of distance for equations like “it is this much change of exponent away from some given transition”)

For any one equation, this just seems like a mildly amusing game, but could there be any regularities across many equations (or many definitions of “comparable”), so that there might be general facts?

(via just-evo-now)

Is there any interesting (i.e. with non-trivial properties) way of defining metrics or measures over sets of differential equations?  (Got onto thinking about this bc of the fine-tuning in cosmology thing, and wondering if there is any way to talk about a law [i.e. equation] being more or less fine-tuned, but now I’m just curious in general)

(Note: I think I made this post too long by going on too many digressions.  A short version capturing the main point would probably be better overall, although this at least gave me the chance to ride some entertaining hobby-horses.)


1.

I’ve noticed that several seemingly unrelated frustrations of mine can all be classified as “people should care more about entire probability (or frequency) distributions, rather than summary statistics like averages.”

This is frequently a problem in academic papers.  Many of the problems with that godawful marijuana paper I posted about earlier involved the authors doing complicated things to dredge individual numbers (p < .05, etc.) out of their small sample, when with only n=22, a set of histograms would have been much more informative.  With only 22 people, your statistical power probably isn’t very high, so it’s hard to tell what it means that you can or can’t get p < .05 for something.  But if there’s an effect on a particular metric, we should be able to see it just by plotting a histograms of heavy users on that metric and one of light users.

Indeed, those histograms would contain many other interesting facts.  They would tell you, for instance, whether a given “average” effect was the result of everyone experiencing roughly that effect, or the result of half of the people experiencing no effect while half experience one twice as big, or whatever.  It would let you see differences that wouldn’t show up in a t-test, where a distribution changes shape while still having a similar mean.  It would let you see the difference between statistical and practical significance – you could see when there’s a big effect that the study just doesn’t quite have the power to detect, and when there’s a statistically significant but tiny difference that’s swamped by individual variability.  (You can usually infer the latter from std. devs. if they’re supplied, but this isn’t always possible, and the former is usually invisible.)

You get all of these things for a simple reason: the distribution implicitly contains all of the other information.  Every derived number you see in a statistical paper came from a distribution (or collection of them), and if you knew the distribution, you could re-derive all the numbers.  But it doesn’t go the other way: you can’t re-derive the distribution from the numbers.

(Unless you know the distribution has a parametric form, but this is irrelevant to real data; “these frequency counts could have come from this Gaussian” does not let you reconstruct the original counts, and the original counts contain more information, e.g. information that might lead you to disagree with the assertion about Gaussianity.)

There’s kind of a tradeoff here, since I’m literally talking about exhibiting pictures and determining things by “seeing them in the picture,” which is uncomfortably subjective.  If you wanted to make it more objective, you’d have to come up with numerical proxies for the judgments you’d be making visually, which gets you back to … exactly what I was trying to get away from.

But it isn’t really that stark.  One thing that would vastly improve a lot of papers I read is just including more histograms, even if they also included all the same derived numbers.  Additionally, even if we’re deriving numbers, it’s possible to have an attitude that pays more “due respect” to the distribution.  This is a reason to prefer nonparametric tests, but parametric tests are fine too as long as you can justify using them.  For instance, a lot of standard intuitions (about the interpretation of the mean and tests that rely on it) break down for data that is not unimodal.  But a lot of data is unimodal, so this may not be a problem.  But it’s very rare for authors to just tell me the data is unimodal, even though it’d just take a few words to do so.  (A histogram would also help, and most of the cases where this information is included are ones where it’s included implicitly via histogram.)

I want to focus on this unimodality issue more, because it’s central to the problem.  We have a habit of simplifying a distribution down to a single number, usually a mean; if a second number is included, it’s some measure of spread around the mean.  A bimodal distribution can’t be captured in one number, and a mean-and-spread won’t capture it either.  So implicit in our whole way of talking about results in social and medical science is that everything is unimodal, or else nothing would make sense.  Indeed, this is a problem even for unimodal distributions that are skewed – since the mode, median, and mean are different, the mean (which is typically the number reported) is a poor guide to the “typical” case, either in the sense of “most common” (mode) or “50th percentile” (median).

Here’s an example from another marijuana paper (PDF).  This paper had a really intriguing result – previous studies had shown that cannabinoid receptor availability is suppressed in regular users but bounces back somewhat after 28 days of abstinence, and this study showed that most of the bounce-back (~75%) happens within just 2 days, with the remaining 26 just adding some extra on top.

However, only means are reported, so that this time trajectory (“75% of the 28-day improvement in 2 days, the remaining 25% in 26 days”) may not have occurred in any experimental subject, like the proverbial “average American family” that has 2.5 kids even though literally no one has 2.5 kids.  It could be that some people bounce back completely in 2 days (or fewer) while others improve slowly and linearly; it could be that some people improve fast while others don’t improve at all; it could be that everyone improves exponentially with the same half-life, so people who started lower bounce back faster on a linear scale; it could be that you can only bounce back if you’re above a certain threshold, but if you are then it’s fast.  All of these possibilities have strong and different implications for individual users.

When I was thinking about that other study, I joked with myself that you’d get higher-quality information just from talking to a few stoners.  On reflection, I think this is less of a jokey exaggeration than I realized.  These studies have small samples (the one I just talked about had 11 dependent subjects and 19 controls).  These sample sizes are in kind of a transitional regime between case studies and proper statistical samples.  Because tests with a small sample will have low power, I feel wary of drawing any conclusions from the observed patterns of significance and non-significance (although these are often presented as the main results).  Since I don’t think you can do much with derived quantities (of the sort that usually get derived), I am correspondingly more interested in individual cases.

After all, if nothing else, we have a collection of individual cases here, and the “case study approach” can still be interesting with very few cases, while the statistical approach cannot.  If your sample size is three, and you give me detailed info about all three cases, I have at least learned about 3 things that can happen to human beings.  If the variance is high, all the better: now I know about 3 quite different things that can happen to human beings.  But if your sample size is 3 and you only report the results of statistical tests – which are all going to turn out non-significant, probably – I have learned nothing from you.

So, if you know (or know people who know) 11 stoners, you have 11 (colloquially presented) case studies.  This provides a lot of information, if not about overall trends, then about the sorts of things that can happen in individuals – which, after all, is what all of this (i.e. medical and public health research) is supposed to be about.  I can even start to get a sense of the relative frequencies of distinct subgroups: if 3 of 11 stoners experience Pattern X while the other 8 experience Pattern Y, well, I’d like more data, but that’s already suggestive.

But if you form an averaged time trajectory over the 11 and never give me more detail about the distribution beyond that coarse average, I don’t have any Pattern Xs or Pattern Ys, I just have a mean pattern that may not correspond to anyone’s story.  I have graphs like this (from the paper):

image

Here we have a picture of the Average Stoner During A Tolerance Break, who may not resemble any particular stoner at all, like the average family with 2.5 kids.  Those error bars aren’t quantiles, BTW, they’re SEM, so we don’t even have any skew information here.

Rather than providing us with anything about individual trajectories, the authors concentrate instead on p-values.  Their reasoning – and I hope I am committing a misreading here! – is based on one of those errors they warn you about in Stats 101 classes: they are interpreting non-significance as conveying positive information about the world.  They present it as a big deal that while they got significance between stoners and non-stoners, the result is no longer significant after 2 days of marijuana abstinence:

Compared with HC subjects, [11C]OMAR volume of distribution was 15% lower in CD subjects (effect size Cohen’s d of 1.11) at baseline in almost all brain regions. However, these group differences in CB1R availability were no longer evident after just 2 days of monitored abstinence from cannabis. [my emphasis]

 Of course, the p-values would all slide downwards with a bigger sample, so if anyone does larger studies of this, they will predictably “find” that it takes longer than 2 days to lose significance.

Right next to that figure is another one, with a more appropriate vertical axis, which shows what the authors mean by a “no longer evident” difference: 

image

That’s right: even after 28 days of abstinence, they’d closed less than half the gap between them and non-users.  But since the sample size is small, they could only get p=0.27 for this clearly-there difference.  (Remember, the bars here are SEMs, so the STDs will be a lot bigger.)

This had been a bit of a digression, since I’m not sure these mistakes about null results have much to do with “respecting distributions.”  But I do think I can justifiable use this as another example of “this is your brain on summary statistics.”  Some of these mistakes are probably due to the emphasis on “significant = important” that is ingrained by publication criteria, but it also evinces a willingness to discard a lot of the information in your data.

To provide a Gallant to pair with the Goofus above, here’s yet another weed paper with a small sample.  They provide a lot of fine-grained, bimodality-tolerant, case-study-like detail:

The general trend was a decrease in blood pressure within the first 43 min after onset of smoking, but an initial increase in blood pressure was observed among some participants. Concerning individual mean arterial blood pressure, the largest decreases in mean arterial blood pressure were observed with the high THC dose with drops up to 41% below baseline (from 121 to 71 mmHg). Subjects 2, 23 and 12 showed the greatest decreases in mean arterial blood pressure whilst their THC serum concentration was 34, 213, and 137 μl/L, respectively, at 43, 17 and 7 min after onset of smoking. Subjects 10, 22 and 19 showed limited initial increase in mean arterial blood pressure (up to 37% above baseline, from 87 to 119 mmHg). Mean arterial blood pressure was still below baseline levels 8 h post-smoking for a majority of the participants.

And they even make plots where they just throw together every single participant’s time course:

image

Admittedly these look ugly, and I’m sure there are much nicer ways of presenting the same information.  Still: this is the “ask some stoners” of graphs, and I mean that as high praise.  These graphs can answer many questions you might want to ask, even if the researchers didn’t ask them: what different types of trajectories are possible, the range at any time, where the distribution is peaked and how far away the unusually high/low trajectories are, etc.  Admittedly, you could be asking these questions more rigorously than by eyeballing a figure – but the authors probably aren’t going to answer every such question rigorously, so these pictures (like histograms) provide an indispensable supplement.


2.

Speaking of public health issues, I think I also see the downstream effects of these bad habits on the doctors who consume medical research.  (I would imagine there are similar effects on people who act upon social science research.)

I complained a while back about how I received different responses from different psychiatrists (and my GP) about benzodiazepenes.  It seemed like each doctor had a single opinion about benzos, and didn’t adapt it much to the patient.  The “benzos are bad” doctors would be unmoved when I mentioned I’d been on the same dose for years, even though “people have to keep taking higher and higher doses” is one of the reasons the “benzos are bad” idea exists.  I think there was an element of “cover your ass” here, but it felt like the usual presumption of expertise in doctor-patient relationships was breaking down, as each doctor would refer to a supposed “standard opinion” which happened to concur with their own, clearly non-universal opinion.

Benzos are a case where lack of unimodality is important.  As I mentioned, one reason why doctors are wary of benzos is tolerance, specifically the need to ramp up the dose more and more over time.  It is true that some patients exhibit this pattern when prescribed benzos.  But then, there are those (like me) who don’t. From this article “reappraising” benzos [below, “BZDs”]:

Although there are occasional reports of patients with anxiety disorders who increase the dose of BDZs to continue experiencing the initial anti-anxiety effect or who experience a loss of therapeutic benefit with the continuing treatment with BDZs, a body of evidence shows that the vast majority of patients with anxiety disorders do not have a tendency to increase the dose during long-term treatment with BDZs [30,69–72]. Therefore, tolerance to anxiolytic effects of BDZs usually does not occur in the course of long-term treatment. When patients increase the dose of BDZs, this usually appears in the context of other substance misuse.

This suggests bimodality, or a unimodal distribution not very well represented by its peak.  There is a population subtype that requests increasing doses, and people outside that subtype generally do not.

Likewise, a common concern is the withdrawal syndrome (or equivalently “dependence,” which is “characterized by the symptoms of withdrawal upon abrupt discontinuation and no tolerance”).  But again, this is not universal, and this may be another issue of “population subtypes”:

Withdrawal symptoms occurring after an abrupt cessation of long-term BDZ use are not inevitable; such problems were reported in approximately 40% of individuals taking BDZs regularly [80,81] and they were more likely in people with personality disorders, especially those with passive-dependent personality traits [82,83] (ibid.)

(I myself have abruptly stopped taking BZDs and then not taken them for periods of several weeks, and I’ve never experienced withdrawal symptoms.  I am the 60%.)

We have distinct desired and undesired patterns, so it would seem that the clinician’s task is to think about whether their patient is displaying (or likely to display) the undesired pattern, and act accordingly.  Instead, what we have gotten is a one-size-fits-all concept which says that, “overall,” BZDs can cause worrying tolerance and dependence issues, and so one should treat them warily as a last resort. This means that even when evidence about my own personal BDZ use over 3+ years is available, doctors prefer to consult assessments of BZDs and SSRIs “overall,” throwing away the distribution in favor of a single number.

(I don’t want to play up my own frustrations about this, which are very minor as frustrating medical care goes; I’m using myself as an example solely because it’s a case I’ve read a bit about.) 

(Some of the oddness I am trying to explain is no doubt the result of the pharma industry pushing heavily for the SSRIs, which are newer than BZDs.  For instance, it’s gradually been realized (by the profession – more quickly by patients, one would assume) that SSRIs can have a bitchin’ withdrawal syndrome themselves.)


3.

This problem also appears in many conversations that are not, superficially about statistics.  A familiar example is conversations about attractiveness.  A lot of people talk as though there’s just one scale of attractiveness, which would only be true if the assessors of attractiveness had unimodal (and strongly peaked) preferences.  In concrete terms: if you think that the way for any straight man to become more attractive is to improve him on a single “what women want” metric, you are assuming that straight female preferences are (if not all literally identical) strongly peaked around a single mode, so that boiling the distribution down to a single number is a reasonable approximation.

Everything in my relevant experiences suggests this is false.  There are people who like all sorts of things, and there are things far from the mean/median that are nonetheless interesting to many (perhaps there is another mode at these points).  This does not mean that scales of attractiveness do not exist, but that there are multiple such scales worth considering (at least one per mode), and that climbing the nearest scale is probably better than climbing the mean/median scale.

There is something analogous in political attitudes that assume sociological groups are homogeneous blocks.  This post is already too long, though.

oligopsoneia-deactivated2018051 asked: Of the resources list you reblogged, I suspect that you might get the most out of Shaikh's "Capitalism," though presumably as a longer read - there are higher technical barriers to entry compared to many resources on the list but fewer "cultural" ones, if that makes sense.

nostalgebraist:

nostalgebraist:

Damn this looks really good, thanks

(For those reading this post who don’t want to go back to the earlier one, here’s the PDF)

Hildebrand (1994) suggests that one should leave “preferences and choices [to] … psychiatrists” and focus instead on establishing the statistical conditions under which basic economic patterns such as downward sloping market demand curves can be derived (Dosi, Fagiolo, Aversi, Meacci, and Olivetti 1999, 141). Hildebrand (1994) and Trockel (1984) provide the pioneering work in this regard.

In each of these cases, economic shaping structures create limits and gradients that channel aggregate outcomes: the positive profit survival criterion in the case of the firm, individual economic characteristics in the case of income distribution, and the budget constraint in the case of individual consumer choice. Each of these gives rise to stable aggregate patterns which do not depend on the details of the underlying processes. And precisely because many roads can lead to any particular result, we cannot be content with considering a model valid simply because it yields some observed empirical pattern. Other facets of the model may yield conclusions which are empirically falsifiable, for which the model must also be held responsible. […]

In what follows, I will demonstrate that the major empirical patterns of consumer behavior can be derived from two key shaping structures: a given level of income, which restricts the choices that can be made; and a minimum level of consumption for necessary goods which introduces a crucial nonlinearity. The patterns in question are downward sloping market demand curves, income elasticities of less than one for necessary goods and more than one for luxury goods (Engel’s Law), and aggregate consumption functions that are linear in real income in the short run and include wealth effects in the long run (Keynesian type consumption functions). The analytical derivations will be supplemented by the simulation of four radically different models of individual behavior: (1) a standard neoclassical model of identical hyper-rational consumers in which a representative agent obtains; (2) a model of heterogeneous hyper-rational consumers in which a representative agent does not obtain; (3) a model with diverse consumers in which each one acts whimsically by choosing randomly within the choices afforded by his or her income (this is Becker’s irrational consumer); and (4) a model inspired by Dosi et al. (1999) in which consumers learn from those around them (their social neighborhood) and also develop new preferences (mutate) over time. Despite their differences, all of the models give rise to the very same aggregate patterns. The essential point is that the same macroscopic patterns can obtain from a great variety of individual behaviors. This way of proceeding harks back to an earlier approach initiated, and subsequently abandoned, by Becker (1962).

OK, this is already giving me cartoon heart-eyes, since this is the kind of approach that seems obviously necessary and important to me but to which I had given up on the prospect of finding in actually-existing econ

(Or, more accurately, you can find little sketches of things like this in the work of various “heterodox” economists, but the danger of heterodoxy is that you spend your career poking thorns in the side of orthodoxy rather than getting anywhere yourself; this guy seems to have a big, substantial positive theory, just one that starts out in an unusually promising way)

OTOH, it is frustrating that he uses the word “turbulent” a lot and I can’t find anywhere where he defines it (if he does at all, it is only after having used it many times without definition).  In a less mathematical book, I could accept being asked to reconstruct the sense of a term like this, but this is the sort of book where you’d expect such a term to have a precise definition, and/or a definition close to the established technical one

Part of it is that I have a background in fluid mechanics, so I know all sorts of things that “turbulence” could connote, and have trouble returning to a “colloquial” definition if that’s what’s being used here

proofsaretalk:

shadowpeoplearejerks:

proofsaretalk:

the-irrationals:

overheard conversation between grad students

“…. this is the superpotential cone …. so take your geometric crystal and tropicalize it ….”

confirmed, math is fake

@valiantorange

https://arxiv.org/pdf/1606.06883.pdf


I found it :D

the only thing that i hate more than that you found an entire paper based on a dozen words of a conversation is that i actually understand the majority of the abstract

(via proofsaretalk)

the-moti:

nostalgebraist:

togglesbloggle replied to your post “It’s a bit unsettling to me how much of mathematics is grounded in the…”

Depending on how far you want to stretch ‘spatial’, you could include integers here as well- certainly we first got interested in numbers because they can be used for quantifying physical objects. So I’m not even sure how much number theory and algebra are in separate bins.

Huh, maybe.  When I think about why the definition of a field (like the naturals) includes multiplication but not exponentiation, the first argument that comes to my mind is spatial (grouping things in squares and cubes).

But do you get anything interesting out of the other “hyperoperations” (exponentiation, tetration, etc.), the way you get primes and stuff out of multiplication?  Everything I can find about discrete logs/roots (analogue of factorization for ^ rather than *) is in modular arithmetic.  I’ve never thought about this but I have the feeling there’s some trick that makes all these reduce to factorization, or something.  In which case, just having + and * makes sense.

The reason I am unsettled by these things is that sometimes I hear math characterized as “the study of abstract structures” or something like that, and I always wonder about that – if there are different types of “abstract structure,” do we know about all of them?  Are we grouping them in a natural way?

I try to sit down and think of “mathematical structures,” imagining that I am about to tell someone all about this exciting “study of abstract structures,” and I’m like, “well, there’s my old friend Squishy Space, and of course there’s Unsquishy Space, there’s Space That Holds Stuff, there’s Shapes You Had to Draw on Graph Paper in School, there’s Especially Spacey Space and its Special Hills, um,”

No, mathematics is emphatically not set up to study all abstract structures. Mathematical progress is incremental and we primarily study structures that have interesting relationships to structures we have already studied. We pretty much only study structures where our tools and thinking styles can say something powerful about them, and of course we have a chance to just miss them.

Furthermore, a lot of mathematicians want to actually study things that have relevance to concrete questions about the real world, or at least something closely related to the real world. A lot of the 1) problems about the real world that 2) are hard even when abstracted but 3) have solutions involve physical space.

But because mathematics is closely connected, we can give alternate justifications for these things. I think quite a lot of people who study Especially Spacey space, and almost all people who study its Special Hills, are motivated by trying to understand the actual real world we live in, with all its spacey spaciness.

On the other hand, it’s known that the study of Shapes You Had to Draw on Graph Paper in School is equivalent to a field of algebra (commutative ring theory) by a one-to-one equivalence. Why do people often study it in terms of spaces and not in terms of rings? One reason is because visualizing the space makes your thinking more aesthetically pleasing. Another is that it guides your intuition about how to solve problems. A third is it makes it easier to think up problems that aren’t too easy but aren’t too hard.

A lot of the modern theory of Squishy Spaces has gotten extremely abstract, and is now more like the study of a certain kind of category theory that can be applied to the original concept of Squishy Spaces but also to fields like algebra. Of course this is not how it started - but if I recall correctly it started with very concrete problems and people noticed that a particular kind of spatial structure was relevant to these problems.

Maybe part of it has to do with what we mean by structure? In its usage in the regular world, when people talk about the structure of something, they very often mean its arrangement in physical space. In mathematics we’re not so different.

But let me return to algebra and explain why exponentiation is not so important in modern algebra. I think there are several reasons it is not as nice as addition and multiplication. For instance, from your perspective on primes, observe that most integers are the product of two integers, but almost all integers are not one integer raised to the power of another integers. So would everything but squares, cubes, etc. be primes? That seems a bit silly. Instead, when we want to study powers in number theory, we often do it using the usual prime factorizations - i.e. n is a square if all the exponents of the prime factorization of n are even. Or we use analytic tools.

Especially from an algebraic geometry perspective, the “point” of primes is that they allow us to quotient our ring and obtain a field. So a prime is really a special kind of ideal, and an ideal is an equivalence relation that agrees nicely with addition and multiplication. But there aren’t really any equivalence relations that agree nicely with addition, multiplication, and exponentiation. In the simplest case, if I mod out my numbers by p, I have to mod out my exponents by p-1, so I get two different kinds of numbers. I think the study of exponentiation in the ring Z/p is almost always better served by studying the ring theory of Z/p and Z/(p-1) separately.

So my closing statement: Mathematicians are not interested in abstract structures, but primarily in those that 1) are juicy, in that there are lots of nontrivial but tractable problems about them, and ideally hierarchies of such problems and relationships between them with long chains of simple steps coming to a stunning conclusion, 2) have relevance to solving problems about other mathematical structures, and ideally the physical world, and 3) humans can comprehend, and ideally can comprehend some aspects of without advanced training. There’s no way we’ve explored all of such structures but the only way to find more is to look, as some mathematicians are doing.

Kudos for the interesting response + double kudos for using my silly terminology :)

(Thanks also to the other people who replied to this post)

(via the-moti)

togglesbloggle replied to your post “It’s a bit unsettling to me how much of mathematics is grounded in the…”

Depending on how far you want to stretch ‘spatial’, you could include integers here as well- certainly we first got interested in numbers because they can be used for quantifying physical objects. So I’m not even sure how much number theory and algebra are in separate bins.

Huh, maybe.  When I think about why the definition of a field (like the naturals) includes multiplication but not exponentiation, the first argument that comes to my mind is spatial (grouping things in squares and cubes).

But do you get anything interesting out of the other “hyperoperations” (exponentiation, tetration, etc.), the way you get primes and stuff out of multiplication?  Everything I can find about discrete logs/roots (analogue of factorization for ^ rather than *) is in modular arithmetic.  I’ve never thought about this but I have the feeling there’s some trick that makes all these reduce to factorization, or something.  In which case, just having + and * makes sense.

The reason I am unsettled by these things is that sometimes I hear math characterized as “the study of abstract structures” or something like that, and I always wonder about that – if there are different types of “abstract structure,” do we know about all of them?  Are we grouping them in a natural way?

I try to sit down and think of “mathematical structures,” imagining that I am about to tell someone all about this exciting “study of abstract structures,” and I’m like, “well, there’s my old friend Squishy Space, and of course there’s Unsquishy Space, there’s Space That Holds Stuff, there’s Shapes You Had to Draw on Graph Paper in School, there’s Especially Spacey Space and its Special Hills, um,”

It’s a bit unsettling to me how much of mathematics is grounded in the description of something like physical space.  A lot of large areas of math (topology, geometry, various kinds of analysis) start out with one particular property of physical space (or of an intuitive idea of what physical space is), separate that property from all the others, and then generalize it.  The generalizations can get pretty far from the starting point, but even then it’s a strange way of classifying different abstractions: “which property of space did you abstract from?”

I guess one could ask “what else would we do?”, but there’s algebra and number theory.  So it’s like, we have the math that “starts with space” and then the other math, and each of those groups comprises closer to 50% of the math people think about than 90% or 10% (I have no idea how you would measure such a thing, or even define it, but I hope you see what I am getting at).  Which seems strange to me.

(Then again analysis is very useful in number theory, which I have always found spooky, but I’m too ignorant to know whether I should find it spooky.)

Really, I don’t know enough about the super-abstract parts of math subfields to talk about this sensibly, so I welcome input from the less ignorant + you should take this with a grain of salt.