Install Theme

hot-gay-rationalist:

nostalgebraist:

hot-gay-rationalist:

nostalgebraist:

hot-gay-rationalist:

nostalgebraist:

raginrayguns:

nostalgebraist:

“Where do Bayesians get their numbers from anyway,” installment (n+1)

[cut cut]

[snup]

[snop]

[snip]

[snrp]

[snep]

I think it’s because the uniform Ap distribution does not represent a state of “full uncertainty.” It in fact represents the statement “I think any probability is just as plausible as any other one,” and for most possible hypotheses, that is not a good description of most agents’ state of uncertainty about them! And once again we run into the problem of conflicting intuitions because I think that states of really total uncertainty are the ones that need to obey the rules of probability the most because they’re the easiest to fuck up.

And this problem is just isomorphic to the “finding priors” problem, which is the greatest weakness of the method. The Solomonoff solution, priors proportional to the negative exponential of the hypothesis’ complexity, is one that also appeals to me intuitively and mathematically - the basic argument that adding one bit makes twice as many hypotheses available and therefore should cut probability in half looks very good to me.

But there is no such thing as total uncertainty. Even in the ice cream case, you know something about the human population in the present and how it’s likely to change in the next 25 years, you know something about the prevalence of ice cream, etc - and even if you were completely ignorant, I hardly think you’d believe that “it’s as likely as not that that proposition is true,” or that literally any probability assignment is as justifiable as any other. The uniform Ap distribution is not special nor does it represent full uncertainty, because there is no such thing as full uncertainty, and because different amounts of information are represented by different Ap distributions (another example Jaynes uses is that of the existence of life in Mars at some point in the past, whose Ap distribution for himself he claims he’d describe as something like the Haldane prior).

—EDIT to add:

But also, as you and raginrayguns pointed out, it’s in practice impossible to really use the same heuristic rule about assigning more “reasonable” plausibilities to things when the thing is murky enough, as humans. But… all of my argumentation isn’t supposed to apply to humans :P I’m not concerned with what we can actually accomplish in the physical universe, I’m concerned with what’s the optimal way of reasoning, what’s the golden standard, hyper ideal, the Platonic Form of reasoning. Whether we can implement that and what we can do when we can’t is a completely different topic. But if I can determine that Bayes is indeed the ideal perfect unachievable golden standard of reasoning, then I can move on to see what approximations I can make and when and how I should deviate from the strict uncomputable solution.

OK, first of all a response to the last point: this conversation started with me objecting to some statements by a human, which are still floating up there above all the [snip]s and [snop]s.

As I’ve been interpreting it, this conversation has been about human approximations and not ideal reasoning, and to the extent that you’re only talking about ideal reasoning, you aren’t addressing the original question, which was “is the behavior of Robin Hanson (who is a human, not a JaynesBot) justified here?  Why or why not?”

Second: ultimately, I think the conflict here comes down to this question:

“Is it right to ‘add’ information we don’t feel like we know in order to make our representation of our uncertainty obey the probability axioms?”

It seems like our intuitions are diametrically opposed here.  I actively think this extra information shouldn’t be added, while you think that rationality demands that we add such information, and refer to not doing so as “fucking up.”

I’m going to give two examples of what I mean by “adding information.”  Note that I’m using “information” in an informal sense, not in the Shannon  sense or anything.  And also that, as always, I’m an amateur in these things and I wouldn’t ever want to imply that I’m the first to have ever thought of these objections – merely that I don’t know what the expert responses to them look like (though they presumably exist).

A thing I was getting at with my “conjunction” points yesterday is that if you know a probability distribution, you know information about the dependence structure of the events in it.  However, sometimes I don’t feel like I know the dependence structure of a set of “murky” events.  Representing my uncertainty as a distribution requires me to choose one, and this feels wrong.

For instance, let’s look at your example of the machine with two lights.  In that example there was no problem because the dependence structure was given (we know P(blue|red) = 0 and vice versa).

But now suppose I have the same machine, just as mysterious, except now I know any combination of the lights would be possible.  The possibilities are now {neither, red only, blue only, both}.  We could split this into various events like “red” which would be the set {red only, both}.  (I’ll give the event {both} the name “red&blue.”)

Knowing nothing about the machine, I don’t know what the dependence structure is.  Maybe the two lights are independent like two flipped coins, and P(red&blue) = P(red)*P(blue).  Or maybe they have some kind of dependence: maybe only “red only” and “blue only” are possible, or maybe only “neither” and “both” are possible.

What should my A_p distributions be here?  They can’t be uniform for each outcome because there are 4 outcomes and the means have to sum to 1, not 2.  (They still should have support over the whole interval [0,1] because maybe the machine just does blue every time or w/e.)  There’s probably a MaxEnt answer here?

In any case, whatever answer I choose, it will imply a dependence structure.  If I have a probability distribution over the space {neither, red only, blue only, both} then I can compute things like P(red|blue).

But actually I feel totally uncertain about those things, which I was not informed of at the outset.  They are “extra information,” and the idea that an object involving this information is the “right” representation of my state of uncertainty seems strange to me.

Is there any one dependence structure here that “correctly” represents my total lack of knowledge about the dependence structure of the machine’s behavior?  It feels counter-intuitive that there would be, though maybe there is.  Maybe I’m missing something here?

Here’s a second example of “extra information.”

Famously, if all you’re given are a mean and a variance, the MaxEnt prob. dist. on R is a Gaussian.

In going from “mean mu, variance sigma^2” to “N(mu,sigma^2)” I become able to compute many new things.  For instance, I can now compute any moment of this distribution.  I could tell you its fourth moment, say (and it would be finite).

However, the information provided is also consistent with other distributions, such as the Student’s t with nu > 2 (technically, a non-standardized Student’s t).  However, the Student’s t does not have defined nth moment for n >= nu.  So, the information provided is consistent with a Student’s t with nu = 3, but the fourth moment of that distribution is undefined (in the sense of being “infinite,” i.e. the integral diverges to +infinity).

So, suppose you come along and say, “Rob, there’s a distribution with mean mu and variance sigma^2.”  And I think, okay, there's some nonzero chance it’s a Student’s t.  After all, that’s a distribution that comes up in real life.

Now you ask “okay, Rob, what’s its fourth moment?”  And if I were a MaxEnt machine I’d happily spit out the fourth moment of N(mu,sigma^2).  But I, Rob, know that Student’s t is out there in the world, and that its fourth moment is infinite!  How do I incorporate this knowledge into a probability distribution?  If I have, say, some distribution over distributions in which the relevant Student’s t has a nonzero probability epsilon of being the right one, even if epsilon is tiny, the “expected value” of the fourth moment will still be infinite (infinity * epsilon = infinity), and I’ll spit out “infinity,” not “fourth moment of N(mu,sigma^2).”

So it seems like doing MaxEnt makes me forget that Student’s t is a possibility, in that it leads me to draw conclusions that seem unreasonable unless I think a Student’s t is actually impossible.  The information “mean mu, variance sigma^2” doesn’t just fail to tell me the fourth moment; it fails to tell me that it’s even finite.  The state of uncertainty I’m in, having received that information, seems very poorly represented by a Gaussian.

Again, I don’t doubt that people have thought about these issues and come up with answers to them; I just don’t know what the answers are.  And in sum, the disagreement here seems to come down to the question of whether states of uncertainty should or shouldn’t be “altered” to give me a prior that acts like a probability distribution.

(via hot-queer-rationalist-deactivat)

It seems like the natural way to follow up on the probabilism concerns I was mentioning yesterday would be to look into Dempster-Shafer theory, which seems to have been developed on the basis of similar concerns.

Of course, even reading the Wikipedia page on Dempster-Shafer theory reveals that it has its own problems, which other people have then tried to resolve using their own modifications (or otherwise).

This is all very complicated, but hopefully I’ll understand it someday.

hot-gay-rationalist:

nostalgebraist:

hot-gay-rationalist:

nostalgebraist:

raginrayguns:

nostalgebraist:

“Where do Bayesians get their numbers from anyway,” installment (n+1)

[cut cut]

[snup]

[snop]

[snip]

tl;dr my subjective feelings about very inferentially distant propositions don’t feel like subjective plausibilities, and I get the sense that this is true for most people.  The way Bayesians quote numbers seems strange to many people, not just because it is unfamiliar, but because it seems to conflate the “sure uncertainty” one has about a fair coin with the “unsure uncertainty” one has about inferentially distant events.

About the difference in feeling, that’s indeed true, and in fact Jaynes has a whole chapter about this subjective difference and I wrote a post about it.

But I’m fairly certain that I wouldn’t say 50% to a question like that, because… well, I don’t know, full uncertainty doesn’t feel like “it could go either way” to me? And that’s part of the thing in the link, probabilities aren’t “brute numbers,” they have distributions. I’m also uncertain about my uncertainty, and if someone asked me that, I’d probably say something like “40% with a very very very wide tail.”

(By the way, I did read the whole thing, I just wanted to save space.)

It seems like you’re one of “nature’s plausibilists” — someone whose mind just naturally assigns a subjective plausibility to every proposition.  Which is pretty cool, don’t get me wrong, but my hunch is that this is not a common trait.  And the ultimate justifications for plausibilism are intuition-based, as ultimate justifications in philosophy tend to be.

Hmmm… yeah maybe. Though I don’t feel like those justifications are really intuition based, they’re more like “what I’d want to reason like”? I don’t know, maybe I’m projecting, or I’m just happy to have found my qualia represented in Bayesianism, but the Cox Axioms look like what I would want to reason like, even if I didn’t in fact reason like that - and of course I don’t really reason like that, I’m a biased human, and my reasoning deviates predictably from perfect Bayes, but it still looks like I’d want to be Bayesian and that whenever I reason in a way that’s inconsistent with that I’d feel bad about it. I do feel bad about it.

(Last point: I think there’s an even more fundamental state of uncertainty one can have, which is being uncertain about whether a proposition even describes a state of the territory at all.  For instance, if you asked me whether I thought Max Tegmark’s “Mathematical Universe Hypothesis” was true, I would feel a fundamental uncertainty caused in part by the fact that I’m not even sure yet what it would mean for it to be “true,” or whether that’s a meaningful question.  That is, uncertainty about whether or not a proposition is vacuous is a second kind of uncertainty that I don’t think can be captured well with plausibilities.  I have no idea if the Mathematical Universe Hypothesis is “plausible”; I don’t even know if it can be true or false, and will have to do more thinking to resolve that question.)

Well, here I think we sort of shrug and just go with logic? In logic, a meaningless sentence is always false (has no model, describes no possible world), so I think you can give that some form of probability. Maybe.

Though that’s in fact an Open Problem in FAI, in the same order of “how to reason about hypercomputers?” So this is a part where I confess complete epistemic confusion and say that this is an unsolved problem and that it may well be the point where Bayesianism unravels completely and we find out that we need some othermore universal form of reasoning to be our unattainable golden standard.

But I don’t know that Bayesianism can’t solve that, and I don’t know that simply giving potentially-meaningless propositions a probability will give me headaches. I haven’t yet read this paper but maybe there’s some potential stuff there.

I understand that such an A_p distribution can be constructed, and I guess then what I’d say is “for most futurological and other highly speculative propositions, my A_p distribution is nearly uniform.”

You can then take the first moment of all these uniform distributions and get 0.5 out and play games with the 0.5s, but this will at best just be a way of reflecting that I have no clue about the answers to any of these questions.  Saying “I have a subjective plausibility and it corresponds to probability 0.5” seems misleading; I don’t have a subjective plausibility.

Anyway, if we try to represent this state with uniform A_p distributions, doesn’t that run into the conjunction problem I mentioned?  "P and Q" should be (in the general case) less likely than P or Q alone, but supposing any of these are sufficiently far from my experience, I simply feel in a state of complete uncertainty about them.  So if you first asked me about “P and Q” I would give you “0.5” or “A_p is uniform” or whatever, but I would also have said that if you had presented me with P or Q alone.  (Taken literally, this set of judgments would seem to imply that all the events I’m totally uncertain about are really the same event, i.e. imply each other with probability 1, which is something I certainly don’t believe!)

For the above reason the A_p formalism seems like an awkward way to express the state of really total uncertainty I feel about many things; it still assumes my states of total uncertainty can play by the rules of probability when they can’t.  I don’t feel like they should, either – that would imply more knowledge than I really have, viz. some sense of which events are relatively big in the probability space and which are relatively small (to resolve the conjunction issue).

I guess what I’m looking for here is the mental state “more research is needed” – like the mental state I was in when I first learned about the “P=NP?” question but before I had learned that most people thought P != NP, and knew that if I really wanted to have a subjective sense of how plausible P=NP was, I should look up what experts thought about it.  I don’t think this can be captured by A_p because of the conjunction problem I mentioned, though maybe I’ve gotten that wrong.

(via hot-queer-rationalist-deactivat)

distract me from my emotions, scott aaronson!

I like this post about “bullet-dodgers vs. bullet-swallowers.”

I feel like I am a bullet-dodger, but once who has an emotional prejudice against knee-jerk bullet-dodging.  I think that for the most part, simple and general arguments have not worked very well over the course of human history, and their few fantastic successes (e.g. in fundamental physics) have been very misleading about how all other cases should be treated.

On the other hand, I feel (emotionally) that the rarity of simple and general arguments that work makes them precious; I want them to exist, and feel that it is a miracle and a wonder when they do.  I get mad at people who simply say “oh, it can’t be that simple, life is never that simple” – I want them to tell me specifically why it is not that simple in this case.  Because it is sometimes, in very magical and uncommon moments, it really is that simple, and we should hold out hope that we may again be blessed.

I’m constantly amazed that Newtonian mechanics, the basic laws of motion, turned out to be as simple as they are.  Or (classical) electromagnetism, the whole of which looks like this:

image

You can fit it on a t-shirt (and people have).  It takes less space on the page than it probably did to describe your favorite fictional character’s hair.  The bullet was bitten and the biters were right.  It didn’t have to be like this; it really didn’t.  The basic laws of motion could instead have looked like biochemistry (very large full-size version here):

image

But they didn’t.

Sometimes, very rarely, biting the bullet works.  Celebrate these moments.  Hold out hope for them.

Over a lifetime, cortex performs a vast number of different cognitive actions, mostly dependent on experience. Previously it has not been known how such capabilities can be reconciled, even in principle, with the known resource constraints on cortex, such as low connectivity and low average synaptic strength. Here we describe neural circuits and associated algorithms that respect the brain’s most basic resource constraints and support the execution of high numbers of cognitive actions when presented with natural inputs. Our circuits simultaneously support a suite of four basic kinds of task, each requiring some circuit modification: hierarchical memory formation, pairwise association, supervised memorization, and inductive learning of threshold functions. The capacity of our circuits is established by experiments in which sequences of several thousand such actions are simulated by computer and the circuits created tested for subsequent efficacy. Our underlying theory is apparently the only biologically plausible systems-level theory of learning and memory in cortex for which such a demonstration has been performed, and we argue that no general theory of information processing in the brain can be considered viable without such a demonstration.

This paper looks cool as fuck

(Leslie Valiant’s work tends to be cool as fuck, as a rule)

[0904.1556] John Baez & John Huerta - The Algebra of Grand Unified Theories →

thesummerofmark:

adjoint-triple:

The Standard Model of particle physics may seem complicated and arbitrary, but it has hidden patterns that are revealed by the relationship between three “grand unified theories”: theories that unify forces and particles by extending the Standard Model symmetry group U(1) x SU(2) x SU(3) to a larger group. These three theories are Georgi and Glashow’s SU(5) theory, Georgi’s theory based on the group Spin(10), and the Pati-Salam model based on the group SU(2) x SU(2) x SU(4). In this expository account for mathematicians, we explain only the portion of these theories that involves finite-dimensional group representations. This allows us to reduce the prerequisites to a bare minimum while still giving a taste of the profound puzzles that physicists are struggling to solve.

Where has this paper been all my life? I finally feel like I have a solid grasp of how the standard model is organized.

(via eka-mark)

I love doing math because when I’m wrong I’m just wrong.  There’s no uncertainty, no worries about whether math is just tricking me by casting a rhetorical spell over me or trying to making me feel guilty, no wondering if there’s a “nicer” version of math I should be spending my time with instead of this jerk.  It just tells me I’m wrong and I know it’s right.  It’s so simple.  It is impossible for math to be either merciful or manipulative.

I just spent five minutes being confused because I forgot log(x) < 0 if 0 < x < 1

This is just downright shameful

Cool plot from the paper linked in the previous post

Cool plot from the paper linked in the previous post

nostalgebraist asked: If the present uniquely determines the future, then specifying the present implies a specific future. You can't specify the present and the future at the same time, because the specified future might not be the one determined by the specified present. On the other hand, you can put walls around something without any paradoxes. But specifying the present and the future is like "putting walls" around something, except in time rather than in space. So time is not just like a fourth space dimension.

ghostdunk:

blurds:

I think this is a more elaborate logical justification than the conclusion requires!   Just trying to conceive of something called “the present” in spacial terms is enough to give me the heebie-jeebies.

I just finished reading Julian Barbour’s “The End of Time” and it was a pretty great read and inspired me to go back and relearn some physics stuff. This is pretty typical of the stuff he addresses in that book.

My serious physicist friend says Barbour’s kind of philosophizing is as useless as string theory until you can show how it improves or better explains anything.

(yes im just now making my way through Neal Stephenson’s reading list from Anathem)

I haven’t read anything by Barbour — thanks for pointing out the connection.

The argument I made in the ask is an attempt to say non-mathematically something I learned when I took a Partial Differential Equations class: that in some cases you can look at an equation that came from physics and tell whether it was supposed to be about “space and time” or just “space,” by looking at whether it “lets you put walls around things” so to speak.  This is a property of the equation itself, as an equation, so you can do this even if no one tells you that it came from physics, or that some of the coordinates in it are “supposed to be” time rather than space (or any other quantity, like “number of trees” or something).

(The “space only” equations are called “elliptic PDEs,” and there are two kinds of space-and-time equations called “parabolic PDEs” and “hyperbolic PDEs”; roughly, the difference is that for parabolic PDEs time has a definite direction.  There are also “ultrahyperbolic PDEs” which can describe worlds with more than one time dimension — see e.g. here where the author talks about how conscious observing beings probably wouldn’t be able to exist in a world that didn’t have exactly 3 space dimensions and 1 time dimension)

I guess what I find cool about these ideas is that they show that there’s a basic difference between space and time conceptually, one that is more basic than anything having to do with classical physics vs. relativity