Install Theme

Edsger Dijkstra was a seminal computer scientist who had beliefs about computer programming that strike me as very strange and impractical

Roughly, “everything should be about treating programs as formal mathematical objects that can be proven to do what you want, and it is bad to think about them as things that might run on real hardware or even might execute over time – they are timeless static objects about which we write proofs”

Have fun spending years created Formally Verified Photoshop – proven to always do exactly what it’s supposed to, given any sequence of inputs, including those you would never possibly give! – while Adobe spends those same years putting features you actually want in Actual Photoshop

Meanwhile Formally Verified Photoshop might be awfully slow but who cares, computer science isn’t engineering and isn’t about programs doing things over time (????)

The good old “when do you dismiss an expert’s weird belief and when do you keep trying to figure out why it might make sense after all” dilemma

hot-gay-rationalist:

nostalgebraist:

scientiststhesis reblogged your post and added:

Dunno. I think in the mathematical bits I gave…

A few quick notes:

First: I think we’re both talking about the same thing (i.e. when you say “Is that what I did?” the answer is “yes”).

Unless I am missing something here (and I could be), I think there is no difference between the following:

“calculat[ing] the expected utility for all possible distributions and then averag[ing] them according to your prior over these distributions”

and

“calculat[ing] some ‘weighted distribution’ based on that prior and us[ing] that distribution to calculate your expected utility”

These just correspond to two different orders of doing the integral.  The first one corresponds to  ∫ (∫x^4 p(x|H) dx) p(H) dH, and the second one corresponds to ∫ x^4 (∫ p(x|H) p(H) dH) dx.  These should be equal.

The only reason we might not be able to exchange the order of integration would be if the limits of one of the integrals depended on the integration variable in the other.  That’s not true here: the x integral is over all x for any H (all the distributions are distributions over the reals), and the H integral is over all H for any x (for the same reason — no distribution “excludes” any x).

That is incorrect. There is another reason we might not be able to exchange the order of integration: the actual function we’re integrating changes. p(x|H) isn’t a function of x and H, it’s just the function H(x), so to speak. So you can’t in fact choose to do the integration on x first, because (∫x^4 p(x|H) dx) is completely undefined. p(x|H) is a weird mutant mathematical object, and it cannot be properly computed, so there is only one order which makes sense here.

TBH I don’t get this at all.  (I don’t know what you mean by “the function H(x).”  Doesn’t p(x|H) take in two numbers, x and H, and return a probability?  How is that not a function of x and H?)

Let’s look at a simple discrete example.  Say you only have two hypotheses, H1 (90% prior probability) and H2 (10% prior probability).  H1 says the distribution is p1(x), and H2 says it’s p2(x).  In this case P(x|H1) = p1(x) and p(x|H2) = p2(x).  (Do you agree?  That’s what those symbols mean, right?)

Your preferred calculation would give us “weighted distribution” p(x) = 0.9*p1(x) + 0.1*p2(x).  If we now want to calculate the 4th moment, we’d compute ∫ x^4 (0.9*p1(x) + 0.1*p2(x)) dx.

But that splits up into 0.9*(∫ x^4 p1(x) dx) + 0.1*(∫ x^4 p2(x) dx)

which in turn is 0.9*(∫ x^4 p(x|H1) dx) + 0.1*(∫ x^4 p2(x|H2) dx)

which is my preferred calculation.

This was on a discrete space of hypotheses, but that shouldn’t change anything.  We could turn H into a real-valued variable and do the same thing (which would reduce to the above if p(x|H) were piecewise constant with two pieces).

(via hot-queer-rationalist-deactivat)

geometry of RPG world maps

differentialprincess:

if you’ve ever played an RPG, a generic RPG world map is something you’re probably familiar with. you play on a square, but if you go through the top you come out the bottom, and if you go through a side you come out another side. it’s easier to see with an image:

image

the world map is the white region inside the colored boundary square. if you go through the red, you come out the other red (with the same orientation), if you go through the blue, you come out the other blue (again, same orientation). (the four purple marks are all the same point).

this world map is a torus

if you don’t believe that, here’s an animation that folds the above square into a torus

now in a classic RPG like Chrono Trigger, you have two things: 1) the above world map, which is a donut

and 2) an image of the world as a sphere in cutscenes.

these two things are very incompatible! it’s hard to think of ways a donut and a ball are the same thing.

anyways I realized this late last night and I thought it was pretty hilarious

scientiststhesis reblogged your post and added:

Dunno. I think in the mathematical bits I gave…

A few quick notes:

First: I think we’re both talking about the same thing (i.e. when you say “Is that what I did?” the answer is “yes”).

Unless I am missing something here (and I could be), I think there is no difference between the following:

“calculat[ing] the expected utility for all possible distributions and then averag[ing] them according to your prior over these distributions”

and

“calculat[ing] some ‘weighted distribution’ based on that prior and us[ing] that distribution to calculate your expected utility”

These just correspond to two different orders of doing the integral.  The first one corresponds to  ∫ (∫x^4 p(x|H) dx) p(H) dH, and the second one corresponds to ∫ x^4 (∫ p(x|H) p(H) dH) dx.  These should be equal.

The only reason we might not be able to exchange the order of integration would be if the limits of one of the integrals depended on the integration variable in the other.  That’s not true here: the x integral is over all x for any H (all the distributions are distributions over the reals), and the H integral is over all H for any x (for the same reason – no distribution “excludes” any x).

Second: I think we get “swamped” as long as the prior assigns any positive measure to the set of hypotheses with infinite fourth moment.  This is easy to see if you do the integral as ∫ (∫x^4 p(x|H) dx) p(H) dH.  Sometimes the inner integral is +infinity, and if the region in H-space where this happens has positive measure according to p(H), we get +infinity for the whole integral.

The only reason this might not happen is if there were a corresponding region of H-space where the inner integral worked out to -infinity, so the two would cancel.  (In informal, physics-y reasoning that could presumbly be put on a rigorous footing.)  That’s why I chose an even moment here: that can’t happen because it’s impossible for a distribution to have a negative fourth moment.  There’s an asymmetry here: given this utility function, there are no “infinitely dispreferred” distributions out there to cancel the influence of the “infinitely preferred” distributions with infinite 4th moment.

Smooth infinitesimal analysis - Wikipedia, the free encyclopedia →

youzicha:

nostalgebraist:

Someone decided to base a version of nonstandard analysis on … denying the law of the excluded middle?

????

It’s not really related to nonstandard analysis, it’s yet another formalization of the “intuitive” notion of infinitesimals.

But yeah! I think this is one of the best arguments for why constructive logic is cool/maybe-useful.

I guess that’s what I meant by “a version of nonstandard analysis” – it’s kind of awkward that one of these infinitesimal things is specifically called “nonstandard analysis,” yet all of them are analysis that is not standard [insert excluded middle joke here]

Anyway, it’s definitely been fun to read about, so far

(via youzicha)

Smooth infinitesimal analysis - Wikipedia, the free encyclopedia →

Someone decided to base a version of nonstandard analysis on … denying the law of the excluded middle?

????

twocubes asked: idk, it seems you might be able to make it make sense by thinking of recall of pleasure as a linear functional on experiences, so a dirac delta pleasure experience simply corresponds to just, well, suddenly discontinuously feeling happier?

Maybe, but why would someone find that so desirable in particular?  If anything I’d think gradually becoming happier would be preferable to discontinuously becoming happier.

My guess is that the guy’s reasoning went like this (note: it is bad reasoning):

  1. “I want to experience as much pleasure as possible.”
  2. “The amount of pleasure I can experience at any one time must be unbounded.”  (I don’t know why he would think this)
  3. “So really I’d like to just be in a state of infinite pleasure, like pleasure(t) = infinity for all t, but I’ve taken some math or physics classes that gave me the sense that infinities like that are somehow not Math Kosher.”
  4. “Oh, but there was that one thing I learned about in class that lets an infinite function value be sort of Math Kosher!  The Dirac Delta!  Yeah, let’s say I want to experience that kind of pleasure.  pleasure(t) = delta(t-t_0).  That’s a real math thing, not like the proposal in step 3.”
  5. (Not included: any concept of what the integral of pleasure over time is supposed to mean, or or any sense of what the “shorter and better” experiences that limit to the Delta could conceivably feel like, once they get like 0.0001 seconds long or something)

Hate-reading Luboš Motl’s blog is a kind of mental junk food even by hate-reading standards, I always feed bad afterwards

(“The physicist John Baez is rumored to have said: ‘It’s not easy to ignore Luboš, but it’s ALWAYS worth the effort.’ ” [Source])

hot-gay-rationalist reblogged your post and added:

But the thing is, even if it turns out that the…

Let me try to clarify my point about MaxEnt.

First, forget about MaxEnt entirely, and consider a totally different situation, which I hope will strike you as properly Bayesian.  In this situation, I have a prior over a number of hypotheses about the nature of a process.  I am trying to compute an expected utility given this prior.  Good so far?

Okay, now suppose that the hypotheses are different PDFs that might describe the statistics of a single outcome from this process.  (If this kind of hypothesis space is somehow disallowed by Bayesianism, then that might explain the disagreement?)  Some of these PDFs have finite 4th moment and some have infinite 4th moment.  My utility function assigns utility U = x^4 to real-valued outcomes x from this process.  (Edit: if you’re worried about U being non-negative everywhere, just make it x^4 - c for c > 0.  The argument below involves positive utilities of vastly different sizes, which could always be shifted down to get one positive utility and one negative one.)

Okay, so clearly if I try to calculate my expected utility, it will be “infinity.”  After all, I assign some nonzero probability to the infinite-4th-moment distributions, and that’s all it takes for their infinity to swamp the whole calculation.  (E.g. 0.99*5 + 0.01*infinity = infinity.)  The very presence of infinite-4th-moment PDFs ends up determining the whole calculation, with the other hypotheses having no influence.

(Note we’re taking an expectation value respect to the hypothesis PDF in each hypothesis case to get the utilities in each case, and then we’re taking another expectation value over the utilities in each hypothesis case with respect to the prior.  That seems to me to be the clearly correct way to do this.)

(You could object that the problem here is that U is unbounded.  But I could do a similar thing with a bounded utility.  Take U_b = x^4 for x < x_b, and U_b = x_b^4 for x >= x_b.  If the integral of x^4 diverges, we can make this as big as we want by making x_b sufficiently big, so we can always construct a U_b where the utility of the infinite-4th-moment PDFs is 10^500 times bigger than any of the other numbers in the problem, or whatever.)

Note that, for a utility function like U or U_b (with appropriate x_b), the only fact that really matters is “some infinite-4th-moment PDF is in the support of my prior.”  If we compared this prior to another one, then the expected utility would be (nearly) the same if this fact were true, and very different if it weren’t.

For instance, if we shrunk the prior so that it only contains one “best guess” distribution, with probability 1, then if that “best guess” had infinite 4th moment, I’d still have expected utility infinity or (really big num) or whatever.  On the other hand, if my “best guess” had finite 4th moment, all of a sudden everything is totally different.

Okay, now back to MaxEnt.  You talk about how the Gaussian is a “best guess,” and also how it’s a sort of average, that takes into account various possibilities about the distribution but doesn’t privilege them.  However, doing expected U calculations with the Gaussian is just like doing them in the case just mentioned in the last sentence of the previous paragraph: I get finite (or non-astronomical) answers.  This doesn’t seem to correctly “average” the behavior of my utility over various unknown possibilities.  After all, distributions with very low 4th moment can’t “cancel out” distributions with infinite 4th moment here.  (Imagine including them in the prior in the earlier part of this post; they don’t stop the infinity from “swamping” everything.)

In this contrived example, my expected utility with a Gaussian is a poor approximation of my expected utility in a world where I think any infinite-4th-moment distribution might be the true distribution.  The EU with the Gaussian can’t be seen as some kind of “average” of my EUs with various possible other distributions; as we’ve seen, once we introduce the infinite-4th-moment ones, they swamp out everything else, and can’t be “averaged out” by including any kind of “opposite” of them.

So: should I do my expected utility calculations with the distribution MaxEnt gives me in a case like this?

Quick response to hot-gay-rationalist’s most recent post – I have to leave to get on a plane soon so I don’t have as much time as I’d like to think and write about this, but I want to note down my initial, possibly stupid responses to your points so any easily resolvable confusions can be resolved.

First, about the principle of indifference: I agree about the relabeling and the idea that the 4 outcomes shouldn’t be treated differently.  However, this is (sort of) why I don’t think it’s appropriate to represent my state of uncertainty as a probability distribution.  Assigning probability 0.25 to each outcome implies that the behavior of the lights is independent, but that’s something I feel like I, in a definite/positive sense, do not know.

I would not be “more surprised” by non-independent behavior than independent behavior, it’s just that all of the possible non-independent behaviors “cancel out,” as it were (because of the invariances), so that the probability distribution has independent behavior.  But that doesn’t mean that independence is part of my knowledge.

Perhaps using the correct P_a distributions would reflect this fact?  What I want to capture is that I don’t believe any more strongly that the behavior of the lights is independent than that it isn’t.  I really do know nothing about the machine (let’s bracket the issue of whether total uncertainty is realistic for now – not enough time on my end).

Second, about MaxEnt.  I don’t think I’m doing the Mind Projection Fallacy.  I’m not looking for “what the real PDF is.”  I’m supposing that what I know about a phenomenon is limited to a mean and variance, and I’m trying to come up with a mathematical object that describes what I know about it.  For a Bayesian, this is “prior construction.”  Even if I’m not Bayesian, I might want some object that represents what I know.

What I’m saying is that using a Gaussian prior doesn’t feel like “what I should do based on what I know” in that situation.  If the variable is x and I care about E[(x-mean(x))^4], and want to make predictions about it, the fact that the variable might (in the real world as opposed to my head) be distributed like a Student’s t is a thing that would actually enter my head (like, in a real situation, if I were thinking about a thing and all I knew was its mean and variance).  That this fact seems to be “missing” from the Gaussian prior suggests that the Gaussian prior is somehow not a good representation of my actual understanding of the situation.

Note that my complaint is not that the Gaussian might not match the real world (which would be Mind Projection), but that my internal state of knowledge involves information that the Gaussian doesn’t seem to capture (all stuff in my own mind).  I don’t really have “the fourth moment of my uncertainty” because “my uncertainty” ranges over possibilities like “maybe the fourth moment isn’t even finite” and I can’t see how the Gaussian includes that.