Install Theme
lisp-case-is-why-it-failed:
“nostalgebraist:
“michaelblume:
“lisp-case-is-why-it-failed:
“michaelblume:
“…am I just too Bayesian to understand why this is supposed to be weird?
”
Yes. Imagine if you believed in the frequentist or propensity...

lisp-case-is-why-it-failed:

nostalgebraist:

michaelblume:

lisp-case-is-why-it-failed:

michaelblume:

…am I just too Bayesian to understand why this is supposed to be weird?

Yes. Imagine if you believed in the frequentist or propensity definitions of probability. Almost all of these questions are nonsensical then (although you should be able to handle the aliens question with propensities).

Alternatively: most of these questions are about the macro structure of the universe. How do you answer such questions without some kind of universal prior?

Ok but then how do you have beliefs of any kind about reality ever bet?

My opinions about things like this sometimes change, and are sometimes vague/confused, and I’ve rambled a lot about it

But my current opinion is something like:

I don’t have a prior with support over every conceivable outcome (using “outcome” in the broadest sense, so that things like those in the screenshot apply).  I don’t think anyone actually does.

What we have is more like a mental function we can query that outputs “how likely does this feel to me?”  We can, if we wish, try to translate these feelings into numbers in [0,1].  But calling these numbers “probabilities” is inappropriate in most cases, because the mental function isn’t consulting some underlying distribution obeying the probability axioms, except in toy problems like rolling fair dice (where the function will, so to speak, call another function that actually does the math).

In particular, the mental function generally doesn’t even use a consistent outcome space, and e.g. if A and B are both things I “have no idea about” it will also tell me that I equally “have no idea about” the event A&B.  (It makes the conjunction fallacy because it has no picture of outcome space with regions that could be labeled “A,” “B” and “A&B.”)

How does this relate to betting?  Well, I’m wary of making bets by using credences which generally will not (except by coincidence) obey the probability axioms, for the usual reasons.  (I don’t exactly mean “Dutch books” because I think that issue is a bit different from how it’s usually presented, but I think that human biases make it easy to get tricked into bad bets and incoherent credences only make it easier.)

One could then object that surely I’d take some sufficiently skew bets on any given question.  Would I really turn down a bet that costs me $1 if MWI is false and pays me $1 billion if MWI is true?  And couldn’t you back out a “revealed probability” from this?   I talk about this issue here – the upshot is that while I might take such a bet, this has nothing to do with the specific concept I’m being asked about, but is simply an instance of my generic, default betting behavior in response to questions where my mental function outputs “oh god who even knows.”

Even then, I’d probably reject all such bets in real life.  Partially because I’d be suspicious about why the other side is willing to offer them, but more fundamentally, because I try to do things that are designed for actual probabilities – like expected value calculations – only when I feel like my credences come from some actual knowledge about the underlying outcome space.

That is, I won’t pay a Pascal’s Mugger, not because I “believe there is probability zero that the mugger has magical powers,” but that I don’t have any informed breakdown of how the world could be, such that some parts of it are labelled “these magical powers are possible” – specifically such a breakdown I would have been able to give you before I ever encountered the mugger.  I file “the mugger has magical powers” under “hey, anything’s possible,” rather than under “I have a number of theories of how the world might work, and under this subset of them, the mugger could have magical powers.”

To sum up, I make decisions in various ways, and use ways that approximate EV maximization when I think I’m in the sort of domain where I can construct something like a probability distribution on a well-defined outcome space.  I think it’d be actively irrational, or at least totally without rational justification to use that sort of technique when I can’t do this even approximately.

This includes cases like “MWI is more or less correct” and “God exists.”  I do have opinions about these questions – my sense is that MWI would at least need substantial revision to be correct, and that God almost certainly doesn’t exist.  But I have nothing like a probability space associated with these questions.  (For instance, it’s conceivable that the problem of evil is correctly resolved by “it’s all God’s plan and was all a good idea for some reason beyond our current understanding,” but I don’t have a picture of all the ways in which this could be true nor any sense about “how likely” it is for “a typical universe” to be configured in any one of these ways; again, for me this falls under “hey, anything’s possible.”)  Thus, I can’t provide numbers that I could justly call “probabilities.”

ETA: I don’t personally frame this “frequentist defn. vs. Bayesian defn. of probability,” but rather as a distinction between beliefs about how to do inference correctly in real life.  It’s not that I think “degrees of belief” can’t be probabilities by definition, but rather that treating my degrees of belief like probabilities in all cases would be bad practice.

Longish response:

Keep reading

(Responding the most recent reply, the one under a cut)

I worry this will sound arrogant or hostile, but I’ve been reading/thinking/talking about these issues for a long time, so I don’t think I’m just making some basic misunderstanding about what Bayesians mean by certain terms.  Relatedly, most of the issues you raise are things I’ve talked about on tumblr before at some point – see my Bayes tag (which I realize is long and disorganized, I just don’t want to repeat myself).

A few points (again, there is more in the tag):

I understand that the “degrees of belief as coherent probabilities” is an ideal for rational agents rather than a description of human psychology.  The practical question is then “what should I do, given that I have degrees of belief that don’t work like probabilities?”  For instance, should I still do expected utility calculations (pretending my degrees of belief are probabilities)?  In all cases, or only in some?

In some cases our “failures of coherence” are just due to simple mistakes that can actually be patched in practice, with stuff like “don’t neglect the base rate.”  In other cases it has the much deeper cause that we don’t know what the outcome space looks like, so we can’t put a distribution over it, even a flat one.  (One consequence is that it is basically impossible to deal with conjunctions sensibly in these cases – I made some posts with more detail about this a while back)

Since we are so very far from being coherent rational agents, it’s not clear that behaving more like those agents in any single, particular way will be good rather than bad for us.  In optimization terms, the ideal is far enough away that it doesn’t tell us much about the local gradient, so to speak.  I think the use of the word “probability” in things like the OP picture comes from a belief that in fact we are sufficiently close to the ideal that “moving towards the ideal” approximates “moving along the gradient,” i.e. “these aren’t probabilities, but they’re sort of close to being probabilities and we rational folks are trying to make them even closer.”

Incidentally, I think the Dutch book argument for coherence has serious problems, although there are other arguments for the same conclusion.

(via just-evo-now)

michaelblume:
“lisp-case-is-why-it-failed:
“michaelblume:
“…am I just too Bayesian to understand why this is supposed to be weird?
”
Yes. Imagine if you believed in the frequentist or propensity definitions of probability. Almost all of these...

michaelblume:

lisp-case-is-why-it-failed:

michaelblume:

…am I just too Bayesian to understand why this is supposed to be weird?

Yes. Imagine if you believed in the frequentist or propensity definitions of probability. Almost all of these questions are nonsensical then (although you should be able to handle the aliens question with propensities).

Alternatively: most of these questions are about the macro structure of the universe. How do you answer such questions without some kind of universal prior?

Ok but then how do you have beliefs of any kind about reality ever bet?

My opinions about things like this sometimes change, and are sometimes vague/confused, and I’ve rambled a lot about it

But my current opinion is something like:

I don’t have a prior with support over every conceivable outcome (using “outcome” in the broadest sense, so that things like those in the screenshot apply).  I don’t think anyone actually does.

What we have is more like a mental function we can query that outputs “how likely does this feel to me?”  We can, if we wish, try to translate these feelings into numbers in [0,1].  But calling these numbers “probabilities” is inappropriate in most cases, because the mental function isn’t consulting some underlying distribution obeying the probability axioms, except in toy problems like rolling fair dice (where the function will, so to speak, call another function that actually does the math).

In particular, the mental function generally doesn’t even use a consistent outcome space, and e.g. if A and B are both things I “have no idea about” it will also tell me that I equally “have no idea about” the event A&B.  (It makes the conjunction fallacy because it has no picture of outcome space with regions that could be labeled “A,” “B” and “A&B.”)

How does this relate to betting?  Well, I’m wary of making bets by using credences which generally will not (except by coincidence) obey the probability axioms, for the usual reasons.  (I don’t exactly mean “Dutch books” because I think that issue is a bit different from how it’s usually presented, but I think that human biases make it easy to get tricked into bad bets and incoherent credences only make it easier.)

One could then object that surely I’d take some sufficiently skew bets on any given question.  Would I really turn down a bet that costs me $1 if MWI is false and pays me $1 billion if MWI is true?  And couldn’t you back out a “revealed probability” from this?   I talk about this issue here – the upshot is that while I might take such a bet, this has nothing to do with the specific concept I’m being asked about, but is simply an instance of my generic, default betting behavior in response to questions where my mental function outputs “oh god who even knows.”

Even then, I’d probably reject all such bets in real life.  Partially because I’d be suspicious about why the other side is willing to offer them, but more fundamentally, because I try to do things that are designed for actual probabilities – like expected value calculations – only when I feel like my credences come from some actual knowledge about the underlying outcome space.

That is, I won’t pay a Pascal’s Mugger, not because I “believe there is probability zero that the mugger has magical powers,” but that I don’t have any informed breakdown of how the world could be, such that some parts of it are labelled “these magical powers are possible” – specifically such a breakdown I would have been able to give you before I ever encountered the mugger.  I file “the mugger has magical powers” under “hey, anything’s possible,” rather than under “I have a number of theories of how the world might work, and under this subset of them, the mugger could have magical powers.”

To sum up, I make decisions in various ways, and use ways that approximate EV maximization when I think I’m in the sort of domain where I can construct something like a probability distribution on a well-defined outcome space.  I think it’d be actively irrational, or at least totally without rational justification to use that sort of technique when I can’t do this even approximately.

This includes cases like “MWI is more or less correct” and “God exists.”  I do have opinions about these questions – my sense is that MWI would at least need substantial revision to be correct, and that God almost certainly doesn’t exist.  But I have nothing like a probability space associated with these questions.  (For instance, it’s conceivable that the problem of evil is correctly resolved by “it’s all God’s plan and was all a good idea for some reason beyond our current understanding,” but I don’t have a picture of all the ways in which this could be true nor any sense about “how likely” it is for “a typical universe” to be configured in any one of these ways; again, for me this falls under “hey, anything’s possible.”)  Thus, I can’t provide numbers that I could justly call “probabilities.”

ETA: I don’t personally frame this “frequentist defn. vs. Bayesian defn. of probability,” but rather as a distinction between beliefs about how to do inference correctly in real life.  It’s not that I think “degrees of belief” can’t be probabilities by definition, but rather that treating my degrees of belief like probabilities in all cases would be bad practice.

(via michaelblume)

raginrayguns:

lambdaphagy:

nostalgebraist:

nostalgebraist:

vaniver:

nostalgebraist:

So: what’s the deal with Akaike information criterion vs. Bayesian information criterion?  "Information theory” and “Bayesianism” are both things with a lot of very devoted adherents and here they appear superficially to give different answers

They correspond to different priors. AIC has a bit better underlying framework (from an information theory point of view) and I believe better empirical validation.

Ah, OK.  I found this paper through Wikipedia, about AIC as Bayesian with a different (better?) prior, which looks good.

BIC has the advantage that it will converge asymptotically to the true model if the true model lies in the set of models being fitted, although it’s disputable how important this is.  And BIC can be derived using a minimum description length approach (can you get AIC this way too?).

One of the things I am wary of here is the sense that “information theory is magic” – e.g. in the paper linked above:

Their celebrated result, called Kullback-Leibler information, is a fundamental quantity in the sciences […] Clearly, the best model loses the least information relative to other models in the set […]

Using AIC, the models are then easily ranked from best to worst based on the empirical data at hand. This is a simple, compelling concept, based on deep theoretical foundations (i.e., entropy, K-L information, and likelihood theory).

Maybe I just don’t understand information theory, but I’m confused why I should care that the K-L divergence is “deep” and “fundamental,” here.  The question at hand is how to select a model based on some sort of estimate of how the model will generalize from the training set.  In practice I hear people justify using things like AIC by saying “well, obviously, you want the most information,” where “most information” is just a verbal tag we’ve associated with the K-L divergence and I’m not sure what mathematical weight I should give to it.  If AIC does well, and this is because it is based on information theory, I would like to understand this in a nonverbal way – what property of K-L divergence made it a good choice here, ignoring suggestive words like “information”?

Reblogging because I’m really curious about this – I’ve been aware of information theory for a long time but I’ve never been sure how it justified choices like this, and I feel like I must be just missing something major / “obvious.”

@su3su2u1, @lambdaphagy, @raginrayguns, et. al.?

Oops, didn’t have a chance to get to this earlier.  Others have already chimed in with sensible responses, but here’s another way to think about it non-verbally, especially if you want to ask “why KL divergence in the first place?” rather than “why AIC?”

KL divergence arises naturally when you ask the question “what does it mean for two distributions in a parametric family to be ‘close’ to one another?”  Take univariate Gaussians parametrized by mu and sigma, and consider each measure as a point in a 2-D parameter space.  Consider some plausible things we’d like to say about this space.  First, for any two measures (mu1, sigma1) and (mu2, sigma2), the distance between them should vary with the difference between mu1 and mu2: the further apart the means are, the “further apart” the distributions are.  But secondly, as sigma1 and sigma2 grow larger, the difference in the means should matter less.  As sigma goes to infinity, the value of the mean washes out and there is only really only one Gaussian distribution left, with its density smeared out over the entire real line.

If we think about what this means for the geometry of the parameter space, we realize that it’s not Euclidean.  In fact it’s hyperbolic: we’ve got a half-plane that draws to a single point as sigma goes to infinity.  This motivates us to ask what the appropriate metric tensor is.  It turns out (and here you must imagine my hands waving hard enough to achieve lift-off) that if you take the Hessian of the KL divergence with respect to the parameters, you get the Fisher information matrix and that does the job quite nicely.  The KL divergence is then, roughly, measuring our surprisal about the samples coming off of our distribution of interest as we move through parameter space. 

(This is backwards from the usual presentation, and I’m not sure what you’d get if you went through this exercise with some other notion of distance between distributions, like L1, L2 or TV.  KL divergence has so many other useful properties that I would expect the Fisher-Rao metric to be canonical in some sense, but I don’t know which.)

Okay, I tried to think this through a bit with L2 distance, and I think I’m dropping several levels in HabitRPG as a consequence, I really need to be writing a fellowship applicaiton, anyway….

so here’s the formula I got for L2 distance between two normals

image

So, as for the properties you described.

  1. Increases with |mu1 - mu2|. Yes it does.
  2. Rate of increase with |mu1-mu2| is lower with higher sigmas. I set sigma1=sigma2 and plotted it, and yup, the plot is less steep when sigma1=sigma2 is higher.
  3. Is zero when the sigmas are infinity. Yup.

So…. I guess the same argument… applies? You lost me at hyperbolic geometry ‘cause idk what that is. But definitely L2 fits the picture you painted as well as KL.

There’s a difference though, which is that KL distance between two normals is a convex function of |mu1-mu2|, right? The bigger the difference already is, the more increasing it counts? L2 distance on the other hand is not. So, like, if we set mu1 to 0, and consider positive mu2, then d/dmu2 KL is an increasing function of mu2. But d/dmu2 L2 looks like this:

image

so, what’s that all mean? idk.

This is all very interesting.  Another property you’d want is invariance under general changes of variables, which L2 doesn’t have, but K-L has (the scaling cancels in the fraction, and outside the fraction it gets cancelled by dx).

(via raginrayguns)

@reddragdiva​ linked (here) to a post about something called Perceptual Control Theory, and how it ostensibly conflicts with both (1) Bayesianism and (2) the “stimulus-response” view of behavior.

The post claims that the stimulus-response theory is refuted by tasks in which people respond to external stimuli in a way that continually corrects for outside disturbances, like a thermostat does.  One finds that that the pattern of behavior (over time) is highly correlated with the disturbance, but has a very low correlation with the stimulus itself (composed, at any time, of the disturbance plus the person’s correction).

The post’s author quotes numbers that appear to be Pearson correlation coefficients, but then makes the startling jump to mutual information (which can measure nonlinear dependence as well):

So in a control task, the "stimulus” – the perception – is uncorrelated with the “response” – the behaviour. To put that in different terminology, the mutual information between them is close to zero. But the behaviour is highly correlated with something that the subject cannot perceive.

This claim seemed startling to me, and I got kind of nerd-sniped by it.  (I mean, my air conditioner is a control system like this, and presumably there’s some dependence between its responses and the current temperature?? It shuts off when the temperature gets low enough!)  And I concluded that the statement above didn’t make sense.

The post includes a link to a java demo where you can do such a task yourself.  A line moves around on the screen and you try to move your mouse to keep it fixed at a reference point.  At the end, you get a plot like this

image

The red trace, C, is where the line was relative to the reference point (my goal was to keep it at zero).  The blue trace D was the imposed offset I was trying to correct for, the green trace M is the position of my mouse.  (The black trace is where M would be if I’d done perfectly.)

Correlations (I assume Pearson – nothing about mutual information on the page, anyway) are listed in along the top.  The correlation between M and D is nearly -1, indicating I was doing a good job counteracting the disturbance.  OTOH, the correlation between C and M is only 0.198, which the demo page says is surprising:

When you are able to control the distance between cursor and target, keeping that controlled variable equal to zero, you will see that the cursor-mouse (C-M) correlation is rather low (usually between -.2 and .2). This is surprising if you think of cursor movements as the stimulus for the mouse movements (the response). All you can see in this task is cursor movement, which is at all times a combined result of disturbance and mouse movements. Nevertheless, mouse movements are strongly (negatively) correlated with the invisible disturbance rather than with the visible cursor movements.

This is also what the blog post means about stimulus and response being unrelated.

Does this interpretation make sense?  First, note that I definitely felt like I was responding to an immediate stimulus when I was playing the game – when the line moved right, I moved left, and vice versa.  Describing this in terms of the above variables is a little difficult, though.  When C (cursor position) changed, M (my mouse position) changed in response.  But C itself is the sum of M and D, so my own movements influence it, and you don’t want a measure of the relationship that thinks my own movements are responses to themselves.

What I was actually responding to was not where the cursor was, but how fast the cursor seemed to be moving when you subtracted out my own movements.  This is simply the time derivative of D, and you could model my behavior by writing dM/dt = -dD/dt.  But in the interpretation above, this is supposed to be something magical, because supposedly I “can’t see” D.  But of course I can see D, or rather its time derivative – it is precisely what I’m seeing when I say “hey, the cursor’s drifting off to the left now, time to move right.”

So what does the C-M correlation actually represent?  If I’m doing well, C is close to zero.  M, on the other hand, spans a large range.  Looking at the image, we see that for one part of the game it was positive (canceling negative D) and for the other part it was negative. 

A strong negative C-M correlation would then mean that I tended to err on the left side of correct when my mouse was on the right of center, and vice versa for right/left.  This is a sort of tendency one could conceivably have, but it has nothing to do with stimulus and response.  When my mouse was far to the right, say, this was not because I thought “ah, the Stimulus is left, my Response will be right!”, it’s because I’d drifted over to the right as my previous motions were summed up.  (This would be the integral part of a PID controller.  The blog poster mentions PID controllers, but doesn’t seem to have realized how they refute what they’re saying.)

Nonetheless, the blog poster seems quite confident in their radical conclusions, which apparently overturn much of mainstream psychology:

This is 180 degrees around from the behavioural stimulus-response view, in which you apply a stimulus (a perception) to the organism, and that causes it to emit a response (a behaviour). I shall come back to why this is wrong below. But there is no doubt that it is wrong. Completely, totally wrong. To this audience I can say, as wrong as theism. That wrong. Cognitive psychology just adds layers of processing between stimulus and response, and fares little better.

Ah, good old Less Wrong!

another-normal-anomaly:

nostalgebraist:

another-normal-anomaly:

plain-dealing-villain:

shlevy:

theungrumpablegrinch:

shlevy:

We will cry “ Justice!” and proclaim that rationality need not come as dear as [Bayesians] insist. More than this we shall argue that Bayesian principles cannot even be construed as an idealization of human rationality; in many cases applicable to the human condition, these principles disallow what is rational

[popcorn-eating gif, snipped by nostalgebraist]

Link? (If it’s any good.)

http://www.cs.toronto.edu/~fbacchus/Papers/BKTSYN90.pdf (via @nostalgebraist)

How does someone this clueless commute without being hit by a bus daily?

This seems to be mixing different domains (descriptive, normative, mathematical, practical) wildly, and also to be completely stupid. IIRC, if you can construct a dutch book that someone with axiom-violating beliefs would consider to have a net payout of 0, you can construct one that they would consider to have a positive net payout, unless their deviation from the axioms is super tiny. Then they take it, and you take all their money. And then the paper goes on to say that it’s okay to be dutch-bookable because you might never meet anybody interested in taking advantage! If I knew more social engineering, I would try to sucker the paper authors out of a pile of money, but I suspect their common sense will protect them where their explicit beliefs let them down.

I don’t understand your objection.  The point the authors are making is that one can recognize a Dutch book simply by looking at the odds, and note that betting at those odds gives you a loss, no matter what beliefs one holds about the actual events.  Then one just rejects it because it gives a loss.

For instance, in the Dutch book example given here, you can simply look at the odds given and note that the bookie has created a money pump, and refuse to take the bets.  You can always do this, if even you believe that the bookie’s probabilities are “true”  (in some sense of that phrase).

To avoid getting Dutch booked, you must only agree to betting odds that obey the probability axioms.  The question is then, do the betting odds you post have to reflect your degrees of belief?  What goes wrong if they don’t?

(In practice, I’m well aware that my subjective degrees of belief don’t follow the probability axioms and can’t given the nature of my brain, so I wouldn’t base my betting behavior on them as if they were probabilities – then I would be susceptible to Dutch books!)

The general idea is that lots of people *do* use their subjective degrees of belief to decide what bets to take, and not just formally–in everyday life as well. When you decide to drive your car to work, you’re making a bet based on your degree of belief that you will die in a car accident. When you decide what route to take, you’re making a bet based on your beliefs about what route will get you there fastest. Practically every decision you make depends on your model of the world, which is in some sense made of degrees of belief. If your beliefs don’t obey the probability axioms, you could be dutch-booked without ever meeting a bookie. 

Side question: how do you personally make decisions without using subjective degrees of belief? You made me super curious!

“All decisions under uncertainty are bets” is a pretty close analogy, but it’s not perfect.  A (literal) bet is something you do with another person, who also presumably wants to gain utility from it, so there is reason to be more wary than usual of “tricks” like Dutch books.  If you are noticeably careless in exploitable ways, the bets you are presented with will be skewed towards those that make use of this.

But there is no such skewing tendency in nature.  If you’re just making decisions as life randomly presents them to you, you may be Dutch booked, but life isn’t deliberately selecting events for the sake of Dutch booking you.  So when I say I wouldn’t make bets on the basis of my degrees of belief, I mean that I wouldn’t make literal bets, against other people, on this basis, because my degrees of belief are a mess and people might use that to their advantage.  (What do I do instead?  In practice I just don’t make bets.)

Anyway, here is why I am skeptical of “make your beliefs coherent so life doesn’t screw you over” idea.  A noteworthy fact about Dutch books is that you know they are bad, even if your probabilities are incoherent.  (You lose in every case, after all.)  If someone comes along and just straight-up offers the whole Dutch book to you, you’ll reject it.

To be “Dutch bookable” means to be willing to take several individual bets which, if taken together, are a Dutch book.  If you are paying perfect attention, there is no problem with this.  You accept the individual bets, you reject the Dutch book, if (A with B) is a Dutch book then you’ll reject B if you’ve already accepted A, etc.  The problem only happens when you are careless and say “hmm, B does sound good” without remembering that you accepted A at some point in the past.

What this means, practically, is that having coherent probabilities gives you the freedom to pay less attention.  But also: paying more attention gives you the freedom to have less coherent probabilities.  Personally, I think that coherence is by far the trickier of these two (for reasons that would take a few more paragraphs) and so I’d prefer to just pay more attention.

(via another-normal-anomaly)

how to dutch book someone who has consistent probabilities

cccccppppp:

sometheoryofsampling:

cccccppppp:

nostalgebraist:

[long post snipped]

As an addendum, I believe the diversification preference is to bets that are uncorrelated, not negatively correlated.

In finance, the diversification preference is for minimum variance. A minimum variance portfolio can include positively correlated, uncorrelated and negatively correlated choices.

Of course. I was assuming some other things.

I am confused about the example of the bus ticket utility function. Isn’t it discontinuous?

The bus ticket utility is discontinuous, yes.  (From the way it’s presented, it looks like it’s supposed to be linear except for a jump at $1, and certainly there’s a jump at $1.)

I was thinking that it’s technically “risk-seeking” because it’s convex (but it’s not strictly convex on any interval, so maybe it’s really no more risk-seeking that a linear function?  idk).  Anyway, my intuitive/moral reason for calling it risk-seeking was that it produced the same sort of preference you’d see with more ordinary risk-seeking utility functions.

With diversification, I hastily compared two different things that I think are “morally” the same without talking about the differences.  But now that I think more closely about it, I don’t think what I said in the OP about correlations is quite right.

The relevant differences between portfoilios and bets (of the kind discussed in the OP) is that when making a portfolio you assume you have a fixed amount of money to invest, so if you spread it over N assets, each asset only gets (1/N) of your money.  OTOH, accepting a combination of bets just means accepting each of the bets in sequence, so if A and B both have a $1 stake, you’d have to put in $2 to accept both.

A consequence of this is that while spreading a portfolio over copies of the same asset (correlation 1) is neutral, accepting multiple copies of the same bet (correlation 1) is worse than neutral.  For instance, with two copies you gain twice as much if you win and pay twice as much if you lose, but since you’re risk averse, the badness of the latter outweighs the goodness of the former.  The analogy in investing is putting more money in the same asset or portfolio: if you’re risk averse, this is bad.

Likewise, while spreading a portfolio over multiple uncorrelated assets is good (lower variance), while accepting multiple uncorrelated bets is … well, I thought it was neutral, but on reflection I don’t think so.  If you make the same bet twice on two independent coins, you win twice as much with p=¼, lose twice as much with p=¼ and break even with p=½.  Depending on the specific risk averse utility function, this could be better, worse, or the same.  I want to say that lower correlation is still always better all else being equal, but I’d have to work it out on paper to be sure.

I’m pretty sure that negative correlations are desirable in both cases.  In investing, if you can find two assets with positive expected return and perfect negative correlation, you can combine them to make a riskless asset with positive expected return (Corollary 1.2 here).  With bets, if you make a neutral bet on heads (say win $10 vs. lose $1 bc you’re risk averse), and combine it with the same bet on tails, you always win it once and lose it once and thus always gain $9, which is better than neutral.  I think this generalized to initial bets that aren’t neutral but I’d have to work it out to be sure.

(via cccccppppp-deactivated20181228)

@automatic-ally (responding to this post)

One thing to note about the practical side of Dutch-bookability. Say you’re risk-neutral; you still don’t want to trade at your probabilities, cause in expectation you make zero, and other people might know more. So you always demand a margin of m on your trades–that is, if your probability is p, you demand p+m for a contract. Then if the only Dutch books you have are of margin (~incorrectness) m or less (like say your P(X) + P(~X) = 1 + m for some X), you can’t actually be Dutch booked on any of your contract prices.

Of course, in a competitive market, you’re gonna have lots of people trying to shave off their m to the lowest possible while retaining a profit–so Dutch-bookability concerns, being relatively easy to deal with, are generally shaved away as much as possible. (The independence point is an interesting one, but doesn’t really apply to practical trading that depends on one’s current utility function.)

Of course, this is a lot weaker than saying “in general you should keep your margin as low as possible”, but it does point to why Bayesianism might be particularly well-suited to some areas.

And in general with high-stakes situations, I’d guess there are going to be analogous margin-shaving competitions. Maybe, say, nuclear plant contractors have to trade off safety with cost, and eliminating Dutch-bookability of your meltdown risk probabilities helps you know you can use them aggressively. (I’m not 100% convinced of this, but it seems like a reasonable start.)

This all sounds interesting but I’m not sure I understand any of it, except for the first paragraph.

If I understand you correctly, you’re saying in paragraphs 2-3 that in a competitive market, people will make trades they view as having close to zero expected utility, and so they are close to the threshold of being Dutch booked.  I’m not sure this is practically important, for several reasons.

First, Dutch books seem more important theoretically than practically.  "Dutch books” is a subset of “trades with negative expected utility,” and it’s not clear that a Dutch book is any worse, in practice, than a non-Dutch-book trade with the same (negative) expected utility.

A Dutch book is a trade that gives you negative utility in every possible outcome, rather than just on average.  This is theoretically important because it is especially easy to agree – even before we’ve defined “degrees of belief” or anything – that these trades are undesirable.  When making Dutch book arguments, we’re considering agents who may have strange beliefs about probabilities, and so it’s thorny to talk about what the “correct” expected values without getting circular, since these depend on probabilities.  But if we can make the agent lose no matter what happens, that’s clearly bad no matter what you believe about probability [1].  This is why Dutch books in particular are used in this argument, and not just negative-EU trades.

What I’m trying to say is, when I see Dutch books get mentioned, it’s because they’re important in theory.  I don’t see them get brought up as a practical matter.  Maybe they are practically important, but I’d need more concrete examples to be convinced.  (@plain-dealing-villain talked briefly about a possible Dutch book in the real world here, but I’m having trouble imagining precisely how it would work.)

If we look at the real world, either we see a lot of people being Dutch booked or we don’t.  (To me it looks like the latter, but maybe I’m just not looking with the right conceptual glasses.)  If a lot of people are being Dutch booked, then that means that people are making bad decisions that at least might be fixable by making their beliefs more like probabilities, which supports the “Bayes is important and bad things will happen if you don’t do it” narrative.  If we don’t see many Dutch books happening, then either being Dutch bookable isn’t actually a problem, or it is but everyone’s already made themselves invulnerable to Dutch books.  In either case, the Dutch book arguments wouldn’t have much practical advice to give us.

When I look at trades that are specifically made in combination because they’re more advantageous that way, I don’t see Dutch books.  I’m thinking of diversification and arbitrage, both of which involve trades with multiple other parties, so no one is “on the other end” getting screwed.  But I understand these very hazily and I suspect someone is going to swoop in and tell me about some common practice that is clearly a Dutch book.


[1] OK, technically, if you can believe in negative probabilities, have negative degrees of belief, etc., then you can see a Dutch book as good – say, if you lose money “with probability -1,” then you gain money with probability 1.  It’s still possible here to go through case by case and show that you lose money in every possible case; the negative probability here amounts to a belief that those cases are not the real possibilities.  This is a bizarre way of thinking which people don’t do in practice AFAIK, and I don’t know why one would ever want to do it, so it’s mostly a curiosity.

@argumate

Am I underthinking this, or isn’t this just saying that your internal state and expected utilities depend on the bets you have taken, and thus after taking A you won’t take B, even though you would have gladly taken B if you hadn’t already taken A?

Yeah, that is one (correct) way of describing the issue.  Schick distinguishes “independence” (the negation of the thing you said) and “additivity” (utility of a combination is the sum of the utilities), but then he shows (fn 5) that they’re the same thing in this case.

I guess I prefer to state it in terms of additivity because it’s more obvious that it fails in the real world.  You aren’t additive if you’re risk averse and saying that rationality requires you not to be risk averse is ridiculous.

how to dutch book someone who has consistent probabilities

I was reading about Dutch books today while avoiding work and I came upon something that I was startled to have never heard about before.  It’s the point made in Schick 1986, “Dutch Bookies and Money Pumps,” although I much prefer the presentation in section 4.5 of Maher’s book Betting on Theories, which I can access online through my university but can’t link to.

The issue is simple.  A standard Dutch book argument (like the one given for Additivity here) presents several bets, each of which seems fair to you individually, then shows that taking all three at once will lose you money in all possible cases.

The “gotcha” here depends on the assumption that if you’ll take several bets individually, you must also take them in combination.  But clearly this doesn’t hold if the events in question are correlated.  If you’re risk-averse, say, you get extra value from bets on outcomes which are negatively correlated – that’s what a diversified portfolio is.  And the reverse in the case of positive correlation.

Maher gives a nice simple example of this:

Suppose that, after a night on the town, you want to catch the bus home. Alas, you find that you have only 60 cents in your pocket, and the bus costs $1. A bookie, learning of your plight, offers you the following deal: If you give him your 60 cents, he will toss a coin; and if the coin lands heads, he will give you $1; otherwise, you have lost your 60 cents. If you accept the bookie’s proposal, you stand a 50-50 chance of being able to take the bus home, while rejecting it means you will certainly have to walk. Under these conditions, you may well feel that the bookie’s offer is acceptable; let us suppose you do. Presumably the offer would have been equally acceptable if you were betting on tails rather than heads; there is no reason, we can suppose, to favor one side over the other.

As subjective probability was defined for the simple Dutch book argument, your probability for heads in the above scenario is at least .6, and so is your probability for tails; thus your probabilities violate the probability calculus. The Dutch book argument claims to deduce from this that you are irrational. And yet, given the predicament you are in, your preferences seem perfectly reasonable.

Looking back over the simple Dutch book argument I gave, we can see what has gone wrong. Prom the fact that you are willing to accept each of two bets that together would give a sure loss, that argument infers that you are willing to give away money to a bookie. This assumes that if you are willing to accept each of the bets, you must be willing to accept both of them. But that assumption is surely false in the present case. In being willing to bet at less than even money on either heads or tails, you are merely being sensible; but you would certainly have taken leave of your senses if you were willing to accept both bets together. Accepting both bets, like accepting neither, means you will have to walk home.

In this case, you are risk-seeking (the opposite of risk-averse), and so the less risky (“diversified”) combination of bets on heads and tails is worse to you than the sum of each individual bet.

What is startling here is that risk aversion and risk seeking are not “irrational” in the “expected utility maximizer” sense.  You can have this kind of utility function and still be an expected utility maximizer using consistent probabilities.  But then you will be “Dutch bookable,” as in the above example.


A standard reflex here would be to suppose that the translation from money to utility is confusing things somehow, and to restate the problem with bets on utilities.  Maher treats this case next:

In this version of the argument, if you pay $r for the right to receive $s if A is true, and you have utility function u, then the betting quotient is said to be u($r)/u($s). This will in general be different from the value r/s that was taken to be the betting quotient in the simple argument. Apart from that difference, everything proceeds as before […]

He goes on to point out that, of course, this runs into the exact same problem:

[S]uppose your utility function u is such that

u($0) = 0;   u($0.40) = .4;   u($0.60) = .6;   u($1.00) = 2.

Since you currently have $0.60, the expected utility of not accepting the bookie’s deal is .6. Assuming your subjective probability for heads is ½, the expected utility of accepting the bookie’s offer to bet on the coin landing heads is 1. (You have $1 if the coin lands heads, and nothing otherwise.) Thus you are willing to accept this bet. And the same is true if the bet is on tails rather than heads. But to accept both bets would leave you with only $0.40, and since the utility of that (.4) is less than the status quo, you are not willing to accept both bets.  Prom this perspective, the reason why bets that are severally acceptable need not be jointly acceptable is that utility need not be a linear function of money. More generally, utilities need not be additive: The utility of two things together need not be equal to the sum of the utilities of each separately.

One can get around this problem by stipulating that people are forced to post odds and take any combination of bets at those odds.  Maher says that even this isn’t enough, and you need to make some additional non-obvious stipulations about bookie behavior, your ability to make arbitrary bets, etc.  If one does all this, one does get the original conclusion back, but only in a contrived, highly specific scenario.  Maher rightly mocks this:

Imagine someone arguing that subjective probabilities must satisfy the axioms of probability because, if you are required to assign numbers to propositions, and are told you will be shot if they do not satisfy the axioms of probability, then you would be irrational to assign numbers that violate the axioms.  Clearly this argument has no force at all.  (An exactly parallel argument could be used to “show” that subjective probabilities must violate the axioms of probability: Just stipulate that if the numbers you assign satisfy the axioms, you will be shot.)


What does this all mean?  We should be careful to make the distinction between accepting the Dutch book, which is bad (by construction), and rejecting it, which is supposed to be inconsistent.

As the above shows, accepting “Dutch bookable” bets by themselves, while rejecting the bets together (the Dutch book), can be the expected utility-maximizing choice if you’re a VNM-rational agent with utility function that is not linear.  This is also the choice that will be made by an agent that is trying to maximize expected utility, but has incoherent odds.

Think about that.  The “gotcha” in the Dutch book argument is that the Dutch book will lose you utility, because there’s no outcome that gains you utility*.  But its expected value is negative according to your own calculations, using your own probabilities.  If your probabilities are incoherent, then you will reject the Dutch book (you yourself believe it has negative expected value – it’s not like the bookie is tricking you), but you will accept any of the constituent bets on their own.  Is this horrible shameful inconsistent behavior?  Well, VNM-rational agents do it too, so if this argument works it must equally well be an argument against VNM-rationality (or against nonlinear utility functions).

This seems to me like a giant gaping hole in the Dutch book argument.  The “irrational” thing it makes you do is something that the “rational” agents it is advocating can also do.


One could try to save the Dutch book argument by seeing it as a pragmatic concern.  In life, by making decisions under uncertainty, we make many “bets,” but our attention is usually focused on just one at a time.  So we are always making many bets at once (with some time lapse between betting and payoff, e.g. betting that the local currency will or won’t be worthless in 10 years), and it is easier to see any of these bets individually than to see how they work in conjunction.  We may be getting Dutch booked by making multiple individual bets that all work in isolation, and this will be hard to notice.

But now note that Dutch books are only half of the picture.  It’s possible for the a combination of bets to be worse than the sum of the individual bets, but it can also be better.  This is what a diversified portfolio is – better than any of its constituents, because it has less risk and you’re risk-averse.

If you can inadvertently choose combinations that make you a guaranteed loss, you can also inadvertently choose combinations that get you a guaranteed win.  It might be objected that no one would make these combined bets with you, but no one is making them.  Remember, we’re talking about the cumulative effects of decisions that seem unrelated.  There is no single entity “on the other end” of the cumulative package, just as there is no single entity taking on extra risk when you reduce your risk with a diversified portfolio.

And you can also get guaranteed wins by virtue of having incoherent preferences – it’s just the Dutch book with a negative sign, what Alan Hajek jokingly calls the “Czech book.”  No one person would want to be milked by you in this way, but just as above, this is irrelevant if there’s no one person on the other end.

*(This sentence contained an error in an earlier version of this post which has been reblogged a few times.  Pointing out the edit for clarity.)