Why is it taken as obvious that EV is the right way to go for decisionmaking? Even taking as a given that you can come up with a meaningful calculation of probability and utility of the different outcomes, it’s not clear looking at a single case why I should care what the weighted average of utility across outcomes would be. If I’m only ever going to do one EV calculation in my life, and choice A has a 25% chance of 100 utils and a 75% chance of -10 utils, it seems reasonable to me to say “I should generally expect to lose on this path, not win”. It’s only if you hit the large number regime when you can use the EV of a single case as a sort of short hand for “if I hold to this strategy over time, in most outcomes the wins will overcome the losses”, but how often do people actually do EV?
In science and business it’s used as a tool to make your computer calculate a plan with outcomes that you want, which does not require perfect agreement between the EV choices and the best choices
(re: how often: it’s used in some bayesian clinical trials to assign patients to treatments, and I met a guy whose PhD thesis is on algorithms to prioritize power lines to fix after a hurricane, which I understood to be EV calculations)
it’s also used as a model for actual decision makers, though I think it’s acknowledged as not always the best one, for example sometimes people use prospect theory
(re: how common, I’ve heard very common in economics)
so, it’s used, but these uses are not because the doctors, economists etc. think EV defines the “right way to go.” But there’s arguments for that too.
My preferred way to define “the right way to go” here, is “rationality upon reflection”. Like, you reflect on your preferences, and maybe you decide that they’re awful, which means they’re not rational upon reflection. There are arguments that EV choices are the rational-upon-reflection ones.
This is actually not very useful if true because
- in science and business, when calculating plans, we know there’s going to be mismatch between the EV choices and the best choices, because we oversimplify the utility function and the probability distributions. This is true whether or not some other utility and probability would work perfectly
- in any individual case, we can decide if the EV conclusion is rational on reflection by calculating it, and reflecting. If this always works, it’d be nice to know, but you can always try it
- and obv as a model for people, people don’t have the rational-upon-reflection preferences
Someone who has a lot of reason to care, though, is Eliezer @yudkowsky, since he wants models of self-modifying decision-making programs. He thinks that such programs are accurately modeled, after their initial stages, only by theories which produce rational-upon-reflection conclusions. Because, I guess, the program is doing a lot of reflecting? I recall him saying he had to develop timeless decision theory, because a ai foom would only be described by existing decision theories for at most a few seconds.
rational-upon-reflection choices are sort of normative, because you can use them as a guide for your own actions, but I don’t like to put it that way. And sometimes the use is not normative, for example EY is using them as a model to predict reflective decision-makers
Anyway, I say there are “arguments” for it but I only know a single argument, which is based on Savage’s theorem. Savage’s theorem is something like, “for any set of preferences that obeys constraints R, there exists a utility function U and a probability distribution P such that R is EV with U and P. Also, given U there’s just one P, and vice versa.” So, such an argument can’t establish the right utility function or the right probability distribution, though constraints on one give info about the other. By a set of “preferences” I really mean a set of like hypothetical choices you’d make, like “when given a choice between doing A and doing B I choose doing B”. This is a little different than what we usually mean, usually by preferences people mean preferences between outcomes, but this is preferences between acts.
So, the argument based on this theorem is something like
- One of two things is true by Savage’s theorem:
- the correct-upon-reflection preferences are EV
- the correct-upon-reflection preferences violate some constraint in R
- But surely, if we were reflecting and we noticed the violation of that constraint, we would alter our preferences to repair it, since violating that constraint is really irrational seeming
- Therefore the correct-upon-reflection preferences are EV
A description of the theorem: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.541&rep=rep1&type=pdf
Like Jaynes’s Cox’s theorem argument, there’s some tricky infinities stuff going on. The theorem only works if you have this sort of continuous space of possible acts, and your preference partial order defined for all of them. But of course there’s only a finite set of acts that we actually care about. This is analogous to how Jaynes’s argument requires a sort of continuum of “plausibilities”, all of which are possessed by some proposition, in order to derive with Cox’s theorem that the plausibilities are isomorphic to probabilities. When there’s only finitely many propositions, Halpern has constructed a counterexample to Cox’s theorem. @scientiststhesis argues that if you have a coin and believes about how any series of flips might come out, there’s your continuum of plausibilities (or not really continuum?, but there’s a plausibility between every pair of plausibilities, which might be enough for cox’s theorem, I’m not sure.) I would counterargue, but we don’t really care if we violate our rationality constraints for those, and the set of propositions we actually care about getting our beliefs right for is finite. But he may be right if we’re talking about rational-upon-reflection beliefs for all propositions, rather than just the ones we care about. In any case, to quote Halpern’s paper: “Another possibly interesting line of research is that of characterizing the functions that satisfy Cox’s assumptions. As the example given here shows, the class of such functions includes functions that are not isomorphic to any probability function. I conjecture that in fact it includes only functions that are in some sense ‘close’ to a function isomorphic to a probability distribution, although it is not clear exactly how ‘close should be defined (nor how interesting this class really is in practice)”. Well maybe not in practice, but in philosophy it’d be interesting, if having beliefs about lots of propositions means that the rational-upon-reflection beliefs must be close to obeying probability theory. So, something similar might be true for Savage’s theorem: maybe even with finite numbers of possible acts we need preferences between, having enough of these acts means that our preferences are… close, somehow, to EV?
Certainly, though, if you’re only going to make 1 choice, the argument doesn’t apply. So maybe the Savage’s theorem argument isn’t any more general than the averaging-out argument. Though perhaps it applies to cases where 1 choice outweighs all the rest in consequences? But perhaps not. One advantage it has, though, is that it applies when you’d only make 1 choice, but you don’t know which choice it’s going to be, so you have a rich set of preferences
Anyway, some of these constraints R imply these infinities, but what about the other constraints? Does violating them really make preferences irrational upon reflection? I don’t know, because I just haven’t examined them that closely.
AFAIK, Savage’s Theorem is only necessary if you want to infer EV from preferences alone, without even assuming the agent knows about or believes in any probabilities. If you already know the probabilities of various outcomes – which seems like the case @shlevy is interested in? – then all you need is the VNM theorem, which works with finitely many acts / outcomes.
The VNM axioms only say that there is some function whose expectation your preferences maximize. In @shlevy‘s example, preferring “don’t do A” to “do A” is consistent with maximizing the expectation of a function, just one that isn’t the same as the function the example refers to as “utils.”
(via raginrayguns)
