Data Science and Decisions 2022: Week 3

DATA SCIENCE AND DECISION MAKING Of pro t and loss
Will Lowe Data Science Lab, Hertie School 2022-02-23

SCHEDULE 1 Quiz Utilities and decisions Decision theory for classi
cation in data science How to think about your future How to think about other people’s future How people actually think about the future Making decisions in groups eological case study Trouble in group decision making

UTILITY, THE VERY IDEA 2 Representation eorem (Von Neumann &
Morgenstern, ) If you have preferences over outcomes that are → complete → re exive → transitive then they can be represented numerically with a utility function: U(.) → In classical economics preferences are only observed as choices

Morgenstern, ) If you have preferences over outcomes that are → complete → re exive → transitive then they can be represented numerically with a utility function: U(.) → In classical economics preferences are only observed as choices “Don’t tell me what you value. Show me your budget and I’ll tell you what you value.” What you say you want is stated preference What a utility function would construct you as wanting is revealed preference → Better try to keep them similar!

Morgenstern, ) If you have preferences over outcomes that are → complete → re exive → transitive then they can be represented numerically with a utility function: U(.) → In classical economics preferences are only observed as choices “Don’t tell me what you value. Show me your budget and I’ll tell you what you value.” What you say you want is stated preference What a utility function would construct you as wanting is revealed preference → Better try to keep them similar! Corollary: → If your (algorithm, model, analytics department, decision support tool) also ‘makes choices’ then it has a utility function too → it can diverge from yours!

DATA SCIENCE: CLASSIFICATION 3 A ‘classi er’ is two things,
o en confused. In a simple two class classi cation → Estimating E(C X . . . XK) = P(C = X . . . XK) → Deciding that C= or C= in the light of P(C = X . . . XK)

DATA SCIENCE: CLASSIFICATION 3 A ‘classi er’ is two things,
o en confused. In a simple two class classi cation → Estimating E(C X . . . XK) = P(C = X . . . XK) → Deciding that C= or C= in the light of P(C = X . . . XK) It can seem natural to decide that C= if P(C = X . . . XK) > P(C = X . . . XK) P(C = X . . . XK) > . is is → the ‘loss function I didn’t realize I was using’ p(x|C1 ) p(x|C2 ) x class densities 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 x p(C1 |x) p(C2 |x) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 e P(C X) side of things

CLASSIFICATION ERRORS 4 p(x|C1 ) p(x|C2 ) x class densities
0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 x p(C1 |x) p(C2 |x) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 E ! When ˆ C ≠ C we are making an error We mistake a C = for C = with probability P( ˆ C = C = )

CLASSIFICATION ERRORS 4 p(x|C1 ) p(x|C2 ) x class densities
0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 x p(C1 |x) p(C2 |x) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 E ! When ˆ C ≠ C we are making an error We mistake a C = for C = with probability P( ˆ C = C = ) More generally there are two useful (and closely related) distributions (King & Lowe, ): P( ˆ C C) = P(C ˆ C)P( ˆ C) P(C) (recall) P(C ˆ C) = P( ˆ C C)P(C) P( ˆ C) (precision) that tell us about errors Tip: read C as “the true category” and ˆ C “what category the machine said it was”

SO. MANY. WORDS. FOR. THOSE. THINGS 5

BRINGING THE UTILITIES AND LOSSES BACK IN 6 However, it
is o en more costly to mistake a for a than a for a , e.g. → means a state will collapse in the next year (e.g. King & Zeng, ) → Intuitively we should require lower probability to choose when mistaking a for a is very costly

BRINGING THE UTILITIES AND LOSSES BACK IN 6 However, it
is o en more costly to mistake a for a than a for a , e.g. → means a state will collapse in the next year (e.g. King & Zeng, ) → Intuitively we should require lower probability to choose when mistaking a for a is very costly Optimal choice: ˆ C = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ if P(C = X . . . XK) > ( + α) otherwise where α = L for L for

UNKNOWN LOSSES, UNKNOWN TRADEOFFS 7 Sometimes we don’t have (or
can’t commit to) some loss matrix L or a prefered balance between precision and recall However, since each value of α implies such a loss / balance, we can ask how well a classi er does for all possible cuto s We plot precision and recall in a Receiver Operating Characteristic (ROC) curve for a wide range of cuto s Traditionally, ROC curves plot recall and -precision

ROC AND CALIBRATION 8

LOOKING FORWARD... 9 Apparently not...

TEMPORAL DISCOUNTING FOR REAL 10 How much are future (not
yet existing) lives (and their quality) worth relative to current ones? Two considerations: → e e ect of an action on the existing population (or its quality of life) t years in the future → e e ect on populations that would exist (or the quality of their lives) t years in the future, some of whom would not exist (or whose life quality would be di erent) depending on the action However you feel about discount rates → policy behaviour embodies an implicit rate

FUTURE UTILITY 11 Many (all) policy decision problems have future
consequences → How to weigh them relative to the present consequences?

FUTURE UTILITY 11 Many (all) policy decision problems have future
consequences → How to weigh them relative to the present consequences? Simple theory: → A reward r τ time steps into the future is worth U(r )D(τ) → where D is the discount function → in simple cases D(τ) = exp(−λτ) → where λ is a discount rate is works like a negative interest rate, δτ with ≤ δ ≤

SELF CONTROL 12 Mischel and Ebbesen ( ) gave ≈
children the following situation: → Favourite snack on a chair → Told they could eat it → But if they waited minutes then they could have a two Results → / delayed long enough to get the second → Age predicted ability to delay → follow up: “preschool children who delayed grati cation longer [...] were described [...] as adolescents who were signi cantly more competent” → : SAT scores predict ability Americans eat these things. Nobody really knows why

PUZZLE ABOUT DISCOUNTING 13 e kids aren’t special In fact,
as far as we are know, no animals discount exponentially → Why? Irrationality? Maybe. Maybe not. Let’s see what they do instead

HYPERBOLIC DISCOUNTING 14 Human discounting is usually not exponential but
hyperbolic and time sensitive Something more like ( + kτ) with unintuitive and apparently irrational consequences → preference reversal! For an interpretation in terms of ‘weakness of the will’ (Ainslie, )

ECOLOGICAL VALIDITY 15 One way to understand what may be
happening is to return to the last lecture on representing uncertainty Sozou ( ) argues → Exponential discounting implies a belief in a known and constant hazard rate λ → hazard rate: the probability that the future good fails to appear as expected at t + τ despite being there at t → So consider priors over λ Uncertainty about the future → uncertainty about the discount rate

ECOLOGICAL VALIDITY 16 And we know what to do with
uncertain quantities, right?

ECOLOGICAL VALIDITY 16 And we know what to do with
uncertain quantities, right? Average over it in our decision calculations e exponential prior generates discounting behaviour that is hyperbolic in exactly the functional form we saw before But a probability weighted average of these is not exponential!

ENVIRONMENTAL FACTORS 17 Self control with snack predicts SAT and
other positive outcomes! or Reliable parental environment makes it rational to expect rewards to come and make self-control a better idea

ENVIRONMENTAL FACTORS 17 Self control with snack predicts SAT and
other positive outcomes! or Reliable parental environment makes it rational to expect rewards to come and make self-control a better idea PS: Please don’t make graphs like this (it was , but still)

TEMPORAL DISCOUNTING 18 We need to weight the future somehow
→ ere are mathematical problems with putting positive weight on the in nite future → Rational choices for the discount rate seem...under-constrained → For personal utilities the problem can seem straightforward → For utilities that range over others (real or potential) things get...di cult → Unfortunately these are the important policy cases → Temporal discounting involves questions of policy, psychology, and to fairness Now, about all those people we were making decision about. Maybe they should be involved in the process?

GROUP DECISIONS 19 What is the ‘will’ of the people?
(government, company, organization) Operationally: How to aggregate individual preferences?

POPE CHOICE 20 From until the pope was chosen by
some combination of → God → a / vote majority among ≤ cardinals Pope selection was historically di cult and controversial → e conclave in produced a pope and an anti-pope → a cardinal grabbed the papal coat and tried to run o with it, a erwards reigning as Victor VI → Deadlock in Viterbo prompted the locals to remove the roof and put the cardinals on bread and water is hasn’t happened recently but... → in John Paul II changed the rules...

POPE SELECTION BIAS 21 Let’s take the case of Cardinal
Ratzinger a.k.a. Benedict XVI In John Paul II changed the rules (Apostolic Constitution, ‘Universi Dominici Gregis’) → a simple majority and the possibility of a runo vote

CARDINAL UTILITIES 22 Support and voting is secret. But we
can conjecture on the basis of → constituencies → ideology (Maltzman et al., )

DECISION TIME 23 wins for against margin M vs B
M + M vs R R + B vs M M + B vs R B + R vs B B + R vs M R + Martini beats Bergoglio, Bergoglio beats Ratzinger, Ratzinger beats Martini → ere is no Condorcet winner

CONDORCET PROBLEMS GENERALIZED 24 Complete and transitive preferences over ≥
options can lead to cycles in aggregate preferences → ere need be no overall winner using a majority voting rule I Any preference aggregation method ful lling → non-dictatorship: e wishes of multiple cardinals should be taken into consideration → unrestricted domain: Cardinals must be able to vote for any candidate → pareto optimality: If every cardinal prefers cardinal A over cardinal B, cardinal A should be the pope → independence of irrelevant alternatives: If a candidate is removed or drops out, then cardinal’s orderings don’t change can have cycles (Arrow, )

AVOIDING GRIDLOCK 25 John-Paul II adjusted the rules in to
→ Remove ’unrestricted domain’ by allowing a runo → Allow a simple majority a er rounds of voting In principle, doesn’t matter: → rational cardinals backwards induce from the rst simple majority round Anecdotally, with cardinals, and some abstentions → Round : Ratzinger gets votes → Round : Ratzinger gets votes (but no supermajority) → Round : and a clear winner

ARROW 26 Interpretation: → Cycles can be real → Cycles
can represent the lack of a unique behavioural prediction Note: → We are not guaranteed cycles. at depends on a lot of things, including the individual’s preferences who are being aggregated Most theorists get around this result by assuming more structure on preferences → i.e. structured utility functions e most natural form of structured utility function is spatial

IDEAL POINT STRUCTURE 27 Black ( ) showed that if
preferences were → single dimensional → single peaked then choices are well behaved, e.g. → Linear: − x − x∗ → Quadratic: −(x − x∗ ) → Gaussian: exp −(x−x∗) w Majority vote converges on the median voter and no cycles e further I get from the things that I care about, the less I care about how much further away I get. Robert Smith on Gaussian utility

IDEAL POINTS 28 Special case: → We can have preferences
on several dimensions at once In this (radially symmetrical) case, majority rule also converges on the median voter → yay! But not so fast. It doesn’t work without the radial structure (Plott, ) McKelvey ( ) showed that in general multi-dimensional settings there need be ‘no stable equilibrium’ at all. i.e. cycling

STRATEGY 29 e McKelvey’s ‘chaos theorem’ shows that by careful
choice of comparisons, a smart agenda controller can o er a series of votes that move naive spatial voters to support anything she wants ese seem to be rather dumb voters who have not taken a social choice course. Can we design a system where it’s best to be as honest as these folk?

INTERLUDE: SOPHISTICATION 30 Not usually Gibbard-Satterthwaite theorem (Gibbard, ; Satterthwaite,
) → For group of ≥ individuals and ≥ alternatives (with no constraints on preferences and a non-dictatorial aggregation function) there is always some set of preferences it’s worth lying about ere’s just no avoiding strategy... Riker ( ) noted that it was o en possible to add a dimension in order to manipulate the outcome. More generally ‘the heresthetic’ is a type of political strategy (Schonhardt-Bailey, )

GROUP DECISION MAKING 31 Being a rational group and making
rational group decisions may be harder than you think Lots of possibilities for making it easier → Restrict the choice domain: dictators, committees, departments, runo s → Induce structured utilities, e.g. elite leadership, psychological heuristic Shot: e concept of rational group preferences may just be a mistake (or an impossibility)? Chaser: You are o en, in a psychological sense, a group decision maker and may su er the same problems making decisions (Utility elicitation may be as hard as probability elicitation)

REFERENCES 32 Ainslie, G. ( ). “Breakdown of will.” Cambridge
University Press. Arrow, K. J. ( ). “Social choice and individual values.” John Wiley & Sons. Black, D. ( ). “On the rationale of group decision-making.” Journal of Political Economy, ( ), – . Gibbard, A. ( ). “Manipulation of voting schemes: A general result.” Econometrica, ( ), . King, G., & Lowe, W. ( ). “An automated information extraction tool for international con ict data with performance as good as human coders: A rare events evaluation design.” International Organization, ( ), – . King, G., & Zeng, L. ( ). “Logistic regression in rare events data.” ( ), – . Maltzman, F., Schwartzberg, M., & Sigelman, L. ( ). “Vox populi, vox dei, vox sagittae.” PS - Political Science & Politics, ( ), – . McKelvey, R. D. ( ). “Intransitivities in multidimensional voting models and some implications for agenda control.” Journal of Economic eory, ( ), – . Mischel, W., & Ebbesen, E. B. ( ). “Attention in delay of grati cation.” Journal of Personality and Social Psychology, ( ), – . Plott, C. R. ( ). “A notion of equilibrium and its possibility under majority rule.” e American Economic Review, ( ), – .

REFERENCES 33 Riker, W. H. ( ). “ e art
of political manipulation.” Yale University Press. Satterthwaite, M. A. ( ). “Strategy-proofness and Arrow’s conditions: Existence and correspondence theorems for voting procedures and social welfare functions.” Journal of Economic eory, ( ), – . Schonhardt-Bailey, C. ( ). “From the corn laws to free trade: Interests, ideas, and institutions in historical perspective.” MIT Press. Sozou, P. D. ( ). “On hyperbolic discounting and uncertain hazard rates.” Proceedings of the Royal Society of London. Series B: Biological Sciences, ( ), – . Von Neumann, J., & Morgenstern, O. ( ). “ eory of games and economic behavior.” Princeton University Press.

Data Science and Decisions 2022: Week 3

Data Science and Decisions 2022: Week 3

More Decks by Will Lowe

Featured

Transcript