math 6330: statistical consulting class 8 - cox associatescox-associates.com/6330/class8.pdf ·...

Post on 02-Jun-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Math 6330: Statistical ConsultingClass 8

Tony Coxtcoxdenver@aol.com

University of Colorado at Denver

Course web site: http://cox-associates.com/6330/

Agenda• Projects and schedule

• Prescriptive (decision) analytics (Cont.)– Decision trees

– Simulation-optimization

– Newsvendor problem and applications

– Decision rules, optimal statistical decisions

– Quality control, SPRT

• Evaluation analytics

• Learning analytics

• Decision psychology– Heuristics and biases

2

Recommended readings

• Charniak (1991) (rest of paper)– Build the network in Figure 2 www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836

• Pearl (2009) http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf

• Methods to Accelerate the Learning of Bayesian Network Structures, Daly and Shen (2007) https://pdfs.semanticscholar.org/e7d3/029e84a1775bb12e7e67541beaf2367f7a88.pdf

• Distinguishing cause from effect using observational data (Mooij et al., 2016), www.jmlr.org/papers/volume17/14-518/14-518.pdf

• Probabilistic computational causal discovery for systems biology (Lagani et al., 2016) www.mensxmachina.org/files/publications/Probabilistic%20Causal%20Discovery%20for%20Systems%20Biology_prePrint.pdf

3

Projects

4

Papers and projects: 3 types

• Applied: Analyze an application (description, prediction, causal analysis, decision, evaluation, learning) using high-value statistical consulting methods

• Research/develop software– R packages, algorithms, CAT modules, etc.

• Research/review book or papers (3-5 articles)– Explain a topic within statistical consulting– Examples: Netica’s Bayesian inference algorithms,

multicriteria decision-making, machine learning algorithms, etc.

5

Projects (cont.)

• Typical report paper is about 10-20 pages, font 12, space 1.5. (This is typical, not required)

• Content matters; length does not

• Typical in-class presentation is 20-30 minutes– Can run longer if needed

• Purposes: 1. Learn something interesting and useful;

2. Either explain/show what you learned, or show how to use it in practice (or both)

6

Project proposals due March 17

• If you have not yet done so, please send me a succinct description of what you want to do (and perhaps what you hope to learn by doing it).– Problem to be addressed– Methods to be researched/applied– Hoped-for results

• Due by end of day on Friday, March 17th (though sooner is welcome)

• Key dates: April 14 for rough draft (or very good outline)

• Start in-class presentations/discussions April 18• May 4, 8:00 PM for final

7

Course schedule

• March 14: No class. (Work on project idea)

• March 17: Project/paper proposals due

• March 21: No class (Spring break)

• April 14: Draft of project/term paper due

• April 18, 25, May 2, (May 9): In-class presentations

• May 4: Final project/paper due by 8:00 PM

8

Prescriptive analytics (cont.)

9

Algorithms for optimizing actions

• Decision analysis framework: Choose act a from choice set A to maximize expected utility of consequence c, given a causal model c(a, s), Pr(s) or Pr(c | a, s), Pr(s)– s = state = random variable = things that affect c other than

the choice of act a

• Influence diagram algorithms– Learning ID structure from data – Validating causal mechanisms– Using for inference and recommendations

• Simulation-optimization• Robust optimization• Adaptive optimization/learning algorithms

10

Prescriptive analytics methods

• Optimization– Decision trees, – Stochastic dynamic programming, optimal control– Gittins indices– Reinforcement learning (RL) algorithms

• Influence diagram solution algorithms• Simulation-optimization• Adaptive learning and optimization

– EVOP (Evolutionary operations)– Multi-arm bandit problems, UCL strategies

11

Decision tree ingredients

• Three types of nodes

– Choice nodes (squares)

– Chance nodes (circles)

– Terminal nodes / value nodes

• Arcs show how decisions and chance events can unfold over time

– Uncertainties are resolved as time passes and choices are made

12

Solving decision trees

• “Backward induction”

• “Stochastic dynamic programming”

– “Average out and roll back” implicitly, tree determines Pr(c | a)

• Procedure:

– Start at tips of tree, work backward

– Compute expected value at each chance node

• “Averaging out”

– Choose maximum expected value at each choice node

13

Obtaining Pr(s) from Decision treeshttp://www.eogogics.com/talkgogics/tutorials/decision-tree

Decision 1: Develop or Do Not DevelopDevelopment Successful + Development Unsuccessful(70% X $172,000) + (30% x (- $500,000))$120,400 + (-$150,000)

Obtaining Pr(s) from Decision treeshttp://www.eogogics.com/talkgogics/tutorials/decision-tree

Decision 1: Develop or Do Not DevelopDevelopment Successful + Development Unsuccessful(70% X $172,000) + (30% x (- $500,000))$120,400 + (-$150,000)

What happened to act a and state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

Decision 1: Develop or Do Not DevelopDevelopment Successful + Development Unsuccessful(70% X $172,000) + (30% x (- $500,000))$120,400 + (-$150,000)

What happened to act a and state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

Decision 1: Develop or Do Not DevelopDevelopment Successful + Development Unsuccessful(70% X $172,000) + (30% x (- $500,000))$120,400 + (-$150,000)

What happened to act a and state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

What are the 3 possible acts in this tree?

What happened to act a and state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

What are the 3 possible acts in this tree?

(a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful.

What happened to act a and state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

What are the 3 possible acts in this tree?

(a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful.

Optimize decisions!

Key points

• Solving decision trees (with decisions) requires embedded optimization – Make future decisions optimally, given the

information available when they are made

• Event trees = decision trees with no decisions – Can be solved, to find outcome probabilities, by

forward Monte-Carlo simulation, or by multiplication and addition

• In general, sequential decision-making cannot be modeled well using event trees.– Must include (optimal choice | information)

What happened to state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

What are the 4 possible states?

What happened to state s?http://www.eogogics.com/talkgogics/tutorials/decision-tree

What are the 4 possible states?

C1 can succeed or not; C2 can be high or low demand

Key theoretical insight• A complex decision model can be viewed as a (possibly

large) simple Pr(c | a) model.

– s = selection of branch at each chance node

– a = selection of branch at each choice node

– c = outcome at terminal node for (a, s)

– Pr(c | a) = sPr(c | a, s)*Pr(s)

• Other complex decision models can also be interpreted as c(a, s), Pr(c | a, s), or Pr(c |a) models

– s = system state & information signal

– a = decision rule (information act)

– c may include changes in s and in possible a.

Real decision trees can quickly become “bushy messes” (Raiffa, 1968) with many duplicated sub-

trees

D1

Tra

ck Im

po

rtsD

on’t T

rack Im

po

rts

No BSE

BSE in CA

BSE in US

No BSE

BSE in CA

BSE in US from US

BSE in US from CA

Test All

Repeat Test

Test All

Repeat Test

Test CA only

Test All

Test All

Test All

Test CA only

Test CA only

Test CA only

Repeat Test

Repeat Test

Repeat Test

Test All

Test All

Repeat Test

Repeat Test

A

A

A

A

BSE in CA

BSE in US from US

BSE in US from CA

No BSEA

BSE in CA

BSE in US from US

BSE in US from CA

No BSEA

B

B

B

B

No BSE

BSE in CA

BSE in US from CA

B No BSE

BSE in CA

BSE in US from CA

B

C

C

C

Y2|d1,d2

No BSE

BSE in US

BSE in CA

CNo BSE

BSE in US

BSE in CA

C

D2|d1Y1|d1

A

A

A

A

C

C

C

Influence Diagrams help to avoid large treeshttp://en.wikipedia.org/wiki/Decision_tree

Often much more compact than decision trees

Limitations of decision trees

• Combinatorial explosion

– Example: Searching for a prize in one of N boxes or locations involves building a tree of depth N! = N(N – 1)…*2*1.

• Infinite trees

– Continuous variables

– When to stop growing a tree?

• How to evaluate utilities and probabilities?

29

Optimization formulations of decision problems

• Example: Prize is in location j with prior probability p(j), j = 1, 2, …, N

• It costs c(j) to inspect location j

• What search strategy minimizes expected cost of finding prize?

– What is a strategy? Order in which to inspect

– How many are there? N!

30

With two locations, 1 and 2

Strategy 1: Inspect 1, then 2 if needed: – Expected cost: c1 + (1 – p1)c2 = c1 + c2 – p1c2

Strategy 2: Inspect 2, then 1 if needed: – Expected cost: c2 + (1 – p2)c1 = c1 + c2 – p2c1

Strategy 1 has lower expected cost if:

• p1c2 > p2c1, or p1/c1 > p2/c2

• So, look first at location with highest success probability per unit cost

31

With N locations

• Optimal decision rule: Always inspect next the (as-yet uninspected) location with the greatest success probability-to-cost ratio

– Example of an “index policy,” “Gittins index”

– If M players take turns, competing to find prize, each should still use this rule.

• A decision table or tree can be unwieldy even for such simple optimization problems

32

Other optimization formulations

• maxa A EU(a) – Typically, a is a vector, A is the feasible set– More generally, a is a strategy/policy/decision rule, A

is the choice set of feasible strategies– In previous example, A = set of permutations

• maxa A EU(a) s.t. EU(a) = ∑cPr(c | a)u(c)

Pr(c | a) = ∑sPr(c | a, s)p(s)g(a) ≤ 0 (feasible set, A)

33

Introduction to evaluation analytics

34

Evaluation analytics: How well are policies working?

• Algorithms for evaluating effects of actions, events, conditions – Intervention analysis/interrupted time series

• Key idea: Compare predicted outcomes with no action to observed outcomes with it

– Counterfactual causal analysis

– Google’s new CausalImpact algorithm

• Quasi-experimental designs and analysis– Refute non-causal explanations for data

– Compare to control groups to estimate effects

35

How did U.K. National Institute for Health and Clinical Excellence (NICE) recommendation of complete cessation of antibiotic prophylaxis for prevention of infective endocarditis in March, 2008 affect incidence of infective endocarditis?

36www.thelancet.com/journals/lancet/article/PIIS0140-6736(14)62007-9/fulltext?rss=yes

Nonlinear models complicate inference of intervention effects

38

Solution: Non-parametric models, gradient boosting

Algorithms for evaluating effects of combinations of factors

• Classification trees– Boosted trees, Random Forest, MARS

• Bayesian Network algorithms– Discovery

• Conditional independence tests

– Validation

– Inference and explanation

• Response surface algorithms– Adaptive learning, design of experiments

40

Learning analytics• Learn to predict better

– Create ensemble of models, algorithms• Use multiple machine learning algorithms

– Logistic regression, Random Forest, SVM, ANN, deep learning, gradient boosting, KNN, lasso, etc.

– “Stack” models (hybridize multiple predictions)• Cross-validation assesses model performance

– Meta-learner combines performance-weighted predictors to produce an improved predictor

• Theoretical guarantees, practical successes (Kaggle competitions)

• Learn to decide better– Low-regret learning of decision rules

• Theoretical guarantees (MDPs)

• practical performance

41http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html

42http://groups.inf.ed.ac.uk/agents/index.php/Main/Projects

Collaborative risk analytics: Multiple interacting learning agents

Collaborative risk analytics

• Global performance metrics

• Local information, control, tasks, priorities, rewards

– Hierarchical distributed control

– Collaborative sensing, filtering, deliberation, and decision-control networks of agents• Mixed human and machine agents

• Autonomous agents vs. intelligent assistants

43http://www.cities.io/news/page/3/

Collaborative risk analytics: Games as labs for distributed AI

• Local information, control, tasks, priorities– Hierarchical distributed control

– Collaborative sensing, deliberation, control networks

• From decentralized agents to effective risk analytics teams and HCI support– Trust, reputation, performance

– Sharing information, attention, control, evaluation, learning

44http://people.idsia.ch/~juergen/learningrobots.html

Risk analytics toolkit: Summary

1. Descriptive analytics– Change-point analysis, likelihood ratio CPA

– Machine learning , response surfaces ML (LR, RF, GBM, SVM, ANN, KNN, etc.)

2. Predictive analytics– Bayesian networks, dynamic BN BN, DBN

– Bayesian model averaging BMA, ML

3. Causal analytics & principles– Causal BNs, systems dynamics (SD) DAGs, SD simulation

– Time series causation

4. Prescriptive analytics: IDs, simulation-optimization, robust

5. Evaluation analytics: QE, credit assignment, attribution

6. Learning analytics– Machine learning, superlearning ML

– Low-regret learning of decision rules Collaborative learning

45

Applied risk analytics toolkit: Toward more practical analytics

Reorientation: From solving well-posed problems to

discovering how to act more effectively

1. Descriptive analytics: What’s happening?

2. Predictive analytics: What’s coming next?

3. Causal analytics: What can we do about it?

4. Prescriptive analytics: What should we do?

5. Evaluation analytics: How well is it working?

6. Learning analytics: How to do better?

7. Collaboration: How to do better together?

46

top related