prediction and change detection mark steyvers scott brown mike yi university of california, irvine...

40
Prediction and Change Detection Mark Steyvers Scott Brown Mike Yi University of California, Irvine This work is supported by a grant from the US Air Force Office of Scientific Research (AFOSR grant number FA9550-04-1-0317)

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Prediction and Change Detection

Mark Steyvers Scott Brown Mike Yi

University of California, Irvine

This work is supported by a grant from the US Air Force Office of Scientific Research (AFOSR grant number FA9550-04-1-0317)

Perception of Random Sequences

• People perceive too much structure:

– Coin tosses: Gambler’s fallacy

– Sports scoring sequence: Hot hand belief

• Sequences are (mostly) stationary but people perceive non-stationarity

Bias to detect too much change?

Our Approach

• Non-stationary random sequences – changes in parameters over time.

• How well can people make inferences about underlying changes?

• How well can people make predictions about future outcomes?

• Compare data to:

– Bayesian (ideal observer) models

– Descriptive models

Two Tasks

Inference taskwhat caused the latest

observation?

Observed Data

Internal State (Unobserved)

Future Data

Prediction taskwhat is the next most

likely outcome?

A B C D

trialA

A

A

A

A

B

B

B

D

D

D

D

D

D

A

A

Sequence Generation

• Start with one of four normal distributions

• Draw samples from this distribution

• With probability alpha, switch to a new generating distribution (uniformly chosen)

• Alpha determines number of change points

changepoints

Tomato Cans Experiment

• Cans roll out of pipes A, B, C, or D

• Machine perturbs position of cans (normal noise)

(real experiment has response buttons and is subject paced)

A B C D

Tomato Cans Experiment

(real experiment has response buttons and is subject paced)

A B C D • Cans roll out of pipes A, B, C, or D

• Machine perturbs position of cans (normal noise)

• Curtain obscures sequence of pipes

Tasks

A B C D• Inference:

what pipe produced the last can?

A, B, C, or D?

• Prediction: in what region will the next can arrive?

1, 2, 3, or 4?

1 2 3 4

Experiment 1

• 63 subjects

• 12 blocks

– 6 blocks of 50 trials for inference task

– 6 blocks of 50 trials for prediction task

– Identical trials for inference and prediction

• Alpha = 0.1

0 10 20 30 40 50 6030

40

50

60

70

80

90

100

% Changes (Subject)

Acc

ura

cy (

ag

ain

st T

rue)

SubjectIdeal

0 10 20 30 40 50 6030

40

50

60

70

80

90

100

% Changes (Subject)

Acc

ura

cy (

ag

ain

st T

rue)

SubjectIdeal

Accuracy vs. Number of Perceived Changes

ideal

INFERENCE PREDICTION

ideal

(Each dot is a subject)

INFERENCE PREDICTION

Sequence

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1

2

3

4

5

6

1 2 3 4 5 6A B C D

Trial

INFERENCE PREDICTION

Sequence

Ideal Observer

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1

2

3

4

5

6

1 2 3 4 5 6A B C D

Trial

INFERENCE PREDICTION

Sequence

Ideal Observer

Individualsubjects

Trial0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1

2

3

4

5

6

1 2 3 4 5 6A B C D

INFERENCE PREDICTION

Sequence

Ideal Observer

Trial0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1

2

3

4

5

6

1 2 3 4 5 6A B C D

Individualsubjects

INFERENCE PREDICTION

Sequence

Ideal Observer

Trial0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1

2

3

4

5

6

1 2 3 4 5 6A B C D

Individualsubjects

50

60

70

80

90

100low alpha

50

60

70

80

90

100

% A

ccu

racy

(ag

ain

st T

rue

)

med alpha

0 10 20 30 40 50 6050

60

70

80

90

100

% Changes

high alpha

Exp. 1b

• Alpha = .08, .16, .32

• 136 subjects

• Inference judgments only

Subjects track changes in alpha

ideal

ideal

ideal

ClosedOpen

Experiment 2: Plinko

(view full screen to see animation)

Familiarization Trials

• Input pipe changes at each trial with prob. alpha

Observed Distributions Match Theory

Emp. Theo. Emp. Theo. Emp. Theo. Emp. Theo.

78.9% 82.7%0.0% 0.1% 18.1% 17.2%

0.1%

C 0.0% 0.1% 14.5% 17.1% 68.9% 65.6% 21.1% 17.2%

0.0% 0.0%

B 17.9% 17.2% 65.9% 65.6% 13.0% 17.1% 0.0%

19.5% 17.2% 0.0% 0.1%

Out

pu

t B

inA 82.1% 82.7%

D 0.0% 0.0%

A B C D

Input Bin

Note: mode of output distribution centers on input bin

(view full screen to see animation)

Decision Phase

• Main phase of experiment uses closed device

• Inference taskwhich input pipe was used,

A, B, C, or D?

• Prediction task where will next ball arrive,A, B, C, or D?

Accuracy vs. Number of Perceived Changes

0 20 40 6030

40

50

60

70

80

90

100

% Changes

Acc

ura

cy (

ag

ain

st T

rue)

0 20 40 6030

40

50

60

70

80

90

100

% Changes

Acc

ura

cy (

ag

ain

st T

rue)

INFERENCE PREDICTION

44 subjects

Main Finding

• Ideal observer:

# changes in prediction = # changes in inference

• Subjects

# changes in prediction >> # changes in inference

• Explanation?

Variability Matching

• Example output sequence:

– A B A A B C

• Strategy: match the observed variability in prediction sequence

• Suboptimal! Part of the variability is due to noise that is useless for prediction

Conclusion

• Subjects are able to track changes in dynamic decision environments

• Individual differences

– Over-reaction: perceiving too much change

– Under-reaction: perceiving too little change

• More over-reaction in prediction task

Do the experiments yourself:

http://psiexp.ss.uci.edu

LEFT OVER SLIDES

Digital Plinko – open curtain

Digital Plinko – closed curtain

Analogy to Hot Hand Belief

• Inference task: does a player have a hot hand?

• Prediction task: will a player make the next shot?

Process Model

• Memory buffer for K samples

• Calculate prob. of new sample under normal distribution of buffer

• If prob. < τ,

– Assume a change

– Flush the buffer

– Put new sample in buffer

• Inference responses based on buffer mean

• Prediction responses are the same, except the model tries to anticipate changes by making a purely random response on some fraction X of trials

0 10 20 30 40 50 6030

40

50

60

70

80

90

100

Changes (%)

Acc

ura

cy (

aga

inst

tru

th)

Prediction

0 10 20 30 40 50 6030

40

50

60

70

80

90

100

Changes (%)

Acc

ura

cy (

aga

inst

tru

th)

Inference

model

subject

Sweeping Alpha and Sigma in Bayesian Model

INFERENCE PREDICTION

0 20 40 6030

40

50

60

70

80

90

100

% Changes (Subject)

Acc

ura

cy (

ag

ain

st T

rue)

SubjectIdeal

0 20 40 6030

40

50

60

70

80

90

100

% Changes (Subject)

Acc

ura

cy (

ag

ain

st T

rue)

SubjectIdeal

Optimal Prediction Strategy

• Best Prediction = Last Inference

Subject:inference: A A B B B D …prediction: … A B A B D C …

A A B B B D …

Using shifted inference judgments for prediction, 70% of subjects improve in prediction performance

Observed Data

Internal State(Unobserved)

Future Data

Prediction Infe

renc

e

Locus of Gambler’s

fallacy?

Generating Model

1x

1z

1y

tx

tz

ty

...

Change probability

Changepoints

Distribution parameters

Observed data

|tx Bernoulli

1 if 0|

Uniform({1,..,P}) if 1t t t

t tt

z z xz x

x

| ,t t ty z N z S

1tx

1tz

1ty

...

Bayesian Inference

Given observed sequence y, what are the latent states z and change points x?

Cannot calculate this complex posterior distribution. Use posterior simulation instead: MCMC with Gibbs sampling

1x

1z

1y

tx

tz

ty

...

1tx

1tz

1ty

...

' ' ' 'x',z'

P y | x,z P x,zP x,z | y =

P y | x ,z P x ,z

Gibbs Sampling

• Simulate the high-dimensional distribution by sampling on lower-dimensional subsets of variables where each subset is conditioned on the value of all others. The sampling is done sequentially and proceeds until the sampled values approximate the target distribution.

1x

1z

1y

tx

tz

ty

...

1tx

1tz

1ty

...

Use the subset {zt, xt, xt+1}

Why include xt+1? To preserve consistency. For example, suppose before sampling, zt+1 ≠ zt, and therefore xt+1 = 1. If the sample leads to zt = zt+1, then xt+1 needs to be updated.

Gibbs Sampling

• Assume α is a constant (for now)

• The set of variables {zt, xt, xt+1} is conditionally dependent only on these variables: {yt, zt-1, zt+1}

• Sample values {zt, xt, xt+1} from this distribution:

1 1 1

1 1 1 1

, , | , ,

| | , | ,

t t t t t t

t t t t t t t t t t

P z x x y z z

P y z P z x z P z x z P x P x

Gibbs Sampling

2 2

| t ty z St tP y z e For tomato cans experiment

For plinko experiment, look up from a table |t tP y z

1

1 1

0 0,

| , 1 0,

1/ 1

t t t

t t t t t t

t

x z z

P z x z x z z

P x

(P = number of input pipes)

1tP x

Plinko as a Hidden Markov Model

1y

Time

2y 3y Ty

Output pipe sequence

Start End

Example comparing HMM Viterbi algorithm to Gibbs sampling algorithm

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5Observed Ouput Sequence

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5True and HMM Inferred Input Sequence (Acc=87.000%)TRUE

INFERRED

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5True and Gibbs Inferred Input Sequence (Acc=89.000%)TRUE

INFERRED