recognition stimulus input observer (information transmission channel) response response: which...

26
Recognition stimulus input Observer (information transmission channel) response • Response: which category the stimulus belongs to ? What is the “information value” of recognizing the category?

Upload: fay-stewart

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Recognition

stimulus input Observer(information transmission channel)

response

• Response:which category the stimulus belongs to ?

What is the “information value” of recognizing the category?

Page 2: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Informationarea reduced to 63/64 area reduced to 1/64

NOTHERENOTHERE

area reduced to 1/2

Page 3: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

• The amount of information gained by receiving the signal is proportional to ratio of these two areas

Prior information(possible space of signals)

Posterior(possible space after the signal is received)

The less likely the outcome, the more information is gained!

The information in a symbol s should be inversely proportional to the probability of the symbol p.

Page 4: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Also

a juggling machine, rocket-powered Frisbees, motorized Pogo sticks, a device that could solve the Rubik's Cube puzzle,…..

Basics of Information TheoryClaude Elwood Shannon (1916-2001)

• Observe output message• Try to make up the input message (gain new

information)

Page 5: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Measuring the information

1) Information must be positive i(p) > 0

2) Information from independent events

(i.e. when probabilities multiply)

must add i(p1p2) = i(p1) + i(p2)

Multiplication turns to addition

Is always positive (since p<1)€

i(p) ≈ log(1/ p) = −log(p)

information in an event

Page 6: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Then the probability of the message

P = p1Mp1 ⋅ p2

Mp2 ⋅⋅⋅⋅⋅ pnMpn =

M

(p1p1⋅p2

p2 ⋅⋅⋅pnpn )

The information (logarithm of the inverse probability)

in the whole message will be

log(1/P) = −M⋅ pii=1

n

∑ ⋅ log(pi)

and the information per character (entropy of the alphabet S)

H(s) = 1M log(1/P) = − pi

i=1

n

∑ ⋅ log(pi)

If the message consists of (very many) M characters from alphabet S,

which consists of n symbols, there will be about Mp1 of the first symbol,

Mp2 of the second, e.c.t.

Page 7: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

1 bit of information reduces the area of possible messages to half

When log2 , then entropy is in bits

Entropy

H(s) = − pii=1

n

∑ ⋅ log(pi)

Information gained

when deciding among N (equally likely) alternatives

Number of stimulus alternatives N

Number of Bits (log2N)

21=2 1

22 = 4 2

8 3

16 4

32 5

64 6

128 7

28 = 256 8

Page 8: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

H = − pii=1

n

∑ ⋅ log2(pi)experiments with two possible outcomes with probabilities p1 and p2

total probability must be 1, so p2=1- p1

H=-p1 log2 p1 – (1– p1) log2 (1-p1)

limp→ 0 p log2 p = 0

i.e. H=0 for p1=0 (the second outcome certain) or p1=1 (the first outcome certain)

for p1 = 0.5, p2=0.5

H=-0.5 log2 0.5 - 0.5 log2 0.5 = log2 0.5 = 1

Entropy H (information) is maximum when the outcome is the least predictable !

For a given alphabet S, the entropy (i.e. the information)

will be highest, when all symbols are equally likely

HMAX = − pii=1

n

∑ ⋅ log(pi) = 1/ni=1

n

∑ log(n) = logn

Page 9: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

1st or 2nd half ?

Equal prior probability of each category.

need 3 binary numbers (3 bits) to describe 23 = 8 categories

need more bits when dealing with symbols that are not all equally likely

5 bits

Page 10: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

The Bar Code

bits. 58.26loglog

6 digits'' ofnumber The

22

kH

k

code). robust'' less(but efficient More

!!!!! bits 532log'

ns.combinatio 322 becan there

,meaningful is )digits'(' fingers oforder theIf

2

5

H

k

Page 11: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

With no noise in the channel, p(xi|yi)=1 and p(xi,yj) = 0

p(x) p(y|x) p(y)

1

1

0

0

p(x1) p(y1)=p(x1)

p(x2) p(y2)=p(x2)

With noise, p(xi|yi)<1 and p(xi,yj) > 0

5/8

3/4

1/4

3/8

0.8 0.55

0.2 0.45

p(y1)=(5/8x0.8)+(1/4x0.2)=0.55p(y2)=(3/8x0.8)+(3/4x0.2)=0.45

transmitter(source) channel receiver

p(X) p(Y|X) p(Y)

noise

p(y1|x1)p(x1) p(y1)

p(x2) p(y2)

p(y2|x2)

p(y2|x1)p(y1|x2)

Two element (binary) channel

Information transfer through a communication channel

Page 12: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

p(y1|x1)p(x1) p(y1)

p(x2) p(y2)

p(y1|x1)

p(y||x1)

p(y1|x2)

Binary Channel

N11 N12 Nstim 1

N21 N22 Nstim 2

Nres 1 Nres 2 N

stimulus 1

stimulus 2

number ofresponses

response 1 response 2 number of stimuli

total numberof stimuli(or responses)

p( xj ) = Nstim j / N

joint probability that both xj and yk happen is p( xj,y k) = Njk / N

p( xj|yk ) = Njk / Nres k p( yk ) = Nres k / N

Page 13: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

y1 y2 yn total

x1 N11 N12 N1n Nstim 1

x2 N21 N22

xn Nn1 Nnn Nstim n

total N1row Nnrow N

calledstimulus

received response

Stimulus-Response Confusion Matrix

number of j-th stimuli Σk Njk=N stim j

number of k-th responses Σj Njk=N res k

number of called stimuli = number of responses = ΣkN res k = ΣjNstim j = N

probability of xjth symbolp( xj ) = Nstim j / N

joint probability that both xj and yk happenp( xj,y k) = Njk / N

conditional probability that xj was sent whenyk was received p( xj|yk ) = Njk / Nres k

probability of ykth symbolp( yk ) = Nres k / N

Page 14: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

entropy of the input

H(X) = − p(x jj=1

n

∑ )log2 p(x j )

entropy of the output

H(Y ) = − p(ykk=1

n

∑ )log2 p(yk )

joint entropy of the input and the output

H(X,Y ) = − p(x j ,ykk=1

n

∑ )log2 p(x j,yk )

j=1

n

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

This happens when the input and the output are independent (joint probabilities are given by products of the individual probabilities). There is no relation of the output to the input, i.e. no information transfer)

information transferred by the system

I (X|Y) = Hmax(X,Y)-H(X,Y)

Page 15: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

stim 1 stim 2

resp 1 10 0 10

resp 2 0 10 10

10 10 20

run experiment 20 timesget it always RIGHT

0.5 0

0 0.5

input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5

joint probabilities p(xj,yk)

joint entropy of the input given the output

H(X,Y ) = − p(x j ,ykj=1

n

∑ )log2 p(x j ,yk )

= 0.5 ×1+ 0 × 0 + 0.5 ×1+ 0 × 0 =1 bit

transferred information

I(X|Y)=Hmax(X,Y)-H(X,Y) =2-1=1 bit

0.25 0.25

0.25 0.25

probabilities ofindependent events

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

Page 16: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

stim 1 stim 2

resp 1 0 10 10

resp 2 10 0 10

10 10 20

run experiment 20 timesget it always WRONG

0 0.5

0.5 0

input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5

joint probabilities p(xj,yk)

joint entropy of the input given the output

H(X,Y ) = − p(x j ,ykj=1

n

∑ )log2 p(x j ,yk )

= 0 × 0 + 0.5 ×1+ 0 × 0 + 0.5 ×1 =1 bit

transferred informationI(X;Y)=Hmax(X,Y)-H(X,Y) =2-1=1 bit

0.25 0.25

0.25 0.25

probabilities ofindependent events

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

0.5log2 0.5

Page 17: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

stim 1 stim 2

resp 1 5 5 10

resp 2 5 5 10

10 10 20

run experiment 20 timesget it 10 times right and 10 times wrong

0.25 0.25

02.5 0.25

input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5

joint probabilities p(xj,yk)

joint entropy of the input given the output

H(X,Y ) = − p(x j ,ykj=1

n

∑ )log2 p(x j ,yk )

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

transferred informationI(X;Y)=Hmax(X,Y)-H(X,Y) =2-2=0 bit

0.25 0.25

0.25 0.25

probabilities ofindependent events

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

Page 18: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

response categories number of stimuli

stimulicategories

y1 y2 y3 y4 y5

x1 20 5 0 0 0 25

x2 5 15 5 0 0 25

x3 0 6 17 2 0 25

x4 0 0 5 12 8 25

x5 0 0 0 6 19 25

number of responses

25 26 27 20 27 125

Page 19: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

y1 y2 yn

x1 N11 N12 N1n

x2 N21 N22 N2n

xn Nn1 Nnn

Matrix of Joint Probabilities(stimulus-response matrix divided by total number of stimuli)

y1 y2 yn

x1 p(x1,y1) p(x1,y2) p(x1,yn)

x2 p(x2,y1) p(x2,y2) p(x2,yn)

xn p(xn,y1) p(xn,y2) p(xn,yn)

joint probabilitiesstimuli-responses

number of called stimuli=number of responses=N

p(xi,yj) = Nij/N

Page 20: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

responses number of

stimuli

probability of stimulus

stimuli y1 y2 y3 y4 y5

x1 20 5 0 0 0 25 25/125= 0.2

x2 5 15 5 0 0 25 25/125=0.2

x3 0 6 17 2 0 25 25/125=0.2

x4 0 0 5 12 8 25 25/125=0.2

x5 0 0 0 6 19 25 25/125=0.2

number of responses

25 26 27 20 27 125

probability of response

25/125= 0.2

26/125=0.208

27/125=0.216

20/125=0.16

27/125=0.216

stimulus/response confusion matrix

Page 21: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

y1 y2 y3 y4 y5

x1 20/125 =0.16

5/125=0.04

0 0 0

x2 5/125 =0.04

15/125=0.12

5/125=0.04

0 0

x3 0 6/125=0.048

17/125=0.136

2/125=0.016

0

x4 0 0 5/125=0.04

12/125=0.096

8/125=0.064

x5 0 0 0 6/125=0.048

19/125=0.152

matrix of joint probabilities p(xj,yk)

total number of stimuli (responses) N = 125joint probability p(x\xj,yk) = xiyj/N

joint entropy of the stimus/response system

H(X,Y ) = − p(x j ,ykk=1

n

∑j=1

n

∑ )log2 p(x j ,yk ) = 3.43 bits

Page 22: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

when xi and yj are independent events (i.e. output does not depend on input), the joint probability would be given by a product of probabilities of these independent events P(xi,yj) = p(xi) p(yj), and the entropy of the system would be maximum Hmax (the system would be entirely useless for transmission of the information, since its output would not depend on its input)

y1 y2 y3 y4 y5

x1 20/125 =0.16

5/125=0.04

0 0 0

x2 5/125 =0.04

15/125=0.12

5/125=0.04

0 0

x3 0 6/125=0.048

17/125=0.136

2/125=0.016

0

x4 0 0 5/125=0.04

12/125=0.096

8/125=0.064

x5 0 0 0 6/125=0.048

19/125=0.152

joint entropy of the stimus/response system

Hmax (X,Y ) = − P(x j ,ykk=1

n

∑j=1

n

∑ )log2 P(x j ,yk ) = 4.63 bits

Page 23: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

The information that is transmitted by the system is given by a difference between the maximum joint entropy of the matrix of independent events Hmax (X,Y) and the joint entropy of the real system (derived from the confusion matrix H(X,Y).

I(X;Y) =Hmax (X,Y) – X(X,Y) = 4.63 – 3.41 = 1.2 bits

Page 24: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Capacity of human channel for one-dimensional stimuli

Page 25: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

Magic number 7±2 (between 2-3 bits)(George Miller 1956)

Page 26: Recognition stimulus input Observer (information transmission channel) response Response: which category the stimulus belongs to ? What is the “information

• Human perception seems to distinguish only among 7 (plus or minus 2) different entities along one perceptual dimension

• To recognize more items– long training (musicians)– use more than one perceptual dimension (e.g.

pitch and loudness)– chunk the items into larger chunks (phonemes to

words, words to phrases,..)

Magic number 7±2 (between 2-3 bits)(George Miller 1956)