why simple organisms can cope with complex environments? why simple organisms can cope with complex...

Why simple organisms can copewith complex environments?

NIPS 2009 WorkshopThe curse of dimensionality – how can the brain solve it?

Naftali Tishby

Interdisciplinary Center for Neural ComputationHebrew University, Jerusalem, Israel

Outline• Is the RL “curse of dimensionality” real ?

…the “number of parameters” debate revisited … ?

• The Brain’s primary task: making valuable predictions – The perception-action cycle of information– Optimal solution: the Past-Future Information Bottleneck

• Predictive information is rare – Only a tiny fraction of the world’s complexity is relevant– How difficult it is to extract it?

• The brain’s complexity reflects behavior (not the world)– New bounds on predictive representation complexity – Information bounded Reinforcement Learning– Robustness and generalization theorems

The brain’s primary task is making valuable predictions

Perception is goal oriented directed by active predictions

Hierarchies and reverse hierarchies

Tsostos 1990; Hochstein and Ahissar 2002

The auditory pathways

Feedback

reverse

hierarchy

Feed-fo

rward hierarch

Low level representations are sensitive to fine

temporal cues, in a μs resolution

Phonological/semantic level

……

day bay

nightdream

Initial perception is based on high-level,

phonological representations

Nelken et al, 2005

Perception-Action Cycles

Multiple cycles with Multiple time scales!

The Perception-Action Cycle

The circular flow of information that takes place between the organism and its environment in the course of a sensory-guided sequence of behavior towards a goal. (JM Fuster)

Why Predictability? Life is all about making good

predictions…

The essence of the cycle

Sensing costs

Prediction value

Internal Representations

The Environment: stationary stochastic process

PAST FUTURE

InternalRepresentation

PAST FUTURE

(Optimal) Internal Representationswe like to think probabilistically

XTP | TYP |

• Environment: P(X,Y)

• Internal representation: P(T|X), P(Y|T)

XTI ; YTI ;

• Environment: I(X;Y) – predictive information

• Internal representation: I(T;X) , I(T;Y) - compression & prediction

(Optimal) Internal Representationsand we want a computational principle…

XTI ; YTI ;

Model Quantifiers:

• Complexity (“cost”): I (T;X)

• Predictive Info (“value”): I(T;Y)

Optimality Trade-off:

• minimize complexity

• maximize predictive-info

past future

(Optimal) Internal Representationsand a computational principle…

• Environment: I(X;Y) – predictive information

• Internal representation: I(T;X) , I(T;Y) - compression & prediction

A simple illustration

18,18,...,2,1

YXP ,2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14 16 180

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14 16 180

0 1 2 3 40

I(T;X)

Info Curve

P(T|X)

2 4 6 8 10 12 14 16 18

182 4 6 8 10 12 14 16 18

Predictions

XHXTIXT ;,P

(most complex) (perfect copy) (perfect predictions)

0 1 2 3 40

I(T;X)

Info Curve

P(T|X)

2 4 6 8 10 12 14 16 18

182 4 6 8 10 12 14 16 18

Predictions

bit3; XTIP

0 1 2 3 40

I(T;X)

Info Curve

P(T|X)

2 4 6 8 10 12 14 16 18

182 4 6 8 10 12 14 16 18

Predictions

bit2; XTIP

0 1 2 3 40

I(T;X)

Info Curve

P(T|X)

2 4 6 8 10 12 14 16 18

182 4 6 8 10 12 14 16 18

Predictions

bit1; XTIP

0 1 2 3 40

I(T;X)

Info Curve

P(T|X)

2 4 6 8 10 12 14 16 18

182 4 6 8 10 12 14 16 18

Predictions

bit5.0; XTIP

0 1 2 3 40

I(T;X)

Info Curve

P(T|X)

2 4 6 8 10 12 14 16 18

182 4 6 8 10 12 14 16 18

Predictions

bit0; XTIP

How much of the past The brain really needs?

Predictive Information: The Capacity of the Future-Past

Channel(with Bialek and Nemenman, 2001)

– Estimate PT(W(-),W(+)) : T- past-future distribution

past futureW(-)- T-window

W(+)- T-window

( | )[ ] log

( )past future

T Tfuture past

pred Tfuture p W W

p W WI T

Entropy of words in a Spin Chain

02 )(log)()(

kKNKN WPWPNS

Entropy of spin Chains

total.spins 10 · 1

spins 400000every

1 (0,Νfrom

randomat taken is J •

spins 400000every

1) N(0, from randomat

takenis J , J J •

01ji,0ij

1ji,ij

Entropy is Extensive : it shows No distinction between the cases!

Predictive Information –Subextensive Component of the

Entropy

shows a qualitativ

e distinctio

n between the cases!

Subextensive component

growth is reflecting

the underlying complexity!

Logarithmic growth for finite dimensional processes

• Finite parameter processes (e.g. Markov chains)

• Similar to stochastic complexity (MDL)

dim( )( ) log

2predI T T

Power law growth

• Fast growth is a signature of infinite dimensional processes (e.g. speech)

• Power laws appear in cases where the interactions/correlations have long range.

( ) 1predI T T

Efficient predictors: Prediction Suffix Trees

deep sparse trees do better than full trees

[Ron, Singer, Tishby, 1994,95]

– Most of the past is irrelevant for the future!

– The “relevant components” can be extracted efficiently from small samples (typically),

much smaller than required for reliable Entropy estimation!

But WHAT - in the past - is predictive ?

past futureW(-)- T-window

W(+)- T-window

How much information is needed

for valuable behavior?

Bellman meets Shannon

Richard Ernest Bellman (August 26, 1920 – March 19, 1984)

Claude Elwood Shannon (April 30, 1916 – February 24, 2001)

Value and Information parallels …

asstttt

Passps

sasAat

),|( with :statenext resulting

:reward resulting gets

)|( with )( : stepat action produces

: stepat state TRUE observesAgent

,2,1,0 :steps at timeinteract t environmen andAgent

and ,given DPby )(for solved

)()|()(

sVRPsasV

valuefor equation Bellman

Agent:Internal

knowledge

Environment:complex external

states

Action/sensing at

information gain It

“State of Knowledge“

on goal gsimplex

estimation of p(g|st)

sasAat

sgpssp

with :statenext worldresulting

:gainn informatio teget/estima

)ˆ|( with )( : stepat action produces

)ˆ|( ),|ˆ(by zedcharacteri

,ˆˆ : state internalan infer estimates/

,2,1,0 :steps at timet environmen with interacts

variablegoal a hasAgent

inference prob. and DP using for solved

);ˆ()ˆ|();ˆ(

gsIIPsagsI

for equation Bellman

Combining (future) Value and Information

In cases where information is free, we can maximize value

irrespective of its information cost.

In gene

to reduce decision comple

ral, however, we want

(1) (get home in the simplest way)

(2) maxi

increase robustness to model f

(e.g. with the coins)

All three can be obtained by co

uations

the Inf

e enviornment information

ormation and Value

equati

Trading Value and (future) Information

1( , , ,

,1 11 1

1 2, |

( | , )log

( ) (( )

( | , )log

( | )log

,..., then:

t t t tt t a

t p s a s

t t t t

p s a s a

p s s s a

p s p s

p s s a

We want: arg min arg min

( | , )

( , ) ( , ) ( , , )

( | , ) (,

( , | , )

t t t t t

s t t t t

t t t t

p s s a

s a Q s a F s

p s s a sQ

1 1 11

1 11 1

1 1( | )

1| , ) 1

1( | ,

1 1( | )

( | , )log ( )

( | )log

( |log

lo( ) )

t t t t

tt tt t t

t t ts a t

t tp s s a

s s st

at ta s s s

p s s aV s

p s s a

1 11 1

1 1( | , ) 1

1( | )

( | , ), , log ,

( )( ) (

log ,( )

t t t tt t t

t t t tt p s s a t

s s a s tt

p s s a sa

Es sp s

Information bounded RL

,define

the "optimal" (reward as ) transition probabilities

( '): ( ' | , )

and the state prior,

sufficient statist

exp , , )

exp , is the local pa

rtition) ')

p sq s s a q

function.

Then the state-action free energy, Bellman equation is

( | )lo

( , ) ),

( | , )log

( | , ),

:( , )

( , )g

( )( |( )| , )tt

t t ts t t

ta tt t

F s s a

a Q s a

p s s a

aa s s aa p s s as s

1 1, (

the desired optimal policy is (somewhat surprisin

( )( | ) exp(

) log , , )

( ) exp( ,

( ) ( |

These 3 equatio

ns should be iter

, ( ))

aa s a

Z s a a

ated till convergence for every state (like Blahut Arimoto).

Biological evidence?

Auditory cortex encodes surprise

(with Eli Nelken and Jonathan Rubin)

The predictive bottleneck

0 0.33 0.67 1 1.33 1.67 2 2.330

Model Complexity (bits)

0 1 2 3 4 5

123456

0 1 2 3 4 5

123456

0 1 23 4 5

123456

0 1 2 3 4 5

Information curve showing the optimal predictive information (surprise) as a function of the complexity of the internal model (memory bits) for the next-tone prediction of oddball sequences using a memory duration of 5 tones back.

Left: scatter plots of the neural responses to either ‘A’ (blue) or ‘B’ (red) and the surprise values calculated for a specific model. Dots mark the mean response at a given surprise level, and the error-bars represent 25 and 75 percentile of the data. Right: (1) PSTH for stimulus ‘A’, each row is the averaged PSTH corresponding to a single point in the scatter-plot, sorted from low to high surprise level. (2) PSTH for stimulus ‘B’. (3) Correlations for ‘A’ (as explained before). (4) Correlations for ‘B’.

The PSTH plots help to see what part of signal is correlated with the surprise. For instance the onset seems pretty constant (and absent in the responses to ‘B’), where the sustained part seems to be very correlated with the surprise.

Conclusions

- Prediction complexity – is governed by the “predictive information” of the environment – NOT by its complexity (Entropy). The predictive information is a tiny (exp. small) fraction of the full Entropy of the environment.

- The brain can extract/learn efficient (good enough) predictors

from small samples. No need to capture the full complexity of the world. - There is accumulating experimental evidence that the

brain represents predictive information (surprises). - This view is in full agreement with the top-down

(reverse hierarchy) models of perception and attention.

- Bellman’s “curse of dimensionality” is avoided (not solved) by the brain because the brain’s main task is making predictions, not modeling the world.

why simple organisms can cope with complex environments? why simple organisms can cope with complex...

y internal representation

natural environment

circular flow of information

models of perception

brains complexity

optimal adaptation

good predictions

active predictions hierarchies

Documents

ecosystems & communities: organisms and their environments

care of people with dementia in their environments (cope

special terms involved stimuli: changes in internal and...

1 bioenergetics. 2 what is bioenergetics? energyliving...

ecology = the study of the organisms and their environments

hydrodynamics how organisms cope with the forces imposed on...

multicellular organisms live in & get energy from a variety...

organisms & their environments

educator’s guide...ecosystems diversity and adaptations of...

hydrocarbon pollution: effects on living organisms,...

palaeobiology: how extreme environments drive evolutionary...

1 interactions: environments and organisms chapter 5

organisms and their environments life science...

study of interactions between organisms and their...

ecosystems organisms and environments lesson 1 teks 5.9a,...

organisms and their environments life science sensory organs

ecosystems & communities: organisms and their environments...

organisms and their environments

basic concepts and principles of stoichiometric modeling...

big idea: all organisms interact with living and nonliving...