visual deep learning models,cs.wellesley.edu/~vision/slides/tommy_class.pdf · • human brain...

81
Center Name Presenter Name Visual deep learning models, in particular for face recognition and models of invariant recognition in the ventral stream Towards a theory of the above tomaso poggio, CBMM, BCS, CSAIL, McGovern MIT

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Center Name Presenter Name

Visual deep learning models, in particular for face recognition

and models of invariant recognition

in the ventral stream

Towards a theory of the above

tomaso poggio, CBMM, BCS, CSAIL, McGovern MIT

Page 2: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

Page 3: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Second  Annual  NSF  Site  Visit,  June  2  –  3,  2015

Theoretical/conceptual framework for vision

• The first 100ms of vision: feedforward and invariant: what, who, where

• Top-down needed for verification step and more complex questions: generative models, probabilistic inference, top-down visual routines.

Following this conceptual framework we are working on:

1.a theory of invariance cortical computation —> i-theory2.a generative approach, probabilistic in nature 3.visual routines, and of how they may be learned.

Page 4: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Object  recogni-on

Page 5: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

• Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses

Vision:  what  is  where  

• Ventral stream in rhesus monkey –~109 neurons in the ventral stream

(350 106 in each emisphere) –~15 106 neurons in AIT (Anterior

InferoTemporal) cortex

• ~200M in V1, ~200M in V2, 50M in V4

Van Essen & Anderson, 1990

Page 6: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Source: Lennie, Maunsell, Movshon

Vision:  what  is  where  

Page 7: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

[software available online]Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007

• It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959: qual. Fukushima, 1980: quant; Oram & Perrett, 1993: qual; Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998: not-bio; Amit & Mascaro, 2003: not-bio; Hinton, LeCun, Bengio not-bio; Deco & Rolls 2006…)

• As a biological model of object recognition in the ventral stream – from V1 to PFC -- it is perhaps the most quantitatively faithful to known neuroscience data

   Recogni-on  in  Visual  Cortex:  ‘’classical  model”,  selec-ve  and  invariant

Page 8: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Feedforward Models: “predict” rapid categorization (82% model vs. 80% humans)

Hierarchical feedforward models of the ventral stream

Page 9: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Why do these networks including DLCNs

work so well?

Models are not enough… we need a theory!

Page 10: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

Page 11: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

11

Page 12: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

12

Page 13: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

13

Page 14: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

14

Page 15: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

15

Page 16: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

16

Page 17: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

17

Page 18: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

18

Page 19: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

19

Page 20: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

20

Page 21: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

21

Page 22: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

22

Page 23: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

23

Page 24: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

24

Page 25: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

25

Page 26: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

26

Page 27: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

27

Page 28: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

28

Invariance via pooling

Page 29: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

29

Page 30: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

30

Page 31: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

31

Page 32: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

32

Page 33: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

33

Page 34: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

34

Page 35: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

35

Page 36: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

36

Page 37: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

37

Page 38: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

38

Page 39: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

39

Page 40: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

40

Page 41: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

41

Page 42: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

42

Page 43: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

43

Page 44: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

44

New name for virtual examples

Page 45: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

45

Page 46: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

46

A poor man regularization!

Page 47: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

47

Page 48: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

48

Page 49: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

49

Page 50: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

50

Page 51: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

51

Page 52: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

52

Page 53: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

53

Page 54: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

54

Page 55: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

55

Page 56: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

56

Page 57: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

57

Page 58: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

58

Page 59: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

59

Page 60: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

60

Page 61: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 62: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 63: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 64: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 65: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 66: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 67: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Mobileye

Page 68: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’
Page 69: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

Page 70: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Plan

• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory

Page 71: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

71

i-theoryLearning of invariant&selective Representations in Sensory Cortex

Page 72: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

i-theory: exploring a new hypothesis

A main computational goal of the feedforward ventral stream hierarchy — and of vision — is to compute a representation for each incoming image which is invariant to transformations previously experienced in the visual environment.

Page 73: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

73

Empirical  demonstraCon:  invariant  representaCon  leads  to  lower  sample  complexity  for  a  supervised  classifier

Theorem   (transla)on   case)  Consider   a   space   of   images   of  dimensions                            pixels  which   may   appear   in   any  posiCon   within   a   window   of  size                      pixels.  The  usual  image   representaCon   yields   a  sample  complexity  (  of  a  linear  c l a s s i fi e r )     o f  order                                ;the    oracle  representaCon     (invariant)  yields   (because   of   much  smaller   covering   numbers)   a    sample  complexity  of  order

d × d

rd × rd

m = O(r2d 2 )

moracle = O(d2 ) =

mimage

r2

Page 74: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

74

An algorithm that learns in an unsupervised way to compute invariant representations

ν

P(ν )

νµkn(I) = 1/|G|

|G|X

i=1

�(I · gitk + n�)

Page 75: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

75

Invariant signature from a single image of a new object

Page 76: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

We need only a finite number of projections, K, to distinguish among n images.

Similar in spirit to Johnson-Lindestrauss

 

 

 

Page 77: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Local and global invariance: whole-parts theorem

l=4

l=3

l=2

l=1HW module

Page 78: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

biophysics: prediction

Page 79: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

...

Basic machine: a HW module (dot products and histograms/moments for image seen through RF)

• The cumulative histogram (empirical cdf) can be be computed as

• This maps directly into a set of simple cells with threshold

• …and a complex cell indexed by n and k summating the simple cells

µnk (I ) = 1

|G |σ ( I ,git

k + nΔ)i=1

|G |

The nonlinearity can be rather arbitrary for invariance provided it is stationary in time

Page 80: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Second  Annual  NSF  Site  Visit,  June  2  –  3,  2015

Dendrites of a complex cells as simple cells…

Active properties in the dendrites of the complex cell

Page 81: Visual deep learning models,cs.wellesley.edu/~vision/slides/Tommy_class.pdf · • Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses Vision:’what’iswhere’

Plan

• i-theory • DCLNs • equivalence to DCLNs, theory notes • Some predictions of i-theory • Deep Face systems