what can computational models tell us about face processing?

1

What can computational models tell us about face processing?

Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD

Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang

2




3




4




5




6




IEEE Computational Intelligence Society

4/12/2006

7/100

And now for something completely different…

• The CIS goal is to “mimic nature for problem solving”

• My goal is to mimic nature in order to understand nature

• In fact, as a cognitive scientist, I am glad when my models make the same mistakes people do…

• Because that means the model is fitting the data better -- so maybe I have a better model!

• So - don’t look for a better problem solver here…hopefully, look for some insights into how people process faces.


4/12/2006

8/100

Why use models to understand thought?

• Models rush in where theories fear to tread.

• Models can be manipulated in ways people cannot

• Models can be analyzed in ways people cannot.


4/12/2006

9/100

Models rush in where theories fear to tread

• Theories are high level descriptions of the processes underlying behavior.• They are often not explicit about the processes involved.• They are difficult to reason about if no mechanisms are

explicit -- they may be too high level to make explicit predictions.

• Theory formation itself is difficult.• Using machine learning techniques, one can often build

a working modelworking model of a task for which we have no theories or algorithms (e.g., expression recognition).

• A working model provides an “intuition pump” for how things might work, especially if they are “neurally plausible” (e.g., development of face processing - Dailey and Cottrell).

• A working model may make unexpected predictions (e.g., the Interactive Activation Model and SLNT).


4/12/2006

10/100

Models can be manipulated in ways people cannot

• We can see the effects of variations in cortical architecture (e.g., split (hemispheric) vs. non-split models (Shillcock and Monaghan word perception model)).

• We can see the effects of variations in processing resources (e.g., variations in number of hidden units in Plaut et al. models).

• We can see the effects of variations in environment (e.g., what if our parents were cans, cups or books instead of humans? I.e., is there something special about face expertise versus visual expertise in general? (Sugimoto and Cottrell, Joyce and Cottrell)).

• We can see variations in behavior due to different kinds of brain damage within a single “brain” (e.g. Juola and Plunkett, Hinton and Shallice).


4/12/2006

11/100

Models can be analyzed in ways people cannot

In the following, I specifically refer to neural network models.

• We can do single unit recordings.

• We can selectively ablate and restore parts of the network, even down to the single unit level, to assess the contribution to processing.

• We can measure the individual connections -- e.g., the receptive and projective fields of a unit.

• We can measure responses at different layers of processing (e.g., which level accounts for a particular judgment: perceptual, object, or categorization? (Dailey et al. J Cog Neuro 2002).


4/12/2006

12/100

How (I like) to build Cognitive Models

• I like to be able to relate them to the brain, so “neurally plausible” models are preferred -- neural nets.

• The model should be a working model of the actual task, rather than a cartoon version of it.

• Of course, the model should nevertheless be simplifying (i.e. it should be constrained to the essential features of the problem at hand):• Do we really need to model the (supposed) translation invariance and size invariance of biological perception?

• As far as I can tell, NO!

• Then, take the model “as is” and fit the experimental data: 0 fitting parameters is preferred over 1, 2 , or 3.


4/12/2006

13/100

The other way (I like) to build Cognitive Models

• Same as above, except:• Use them as exploratory models -- in domains where there is little direct data (e.g. no single cell recordings in infants or undergraduates) to suggest what we might find if we could get the data. These can then serve as “intuition pumps.”

• Examples: • Why we might get specialized face processors• Why those face processors get recruited for other tasks


4/12/2006

14/100

Outline• Review of our model of face and object processing

• Some insights from modeling:

• What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?

• Why would a face area process BMW’s?

• Some new directions:• How do we select where to look next?• How is information integrated across saccades?


4/12/2006

15/100







4/12/2006

16/100

The Face Processing SystemThe Face Processing System

PCA

.

.

.

.

.

.

GaborFiltering

HappySadAfraidAngrySurprisedDisgustedNeural

Net

Pixel(Retina)

Level

Object(IT)

Level

Perceptual(V1)

Level

CategoryLevel


4/12/2006

17/100

PCA

.

.

.

.

.

.

GaborFiltering

BobCarolTedAlice

NeuralNet

Pixel(Retina)

Level

Object(IT)

Level

Perceptual(V1)

Level

CategoryLevel



4/12/2006

18/100


PCA...

.

.

.

GaborFiltering

BobCarolTedCupCanBookNeural

Net

Pixel(Retina)

Level

Object(IT)

Level

Perceptual(V1)

Level

CategoryLevel

FeatureFeaturelevellevel


4/12/2006

19/100


LSF

PCA

HSF

PCA

.

.

.

.

.

.

GaborFiltering

BobCarolTedCupCanBookNeural

Net

Pixel(Retina)

Level

Object(IT)

Level

Perceptual(V1)

Level

CategoryLevel


4/12/2006

20/100

The Gabor Filter Layer• Basic feature: the 2-D Gabor wavelet filter

(Daugman, 85):

• These model the processing in early visual areas

Convolution

*

Magnitudes

Subsample in a 29x36 grid


4/12/2006

21/100

Principal Components Analysis

The Gabor filters give us 40,600 numbers We use PCA to reduce this to 50 numbers PCA is like Factor Analysis: It finds the underlying directions of Maximum Variance

PCA can be computed in a neural network through a competitive Hebbian learning mechanism

Hence this is also a biologically plausible processing step

We suggest this leads to representations similar to those in Inferior Temporal cortex


4/12/2006

22/100

How to do PCA with a neural network

(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)

A self-organizing network that learns whole-object representations

(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)

...

Holons

(Gestalt layer)

Input fromInput fromPerceptual LayerPerceptual Layer


4/12/2006

23/100





...

Holons

(Gestalt layer)



4/12/2006

24/100





...

Holons

(Gestalt layer)



4/12/2006

25/100





...

Holons

(Gestalt layer)



4/12/2006

26/100





...

Holons

(Gestalt layer)



4/12/2006

27/100





...

Holons

(Gestalt layer)



4/12/2006

28/100

The “Gestalt” Layer: Holons(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990;

Cottrell & Metcalfe 1990; O’Toole et al. 1991)



...

Holons

(Gestalt layer)



4/12/2006

29/100

Holons• They act like face cells (Desimone, 1991):

• Response of single units is strong despite occluding eyes, e.g.

• Response drops off with rotation• Some fire to my dog’s face

• A novel representation: Distributed templates -- • each unit’s optimal stimulus is a ghostly looking face (template-like),

• but many units participate in the representation of a single face (distributed).

• For this audience: Neither exemplars nor prototypes!• Explain holistic processing:

• Why? If stimulated with a partial match, the firing represents votes for this template:

Units “downstream” don’t know what caused this unit to fire.

(more on this later…)


4/12/2006

30/100

The Final Layer: Classification

(Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; Padgett & Cottrell 1996; Dailey & Cottrell, 1999; Dailey et al. 2002)

The holistic representation is then used as input to a categorization network trained by supervised learning.

Excellent generalization performance demonstrates the sufficiency of the holistic representation for recognition

Holons

CategoriesCategories

...

Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc.Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc.



4/12/2006

31/100

The Final Layer: Classification

• Categories can be at different levels: basic, subordinate.

• Simple learning rule (~delta rule). It says (mild lie here): • add inputs to your weights (synaptic strengths) when you are supposed to be on,

• subtract them when you are supposed to be off.• This makes your weights “look like” your favorite patterns – the ones that turn you on.

• When no hidden units => No back propagation of error.

• When hidden units: we get task-specific features (most interesting when we use the basic/subordinate distinction)


4/12/2006

32/100


• Some insights from modeling:Some insights from modeling:

• What could “holistic processing” mean?What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?




4/12/2006

33/100

Holistic Processing• Holistic processing refers to a type of processing where visual stimuli are treated “as a piece” -- in fact, we are unable to ignore other apparent “parts” of an image.

• Face processing, in particular, is thought to be “holistic” in nature.• We are better at recognizing “Bob’s nose” when it is on his face

• Changing the spacing between the eyes makes the nose look different

• We are unable to ignore conflicting information from other parts of a face

• All of these might be summarized as “context influences perception,” but the context is obligatory.


4/12/2006

34/100

Who do you see?

• Context influences perception


4/12/2006

35/100

Same Different Task


4/12/2006

36/100


4/12/2006

37/100


4/12/2006

38/100


4/12/2006

39/100

These look like very different These look like very different womenwomen


4/12/2006

40/100

But all that has changed is the height of the eyes, right?But all that has changed is the height of the eyes, right?


4/12/2006

41/100

Take the configural processing test!

What emotion is being shown in the top half of the image below?

Happy, Sad, Afraid, Surprised, Disgusted, or Angry?

Now, what do you see?Now, what do you see?

Answer: SadAnswer: Sad


4/12/2006

42/100

Do Holons explain these effects?

• Recall that they are templates --

• each unit’s optimal stimulus is a ghostly looking face (template-like)

• What will happen if there is a partial match?

• Suppose there is a holon that “likes happy faces”.

• The mouth will match, causing this unit to fire.

• Units downstream have learned to associate this firing with a happy face.

• They will “think” the top of the face is happier than it is…


4/12/2006

43/100

Do Holons explain these effects?

• Clinton/Gore: The outer part of the face votes for Gore.

• The nose effect: a match at the eyes votes for that template’s nose.

• Expression/identity configural effects:• Split faces:

• The bottom votes for one person, the top another, but both vote for the WHOLE face…

• Split expressions:• The bottom votes for one expression, the top another…


4/12/2006

44/100

Gabor Pattern

Gabor Filtering

Input Pixel Image

Attention to half an image

Attenuate

Attenuated Pattern

Composite vs. non-composite facial expressions (Calder et al. 2000)

Network ErrorsHuman Reaction Times

(error bars indicate one standard deviation)


4/12/2006

46/100

Is Configural Processing of Identity and Expression Independent?

• Calder et al. (2000) found that adding additional inconsistent information that is not relevant to the task didn’t further slow reaction times.

• E.g., when the task is “who is it on the top?”, having a different person’s face on the bottom hurts your performance, but also having a different expression doesn’t hurt you any more.

Same Identity, Different Expression

Different Identity, Same Expression

Different Identity, Different Expression


4/12/2006

47/100

(Lack of) Interaction between expression and identity

Network Reaction Time:

1 – Correct OutputHuman Reaction Time (ms)

Cottrell, Branson, and Calder, 2002Cottrell, Branson, and Calder, 2002


4/12/2006

48/100

AttenuatedAttenuatedInconsistentInconsistentInformationInformation

herehere

The representationThe representationof shifted informationof shifted informationhere (non-configural)->here (non-configural)->

Has littleHas littleImpact here-->Impact here-->

becausebecauseThe bottom half The bottom half doesn’t match doesn’t match any templateany template

PCA

.

.

.

.

.

.

GaborFiltering

HappySadAfraidBobCarolTedNeural

Net

Pixel(Retina)

Level

Object(IT)

Level

Perceptual(V1)Level

CategoryLevel

Why does this work?Why does this work?

Leads toLeads toa weakera weaker

representationrepresentationhere->here->

Because the Because the Wrong templateWrong template

Is weakly activatedIs weakly activated


4/12/2006

49/100

Configural/holistic processing phenomena

accounted for

• Interference from incorrect information in other half of image.

• Lack of interference from misaligned incorrect information.

• We have shown this for identity and expression, as well as the lack of interaction between these.

• Calder suggested from his data that we must have two representations: one for expression and one for identity: but our model has only one representation.


4/12/2006

50/100



• What could “holistic processing” mean?• Does a specialized processor for faces need to be Does a specialized processor for faces need to be innately specified?innately specified?




4/12/2006

51/100

Introduction• The brain appears to devote specialized resources to face processing.

• The issue: innate or learned?• Our approach: computational models guided by neuropsychological and experimental data.

• The model: competing neural networks + biologically plausible task and input biases.

• Results: interaction between face discrimination and low visual acuity leads to networks specializing for face recognition.

• No innateness necessary!


4/12/2006

52/100

Step one: a model with

parts

• Independent networks compete to perform new tasks• A mediator rewards winners• The question: What might cause a specialized face

processor?

StimulusDecision

Mediator

FeatureExtraction units

FaceProcessing

ObjectProcessing

Processing

??


4/12/2006

53/100

Developmental biases in learning

• The task: we have strong need to discriminate between faces but not between baby bottles.• Mother’s face recognition at 4 days (Pascalis et

al., 1995)

• The input: low spatial frequencies - which tends to be more holistic in nature• Infant sensitivity to high spatial frequencies is

low at birth

From Banks and Salapatek, 1981


4/12/2006

54/100

Neural Network Implementation

InputStimulus Image

Preprocessing

• Separate nets in competition

• Output mixed by gate network

• More error feedback to “winner”

......

......

...

Gate

Output...

multiplicativeconnections

High spatialfrequency

Low spatialfrequency


4/12/2006

55/100

Experimental methods

• Image data: 12 faces, 12 books, 12 cups, 12 soda cans, five examples each.

• 8-bit grayscale, cropped and scaled to 64x64 pixels


4/12/2006

56/100

Image PreprocessingGabor Jet

Pattern Vector(8x5 Elements)

Filter Responses(512x5 Elements)

DimensionalityReduction

PCA

PCA

PCA

PCA

PCA


4/12/2006

57/100

Task Manipulation

Trained networks for two types of task:• Superordinate four-way classification (book? face?)

• Subordinate classification within one class; simple classification for others (book? John?)

Face CanBook Cup Bob CanBook CupCarol Ted ...Alice

Task 1: Superordinate Task 2: Subordinate

Network Output Units


4/12/2006

58/100

Input spatial frequency manipulation

Used two input pattern formats• Each module receives same full pattern vector• One module receives low spatial frequencies; other receives high spatial frequencies

a edcb

a 000.5cb 0 ed0.5c0


4/12/2006

59/100

Measuring specialization

Train the network Record how gate network outputs change with each pattern

Net 1

Net 2

Gate

0.2

0.8

Net 1

Net 2

Gate

0.7

0.3


4/12/2006

60/100

Specialization Results

0

0.2

0.4

0.6

0.8

1

Faces Books Cups Cans

0

0.2

0.4

0.6

0.8

1


Gat

ing

Uni

t Ave

rage

W

eigh

t

0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


0

0.2

0.4

0.6

0.8

1


Four-way classification(Face, Book, Cup, Can?)

Book identification(Face, Cup, Can,

Book1, Book2, ...?)

Face identification(Book, Cup, Can,

Bob, Carol, Ted, ...?)

Module 1

Module 2

High frequencymodule

Low frequencymodule

TASK

All frequencies

Hi/Lo split


4/12/2006

61/100

Modeling prosopagnosia Can “damage” the specialized network.

Face Id Task, Low-f Module, Split Inputs

0

10

20

30

40

50

60

70

80

90

100

0% 25% 50% 75% 100%

% Damage

% Generalization Accuracy

Faces

Books

Cups

Cans

Face Id Task, High-f Module, Split Inputs

0

10

20

30

40

50

60

70

80

90

100

0% 25% 50% 75% 100%

% Damage

% Generalization Accuracy

Faces

Books

Cups

Cans

Damage to high spatial frequency network degrades object

classification

Damage to low spatial frequency network degrades face

identification


4/12/2006

62/100

Conclusions so far…• There is a strong interaction between task and spatial frequency in the degree of specialization for face processing.

• The model suggests that the infant’s low visual acuity and the need to discriminate between faces but not other objects could “lock in” a special face processor early in development.

• => General mechanisms (competition, known innate biases) could lead to a specialized face processing “module”• No need for an innately-specified processor


4/12/2006

63/100




• Why would a face area process BMW’s?Why would a face area process BMW’s?



4/12/2006

64/100

Are you a perceptual expert?Take the expertise test!!!**

“Identify this object with the first name that

comes to mind.”**Courtesy of Jim Tanaka, University of Victoria


4/12/2006

65/100

“Car” - Not an expert

“2002 BMW Series 7” - Expert!


4/12/2006

66/100

“Bird” or “Blue Bird” - Not an expert

“Indigo Bunting” - Expert!


4/12/2006

67/100

“Face” or “Man” - Not an expert

“George Dubya”- Expert!


4/12/2006

68/100

Greeble Experts (Gauthier et al. 1999)

• Subjects trained over many hours to recognize individual Greebles.

• Activation of the FFA increased for Greebles as the training proceeded.


4/12/2006

69/100

The visual expertise mystery

If the so-called “Fusiform Face Area” (FFA) is specialized for face processing, then why would it also be used for cars, birds, dogs, or Greebles?

Our view: the FFA is an area associated with a process: fine level discrimination of homogeneous categories.

But the question remains: why would an area that presumably starts as a face area get recruited for these other visual tasks? Surely, they don’t share features, do they?

Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society


4/12/2006

70/100

Solving the mystery with models

Main idea:• There are multiple visual areas that could compete to be the Greeble expert - “basic” level areas and the “expert” (FFA) area.

• The expert area must use features that distinguish similar looking inputs -- that’s what makes it an expert

• Perhaps these features will be useful for other fine-level discrimination tasks.

We will create • Basic level models - trained to identify an object’s class

• Expert level models - trained to identify individual objects.

• Then we will put them in a race to become Greeble experts.

• Then we can deconstruct the winner to see why they won. Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society


4/12/2006

71/100

Model Database

• A network that can differentiate faces, A network that can differentiate faces, books, cups andbooks, cups and

cans is a “basic level network.”cans is a “basic level network.”•A network that can also differentiate A network that can also differentiate individuals within ONE individuals within ONE

class (faces, cups, cans OR books) is an class (faces, cups, cans OR books) is an “expert.”“expert.”


4/12/2006

72/100

Model

• Pretrain two groups of neural networks on different tasks.

• Compare the abilities to learn a new individual Greeble classification task.

cup

Carol

book

can

BobTed

cancupbookface

Hidden layer

Greeble1Greeble2Greeble3

Greeble1Greeble2Greeble3

(Experts)

(Non-experts)


4/12/2006

73/100

Expertise begets expertise

• Learning to individuate cups, cans, books, or faces first, leads to faster learning of Greebles (can’t try this with kids!!!).

• The more expertise, the faster the learning of the new task!• Hence in a competition with the object area, FFA would win.Hence in a competition with the object area, FFA would win.• If our parents were cans, the FCA (Fusiform Can Area) would win.

AmountAmountOfOf

TrainingTrainingRequiredRequiredTo be aTo be aGreebleGreebleExpertExpert

Training Time on first taskTraining Time on first task


4/12/2006

74/100

Entry Level Shift: Entry Level Shift: Subordinate RT decreases Subordinate RT decreases

with trainingwith training (rt = uncertainty of response = 1.0 -(rt = uncertainty of response = 1.0 -

max(output)max(output)))

Human dataHuman data

--- Subordinate Basic

RT

Network dataNetwork data

# Training Sessions


4/12/2006

75/100

How do experts learn the How do experts learn the task?task?

• Expert level networks must be Expert level networks must be sensitivesensitive to within-class to within-class variation:variation:• Representations must Representations must amplifyamplify small differencessmall differences

• Basic level networks must Basic level networks must ignoreignore within-class variation.within-class variation.• Representations should Representations should reducereduce differencesdifferences


4/12/2006

76/100

Observing hidden layer representations

• Principal Components Analysis on hidden Principal Components Analysis on hidden unit activation:unit activation:• PCA of hidden unit activations allows PCA of hidden unit activations allows us to reduce the dimensionality (to us to reduce the dimensionality (to 2) and plot representations.2) and plot representations.

• We can then observe how tightly We can then observe how tightly clustered stimuli are in a low-clustered stimuli are in a low-dimensional subspacedimensional subspace

• We expect basic level networks to We expect basic level networks to separate classes, but not individuals.separate classes, but not individuals.

• We expect expert networks to separate We expect expert networks to separate classes and individuals.classes and individuals.


4/12/2006

77/100

Subordinate level training Subordinate level training magnifies small differences magnifies small differences

withinwithin object object representationsrepresentations

1 epoch 80 epochs 1280 epochs

Face

Basic

greeble


4/12/2006

78/100

Greeble representations are Greeble representations are spread out prior to Greeble spread out prior to Greeble

TrainingTraining

FaceBasic

greeble


4/12/2006

79/100

Variability Decreases Learning Variability Decreases Learning TimeTime

GreebleLearningTime

Greeble Variance Prior to Learning Greebles

(r = -0.834)(r = -0.834)


4/12/2006

80/100

Examining the Net’s Representations

• We want to visualize “receptive fields” in the network.

• But the Gabor magnitude representation is noninvertible.

• We can learn an approximate inverse mapping, however.

• We used linear regression to find the best linear combination of Gabor magnitude principal components for each image pixel.

• Then projecting each hidden unit’s weight vector into image space with the same mapping visualizes its “receptive field.”


4/12/2006

81/100

Two hidden unit receptive fields

AFTER TRAINING AS A FACE EXPERT

AFTER FURTHER TRAINING ON GREEBLES

HU 16HU 16

HU 36HU 36

NOTE: These are not face-NOTE: These are not face-specific!specific!


4/12/2006

82/100

Controlling for the number of classes

• We obtained 13 classes from hemera.com:

• 10 of these are learned at the basic level.

• 10 faces, each with 8 expressions, make the expert task

• 3 (lamps, ships, swords) are used for the novel expertise task.


4/12/2006

83/100

Results: Pre-training• New initial tasks of similar difficulty: In previous

work, the basic level task was much easier.

• These are the learning curves for the 10 object classes and the 10 faces.


4/12/2006

84/100

Results• As before, experts still learned new expert level tasks faster

Number of epochsNumber of epochsTo learn swordsTo learn swords

After learning facesAfter learning facesOr objectsOr objects

Number of training epochs on faces or objectsNumber of training epochs on faces or objects


4/12/2006

85/100





• Some new directions:Some new directions:• How do we select where to look next?How do we select where to look next?• How is information integrated across saccades?How is information integrated across saccades?


4/12/2006

86/100

Issues I haven’t addressed…1. Development - what is the trajectory of the system

from infant to adult? How do representations change over development?

2. How do earlier acquired representations differ from later ones? I.e., what is the representational basis of Age of Acquisition effects?

3. How do representations change based on familiarity?4. Does the FFA participate in basic level processing?5. Dynamics of expertise: Eye movements

1. How do they change with expertise?2. Are there visual routines for different tasks?3. How much does the stimulus influence eye movements?

I.e., how flexible are the routines?4. How do we decide where to look next?

6. How are samples integrated across saccades?


4/12/2006

87/100

How do we decide where to look next?

• Both bottom up and top down influences:• Local stimulus complexity == “interestingness”

• Task requirements: Look for discriminative features

• We’ve looked at at least two ideas:1.Gabor filter response variance2.Mutual information between the features and the categories


4/12/2006

88/100

Interest points created using Gabor filter varianceInterest points created using Gabor filter variance


4/12/2006

89/100

Where do we look next #2: Mutual Information

• Ullman et al. (2002) proposed that features of intermediate complexity are best for classification.

• They used mutual info. between patches in images and categories to find patches that were good discriminators:• Faces vs. non-faces; Cars vs. non-cars• They found that medium-sized and medium-resolution patches were best for these tasks.

• Our question: what features are best for subordinate-level classification tasks that need expertise, like facial identity recognition

• We found that traditional features such as eyes, noses, and mouths are informative for identity ONLY in the context of each other: I.e., in a configuration.

• Conclusion: Holistic processing develops because “it is good.”


4/12/2006

90/100

Ullman et al 2002

• Features of intermediate complexity (size and resolution) are best for classification.

• These were determined by computing the mutual information between an image patch and the class


4/12/2006

91/100

Facial Identity Classification

• Will features that are good for telling faces from objects be good for identification?

• We expect that more specific features will be needed for face identification.


4/12/2006

92/100

Data Set• We used 36 frontal images of 6 individuals (6 images each) from FERET [Phillips et al., 1998]. The images were aligned.

• Gabor filter responses were extracted from rectangular grids


4/12/2006

93/100

Patches

• Rectangular patches of different centers, sizes and Gabor filter frequencies were taken from images.


4/12/2006

94/100

Corresponding Patches• Patches are defined as “corresponding” when

they are in the same position, size and Gabor filter frequency across images.

• If a “Fred patch” matches the corresponding patch in another image, this is evidence for the “Fredness” of the new image.

• We can then use some measure of how many Fred patches match, and a threshold, to decide if this face is “Fred.”


4/12/2006

95/100

Mutual Information

• How useful the patches were for face identification was measured by mutual information:• I(C,F) = H(C) - H(C|F)

• C, F are binary variables standing for class and feature • C=1 when the image is of the individual• F=1 when the patch is present in the image


4/12/2006

97/100

Results: Best Patches

The 6 patches with the highest mutual information. Frequency of 1 to 5 denote from the highest Gabor filter frequency to the lowest.

These are similar to each other because we do not eliminate redundancy in the patches (these are not independent)


4/12/2006

98/100

Conclusions so far…

• Against intuition, local features of eyes and mouths by themselves are not very informative for face identity.

• Local features need to processed in medium-sized face areas for identification - where they are in a particular configuration with other features.

• This may explain why holistic processing has developed for face processing - simply because it is good or even necessary for identification.


4/12/2006

99/100

Integration across saccades• Now, given these patches sampled from an image, what to do with them?

• Joyca LaCroix’s (2004) Natural Input Memory (NIM) model of recognition memory:• At study, sample the image at random points. Store the patches.

• At test, sample the new image at random points.• Count how many stored patches fall inside a ball of radius R around the new patches. The average of this is the recognition score.

• This is a kernel density estimation model, like GCM, but exemplars are patches, not the whole image.

• I.e., the “NIM” answer to the integration problem is:

Don’t integrate!• NIM is a natural partner to our eye movement modeling.


4/12/2006

100

/100

Implications of the NIM model

• What would this mean for expertise?• Lots of experience -> lots of fragments -> better discrimination

• Familiarity also means lots of fragments, under many lighting conditions, all associated with one name

• Augmentation with an interest operator (e.g., look at high variance points on the face (see previous slides and Yamada & Cottrell 1994)) could easily lead to parts-based representations!


4/12/2006

101

/100

Wrap up• We are able to explain a variety of results in face processing.

• We have a mechanistic way of talking about “holistic processing.”

• How a specialized area might arise for faces, and why low spatial frequencies (LSF) appear to be important in face processing (specialization model: LSF -> better learning and generalization).

• Why a face area would be recruited to be a Greeble area: expert level (fine discrimination) processing leads to highly differentiated features useful for other discrimination tasks.

• And…we have plans to go beyond simple passive recognition models…

102

END

what can computational models tell us about face processing?

Documents