“now! – that should clear up a few things around here!”

Post on 13-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

“Now! – That should clear upa few things around here!”

The Challenge of Recognition

The Challenge of Recognition

The Importance of Recognition

Navigation

Social interactions

Sexualselection

Foraging

Offspringcare

Dangeravoidance

Pavlovianconditioning

Objectrecognition

Objectrecognition

The Importance of Recognition

“Not only did Dr. P fail to see faces, but he saw faces when there wereno faces to see. In the street he might pat the heads of water hydrantsand parking meters, taking these to be the heads of children; he would amiably address carved knobs on the furniture and be astounded whenthey did not reply. Such incidents multiplied, causing embarrassment, perplexity and fear.”

From ‘The man who mistook his wife for a hat’by Dr. Oliver Sacks

Brain Mechanisms of Recognition

Pawan SinhaDepartment of Brain and Cognitive Sciences

MIT

• Current understanding• Ongoing research• Real-world applications• What the future holds

Current Understanding – Lesion Studies

Which parts of the brain are involved in visual recognition?

Initial clues:Kluver-Bucy syndrome (1939)

Temporal lobe lesions in humansand monkeys lead to:1. Visual agnosia, and2. Hypersexuality

Responses of a Patient with Temporal Lobe Damage

(Farah, 1995)

Current Understanding – Electrophysiology

Face-cell 1

Face-cell 2

Desimone, 1984

Are there specific neurons involved in visual recognition?

Current Understanding – Brain Imaging

Kanwisher et al, 1997

Are there specific brain regions involved in visual recognition?

“Face area”

Current Understanding – Summary

The temporal lobe is involved in visual recognition.

Current Understanding – Summary

The temporal lobe is involved in visual recognition.

So what?

Current Understanding – Summary

The temporal lobe is involved in visual recognition.

So what?

This doesn’t tell us how the brain recognizes objects.

How Does the Brain Recognize Objects?

PrimaryVisualCortex

Oriented bar and edgedetector neurons

Hubel and Wiesel (1977)

A Proposal for How the Brain Recognizes Objects

Edges!

Further processing

Marr (1979)

A Proposal for How the Brain Recognizes Objects

Image

Edge map

Binocularprocessing

3D estimate Recognition!

A Proposal for How the Brain Recognizes Objects

What underlies the researchers’ fascination for edges?

Fine edges, but not the coarse structure, are expected to be invariant to imaging variations.

In principle, this can make the recognition task very easy.

The Problem with Edges…

In practice, fine edges turn out to be highly unstable.

Even after more than two decades of research, we havebeen unable to create a robust recognition system based onthe proposed model.

The Problem with the Rest of the Model…

Image

Edge map

Binocularprocessing

3D estimate Recognition!

Recent experimental results suggest that recognitionmay precede the intermediate steps.

Sinha & Poggio, Nature, 1996Jones, Sinha, Poggio & Vetter, Current Biology, 1997Bulthoff, Bulthoff & Sinha, Nature Neuroscience, 1998

The Million Dollar Question…

If recognition has to happen before the image can be finely analyzed, what then is the minimum image information that suffices for recognition?

Image

Edge map

Binocularprocessing

3D estimate Recognition!

?

Reducing Image Information

An ecologically sound method: Progressive blurring(Equivalent to recognition at increasing distances)

RecognitionPerformance

Amount of Blur

Criterion

Face Detection – Experimental Protocol

Subject’s task: Given an image, to determine whether it is a face.

Random patterns

Symmetric patterns

False-alarms from an artificial detectionsystem

Targets Distractors

All stimuli are presented at several blur levels.

Face Detection – Results

Amount of Blur(radius of Gaussian)

Face Detection Performance

0

20

40

60

80

100

120

0 2 4 6 8 10 12 14 16

Blur Amount

Per

form

ance

(%)

Face Hit Rate

FP FA

Symm FA

Random FA

Hit Rate Criterion

FA Criterion

Face Detection – Analysis

Are there any useful invariants at such high levels of blur?

Yes! Sinha, 1994, 1995; Lipson, Grimson, Sinha, 1997

Conceptlearning

algorithmsRatio-template: A stable

face-signature that comprises pairwise ordinal brightness

relationships

http://www.ai.mit.edu/projects/cbcl/web-pawan/cartoon/cartoon.html

Face Detection – Analysis (contd.)

Does the brain use a ‘ratio-template’ like invariant fordetecting faces?

There is no direct evidence yet. However, there is someindirect evidence.

1. Neurons in the visual cortex have the required propertiesneeded to implement this model.

2. Computer implementations of this model yield goodperformance.

Performance of Ratio-templates

Benefits of Low-Resolution Approach

1. Permits face detection at a distance2. Is robust to image degradations3. Can generalize across facial variations4. Is computationally simple

An Application of Our Face-detection System

The Nielsen People Meter

Beyond Mere Detection – Face Recognition

Low-resolution images may suffice for detection, but, surely,we must have fine detailed information for recognition…

A popular approach to face recognition – feature matching

Beyond Mere Detection – Face Recognition

Prediction of such an approach:

Amount of Blur

RecognitionPerformance

Face Recognition – Experimental Protocol

Subject’s task: To recognize celebrity images subjected todifferent levels of blur.

Stimuli: Blur series for 36 celebrity faces.

Face Recognition – Experimental Protocol

29.3%

37.9%

56.1%

66.7%

82.1%87.4%91.2%

91.4%

0%0%0%2.4%

7.1%

29.8%

65.9%

79%

0%0%0%0%0%2%

28.6%

52.4%

94.1%

85.3%

0

10

20

30

40

50

60

70

80

90

100

REF 0 2 4 6 8 10 12 14

Blur Level

Pe

rce

nt

Co

rre

ct

Full_Face_NC_11

Internal_Intact_7

Internal_Broken_7

REF_Intact

REF_Broken

Face Recognition - Results

10 x 12 pixels/face

29.3%

37.9%

56.1%

66.7%

82.1%87.4%91.2%

91.4%

0%0%0%2.4%

7.1%

29.8%

65.9%

79%

0%0%0%0%0%2%

28.6%

52.4%

94.1%

85.3%

0

10

20

30

40

50

60

70

80

90

100

REF 0 2 4 6 8 10 12 14

Blur Level

Pe

rce

nt

Co

rre

ct

Full_Face_NC_11

Internal_Intact_7

Internal_Broken_7

REF_Intact

REF_Broken

Face Recognition - Results

10 x 12 pixels/face70 x 70 pixels/face

Face Recognition - inference

Overall face configuration supports much more robustrecognition as compared to individual features.

Face Recognition - inference

Overall face configuration supports much more robustrecognition as compared to individual features.

“By and large, Dr. P recognized nobody: neither his family, norHis colleagues, nor his pupils. He recognized Einstein becauseHe picked up the characteristic moustache, and the same thingHappened with one or two other people. ‘Ach, Paul!’ he said,When shown a portrait of his brother. ‘That square jaw, thoseBig teeth –I would know Paul anywhere!’

From ‘The man who mistook his wife for a hat’By Dr. Oliver Sacks

Face Recognition - inference

Overall face configuration supports much more robustrecognition as compared to individual features.

Sinha & Poggio, Nature, 1996

The Two Million Dollar Question

Which aspects of facial configuration are importantand which are not?

The Two Million Dollar Question

Which aspects of facial configuration are importantand which are not?

Caricaturists probably know the answer to this question…

The Two Million Dollar Question

Which aspects of facial configuration are importantand which are not?

Caricaturists probably know the answer to this question…

…but it is difficult for them to articulate this intuitive knowledge, sometimes even to themselves.

Caricaturists in their own words…

“When I’m having difficulty caricaturing someone, I just keep drawing by doing ten, twenty, thirty, forty sketches of the subject…”

- Bill Plympton

“I once spent about eighteen hours trying to caricatureBarry Manilow. It was frustrating not being able to draw someone who is so funny looking in the first place.”

- Taylor Jones

The Hirschfeld Project

An attempt to make explicit the intuitive knowledgethat caricaturists possess and, in the process, togain insights into the brain’s face recognitionstrategies.

The Hirschfeld Project - Goal

Given multiple caricatures corresponding to everyface image in a large set…

m f

ace

imag

esn caricatures

…the goal is to determine which facial measurements Caricaturists consistently emphasize (or de-emphasize) and how the extent of distortion relates to the deviation of a given face from the population average.

The Hirschfeld Project - Caveats

“To effectively caricature a subject, I must feel that person’spersonality. A caricature to me is not just a big nose, big earsAnd a big head on a little body – it is much more.”

-Gerald ScarfeCaricaturist

The Hirschfeld Project - Caveats

“I wouldn’t be surprised if some day they would create a computer program to do caricature, but it would be terrible.what a caricaturist does involves his whole life experience,education and some indefinable thing that makes it all work.”

-Robert GrossmanCaricaturist

“To effectively caricature a subject, I must feel that person’spersonality. A caricature to me is not just a big nose, big earsAnd a big head on a little body – it is much more.”

-Gerald ScarfeCaricaturist

The Hirschfeld Project – in 10 Steps

Step 1:Caricature database creation(~50 caricaturists; ~100 faces of celebrities and others)

Step 2:Assessment of recognizability of caricatures

Step 3: Database digitization

The Hirschfeld Project – in 10 Steps

Step 4:Measurements for each database entry

The Hirschfeld Project – in 10 Steps

Step 4:Measurements for each database entry

The Hirschfeld Project – in 10 Steps

Step 5:Average face construction and measurement

The Hirschfeld Project – in 10 Steps

Step 6:Over-complete attribute set creation for each database entry

Step 7:Determination of ‘deviations’ of input face attributes w.r.t. average face

Point coordinates (a0, a1, a2, …, an)Lengths, ratios of lengths, angles,Areas, ratios of areas

dk = a k/a k - 1 inp avg

Dk = 0 -> Face attribute same as averageDk < 0 -> Face attribute smaller than averageDk > 0 -> Face attribute larger than average

Step 8:Determination of attribute exaggeration in caricatures

ek = a k/a k - 1 caric avg

The Hirschfeld Project – in 10 Steps

Step 9:Plot e vs d for each attribute across all inputs and perform linear regression.

d

eBest linear fit

Slope of regression line provides a measure of theemphasis assigned to anattribute.

d

e Emphasis

De-emphasisAnti-emphasis

The Hirschfeld Project – in 10 Steps

Step 10:Rank order attributes using regression line slopes.This provides an estimate of their relative salienceto the caricaturist and, perhaps, to the visual system.

The Hirschfeld Project – Early Results

Prediction:The width and height dimensions of a face can be independently scaledwithout adversely affecting recognizability.

ab

cde

f

g

h

i

Important attributes:

A /Aa/bc/de/df/gh/gi/(g+h)

hair face

Testing the Prediction of Independent Scalability in x & y

Artificially Created Caricatures

We can also specialize this approach to generate caricatures in the style of a particular caricaturist.

One day, the Hirschfeld Project may allow us to create an artificial Mr. Hirschfeld.

Besides yielding clues about the brain’s recognition strategies, these results also provide a prescription for automatically creating caricatures.

The Hirschfeld Project - Hurdles

Getting good caricaturists to help populate the database.

The Hirschfeld Project - Hurdles

Getting good caricaturists to help populate the database.

sinha@ai.mit.edu

Interim Summary

1. The human brain can recognize objects well even invery low-resolution images.2. In contrast to previous proposals, we are formulatinga recognition scheme designed to utilize low-resolutionimage information. We call this the ‘Configural Recognition Scheme’3. We have made some headway on the task of face-detectionand are currently exploring face-recognition.

Interim Summary

1. The human brain can recognize objects well even invery low-resolution images.2. In contrast to previous proposals, we are formulatinga recognition scheme designed to utilize low-resolutionimage information. We call this the ‘Configural Recognition Scheme’.3. We have made some headway on the task of face-detectionand are currently exploring face-recognition.

Understanding the brain’s strategies for recognition mayallow us to create useful artificial vision systems…

Pedestrian Detection

Collaboration with T. PoggioPartly funded by Daimler-Benz

Driver’s attentionalwindow

Achtung!

Pedestrian Detection – Early Results

Logo Search

The US-PTO has more than 2 million logos on file.New logo registration requires a search to prevent design infringement.

Existing logo search method:

Retrieval via numerical design annotation system (USPTO)

Five pointed star: 01-01-03

Logo Search - Challenges

Problem: Many logos do not have simple annotations!

Logo Search - Results

Query

Query

Industrial Inspection

3000 components/PCboard

60 seconds for inspection

Yield rate: 20%

Industrial Inspection - Results

Venturing into the Real World…

1st

MIT $50KEntrepreneurshipCompetition

What Does the Future Hold?

- A great deal more research- More sophisticated artificial systems

The COG project:Brooks, Scassellati et al.MIT AI Lab

What Does the Future Hold?

Face detection and tracking by Cog using Ratio-templates

Scassellati, 1998

What Does the Future Hold?

Face motion imitation by Cog

What Does the Future Hold?

- Better psycho-forensic systems

Current IdentiKit systems use a piecemeal approach:

What Does the Future Hold?

- Better psycho-forensic systems

Some IdentiKit composites generated by a police operator:

What Does the Future Hold?

- Better tools for visual information management

100 million

1997 199919961995199419931992

# of digital imageson the web

What Does the Future Hold?

Current tools for visual information management

If an image is worth a thousand words, how canwe hope to describe it with just a few?

Textual annotations

What Does the Future Hold? (contd.)

Desirable features for an image retriever

Graphical queries (“bring me images like this one”) No annotation required (content-based search) Quantitative measure of perceptual similarity

What Does the Future Hold? (contd.)

A content-based image retriever

What Does the Future Hold? (contd.)

Non-configural image retrievers

What Does the Future Hold?

- Better tools for visual information filtering(the flip side of searching)

xxx xxx

xxxTrainingData

Content-basedImage filter

xxx xxx

xxxWebimages

xxxFilteredcontent

What Does the Future Hold?

- Understanding recognition in the other sensory modalities

Summary and Conclusion

• Current understanding• Ongoing research

•Face detection•Face recognition

• Real-world applications

• The future holds very exciting prospects

top related