parsing images with context/content sensitive grammars eran borenstein, stuart geman, ya jin, wei...

40
Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Upload: harold-godfrey-morgan

Post on 13-Dec-2015

227 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Parsing Images with Context/Content Sensitive

Grammars

Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Page 2: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

I. Structured Representation in Neural Systems

II. Vision is Hard

III. Why is Vision Hard?

IV. Hierarchies of Reusable Parts

V. Demonstration System: Reading License Plates

VI. Generalization: Face Detection

Page 3: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Artificial Intelligence

• Knowledge Engineering

• Learning Theory

• Both Lack Model

engineer everything, learn nothing

engineer nothing, learn everything

Page 4: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Natural Intelligence

• Strong Representation

• Hierarchy and Reusability

simulation and semantics

ventral visual pathway, linguistics, compositionality

Page 5: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

I. Structured Representation in Neural Systems

II. Vision is Hard

III. Why is Vision Hard?

IV. Hierarchies of Reusable Parts

V. Demonstration System: Reading License Plates

VI. Generalization: Face Detection

Page 6: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Machines still can’t reliably read license plates

License plate images from Logan Airport

Page 7: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Machines can’t read fixed-font fixed-scale characters as well as humans

Wafer ID’s

Page 8: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Machines can’t find the bad guys at the Super Bowl

Super Bowl

Page 9: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

I. Structured Representation in Neural Systems

II. Vision is Hard

III. Why is Vision Hard?

IV. Hierarchies of Reusable Parts

V. Demonstration System: Reading License Plates

VI. Generalization: Face Detection

Page 11: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Background is structured, and made of the same stuff!

“Clutter”

Human Interactive Proofs

Page 12: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

I. Structured Representation in Neural Systems

II. Vision is Hard

III. Why is Vision Hard?

IV. Hierarchies of Reusable Parts

V. Demonstration System: Reading License Plates

VI. Generalization: Face Detection

Page 13: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

e.g. discontinuities, gradient

e.g. linelets, curvelets, T-junctions

e.g. contours, intermediate objects

e.g. animals, trees, rocks

Hierarchical of Reusable Parts

“Bricks”

Page 14: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 15: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 16: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 17: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 18: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 19: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 20: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Hierarchy of Disjunctions of Conjunctions

Page 21: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Interpretations and Probabilities

Interpretation

I

selected subgraph

Page 22: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Interpretations and Probabilities

selected subgraph

IInterpretation

Page 23: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Interpretations and Probabilities

)(IP GRAPHICAL MODEL (Markov)

X LIKELIHOOD RATIO (non-Markov)

Page 24: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Generative (Bayesian) Model

Page 25: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

I. Structured Representation in Neural Systems

II. Vision is Hard

III. Why is Vision Hard?

IV. Hierarchies of Reusable Parts

V. Demonstration System: Reading License Plates

VI. Generalization: Face Detection

Page 26: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Test set: 385 images, mostly from Logan Airport

Courtesy of Visics Corporation

Page 27: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

characters, plate sides

generic letter, generic number, L-junctions of sides

license plates

Architecture

parts of characters, parts of plate sides

plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits)

license numbers (3 digits + 3 letters, 4 digits + 2 letters)

Page 28: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Original Image Top object

Top 10 objects Top 25 objects

Image interpretation

Page 29: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Test image Top objects

Image interpretation

Page 30: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

• 385 images

• Six plates read with mistakes (>98%)

• Approx. 99.5% characters read correctly

• Zero false positives

Performance

Page 31: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Original image Zoomed license region

Top object under Markov distribution

Top object under content-sensitive distribution

Efficient discrimination: Markov versus Content-Sensitive dist.

Page 32: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

9 active “8” bricks under whole model 1 active “8” brick under parts model

Test image

Efficient discrimination: testing objects against their parts

Page 33: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Vision is Content Sensitive

Summary

Background is Structured, and Made of the Same Stuff

Non-Markovian probability models

Objects come equipped with their own background models

Page 34: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

I. Structured Representation in Neural Systems

II. Vision is Hard

III. Why is Vision Hard?

IV. Hierarchies of Reusable Parts

V. Demonstration System: Reading License Plates

VI. Generalization: Face Detection

Page 35: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Plates Face Detection

Rigid Deformable

“Black/White” Data Model Intensity Model

Hand-Crafted Probabilities Learned Probabilities

Page 36: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Face Hierarchy

Page 37: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang
Page 38: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Sampling from Data Model 0.6 1

Page 39: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

Sampling faces from the distribution

Page 40: Parsing Images with Context/Content Sensitive Grammars Eran Borenstein, Stuart Geman, Ya Jin, Wei Zhang

PATTERN SYNTHESIS

= PATTERN RECOGNITION

Ulf Grenander