carsten rother microsoft research cambridge

84
Carsten Rother Microsoft Research Cambridge

Upload: vankien

Post on 13-Feb-2017

233 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Carsten Rother Microsoft Research Cambridge

Carsten Rother Microsoft Research Cambridge

Page 2: Carsten Rother Microsoft Research Cambridge

~140 employees (~100 Researchers, ~30 RSDEs, ~10 Admin)

Six different groups:

Computer-Mediated Living

Machine Learning & Perception

Cambridge Innovation Development

Computational Science

Programming Principles & Tools

Systems & Networking

Page 3: Carsten Rother Microsoft Research Cambridge

• Computer Vision group: medical vision, recognition, reconstruction, image editing, …

• Machine learning group: Infer.Net, Online Services and Advertisement, Xbox Ranking

• Constrained Reasoning group: Planning and Optimization

• Socio-Digital Systems: Understanding human needs for future technology

• Sensors and Devices:

SenseCam, Gadeteer, …

• Interactive 3D Technologies group

Page 4: Carsten Rother Microsoft Research Cambridge

Machine Learning

Hardware design

Human studies

I3D mission: new user experiences

Graphics

Computer Vision

Intersection workshop (Mai 2012, Cambridge): http://research.microsoft.com/en-us/events/intersection12/

Page 5: Carsten Rother Microsoft Research Cambridge
Page 6: Carsten Rother Microsoft Research Cambridge
Page 7: Carsten Rother Microsoft Research Cambridge

• All factors in the graph are trees • Discriminatively training of millions of Parameters • We can handle many loss-function

Decision/Regression Trees Random Fields

+

Page 8: Carsten Rother Microsoft Research Cambridge

Discrete labelling tasks:

Noisy input Ours [Zoran, Weiss, ICCV ‘11]

Continuous labelling tasks:

Test input Ground Truth

Trees Trees & Field

Page 9: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ’11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 – a review • SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 10: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ’11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 – a review • SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 11: Carsten Rother Microsoft Research Cambridge
Page 12: Carsten Rother Microsoft Research Cambridge

Left view Right view

Page 13: Carsten Rother Microsoft Research Cambridge

Depth map

Page 14: Carsten Rother Microsoft Research Cambridge

Left view Right view

Page 15: Carsten Rother Microsoft Research Cambridge

Local stereo matching: rectangular region (patch) check photo-consistency

Page 16: Carsten Rother Microsoft Research Cambridge

Local stereo matching: rectangular region (patch) check photo-consistency

Page 17: Carsten Rother Microsoft Research Cambridge

Fails at discontinuities

Fails at non-fronto-parallel planes

No continuous depth label

Slow

Page 18: Carsten Rother Microsoft Research Cambridge
Page 19: Carsten Rother Microsoft Research Cambridge
Page 20: Carsten Rother Microsoft Research Cambridge

Adaptive support weights [Yoon, CVPR ‘05]

Page 21: Carsten Rother Microsoft Research Cambridge

Fails at discontinuities

Fails at non-fronto-parallel planes

No continuous depth label

Slow

Page 22: Carsten Rother Microsoft Research Cambridge
Page 23: Carsten Rother Microsoft Research Cambridge

3 continuous parameters (depth + normal) for each pixel

Page 24: Carsten Rother Microsoft Research Cambridge
Page 25: Carsten Rother Microsoft Research Cambridge

Fails at discontinuities

Fails at non-fronto-parallel planes

No continuous depth label

Slow

Page 26: Carsten Rother Microsoft Research Cambridge
Page 27: Carsten Rother Microsoft Research Cambridge

Depth map

Page 28: Carsten Rother Microsoft Research Cambridge

Depth map

Page 29: Carsten Rother Microsoft Research Cambridge

Depth map

Page 30: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 31: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 32: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 33: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 34: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 35: Carsten Rother Microsoft Research Cambridge

Red Pixel means in the 4-neighborhood is a better solution

Page 36: Carsten Rother Microsoft Research Cambridge

1. Random initialization 2. Go through pixel in sequential order: 2a. consider solution from left/top neighbour 2b. sample around current solution 0 1

Left image –

Reindeer

(Middlebury) Left and right disparity maps (intermediate step of iteration 1)

Page 37: Carsten Rother Microsoft Research Cambridge
Page 38: Carsten Rother Microsoft Research Cambridge

Left image – Sawtooth

(Middlebury)

Image consists of 3 planes -

~80.000 guesses for yellow plane Ground truth disparities

Randomization is in our favour

No cost volume needed: well suited for large images and large depth range

Page 39: Carsten Rother Microsoft Research Cambridge

Left view Right view

Page 40: Carsten Rother Microsoft Research Cambridge

PatchMatch Stereo result

Page 41: Carsten Rother Microsoft Research Cambridge

Unary term (photo-consistency)

Pairwise term (local curvature)

Add a Markov Random Field:

Continuous 3-dimension

Page 42: Carsten Rother Microsoft Research Cambridge

Cost ≠ 0: local curvature or discontinuity

Cost = 0 both planes are aligned in 3D

Page 43: Carsten Rother Microsoft Research Cambridge

So far, we have been running with λ = 0

For non-zero λ, with super high-dimensional u:

Gradient descent

Gradient descent + Fusion move

Relaxation + Gradient descent

Simulated Annealing

Continuos Belief Propagation

Page 44: Carsten Rother Microsoft Research Cambridge

M2->3

Operation 1: compute neg-log Belief

s

Operation 2: re-compute Message

t s

M1->2

Sequential schedule

M1->4

Final output: us* = argmin Bs(us) us

Page 45: Carsten Rother Microsoft Research Cambridge

target

Page 46: Carsten Rother Microsoft Research Cambridge

Source (shifted 4.0 + noise)

Ground Truth

Page 47: Carsten Rother Microsoft Research Cambridge

Error: 0.618; Unary only

Error: 0.251

Ground Truth

12x12 discrete labels

Page 48: Carsten Rother Microsoft Research Cambridge

target

Page 49: Carsten Rother Microsoft Research Cambridge

Source (shifted 4.2 + noise)

GT

Page 50: Carsten Rother Microsoft Research Cambridge

Error: 0.66

Error: 1.9; unary only GT

12x12 discrete labels

Page 51: Carsten Rother Microsoft Research Cambridge

Error: 5.68

Error: 3.46; unary only GT

12x12 discrete labels

Page 52: Carsten Rother Microsoft Research Cambridge

M2->3 M1->2

Sequential schedule

M1->4

0 1

Each pixel has different set of particles:

t

0 1

s

Comment: we do max-product, hence we may not want to approximate true continuous distributions

t

us

ut

Bs(us)

(neg. log Belief) Bt(ut)

Page 53: Carsten Rother Microsoft Research Cambridge

t

M2->3 M1->2

Sequential schedule

M1->4 s

0 1 0 1

0 1

= (us-ut)2

ut us

us

Page 54: Carsten Rother Microsoft Research Cambridge

M2->3 M1->2

Sequential schedule

M1->4

0 1

Sample around current particles

0 1

s

us us

Final output: us* = argmin Bs(us) us

Page 55: Carsten Rother Microsoft Research Cambridge

GT

Error: 5.68 discrete

Energy: 47308 Error: 0.9713

Random init

Energy: 42628

Error: 0.8259 Best unary init (144 discrete)

Page 56: Carsten Rother Microsoft Research Cambridge

t s

The message Mt->s has high values for s = t since smoothness term is (us-ut)2

PM idea: sample also at your neighbours solutions!

We call this variant of Particle BP PatchMatch BP (PMBP)

0 1 0 1

= (us-ut)2

ut us

Page 57: Carsten Rother Microsoft Research Cambridge

GT

Energy: 42628

Error: 0.8259 Best unary init

Random init Energy: 21959

Error: 0.4159 50 particles

Random init Energy: 22593 Error: 0.3864 1 particle

Page 58: Carsten Rother Microsoft Research Cambridge

1 particles

Energy: 22593

Error: 0.3864

Energy: 21959

Error: 0.4159

50 particles

Page 59: Carsten Rother Microsoft Research Cambridge
Page 60: Carsten Rother Microsoft Research Cambridge

PatchMatch is a special Form of Particle BP

λ = 0

1 particle per node

Sample from neighbour nodes

Page 61: Carsten Rother Microsoft Research Cambridge
Page 62: Carsten Rother Microsoft Research Cambridge

Iterate two steps (in a nutshell):

1) Run full BP until convergence (convex version which solves the LP relaxation)

2) Sample all nodes individually

Page 63: Carsten Rother Microsoft Research Cambridge

Highly ranked in Middlebury Table

Page 64: Carsten Rother Microsoft Research Cambridge
Page 65: Carsten Rother Microsoft Research Cambridge
Page 66: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ‘11 • PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 • SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 67: Carsten Rother Microsoft Research Cambridge

Ultimate Goal: Recover: geometry, light, material Recognise: object instances, attributes … and do that jointly

Theoretical Challenges: statistical models of the world and the captured images Combines statistical Priors and physical constraints Practical Challenges: Robustness Real-time inference Task-driven, e.g. Robotics

To achieve this: latest machine learning latest optimization techniques

Page 68: Carsten Rother Microsoft Research Cambridge

Assignment of pixels to surfaces

Simple explanation: describe the scene by a few low-degree surfaces (splines, planes) Goal: depth estimation improves

Without prior With prior

Page 69: Carsten Rother Microsoft Research Cambridge

Simple explanation: describe scene by a few Objects: - compact in 3D - Connected in 3D - each object has a color model Goal: depth estimation improves

Objects o

Depth d

Objects o

Page 70: Carsten Rother Microsoft Research Cambridge

Simple explanation: describe scene by a few Objects: - compact in 3D (use bbox) - each object has a color model - Physical constraints Goal: 1) depth estimation improves 2) improves object extraction

Page 71: Carsten Rother Microsoft Research Cambridge

1) Create proposal pool

2) Rank proposal pool

3) Combine best objects and recognize

Use stereo images

boat

sky

water

Goals: • Reason in 3D with

physical constraints

• Improve depth estimation

Page 72: Carsten Rother Microsoft Research Cambridge

Left input image

Object labelling proposal 1

Object labelling proposal 2

Output: - Object labelling - Depth labelling - Object 3D bounding boxes - Object colour distribution

Page 73: Carsten Rother Microsoft Research Cambridge

Stereo: photo-consistency

Objects:

colour model

Prior on number of objects

Left input image PatchMatch Stereo Result

Object mask

Depth map

Page 74: Carsten Rother Microsoft Research Cambridge

Physical properties:

Bounding Box tightness

Bounding Box intersection

Bounding Box Gravity

Page 75: Carsten Rother Microsoft Research Cambridge

Merging (simulated annealing, patchmatch)

Exploration (mean-shift, patchmatch)

Object maps

Multiple Scene Proposals by varying the prior on number of objects

Page 76: Carsten Rother Microsoft Research Cambridge

Good rank in Middlebury table

Green: this term is useful

All Terms are useful

Page 77: Carsten Rother Microsoft Research Cambridge

Images

Ground truth

Our labelling

2D

Ours

GT

Object stereo

2D

Object stereo

Ours

Page 78: Carsten Rother Microsoft Research Cambridge

Large Scale Train and Test

Real-time

Do full 3D reconstruction (KinectFusion)

Model all physical properties: Light, Material

Use graphics engine for train and test “analysis by synthesis”

Page 79: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ‘11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12

Page 80: Carsten Rother Microsoft Research Cambridge

Weights w

Training Time

How much user input shall we use for learning?

predictions

Testing Time

prediction

prediction

Page 81: Carsten Rother Microsoft Research Cambridge

Static brush

Static trimap

Training Time Testing Time

Goal: User should reach a satisfying result in as few interactions as possible

Define: “interaction” and “satisfying”

Page 82: Carsten Rother Microsoft Research Cambridge

Human (averaged over 6 users)

Computer (simulated brush strokes)

Algorithmic State

Suggested action

Ground Truth

Current Solution

Page 83: Carsten Rother Microsoft Research Cambridge

What type of user? (novice user, advanced user)

Adjusting weights with the learning curve of the user

Other interactive systems

Page 84: Carsten Rother Microsoft Research Cambridge

• PatchMatch stereo, BMVC ‘11 PatchMatchBP stereo, BMVC ‘12

• SurfaceStereo, ObjectStereo, CVPR ‘10,’11 SceneStereo, ECCV ‘12

• Learning interactive image segmentation, IJCV ‘12