multi-view stereo for community photo collections

91
Multi-View Stereo for Community Photo Collections Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz

Upload: zaide

Post on 24-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Multi-View Stereo for Community Photo Collections. Michael Goesele , Noah Snavely , Brian Curless , Hugues Hoppe, Steven M. Seitz. photos varies substantially in lighting, foreground clutter, scale due to various cameras, time, weather. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multi-View Stereo for Community Photo Collections

Multi-View Stereo for Community Photo Collections

Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz

Page 2: Multi-View Stereo for Community Photo Collections

photos varies substantially in lighting, foreground clutter, scale due to various cameras, time, weather

Page 3: Multi-View Stereo for Community Photo Collections

Images of Notre Dame (a variation in sampling rate of more than 1,000)

Page 4: Multi-View Stereo for Community Photo Collections

Images taken in the wild—wide variety

Lots of photographers

Different cameras

Sampling rates

Occlusion

Different time of day, weather

Post processing

Page 5: Multi-View Stereo for Community Photo Collections

The problem statement

Design an adaptive view selection process Given the massive number of images, find a compatible subset

Multi View Stereo (MVS) Reconstruct robust & accurate depth maps from this subset

Page 6: Multi-View Stereo for Community Photo Collections

Previous workGlobal View Selection assume a relatively uniform viewpoint distribution and simply choose the k nearest images from each reference viewLocal View Selection use shiftable windows in time to adaptively choose frames to match

Page 7: Multi-View Stereo for Community Photo Collections

CPC non-uniformly distributed in 7D viewpoint(translation, rotation, focal length) space- represents an extreme case of unorganized images sets

Algorithm overview:- Calibrating Internet Photos- Global View Selection- Local View Selection- Multi-View Stereo Reconstruction

Page 8: Multi-View Stereo for Community Photo Collections

Calibrating Internet Photos• PTLens extracts camera and lens information and corrects for

radial distortion based on a database of camera and lens properties

• Discard images cannot be corrected• Remaining images entered into a robust, metric structure-from-

motion (SfM) system (uses SIFT feature detector) - generate a sparse scene reconstruction from the matched features - list of images where feature was detectedRemove Radiometric Distortions - all input images into a linear radiometric space (sRGB color space)

Page 9: Multi-View Stereo for Community Photo Collections

Global View Selection

For each reference view R, global view selection seeks a set N of neighboring views that are good candidates for stereo matching in terms of scene content, appearance, and scale

SIFT selects features with similar appearance - Shared feature points: collocation problem - Scale invariance: stereo matching problem

need a measurement to deal these two problems !

Page 10: Multi-View Stereo for Community Photo Collections

Global score gR for each view V within a candidate neighborhood N (which includes R)

FV: set of feature points in View V

FV ∩ FR: common feature points of View V and R

wN(f): measure angular separation of two views, the larger, the more separated in angulation

ws(f): measures similarity in scale of two views, the larger, the more similar in scale

Page 11: Multi-View Stereo for Community Photo Collections

Calculating wN(f)

α is the angle between the lines of sight from V i and Vj to f

αmax set to 10 degrees

Page 12: Multi-View Stereo for Community Photo Collections

Calculating ws(f)r = sR(f) / sV(f)

sR(f): diameter of a sphere centered at f whose projected diameter in view V equals the pixel spacing in V

- favors the case 1 ≤ r <2

Page 13: Multi-View Stereo for Community Photo Collections

Add scores of all feature points for all view V and select top N

Rescaling views

If scaleR(Vmin) is smaller than 0.6 (threshold), which means 5x5 R vs 3x3 V, need rescale

Find lowest resolution view Vmin, resample R

Resample view whose scaleR(V) > 1.2 to match the scale of R

Page 14: Multi-View Stereo for Community Photo Collections

Local View Selection

Global view selection determines a set N of good matching candidates for a reference view R

Select a smaller set A N (|A|=4) of active views for stereo ∈matching at a particular location in the reference view

Page 15: Multi-View Stereo for Community Photo Collections
Page 16: Multi-View Stereo for Community Photo Collections

Stereo Matching

Use nxn window centered on point in R

Goal: To maximize photometric consistency of this patch to its projections into the neighboring views

Scene Geometry Model

Photometric Model

Page 17: Multi-View Stereo for Community Photo Collections

Scene Geometry Model

Window centered at pixel (s, t)

oR is the center of projection of view R

rR(s,t) is the normalized ray direction through the pixel

Reference view corresponds to a point xR(s,t) at a distance h(s,t) along the viewing ray rR(s,t)

Page 18: Multi-View Stereo for Community Photo Collections
Page 19: Multi-View Stereo for Community Photo Collections

Photometric Model

Simple model for reflectance effects—a color scale factor ck for each patch projected into the k-th neighboring view

- Models Lambertian reflectance for constant illumination over planar surfaces

- Fails for shadow boundaries, caustics, specular highlights, bumpy surfaces

Page 20: Multi-View Stereo for Community Photo Collections
Page 21: Multi-View Stereo for Community Photo Collections
Page 22: Multi-View Stereo for Community Photo Collections
Page 23: Multi-View Stereo for Community Photo Collections
Page 24: Multi-View Stereo for Community Photo Collections
Page 25: Multi-View Stereo for Community Photo Collections

ResultsSeveral Internet CPCs gather from Flickr varying widely in terms of size, number of photographers and scale

Page 26: Multi-View Stereo for Community Photo Collections

Output

Page 27: Multi-View Stereo for Community Photo Collections
Page 28: Multi-View Stereo for Community Photo Collections
Page 30: Multi-View Stereo for Community Photo Collections
Page 31: Multi-View Stereo for Community Photo Collections

Thank you!

Page 32: Multi-View Stereo for Community Photo Collections

Reconstructing Building Interiors from Images

Yasutaka Furukawa Brian Curless Steven M. SeitzUniversity of Washington, Seattle, USA

Richard SzeliskiMicrosoft Research, Redmond, USA

Page 33: Multi-View Stereo for Community Photo Collections

Reconstruction & Visualizationof Architectural Scenes

• Manual (semi-automatic)– Google Earth & Virtual Earth– Façade [Debevec et al., 1996]– CityEngine [Müller et al., 2006, 2007]

• Automatic– Ground-level images [Cornelis et al., 2008, Pollefeys et al., 2008]

– Aerial images [Zebedin et al., 2008]

Google Earth Virtual Earth Zebedin et al.Müller et al.

Page 34: Multi-View Stereo for Community Photo Collections

Reconstruction & Visualizationof Architectural Scenes

• Manual (semi-automatic)– Google Earth & Virtual Earth– Façade [Debevec et al., 1996]– CityEngine [Müller et al., 2006, 2007]

• Automatic– Ground-level images [Cornelis et al., 2008, Pollefeys et al., 2008]

– Aerial images [Zebedin et al., 2008]

Google Earth Virtual Earth Zebedin et al.Müller et al.

Page 35: Multi-View Stereo for Community Photo Collections

Reconstruction & Visualizationof Architectural Scenes

Google Earth Virtual Earth Zebedin et al.Müller et al.

Little attention paid to indoor scenes

Page 36: Multi-View Stereo for Community Photo Collections

Our Goal• Fully automatic system for indoors/outdoors– Reconstructs a simple 3D model from images– Provides real-time interactive visualization

Page 37: Multi-View Stereo for Community Photo Collections

Challenges - Reconstruction

• Multi-view stereo (MVS) typically produces a dense model

• We want the model to be– Simple for real-time interactive visualization of a

large scene (e.g., a whole house)– Accurate for high-quality image-based rendering

Page 38: Multi-View Stereo for Community Photo Collections

Challenges – Indoor Reconstruction

Texture-poor surfaces Complicated visibility

Prevalence of thin structures(doors, walls, tables)

Page 39: Multi-View Stereo for Community Photo Collections

Outline

• System pipeline (system contribution)• Algorithmic details (technical contribution)• Experimental results• Conclusion and future work

Page 40: Multi-View Stereo for Community Photo Collections

System pipeline

Images

Images

Page 41: Multi-View Stereo for Community Photo Collections

System pipeline

Structure-from-Motion

Images

Bundler by Noah SnavelyStructure from Motion for unordered image collectionshttp://phototour.cs.washington.edu/bundler/

Page 42: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM

PMVS by Yasutaka Furukawa and Jean PoncePatch-based Multi-View Stereo Softwarehttp://grail.cs.washington.edu/software/pmvs/

Multi-view Stereo

Page 43: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS

Manhattan-world Stereo[Furukawa et al., CVPR 2009]

Page 44: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS

Manhattan-world Stereo[Furukawa et al., CVPR 2009]

Page 45: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS

Manhattan-world Stereo[Furukawa et al., CVPR 2009]

Page 46: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS

Manhattan-world Stereo[Furukawa et al., CVPR 2009]

Page 47: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS

Manhattan-world Stereo[Furukawa et al., CVPR 2009]

Page 48: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS MWS

Axis-aligned depth map merging(our contribution)

Page 49: Multi-View Stereo for Community Photo Collections

System pipeline

Images SFM MVS MWS Merging

Rendering: simple view-dependent texture mapping

Page 50: Multi-View Stereo for Community Photo Collections

Outline

• System pipeline (system contribution)• Algorithmic details (technical contribution) • Experimental results• Conclusion and future work

Page 51: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Basic framework is similar to volumetric MRF [Vogiatzis 2005, Sinha 2007, Zach 2007, Hernández 2007]

Page 52: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Basic framework is similar to volumetric MRF [Vogiatzis 2005, Sinha 2007, Zach 2007, Hernández 2007]

Page 53: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Basic framework is similar to volumetric MRF [Vogiatzis 2005, Sinha 2007, Zach 2007, Hernández 2007]

Page 54: Multi-View Stereo for Community Photo Collections

Key Feature 1 - Penalty terms

Page 55: Multi-View Stereo for Community Photo Collections

Key Feature 1 - Penalty terms

Binary penalty

Binary encodes smoothness & dataUnary is often constant (inflation)

Page 56: Multi-View Stereo for Community Photo Collections

Key Feature 1 - Penalty terms

Binary penalty

Binary encodes smoothness & dataUnary is often constant (inflation)

Page 57: Multi-View Stereo for Community Photo Collections

Key Feature 1 - Penalty terms

Binary penalty

Binary encodes smoothness & dataUnary is often constant (inflation)

Page 58: Multi-View Stereo for Community Photo Collections

Key Feature 1 - Penalty terms

Binary is smoothnessUnary encodes data

Binary penalty

Binary encodes smoothness & dataUnary is often constant (inflation)

Page 59: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Align voxel grid withthe dominant axes

• Data term (unary)

Page 60: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Align voxel grid withthe dominant axes

• Data term (unary)• Smoothness (binary)

Page 61: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Align voxel grid withthe dominant axes

• Data term (unary)• Smoothness (binary)

Page 62: Multi-View Stereo for Community Photo Collections

Axis-aligned Depth-map Merging

• Align voxel grid withthe dominant axes

• Data term (unary)• Smoothness (binary)• Graph-cuts

Page 63: Multi-View Stereo for Community Photo Collections

Outline

• System pipeline (system contribution)• Algorithmic details (technical contribution)• Experimental results• Conclusion and future work

Page 64: Multi-View Stereo for Community Photo Collections

Kitchen - 22 images1364 triangles

hall - 97 images3344 triangles

house - 148 images8196 triangles

gallery - 492 images8302 triangles

Page 65: Multi-View Stereo for Community Photo Collections

Demo

Page 66: Multi-View Stereo for Community Photo Collections

Conclusion & Future Work

• Conclusion– Fully automated 3D reconstruction/visualization

system for architectural scenes– Novel depth-map merging to produce piece-wise

planar axis-aligned model with sub-voxel accuracy• Future work– Relax Manhattan-world assumption– Larger scenes (e.g., a whole building)

Page 67: Multi-View Stereo for Community Photo Collections

Any Questions?

Images SFM MVS MWS Merging

Page 68: Multi-View Stereo for Community Photo Collections

KinectFusion: Real-time 3D Reconstruction and InteractionUsing a Moving Depth Camera

Page 69: Multi-View Stereo for Community Photo Collections

[0] KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera*, 1Microsoft Research

Page 70: Multi-View Stereo for Community Photo Collections

A) Depth Map Conversion

Reduce noise and calibrate with the inferred camera intrinsic matrix to get the point cloud position in camera coordinate.

[1] C.Tomasi, R. Manduchi, "Bilateral Filtering for gray and color images", Sixth International Conference on Computer Vision, pp 839-46, New Delhi, India, 1998.

Page 71: Multi-View Stereo for Community Photo Collections

B) Camera Tracking(ICP)

[2] Zhang, Zhengyou (1994). "Iterative point matching for registration of free-form curves and surfaces". International Journal of Computer Vision (Springer)

Page 72: Multi-View Stereo for Community Photo Collections

C) Volumetric Integration

[3] B. Curless and M. Levoy. A volumetric method for building complex models from range images. ACM Trans. Graph., 1996.

Signed distance field: divided the world into voxels, each one saves the nearest distance to a surface.

2D example

Page 73: Multi-View Stereo for Community Photo Collections

3D example

Page 74: Multi-View Stereo for Community Photo Collections

D) Ray Casting

Page 75: Multi-View Stereo for Community Photo Collections

Demonstration

Page 76: Multi-View Stereo for Community Photo Collections

Building Rome in a Day

Paper Summary

Page 77: Multi-View Stereo for Community Photo Collections

Outcome•A system that can reconstruct 3D geometry from large, unorganized collections of photographs

• Uses new distributed computer vision algorithms for image matching and 3D reconstruction

• Algorithms designed to maximize parallelism at each state of the pipeline• Algorithms designed to scale well with size of problem• Algorithms designed to scale well with amount of available computation.

Page 78: Multi-View Stereo for Community Photo Collections

Challenges

• Images collected from photo sharing websites• Images are unstructured

• Images taken in no specific order• no control over distributions of camera viewpoints

• Images are uncalibrated• Different photographers• Different cameras• Little knowledge of camera settings for each image

• Scale of project 2-3 orders of magnitude larger than used with prior methods

• Algorithms must be fast to complete reconstruction in one day

Page 79: Multi-View Stereo for Community Photo Collections

Applications•Government sector uses city models• Urban planning and visualization

•Academic disciplines use city models• History• Archeology• Geography

•Consumer mapping technology• Google Earth• GPS navigation systems• Online Map sites

Page 80: Multi-View Stereo for Community Photo Collections

Recover 3D Geometry•Given scene geometry and camera geometry, we can predict where the 2D projections of each point should be in each image. Compare these projections to the original measurements.• Scene geometry represented as 3D points• Camera geometry represented as 3D position and orientation for each camera

•Equations:

(x, y, z) = (x/z, y/z)

Page 81: Multi-View Stereo for Community Photo Collections

Correspondence Problem•Definition: Automatically estimate 2D correspondence between input images

•Detect most distinctive, repeatable features in each image

•Match features across image pairs by finding similar looking features using approximate nearest neighbors search• For each pair of images, insert the features of one image into a k-d tree• Use features from second image as queries.• For each query, if the nearest neighbor is sufficiently far away from the next nearest

neighbor, declare a match.

•Clean up matches • Rigid scenes have strong geometric constrains on the locations of matching features• 3x3 Fundamental Matrix, F, such that corresponding points xij, xik from images j and k

satisfy:

Page 82: Multi-View Stereo for Community Photo Collections

City Scale Matching•Goal: Find correspondence spanning entire collection

•Solve using graph estimation problem• “Match Graph”

• Graph vertices = images• Graph edge exists between two vertices iff they are looking at the same part of

the scene and have a sufficient number of feature matches

• Multiround scheme• In each round, propose a set of edges in the match graph

• Whole Image Similarity• Query Expansion

• Verify each edge through feature matching

Page 83: Multi-View Stereo for Community Photo Collections

City Scale Matching: Whole Image Similarity

•Used for first round edge proposal

•Metric to compute overall similarity of two images• Cluster features into visual words• Visual words weighted using Term Frequency Inverse Document Frequency method• Apply document retrieval algorithms to match data sets

• Each photo represented as sparse histogram of visual words• Compare histograms by taking inner product

•For each image, determine k1 + k2 most similar images•Verify top k1 images •Result: sparsely connected match graph

•Goal: minimize connected components• For each image, consider next k2 images and verify pairs which straddle different

connected components

Page 84: Multi-View Stereo for Community Photo Collections

City Scale Matching: Query Expansion

•Result from first round: sparse match graph, insufficiently dense to produce good reconstruction

•Definition, Query Expansion: find all vertices within two steps of the query vertex

•If vertices i and k connected to j, propose i and k also connected•Verify edge (i, k)

Page 85: Multi-View Stereo for Community Photo Collections

City Scale Matching:Implementation

1. Pre-processing2. Verification3. Track Generation

•System runs on cluster of computers (“nodes”)•“Master node” makes job scheduling decisions

Page 86: Multi-View Stereo for Community Photo Collections

Implementation: Pre-processing•Images distributed to cluster nodes in chunks of fixed size•Node down-samples images to fixed size•Node extracts features

Page 87: Multi-View Stereo for Community Photo Collections

Implementation: Verification•Use whole image similarity for first two rounds•Use query expansion for remaining rounds

•Solve with greedy bin-packing algorithm• Bin = set of jobs sent to a node

• Drawback: requires multiple sweeps over remaining image pairs• Solution: consider only fixed sized subset of image pairs for

scheduling

Page 88: Multi-View Stereo for Community Photo Collections

Implementation: Track Generation•Definition: A group of features corresponding to a single 3D point

•Combine all pairwise matching information to generate consistent tracks across images

•Solved by finding connected components in a graph• Vertex = features in images• Edge = connect matching features

Page 89: Multi-View Stereo for Community Photo Collections

Recover camera poses•Find and reconstruct skeletal set, minimal subset of photographs capturing essential geometry of a scene

•Add remaining images to the scene by estimating each camera’s pose with respect to known 3D points matched to the image

Page 90: Multi-View Stereo for Community Photo Collections

Multiview Stereo•Estimate depths for every pixel in every image

•Merge resulting 3D points into a single model

•Scale exceeds MVS algorithms ability• Group photos into clusters that each reconstruct part of the scene

Page 91: Multi-View Stereo for Community Photo Collections

Results