visual 3d modeling using cameras and camera networks marc pollefeys university of north carolina at...

Visual 3D Modeling using Cameras and Camera Networks

Marc PollefeysUniversity of North Carolina at Chapel Hill

Visual 3D Modeling using Cameras and Camera Networks2

Talk outline

• Introduction• Visual 3D modeling with a hand-held camera

– Acquisition of camera motion– Acquisition of scene structure– Constructing visual models

• Camera Networks– Camera Network Calibration– Camera Network Synchronization – Towards Active Camera Networks…

• Conclusion

What can be achieved?

• Can we get 3D models from images? • How much do we need to know about the camera? • Can we freely move around? Hand-held? • Do we need to keep parameters fixed? Zoom?• What about auto-exposure?

• What about camera networks?• Can we provide more flexible systems? Avoid calibration?• What about using IP-based PTZ cameras? Hand-held

camcorders?• Unsynchronized or even asynchronous?

Talk outline

• Camera Networks– Camera Network Calibration– Camera Network Synchronization – Towards Active Camera Networks…

• Conclusion

(Pollefeys et al. ’98)

(Pollefeys et al. ’04)

More efficient RANSAC

Fully projective

Bundle adjustment

Deal with dominant planes

Improved self-calibration

Polar stereo rectification

Deal with radial distortion

Volumetric 3D integration

Image-based rendering

Faster stereo algorithm

Video Key-frame selection

Deal with specularities

Deal with Auto-Exposure

Feature tracking/matching

• Shape-from-Photographs: match Harris corners

• Shape-from-Video: track KLT features

Problem: insufficient motion between consecutive video-frames to compute epipolar geometry accurately and use it effectively as an outlier filter

Key-frame selection

Select key-frame when F yields a better model than H– Use Robust Geometric Information Criterion

– Given view i as a key-frame, pick view j as next key-frame for first view where GRIC(Fij)>GRIC(Hij) (or a few views later)

(Torr ’98)

bad fit penalty model complexity

H-GRIC

F-GRIC

(Pollefeys et al.’02)

Epipolar geometry

e20m m 1T2 F

Fundamental matrix (3x3 rank 2

matrix)

1. Computable from corresponding points

2. Simplifies matching3. Allows to detect wrong

matches4. Related to calibration

Underlying structure in set of matches for rigid scenes

Epipolar geometry computation:robust estimation (RANSAC)

Step 1. Extract featuresStep 2. Compute a set of potential matchesStep 3. do

Step 3.1 select minimal sample (i.e. 7 matches)Step 3.2 compute solution(s) for FStep 3.3 count inliers, if not promising stop

until (#inliers,#samples)<95% samples#7)1(1

matches#inliers#

#inliers 90% 80% 70% 60% 50%

#samples 5 13 35 106 382

Step 4. Compute F based on all inliersStep 5. Look for additional matchesStep 6. Refine F based on all correct matches

(generate hypothesis)

(verify hypothesis)

Epipolar geometry computation

geometric relations between two views is fully described by recovered 3x3 matrix F

Initialize Motion (P1,P2 compatibel with F)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Extend motion(compute pose through matches seen in 2 or more previous views)

Extend structure(Initialize new structure, refine existing structure)

Dealing with dominant planar scenes

• USaM fails when common features are all in a plane

• Solution: part 1 Model selection to detect problem

(Pollefeys et al., ECCV‘02)

Dealing with dominant planar scenes

• USaM fails when common features are all in a plane

• Solution: part 2 Delay ambiguous computationsuntil after self-calibration(couple self-calibration over all 3D

parts)

Refine Structure and Motion

• Use projective bundle adjustment– Sparse bundle allows very efficient computation (2 levels)– Take radial distortion into account (1 or 2 parameters)

projection

constraints

Tii ωΩ KKPP

Self-calibration using absolute conic

Absolute conic projection:

Translate constraints on K through projection equationto constraints on *

Euclidean projection matrix:

Upgrade from projective to metric

Transform structure and motion so that * diag(1,1,1,0)

some constraints, e.g. constant, no skew,...

(Faugeras ECCV’92; Triggs CVPR’97; Pollefeys et al. ICCV’98; etc.)

Practical linear self-calibration

fTPPKK

(only rough aproximation, but still usefull to avoid degenerate configurations)

(relatively accurate for most cameras)

Don’t treat all constraints equal

after normalization!

when fixating point at image-center not only absolute quadric diag(1,1,1,0) satisfies ICCV98 eqs., but also diag(1,1,1,a), i.e. real or imaginary spheres!

Refine Metric Structure and Motion

• Use metric bundle adjustment– Use Euclidean parameterization for projection matrices– Same sparseness advantages, also use radial distortion

Mixing real and virtual elements in video

Virtual reconstruction of ancient fountain

Preview fragment of sagalassos TV documentary Similar to 2D3‘s Boujou and RealViz‘ MatchMover

Intermezzo: Auto-calibration of Multi-Projector System

(Raij and Pollefeys, submitted)hard because screens are planar, but still possible

Stereo rectification

• Resample image to simplify matching process

Stereo rectification

• Resample image to simplify matching process

Also take into account radial distortion!

Polar stereo rectification

Does not work with standard Homography-based approaches

Polar reparametrization of images around epipoles(Pollefeys et al. ICCV’99)

General iso-disparity surfaces(Pollefeys and Sinha, ECCV’04)

Example: polar rectification preserves disp.

Application: Active vision

Also interesting relation to human horopter

Stereo matching

Optimal path(dynamic programming )

Similarity measure(SSD or NCC)

Constraints• epipolar

• ordering

• uniqueness

• disparity limit

• disparity gradient limit

Trade-off

• Matching cost

• Discontinuities

(Cox et al. CVGIP’96; Koch’96; Falkenhagen´97; Van Meerbergen,Vergauwen,Pollefeys,VanGool IJCV‘02)

Hierarchical stereo matchingD

Allows faster computation

Deals with large disparity ranges

Disparity map

image I(x,y) image I´(x´,y´)Disparity map D(x,y)

(x´,y´)=(x+D(x,y),y)

Example: reconstruct image from neighbors

Multi-view depth fusion

• Compute depth for every pixel of reference image– Triangulation– Use multiple views– Up- and down sequence– Use Kalman filter

(Koch, Pollefeys and Van Gool. ECCV‘98)

Also allows to compute robust texture

Real-time stereo on GPU

• Plane-sweep stereo• Computes Sum-of-Square-Differences (use pixelshader)• Hardware mip-map generation for aggregation over window• Trade-off between small and large support window

(Yang and Pollefeys, CVPR2003)

150M disparity hypothesis/sec (Radeon9700pro)e.g. 512x512x20disparities at 30Hz

(Demo GeForce4)

GPU is great for vision too!

Dealing with specular highlights

Extend photo-consistency model to include highlights

(Yang, Pollefeys and Welch, ICCV’03)

3D surface model

Depth image Triangle mesh Texture image

Textured 3DWireframe model

Volumetric 3D integration

Multiple depth images Volumetric integration Texture integration

(Curless and Levoy, Siggraph´96)

patchwork texture map

Dealing with auto-exposure

• Estimate cameras radiometric response curve, exposure and white balance changes

• Extends prior HDR work at Columbia, CMU, etc.to moving camera

(Kim and Pollefeys, submitted)

brightness transfer curve

auto-exposure fixed-exposure response curve model

robust estimate using DP

Dealing with auto-exposure

Applications:• Photometric alignment of textures (or HDR textures)• HDR video

(Kim and Pollefeys, submitted)

Part of Jain temple

Recorded during post-ICCV tourist trip in India

(Nikon F50; Scanned)

Example: DV video 3D model

accuracy ~1/500 from DV video (i.e. 140kb jpegs 576x720)

Unstructured lightfield rendering

(Heigl et al.’99)

Talk outline

• Camera Networks– Camera Network Calibration– Camera Network Synchronization – towards active camera networks…

• Conclusion

Camera Networks

• CMU’s Dome, 3D Room, etc.

• MIT’s Visual Hull

• Maryland’s Keck lab, ETHZ’s BLUE-C and more

• Recently, Shape-from-Silhouette/Visual-Hull systems have been very popular

Camera Networks

• Offline Calibration Procedure• Special Calibration Data

– Planar Pattern– moving LED

• Requires physical access to environment• Active Camera Networks

– How do we maintain calibration ?

An example

• 4 NTSC videos recorded by 4 computers for 4 minutes• Manually synchronized and calibrated using MoCap

system

P. Sand, L. McMillan, and J. Popovic. Continuous Capture of Skin Deformation.ACM Transactions on Graphics 22, 3, 578-586, 2003.

Can we do without explicit calibration?

• Feature-based? – Hard to match features between very different views– Not many features on foreground– Background often doesn’t overlap much between views

• Silhouette-based?– Necessary for visual-hull anyway– But approach is not obvious

Multiple View Geometry of Silhouettes

• Frontier Points• Epipolar Tangents

• Points on Silhouettes in 2 views do not correspond in general except for projected Frontier Points

• Always at least 2 extremal frontier points per silhouette

• In general, correspondence only over two views

Calibration from Silhouettes: prior work

Epipolar Geometry from Silhouettes • Porril and Pollard, ’91• Astrom, Cipolla and Giblin, ’96

Structure-and-motion from Silhouettes• Joshi, Ahuja and Ponce’95 (trinocular rig/rigid object)• Vijayakumar, Kriegman and Ponce’96 (orthographic)• Wong and Cipolla’01 (circular motion, at least to start)• Yezzi and Soatto’03 (only refinement)

None really applicable to calibrate visual hull system

Camera Network Calibration from Silhouettes

• 7 or more corresponding frontier points needed to compute epipolar geometry for general motion

• Hard to find on single silhouette and possibly occluded

• However, Visual Hull systems record many silhouettes!

(Sinha, Pollefeys and McMillan, submitted)

Camera Network Calibration from Silhouettes

• If we know the epipoles, it is simple• Draw 3 outer epipolar tangents (from two silhouettes)

• Compute corresponding line homography H-T (not unique)

• Epipolar Geometry F=[e]xH

Let’s just sample: RANSAC

• Repeat – Generate random hypothesis for epipoles – Compute epipolar geometry – Verify hypothesis and count inliers

until satisfying hypothesis • Refine hypothesis

– minimize symmetric transfer error of frontier points– include more inliers

Until error and inliers stable

(use conservative threshold, e.g. 5 pixels, but abort early if not promising)

(use strict threshold, e.g. 1 pixels)

We’ll need an efficient representation as we are likely to have to do many trials!

A Compact Representation for SilhouettesTangent Envelopes

• Convex Hull of Silhouette.

• Tangency Pointsfor a discrete set of angles.

• Approx. 500 bytes/frame. Hence a whole video sequences easily fits in memory.

• Tangency Computations are efficient.

Epipole Hypothesis and Computing H

Model Verification

Remarks

• RANSAC allows efficient exploration of 4D parameter space (i.e. epipole pair) while being robust to imperfect silhouettes

• Select key-frames to avoid having too many identical constraints (when silhouette is static)

Reprojection Error and Epipole Hypothesis Distribution

Residual Distribution – Hypotheses along y-axis– Sorted Residuals along x-axis.– Pixel Error along z-axis.

40 best hypothesis out of 30000

Typically, 1/5000 samples converges to global minima after non-linear refinement (corresponds to 15 sec. computation time)

Computed Fundamental Matrices

F computed directly (black epipolar lines) F after consistent 3D reconstruction (color)

From epipolar geometry to full calibration

• Not trivial because only matches between two views• Approach similar to Levi et al. CVPR’03, but practical• Key step is to solve for camera triplet

• Assemble complete camera network• projective bundle, self-calibration, metric bundle

(also linear in v)

(v is 4-vector )

Choose P3 corresponding to closest

Experiment

4 video sequences at 30 fps.

All F Matricescomputed from silhouettes

Full calibration computed

Metric Cameras and Visual-Hull Reconstruction from 4 views

Final calibration quality comparable to explicit calibration procedure

What if the videos are unsynchronized?

For videos recorded at a constant framerate, same contraints are valid, up to some extra unknown temporal offsets

Synchronization and calibration from silhouettes (Sinha and Pollefeys, submitted)

• Add a random temporal offset to RANSAC hypothesis generation, sample more

• Use multi-resolution approach:– Key-frames with slow motion, rough synchronization– Add key-frames with faster motion, refine

synchronization

Synchronization experiment

• Total temporal offset search range [-500,+500] (i.e. ±15s)• Unique peaks for correct offsets• Possibility for sub-frame synchronization

Synchronize camera network

• Consider oriented graph with offsets as branch value

• For consistency loops should add up to zero• MLE by minimizing

ground truth

in frames (=1/30s)

Towards active camera networks

• Provide much more flexibility by making use of pan-tilt-zoom range, networked cameras

• (maintaining) calibration is a challenge

up to 3Gpix!

Calibration of PTZ camerassimilar to Collins and Tsin ’99, but with varying radial distortion

Conclusion

• 3D models from video, more flexibility, more general

• Camera networks synchronization and calibration, just from silhouettes, great for visual-hull systems

Future plans• Deal with sub-frame offset for VH reconstruction• Extend to active camera network (PTZ cameras)• Extend to asynchronous video streams (IP

cameras)

Acknowledgment

• NSF Career, NSF ITR on 3D-TV, DARPA seedling, Link foundation• EU ACTS VANGUARD, ITEA BEYOND, EU IST MURALE, FWO-Vlaanderen

• Sudipta Sinha, Ruigang Yang, Seon Joo Kim, Andrew Raij, Greg Welch, Leonard McMillan (UNC)

• Maarten Vergauwen, Frank Verbiest, Kurt Cornelis, Jan Tops, Luc Van Gool (KULeuven), Reinhard Koch (UKiel), Benno Heigl

visual 3d modeling using cameras and camera networks marc pollefeys university of north carolina at...

d modeling

d models

autoexposure slide

f step

chapel hill slide

outlier filter slide

ipbased ptz cameras

set of potential matches

Documents

adapted from m. pollefeys, unc epipolar plane epipoles...

image formation iii chapter 1 (forsythponce) cameras lenses...

projective 2d geometry course 2 multiple view geometry comp...

parameter estimation class 6 multiple view geometry comp...

cameras - pdfs.semanticscholar.org€¦ · freestanding...

visual 3d modeling from images - cvg @ ethz · visual 3d...

computer vision multiple view geometry & stereo marc...

3d photography marc pollefeys fall 2007

multiple view reconstruction class 23 multiple view geometry...

index cctv · 2010. 10. 6. · index cctv professional...

simple calibration of non-overlapping cameras with a mirror...

computer vision - university of california,...

optical flow i -...

service of worship - duke chapel bulletin final.pdf ·...

image formation iii chapter 1 (forsyth&ponce) cameras...

multiple view reconstruction class 24 multiple view geometry...

curriculum vitae marc pollefeys april 5, 2016 · curriculum...

calvary chapel calvary chapel

computer vision sources, shading and photometric stereo marc...

a unified approach to calibrate a network of camcorders &...