cvpr 2012 review seminar - multi-view hair capture using orientation fields
TRANSCRIPT
MULTI-VIEW HAIR CAPTURE USING ORIENTATION FIELDS LINJIE LUO, HAO LI, SYLVAIN PARIS, THIBAUT WEISE, MARK PAULY, SZYMON RUSINKIEWICZ PROCEEDINGS OF THE 25TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION – CVPR 2012
CVPR 2012 Review Seminar 2012/06/23
Jun Saito @dukecyto
Objective
Build 3D hair geometry from a small number of photographs
Related Work
Paris, S. et al. “Hair photobooth: geometric and photometric acquisition of real hairstyles.” SIGGRAPH 2008.
Requires thousands of images for a single reconstruction
Related Work
Jakob, W., & Moon, J. “Capturing hair assemblies fiber by fiber.” ACM SIGGRAPH Asia 2009.
Capture individual hair strands using focal sweep
Photographed Rendered
Contributions
Passive multi-view stereo approach capable of reconstructing finely detailed hair geometry
Robust matching criterion based on the local orientation of hair
Aggregation scheme to gather local evidence while taking hair structure into account
Progressive template fitting procedure to fuse multiple depth maps
Quantitative evaluation of our acquisition system
System Overview
• Robotic camera gantry w/ Canon EOS 5D Mark II
System Overview
• Each 4 views grouped into a cluster to construct partial depth maps
System Overview
• Multi-resolution orientation fields computed
System Overview
• Partial depth map constructed by minimizing MRF framework with graph cuts, along with structure-aware aggregation and depth map refinement to improve quality
System Overview
• Partial depth maps from different views are merged together
2. Local Hair Orientation
• Filter bank of many (e.g. 180) oriented Difference-of-Gaussian
DoG graph from http://fourier.eng.hmc.edu/e161/lectures/gradient/node11.html
Oriented DoG
2. Local Hair Orientation
Orientation Map orientation to complex domain Maximum response Orientation field
! !,! = argmax!
!! ∗ ! !,! !!!!!! : oriented!filters
! !,! = max!
!! ∗ ! !,! ! !!!!!!!!!!!!! : oriented!filters
! !,! = exp!(2!" !,! )
2. Local Hair Orientation
• Multi-resolution pyramid of orientation fields
Coarse Fine
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation
! ! = !! ! + !!! !
yi:: noisy image xi: denoised image
Global minimization approximation by graph cuts with α expansion
Image from Patter Recognition and Machine Learning
3. Partial Geometry Construction
Approximate global minimization using graph cuts
• Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
• コンピュータビジョン最先端ガイド1 第2章 グラフカット
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation • Smoothness term
! ! = !! ! + !!! !
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation • Smoothness term
! ! = !! ! + !!! !
!! ! =!∈!"#$%&
!! !,!! ! ! − ! !! !
!!∈!(!)
Depth continuity constraint between adjacent pixels p and p’
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation • Smoothness term
! ! = !! ! + !!! !
!! !,!! = exp!(− !!"# ! − !!"# !! !
2!!! )
Enforce strong depth continuity by Gaussian of orientation distance
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation • Data term
! ! = !! ! + !!! !
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation • Data term
!! ! =!∈!"#$%&
!!(!)(!,!)!∈!"#"!$
! ! = !! ! + !!! !
!!! !,! = !!(!!"#! ! ,!!(!) !! !,! )!∈!"#$%
orientation field at level l of
reference view
orientation field at level l of
adjacent view
Projection of 3D point of depth map D at pixel p
onto view v
3. Partial Geometry Construction
• MRF (Markov Random Field) energy formulation • Data term
!! ! =!∈!"#$%&
!!(!)(!,!)!∈!"#"!$
! ! = !! ! + !!! !
!!! !,! = !!(!!"#! ! ,!!(!) !! !,! )!∈!"#$%
Cost function to measure deviation of orientation fields exp(..) is for influence of camera pair’s different tilting angles
3. Partial Geometry Construction
• Structure-Aware Aggregation • Before summing up data term, guided filtering is applied on each
level l based on orientation field
!(!) !,!! = 1! ! 1+ℜ{ ! ! − !! ∗ ! !! − !! }
!!! + !!:(!,!!)∈!!
|ω| : # of pixels in window ε : structure awareness µk : average of orientation σk : standard deviation of orientation
3. Partial Geometry Construction
• Sub-pixel depth map refinement • Similar to T. Beeler et al. “High-quality single-shot capture of facial
geometry.” ACM ToG., 29(4)
3. Partial Geometry Construction
• Aggregation and refinement results
No refinement With refinement With refinement and aggregation
4. Final Geometry Reconstruction
• Merge depths from multiple views by • Kazhdan, M. et al. “Poisson surface reconstruction.” SGP06 • Li, H. et al. “Robust single-view geometry and motion
reconstruction.” SIGGRAPH Asia 2009.
5. Evaluation
• Quantitative evaluation using synthetic data • (a) (f): Synthetic data • (b): This method • (c): (a) overlaid on (b) • (d): Difference between (a)
and (b) is on the order of millimeters
• (g): PMVS + Poisson • (h): T. Beeler et al. “High-
quality single-shot capture of facial geometry.” ACM ToG., 29(4)
• (i) This method
Figure 7: Reconstruction results on different levels. Fromleft to right the resolution of the depth map increases from0.4M to 1.5M and 6M pixels, respectively.
sitions in a T-pose: center, left, right and bottom. Eachposition is 10 degrees apart from the neighboring positionin terms of gantry arm rotation. The left and right cam-eras in the T-pose provide balanced coverage with respectto the center reference camera. Since our system employsorientation-based stereo, matching will fail for horizontalhair strands (more specifically, strands parallel to epipolarlines). To address this problem, a bottom camera is addedto extend the stereo baselines and prevent the “orientationblindness” for horizontal strands.
We use 8 groups of 32 views for all examples in thispaper. Three of these groups are in the upper hemisphere,while a further five are positioned in a ring configurationon the middle horizontal plane, as shown in Figure 2. Wecalibrate the camera positions with a checkerboard pat-tern [19], then perform foreground-background segmenta-tion by background color thresholding combined with asmall amount of additional manual keying. A large arealight source was used for these datasets.
Qualitative Evaluation The top two rows of Figure 11show reconstructions for two different hairstyles, demon-strating that our method can accommodate a variety ofhairstyles — straight to curly — and handle various hair col-ors. We also compare our results on these datasets with[4] and [7] in Figure 6. Note the significant details presentin our reconstructions: though we do not claim to per-form reconstruction at the level of individual hair strands,small groups of hair are clearly visible thanks to ourstructure-aware aggregation and detail-preserving mergingalgorithms.
In Figure 7 and Figure 8, we show how our reconstruc-tion algorithm scales with higher resolution input and morecamera views. Higher resolution and more views greatlyincrease the detail revealed in the reconstructed results.
Quantitative Evaluation To evaluate our reconstructionaccuracy quantitatively, we hired a 3D artist to manuallycreate a highly detailed hair model as our ground truthmodel. We then rendered 8 groups of 32 images of thismodel with the same camera configuration as in the realcapture session. We ran our algorithm on the images andcompared the depth maps of our reconstruction and the
Figure 8: Comparison between the depth map recon-structed with 2, 3, 4 cameras.
ground truth model from the same viewpoints. The resultsare shown in Figure 9. On average, the distance between ourresult and the ground truth model is 5 mm, and the mediandistance is 3 mm. We also ran a state-of-the-art multi-viewalgorithm [4, 7, 1] on the synthetic dataset, and the statisticsof its numerical accuracy are similar to ours. However, asshown in Figure 9, their visual appearance is a lot worsewith the presence of blobs and spurious discontinuities.
Timings Our algorithm performs favorably in terms of ef-ficiency. On a single thread of a Core i7 2.3GHz CPU, each
(a) (b) (c) (d) (e)
(f) (g) (h) (i)
Figure 9: We evaluate the accuracy of our approach byrunning it on synthetic data (a), (f). The result is shownin (b), and is overlaid to the synthetic 3D model in (c). Thedifference between our reconstruction in the ground-truth3D model is on the order of a few millimeters (d). We showa horizontal slice of the depth map in (e): the ground-truthstrands are shown in red and our reconstruction result inblue. Compared to PMVS + Poisson [4, 7] (g) and [1] (h),our reconstruction result (i) is more stable and accurate.
Dynamic Hair Capture
• Being completely passive, this method is applicable to dynamic hair capture
• Capture setup: • 4 high-speed cameras • 640 x 480, 100fps
• Lower quality due to low resolution of high-speed cameras
Conclusions
• Qualitative evaluation shows that passive, multi-view construction of hair geometry based on multi-resolution orientation fields achieves accurate measurements
• Combined with structure-aware aggregation, this method achieves superior quality compared to other methods
• This method can be applied to capturing hair in motion
Latest Related Work
• Chai M. et al. “Single-View Hair Modeling for Portrait Manipulation.” To appear in ACM TOG 31(4), to be presented at SIGGRAPH 2012.