cvpr 2012 review seminar - multi-view hair capture using orientation fields

MULTI-VIEW HAIR CAPTURE USING ORIENTATION FIELDS LINJIE LUO, HAO LI, SYLVAIN PARIS, THIBAUT WEISE, MARK PAULY, SZYMON RUSINKIEWICZ PROCEEDINGS OF THE 25TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION – CVPR 2012

CVPR 2012 Review Seminar 2012/06/23

Jun Saito @dukecyto

Objective

Build 3D hair geometry from a small number of photographs

Related Work

Paris, S. et al. “Hair photobooth: geometric and photometric acquisition of real hairstyles.” SIGGRAPH 2008.

Requires thousands of images for a single reconstruction

Related Work

Jakob, W., & Moon, J. “Capturing hair assemblies fiber by fiber.” ACM SIGGRAPH Asia 2009.

Capture individual hair strands using focal sweep

Photographed Rendered

Contributions

Passive multi-view stereo approach capable of reconstructing finely detailed hair geometry

Robust matching criterion based on the local orientation of hair

Aggregation scheme to gather local evidence while taking hair structure into account

Progressive template fitting procedure to fuse multiple depth maps

Quantitative evaluation of our acquisition system

System Overview

•  Robotic camera gantry w/ Canon EOS 5D Mark II

System Overview

• Each 4 views grouped into a cluster to construct partial depth maps

System Overview

• Multi-resolution orientation fields computed

System Overview

• Partial depth map constructed by minimizing MRF framework with graph cuts, along with structure-aware aggregation and depth map refinement to improve quality

System Overview

• Partial depth maps from different views are merged together

2. Local Hair Orientation

•  Filter bank of many (e.g. 180) oriented Difference-of-Gaussian

DoG graph from http://fourier.eng.hmc.edu/e161/lectures/gradient/node11.html

Oriented DoG


Orientation Map orientation to complex domain Maximum response Orientation field

! !,! = argmax!

!! ∗ ! !,! !!!!!! : oriented!filters

! !,! = max!

!! ∗ ! !,! ! !!!!!!!!!!!!! : oriented!filters

! !,! = exp!(2!" !,! )


• Multi-resolution pyramid of orientation fields

Coarse Fine

3. Partial Geometry Construction

• MRF (Markov Random Field) energy formulation

! ! = !! ! + !!! !

yi:: noisy image xi: denoised image

Global minimization approximation by graph cuts with α expansion

Image from Patter Recognition and Machine Learning


Approximate global minimization using graph cuts

• Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

• コンピュータビジョン最先端ガイド1 第2章グラフカット


• MRF (Markov Random Field) energy formulation •  Smoothness term

! ! = !! ! + !!! !



! ! = !! ! + !!! !

!! ! =!∈!"#$%&

!! !,!! ! ! − ! !! !

!!∈!(!)

Depth continuity constraint between adjacent pixels p and p’



! ! = !! ! + !!! !

!! !,!! = exp!(− !!"# ! − !!"# !! !

2!!! )

Enforce strong depth continuity by Gaussian of orientation distance


• MRF (Markov Random Field) energy formulation •  Data term

! ! = !! ! + !!! !



!! ! =!∈!"#$%&

!!(!)(!,!)!∈!"#"!$

! ! = !! ! + !!! !

!!! !,! = !!(!!"#! ! ,!!(!) !! !,! )!∈!"#$%

orientation field at level l of

reference view

orientation field at level l of

adjacent view

Projection of 3D point of depth map D at pixel p

onto view v



!! ! =!∈!"#$%&

!!(!)(!,!)!∈!"#"!$

! ! = !! ! + !!! !

!!! !,! = !!(!!"#! ! ,!!(!) !! !,! )!∈!"#$%

Cost function to measure deviation of orientation fields exp(..) is for influence of camera pair’s different tilting angles


• Structure-Aware Aggregation •  Before summing up data term, guided filtering is applied on each

level l based on orientation field

!(!) !,!! = 1! ! 1+ℜ{ ! ! − !! ∗ ! !! − !! }

!!! + !!:(!,!!)∈!!

|ω| : # of pixels in window ε : structure awareness µk : average of orientation σk : standard deviation of orientation


• Sub-pixel depth map refinement •  Similar to T. Beeler et al. “High-quality single-shot capture of facial

geometry.” ACM ToG., 29(4)


•  Aggregation and refinement results

No refinement With refinement With refinement and aggregation

4. Final Geometry Reconstruction

• Merge depths from multiple views by •  Kazhdan, M. et al. “Poisson surface reconstruction.” SGP06 •  Li, H. et al. “Robust single-view geometry and motion

reconstruction.” SIGGRAPH Asia 2009.

5. Evaluation

• Quantitative evaluation using synthetic data •  (a) (f): Synthetic data •  (b): This method •  (c): (a) overlaid on (b) •  (d): Difference between (a)

and (b) is on the order of millimeters

•  (g): PMVS + Poisson •  (h): T. Beeler et al. “High-

quality single-shot capture of facial geometry.” ACM ToG., 29(4)

•  (i) This method

Figure 7: Reconstruction results on different levels. Fromleft to right the resolution of the depth map increases from0.4M to 1.5M and 6M pixels, respectively.

sitions in a T-pose: center, left, right and bottom. Eachposition is 10 degrees apart from the neighboring positionin terms of gantry arm rotation. The left and right cam-eras in the T-pose provide balanced coverage with respectto the center reference camera. Since our system employsorientation-based stereo, matching will fail for horizontalhair strands (more specifically, strands parallel to epipolarlines). To address this problem, a bottom camera is addedto extend the stereo baselines and prevent the “orientationblindness” for horizontal strands.

We use 8 groups of 32 views for all examples in thispaper. Three of these groups are in the upper hemisphere,while a further five are positioned in a ring configurationon the middle horizontal plane, as shown in Figure 2. Wecalibrate the camera positions with a checkerboard pat-tern [19], then perform foreground-background segmenta-tion by background color thresholding combined with asmall amount of additional manual keying. A large arealight source was used for these datasets.

Qualitative Evaluation The top two rows of Figure 11show reconstructions for two different hairstyles, demon-strating that our method can accommodate a variety ofhairstyles — straight to curly — and handle various hair col-ors. We also compare our results on these datasets with[4] and [7] in Figure 6. Note the significant details presentin our reconstructions: though we do not claim to per-form reconstruction at the level of individual hair strands,small groups of hair are clearly visible thanks to ourstructure-aware aggregation and detail-preserving mergingalgorithms.

In Figure 7 and Figure 8, we show how our reconstruc-tion algorithm scales with higher resolution input and morecamera views. Higher resolution and more views greatlyincrease the detail revealed in the reconstructed results.

Quantitative Evaluation To evaluate our reconstructionaccuracy quantitatively, we hired a 3D artist to manuallycreate a highly detailed hair model as our ground truthmodel. We then rendered 8 groups of 32 images of thismodel with the same camera configuration as in the realcapture session. We ran our algorithm on the images andcompared the depth maps of our reconstruction and the

Figure 8: Comparison between the depth map recon-structed with 2, 3, 4 cameras.

ground truth model from the same viewpoints. The resultsare shown in Figure 9. On average, the distance between ourresult and the ground truth model is 5 mm, and the mediandistance is 3 mm. We also ran a state-of-the-art multi-viewalgorithm [4, 7, 1] on the synthetic dataset, and the statisticsof its numerical accuracy are similar to ours. However, asshown in Figure 9, their visual appearance is a lot worsewith the presence of blobs and spurious discontinuities.

Timings Our algorithm performs favorably in terms of ef-ficiency. On a single thread of a Core i7 2.3GHz CPU, each

(a) (b) (c) (d) (e)

(f) (g) (h) (i)

Figure 9: We evaluate the accuracy of our approach byrunning it on synthetic data (a), (f). The result is shownin (b), and is overlaid to the synthetic 3D model in (c). Thedifference between our reconstruction in the ground-truth3D model is on the order of a few millimeters (d). We showa horizontal slice of the depth map in (e): the ground-truthstrands are shown in red and our reconstruction result inblue. Compared to PMVS + Poisson [4, 7] (g) and [1] (h),our reconstruction result (i) is more stable and accurate.

Dynamic Hair Capture

• Being completely passive, this method is applicable to dynamic hair capture

• Capture setup: •  4 high-speed cameras •  640 x 480, 100fps

•  Lower quality due to low resolution of high-speed cameras

Conclusions

• Qualitative evaluation shows that passive, multi-view construction of hair geometry based on multi-resolution orientation fields achieves accurate measurements

• Combined with structure-aware aggregation, this method achieves superior quality compared to other methods

•  This method can be applied to capturing hair in motion

Latest Related Work

• Chai M. et al. “Single-View Hair Modeling for Portrait Manipulation.” To appear in ACM TOG 31(4), to be presented at SIGGRAPH 2012.

cvpr 2012 review seminar - multi-view hair capture using orientation fields

Technology