automatic scene inference for 3d object compositing kevin karsch (uiuc), sunkavalli, k. hadap, s.;...
TRANSCRIPT
Automatic scene inference for 3D object compositing
Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.; Jin, H.; Fonte, R.; Sittig, M., David Forsyth
SIGGRAPH 2014
What is this system• Image editing system• Drag-and-drop object insertion• Place objects in 3D and relight• Fully automatic for recovering a comprehensive 3D
scene model: geometry, illumination, diffuse albedo, and camera parameters
• From single low dynamic range (LDR) image
Existing problems• It’s the artist’s job to create photorealistic
effects by recognizing the physical space• Lighting, shadow, perspective• Need: camera parameters, scene geometry,
surface materials, and sources of illumination
State-of-the-art• http://www.popularmechanics.com/technolog
y/digital/visual-effects/4218826• http://en.wikipedia.org/wiki/The_Adventures
_of_Seinfeld_%26_Superman
What can not this system handle• Works best when scene lighting is diffuse;
therefore generally works better indoors than out
• Errors in either geometry, illumination, or materials may be prominent
• Does not handle object insertion behind existing scene elements
Contribution• Illumination inference: recovers a full lighting
model including light sources not directly visible in the photograph
• Depth estimation: combines data-driven depth transfer with geometric reasoning about the scene layout
How to do this• Need: geometry, illumination, surface
reflectance• Even though the estimates are coarse, the
composites still look realistic because even large changes in lighting are often not perceivable
Indoor/outdoor scene classification• K-nearest-neighbor matching of GIST features• Indoor dataset: NYUv2• Outdoor dataset: Make3D• Different training images and classifiers are
chosen depending on indoor/outdoor scene
Single image reconstruction• Camera parameters, geometry– Focal length f, camera center (cx, cy) and extrinsic
parameters are computed from three orthogonal vanishing points detected in the scene
Data-driven depth estimation• Database: rgbd• Appearance cues for correspondences: multi-
scale SIFT features• Incorporate geometric information
Data-driven depth estimation
Et: depth transferEm: Manhattan worldEo: orientationE3s: spatial smoothness in 3D
Visible sources• Segment the image into superpixels;• Then compute features for each superpixel;– Location in image– Use 340 features used in Make3D
• Train a binary classifier with annotated data to predict whether or not a superpixel is emitting/reflecting a significant amount of light.
Out-of-view sources• Data-driven: annotated SUN360 panorama
dataset;• Assumption: if photographs are similar, then
the illumination environment beyond the photographed region will be similar as well.
Out-of-view sources• Use features: geometric context, orientation maps, spatial
pyramids, HSV histograms, output of the light classifier;• Measure: histogram intersection score, per-pixel inner
product;• Similarity metric of IBLs: how similar the rendered
canonical objects are;• Ranking function: 1-slack, linear SVN-ranking optimization
(trained).
Relative intensities of the light sources• Intensity estimation through rendering: adjusting until a
rendered version of the scene matches the original image;• Humans cannot distinguish between a range of illumination
configurations, suggesting that there is a family of lighting conditions that produce the same perceptual response.
• Simply choose the lighting configuration that can be rendered faster.
Physically grounded image editing• Drag-and-drop insertion• Lighting adjustment• Synthetic depth-of-field