peek-in-the-pic: flying through architectural scenes from ... · pdf filepeek-in-the-pic:...

Volume 0(1981), Number 0 pp. 1–11

Peek-in-the-Pic: Flying Through Architectural Scenes FromA Single Image

Amit Shesh and Baoquan Chen†

University of Minnesota at Twin Cities

Abstract

Many casually taken “tourist” photographs comprise of architectural objects like houses, buildings, etc. Recon-structing such 3D scenes captured in a single photograph is avery challenging problem. We propose a novelapproach to reconstruct such architectural scenes with minimal and simple user interaction, with the goal ofproviding 3D navigational capability to an image rather than acquiring accurate geometric detail. Our system,Peek-in-the-Pic, is based on a sketch-based geometry reconstruction paradigm. Given an image, the user simplytraces out objects from it. Our system regards these as perspective line drawings, automatically completes themand reconstructs geometry from them. We make basic assumptions about the structure of traced objects and pro-vide simple gestures for placing additional constraints. We also provide a simple sketching tool to progressivelycomplete parts of the reconstructed buildings that are not visible in the image and cannot be automatically com-pleted. Finally, we fill holes created in the original image when reconstructed buildings are removed from it, byautomatic texture synthesis. Users can spend more time using interactive texture synthesis for further refining theimage. Thus, instead of looking at flat images, a user can fly through them after some simple processing. Minimalmanual work, ease of use and interactivity are the salient features of our approach.

Categories and Subject Descriptors(according to ACM CCS): I.3.5 [Computer Graphics]: Geometric Algorithmsand Systems

1. Introduction

In the current age of digital photography, people take manycasual photographs of places visited. Although these pho-tographs serve as visual records, they do not create a senseof “being there”. It would be much more exciting to expe-rience the actual 3D scene by flying through it, rather thanlooking at a flat image. This paper addresses this goal.

Extracting geometry from images is a very challengingtask. As a single method is unlikely to reconstruct variousentities contained in photographs like buildings, vegetation,etc., most approaches concentrate on particular classes ofen-tities (faces [LTM96], trees [RMMD04], etc.). We observethat many photographs largely capture buildings and otherforms of architecture. Our system is designed to create ge-ometry from such casually captured/created images.

† ashesh|[email protected]

Geometry captured in images can be acquired byinstanti-ation (associating and aligning pre-defined building blockswith the image), orreconstruction(actually reconstruct-ing the captured geometry). Instantiation may work well inmany situations, but conceptually it offers a very technicaluser interaction, unlikely to be intuitive for the typical user.Reconstruction of geometry from images has been rigor-ously studied. If multiple images are available, photogram-metry and computational stereopsis [MB95, Sze94] can beeffective. This problem is much more challenging if only oneimage for a scene is available (typical for the casual photog-rapher). If very crude approximations of geometry suffice foran application, automatic (Photo Pop-up [HEH05]) and in-teractive methods (spidery-mesh interfaces [HiAA97]) canbe employed to quickly construct them. For more completegeometry, ortho-rectification approaches based on homogra-phies [LCZ99] can be useful. However these methods re-quire significantly “technical” human intervention (specify-

c© The Eurographics Association and Blackwell Publishing 2008. Published by BlackwellPublishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350Main Street, Malden,MA 02148, USA.

A. Shesh & B. Chen / Peek-in-the-Pic

(a) (b) (c)

(d) (e) (f)

Figure 1: Partial geometry reconstruction of lower Manhattan, New York City from a single image. (a) the original image. (b)all line drawings made by the user (figure shows all traced lines; actually, one building is traced and reconstructed at a time).(c) eight reconstructed buildings, with the ground relief and the background. (d) the original image with holes to be synthesized.(e) the synthesized image for background and ground geometry. (f) an alternate view of the city.

ing relationships between planes, understanding the conceptof vanishing points/lines, etc.) and in some cases may re-quire carefully-planned photographs. We strive to devise amethod that requires minimaland simple user interaction.Indeed, our system combines different disparate bodies ofresearch in sketch-based geometry reconstruction, vision-based geometry reconstruction and image completion to pro-vide a complete image-to-navigable-geometry tool. Manyoperations of our pipeline are fully automatic, with interac-tive alternatives to improve the automatic results.

In our system, user interaction is limited to simply trac-ing out buildings with lines; we treat these traces as 2Dline drawings and reconstruct them. This unique treatmentof reconstructing geometry from an image finds some com-mon ground with reconstructing geometry from freehandsketches. But most of the work done in this area [LS96,SC04] applies only to orthographic drawings.Peek-in-the-Pic extends their approaches to perspective line drawings. Itis designed such that the image ends up being used only asa guide to trace objects well. Without the context of imagenavigation, it functions as a general tool to reconstruct ge-ometry from perspective sketches. This increases its scopeof usage to many design-by-sketches applications.

This paper expands on a short conference paper [SC05],and discusses our approach, formulation and implementationof the problem in detail. The paper is organized as follows:

Section2 provides an overview of related work in two con-texts, image-based modeling and sketch-based reconstruc-tion. Section3 provides an overview of our system. We in-troduce a simple camera calibration step in Section4. Sec-tion 5.1 explains how various constraints are placed on anobject to be reconstructed. Section5.2 first summarizes re-construction of geometry by optimization, and then providesdetails of how we re-formulate it for perspective line draw-ings. Section6 explains how ground and background geom-etry is added to complete the 3D scene. Section7 discusseshow textures are obtained for the reconstructed geometryfrom the original image. We discuss results in Section8 andconclude with a discussion in Section9.

2. Related Work

2.1. Image-based Modeling

Image-based modeling reconstructs a scene from multi-ple photographs taken from various viewpoints by identi-fying correspondences between points in different images.If a sufficient number of photographs of a scene are avail-able, reconstruction is a mathematically soluble, albeit semi-automatic problem. ThePhotoModelerprogram† works on

† http://www.photomodeler.com

c© The Eurographics Association and Blackwell Publishing 2008.


multiple images and concentrates on getting precise archi-tectural measurements. TheARBA3Dprogram‡ uses onlytwo images, but their absolute positions and correspondencebetween them have to be manually specified. Software likeSilverEye§are limited to reconstructing geometry from satel-lite images. TheCanoma∗∗ program works only on one im-age, but is based oninstantiation, whose potential drawbackshave been discussed earlier.

Hermanet al.’s work [HK87] attempts to reconstruct 3Dmodels of large scenes using multiple aerial images. De-bevecet al. [DTM96] interactively build an approximatemodel of the photographed scene that is then refined by lo-cating image correspondences from multiple images. Re-constructing 3D models from single images, being math-ematically insoluble, usually involves making assumptionsabout the apparent geometry in an image [Kan81], or at-tempting to fit simple 3D objects (cubes, wedges, etc.) ofknown topology to a line drawing obtained from the im-age [Rob63]. Seminal work by Huffman [Huf71] on thenotion of labeling contours in an image to infer their geo-metric nature forms the basis of many approaches to recon-struct 3D geometry from 2D inputs [SG00, KH06, KC06].Tour-Into-The-Picture [HiAA97] provides a spidery-meshinterface for the user to manually locate vanishing pointsof the image. Automatic Photo Pop-up [HEH05] automat-ically creates billboards and “folds” from a single imageto create a “pop-up” effect. Both these approaches gener-ate very crude approximations of geometry in the scene,which limits the types of scenes they can navigate andthe freedom of navigation itself. Typical vision-based ap-proaches [LCZ99, CDR99, CRZ00] perform reconstructionof a single image by calculating camera parameters, usingvanishing lines and point correspondences on each plane.Such a “reconstruct-plane-by-plane” interface may be ex-cessive for our application where accurate geometry is notdesired. Also, such methods may not work correctly with-out a lot of interaction for planes that do not contain paral-lel edges from which vanishing lines can be computed (likepyramids). Ohet al. [OCDD01] regard reconstruction as aPhotoshop-likeoperation termeddepth painting. They seg-ment the photograph into layers and assign depth coordi-nates to every pixel of every layer to produce convincingdepth images, but at the cost of heavy and time-consuminguser interaction.Peek-in-the-Picworks at the object level in-stead of the pixel level.

2.2. Reconstruction of Line Drawings

Reconstruction of geometry from freehand 2D sketches isanother related area of research and is related to model-ing from images in many ways. Although many sketching

‡ http://www.arba3d.com§ http://www.geotango.com/products/silvereye.htm∗∗ http://www.canoma.com

metaphors exist (Teddy [IMT99] for making rotund objects,SKETCH [ZHH96] for design by extrusion), approachesthat use actual 2D projections to infer 3D geometry aremore relevant to our work. Reconstruction of 3D modelsfrom line drawings is a very old research problem (Barrowet al. [BT81], Marill [ Mar91], Leclerc et al. [LF92], Var-ley [Var03] are excellent examples). Lipsonet al. [LS96]and Sheshet al. [SC04] formulate reconstruction as an op-timization problem by evaluating the 2D input for various2D image regularities like parallelism and orthogonality oflines and replicating their corresponding 3D properties inthe geometry. Moreover while both reconstruct wire framedrawings (with hidden parts drawn), we reconstruct “linedrawings” (with hidden parts unavailable). Most image reg-ularities in many of these approaches are robust only ifthe underlying sketch is orthographic. We extend the opti-mization formulation to support perspective images and usegestures to provide geometric regularities that could other-wise be obtained implicitly from orthographic images. Af-ter the user “traces” out a building, we use this formulationto reconstruct its 3D geometry. Several other interesting ap-proaches use a minimal set of lines (for example, only sil-houettes and creases) to produce a plausible set of 3D curvedsurface models from them using simple mathematical con-straints [PF06]. Kaplan et al. [KC06] propose an iterativeuser interaction process to progressively refine the constraintspace and make the overall problem tractable.

3. Algorithm Overview

The general theme ofPeek-in-the-Picis “trace and recon-struct”. Figure1 illustrates our pipeline. An object is con-verted into a line drawing by tracing out its edges (Fig-ure 1(b) shows all such tracings). Note that only the visi-ble parts of an object can be drawn this way. They are con-solidated to form a 2D graph and loops representing thefaces of the object are determined. The graph is then ana-lyzed to infer structural constraints on the object to be re-constructed. The user can also specify constraints via sim-ple gestures (Section5.1). The object is then reconstructedby solving an optimization problem (Section5.2) that con-siders various structural constraints placed on it and cameraparameters obtained in Section4. Hidden parts of the ob-ject can be completed automatically or by interactive sketch-ing (Section5.3). The ground geometry is obtained after alldesired objects have been reconstructed (Section6, recon-structed geometry shown in Figure1(c)). Holes in the imageresulting from construction (Figure1(d)) are filled automat-ically and can be refined using a simple interface(Section7,Figure1(e)). Finally, textures are mapped on all constructedgeometry using the original and the synthesized images fornavigation.

4. Camera Calibration

Camera parameters like focal lengthf of the lens and princi-pal pointp0 are required to correct the perspective distortion



in the image before it is reconstructed. Camera calibrationby planting a known object in the scene [WT91] cannot beperformed in our case as the photographs we strive to recon-struct are unplanned and casually taken. There are severalexisting methods to estimate camera parameters from a sin-gle image. We use the method by Cipollaet al. [CDR99]to calculate camera parameters assuming a simple camera.This method requires knowledge of the vanishing points ofan image.

Automatic detection of vanishing points in an imagehas been a subject of research in computer vision [Rot00,MA84, EMJK94]. A common approach is to detect linesin the image and determine points in the image domainwhere large number of such lines intersect. Intersectionscan be determined by considering the domain to be the in-finite image plane [Rot00] or projecting them as great cir-cles onto a Gaussian sphere and clustering their intersec-tions [EMJK94]. We follow the former approach and use aRANSAC algorithm [FB81] to determine likely vanishingpoints.

We begin by finding all edges in the image using a Cannyedge filter [Can86] and linking them to find line segmentsin the image. We find the three vanishing points using theRANSAC algorithm as follows: we randomly choose twolines and determine their point of intersectionv. We thencount the number of lines in the remaining set that this pointis close to (we set the threshold subtended byv and one ofthe end points of the line onto the other end point to be 5◦).If this count is greater than that in the previous iteration,weselectv as a candidate vanishing point. In the next iteration,we randomly pick another pair of lines and proceed simi-larly. After a sufficient number of iterations (one-third ofthetotal number of detected lines in our implementation) werecord the resulting point as a vanishing point and removeall lines that pass through or near this point from the set oflines. We repeat the algorithm twice to retrieve the remainingtwo vanishing points. If the randomly selected pair of linesisparallel to each other, the point is at infinity and we use theircommon direction to test the point against all remaining linesegments. The figure below shows the lines used to get thethree vanishing points in three different colors.

Previous work and our experiments indicate that this algo-rithm is prone to failures and errors in some cases: spuriousedges, Canny thresholds and vanishing points approachinginfinity leading to precision errors. When this occurs, werevert to a manual approach: the user select three pairs ofparallel lines that are mutually perpendicular to each other.These pairs give the three vanishing points.

The three vanishing points constitute a triangleT whosesides are the vanishing lines. Then,p0 is the orthocenter ofT, while f is a function ofλ1

2,λ22,λ3

2, where theλi ’s are theareas of triangles subdivided byp0 in T. A perspective ma-trix P is constructed from these two parameters, that is thenused during optimization. We refer the reader to [CDR99]for further details.

5. Reconstruction of Object Geometry FromPerspective Line Drawings

The user now traces the visible edges of a building to be re-constructed from the image, forming a perspective line draw-ing (since photographs are perspective by nature)††. All linesare consolidated into a 2D graphG of vertices and edges.Clustering [SL97] is used for this consolidation. (Visible)faces of the object are then determined using the modifiedDijkstra algorithm proposed by Sheshet al. [SC04]. All ver-tices of the graph that are on the ground are determined bystarting at the lowest vertex of the graph (obviously on theground). A breadth-first search of the graph is initiated fromthis vertex. Any edge from the current vertex that makes asmall enough angle (sayΘ) with the horizontal is assumedto lie on the ground. In our current implementation, we useΘ = 35◦. In case the program does not select these verticescorrectly, the user can manually specify them.

5.1. Constraint Specification

Once a 2D graph representing the traced object is compiled,constraints are imposed on its 3D structure that are usedduring optimization. Although similar constraints have beenused to refine crudely approximated 3D geometry from pho-tographs [CFD02], the aim of our constraints is to approx-imate 3D geometry from 2D inputs. Techniques explainedin [LS96, SC04] cannot be directly applied to perspectiveimages because these images cannot provide the same im-age regularity cues. For example, two parallel 3D lines arealmost never parallel in perspective projection. We assumesome constraintsimplicitly and also allow the user to explic-itly specify them.

Some general observations can be made about architec-tural objects: they are attached to the ground, the wallstouching the groundmost likely rise vertically upwards,many edges tend to be parallel or perpendicular to eachother, etc. These observations can beimplicitly used as con-straints on the geometry of the object. As the line drawingis in perspective projection, only a few constraints can beobtained from image regularities. The user can specify more

†† Our experiments indicated that detectingonly the visible edgesof a building is much more difficult to achieve through automaticedge detection than detecting all straight-line edges, as was neededto automatically infer the vanishing points. This is largely becauseof a large number of “spurious” edges produced by textures.



constraintsexplicitly through gestures. The three vanishingpoints (and hence the vanishing lines) obtained during cam-era calibration are used to derive more parallel lines. In anyface, if two edges intersect at or near a vanishing line, theyare assumed to be parallel to each other. It is also assumedthat all edges that have exactly one vertex on the ground areperpendicular to the edges on the ground. This observation isnot true for all objects–a notable example being a pyramid.But as these constraints are used as penalties in our opti-mization, they are not hard constraints. In fact, Figure5(d-f)show the construction of the pyramidal head of a tower.

Making two edges parallel toeach other

Making two edges perpendic-ular to each other

Making two edges congruentto each other

Making two faces perpendic-ular to each other

Additional geometric constraints can be specified by theuse of gestures. We currently support four types of con-straints. These constraints can also be specifiedafter the ob-ject has been reconstructed; the reconstruction algorithmisrerun in this case. Constraints assumed implicitly or imposedby the user are used to derive other constraints. For example,(e1‖e2)∧ (e2‖e3) ⇒ (e1‖e3).

5.2. Optimization for Geometry Reconstruction

Given a 2D graphG of the traced object and structural con-straints, we reconstruct 3D geometry that represents the ob-ject’s projection in the photograph. Specifically, we “inflate”G by assigning suitable depth to each vertex.

This problem finds some common ground with that ofreconstructing 3D geometry from 2D sketches. Sketch re-construction by inflation was used earlier by Leclercetal. [LF92] and subsequently by Lipsonet al. [LS96] andSheshet al. [PMC03,SC04]. Various constraints like paral-lelism and perpendicularity of edges and faces are set up byevaluating 2D image regularities in the sketched image andZ-coordinates are given to each vertex of the graph to satisfythese constraints. However, all these methods assume ortho-graphic drawings with image regularities that provide mostof the constraints used by their optimization process. Mostof

the assumed image regularities do not exist for perspectiveline drawings. We extend their work by following a similarframework to reconstructperspectiveline drawings throughconstraints explained in the previous section. All geometricconstraints are incorporated into the compliance functionaspenalties.

It must be noted that inflation of the graphG by assign-ing a Z-coordinate to each vertex does not result in the ac-tual correct geometry, asG originally represents a distortedprojection of the object. Therefore, we use the perspectivematrix P (Section4) to undistort a candidate 3D graphbe-fore evaluating all characteristics. The resulting 3D graphprojects onto the region occupied by it in the image.

The compliance function that the optimization attempts tominimize is of the form:

f = wi ∗ fi

wherew = [wi] is a weighting factor andfi are variousterms calculated as below. In practice, we use a weightingvector of ( 1

3 , 23) which was obtained empirically. The fol-

lowing notation is used henceforth:

vi : ith vertex of the graph Gfi : ith face in the graph G~vi : 3D vector representing edgeei

v̂i : Normalized 3D vector representing edgeei‖~vi‖ : Magnitude of vector~vin̂i : Unit normal vector of facefin(G) : Number of vertices in the graphGe(G) : Number of edges in the graphGf (G) : Number of faces in the graphG

The various termsfi used are as follows:

1. Face PlanarityThis constraint ensures that all faces of the objects areplanar. A plane is fit on all vertices on each facefi andthe sum of distances of each vertex from its fit plane com-prises this term.

f1 = ∑ f (G)i=1 ∑v j∈ f ace Fi

∣

∣ai ∗x j +bi ∗y j +ci ∗zj +di∣

∣

2. Geometry ConstraintsThis set of terms is used to evaluate all the geometric con-straints compiled earlier.

a. Parallelism of edgesFor all pairsei andej of edges (totalnparallel) that aresupposed to be parallel to each other,

t1 =∑ei‖ej

(1−|v̂i·v̂ j |)

nparallel

b. Perpendicularity of edgesFor all pairsei andej of edges (totalnperp) that aresupposed to be perpendicular to each other,

t2 =∑ei⊥ej

|v̂i ·v̂ j |

nperp



c. Congruence of edgesFor all pairsei andej of edges (totalncong) that aresupposed to be equal in length to each other,

t3 =∑ei =ej

abs(‖~vi‖−‖~vj‖)

max(‖~vi‖,‖~vj‖)

ncong

d. Perpendicularity of facesFor all pairs fi and f j of faces (totalnf perp) that aresupposed to be perpendicular to each other,

t4 =∑ fi⊥ f j

|n̂i·n̂j |

nf perp

e. Edge-face perpendicularityFor all pairsei and f j (total ne f perp)of such that edgeei is perpendicular to facef j ,

t5 =∑ei‖ f j

|v̂i·n̂j |

ne f perp

f2 = 0.2∗∑5i=1 ti

We use Brent’s minimization [PTVF02] to solve the aboveoptimization problem, as it offers a good tradeoff betweenspeed and accuracy. We use the layered method of Sheshetal. [SC04] for a good initial guess: all vertices on the silhou-ette form a middle layer, visible vertices not on the silhouetteform the front layer and hidden vertices not on the silhouetteform the back layer. Although this initial guess works wellfor closed objects, it tends to fail in case of incomplete ob-jects composed of facades, like Figure5(j-l). In that case weresort to the trivial initial guess (all vertices with the sameZ-coordinate) and rely more on implied and gestured hintsto reconstruct the 3D model.

5.3. Completing Object Geometry

The user can trace only those parts of the objects that are vis-ible in the image. In order to complete the building geometry,hidden edges must be estimated. In general, this is a difficultproblem because there can theoretically be an infinite num-ber of configurations for hidden geometry. However, in caseof architectural buildings, a reasonable hidden topology canbe estimated automatically.

If we assume that all buildings are trihedral, this prob-lem can be solved automatically. Our solution is similar tothat offered by Varleyet al. [VM00], but for perspective linedrawings. We first detect vertices that have degree two (i.ethey have an edge missing). Then, we estimate the directionof the missing edge by pairing the visible edges at these ver-tices with the known vanishing points. Then, we iterativelyselect two incomplete vertices, and determine the intersec-tion of the two rays along their respective hidden edge direc-tions. In order to prune incorrect pairs of vertices, we con-strain all these intersections to lie within the convex hullofthe building (implicitly sketched by the user). Thus we keepeliminating vertices, until we are left with only two vertices(that we simply connect to each other). Below, the figures inthe top row show the input sketched by the user (in blue) andthe automatically completed line drawing (in green).

(a) (b) (c) (d)

Figure 2: Geometry completion. First row: automatic geom-etry completion. If the objects are assumed to be trihedral,the simplest hidden geometry can be automatically estimatedusing a variant of the technique by Varleyet al.[VM00]. Sec-ond row: interactive geometry completion. (a) an incompletebuilding in its original position in the scene. (b) it is rotatedand a missing face is drawn. (c) as the face is completed, itis reconstructed. (d) the augmented object can be rotated tocomplete other missing faces similarly.

In case the trihedral assumption does not hold for a build-ing, the user can resort to completing its geometry manu-ally. In this case, although the user could “guess” the invis-ible edges while tracing an object, our initial experimentsindicated that such guesses can affect the overall geome-try reconstruction adversely. This is mainly because invis-ible edges also define vertices on the ground, and so guess-ing them can adversely affect the resulting ground geometry.Also, it is difficult to draw accurately in perspective and itmay be easier to specify these edges from a different view.

The figures in the bottom row above provide an exampleof the L-shaped building in Figure1(c). The object is rotatedand completed progressively. The user sketches strokes formissing edges. When the program detects a new face formedby them, it reconstructs its 3D geometry. The user can thenrotate the partially completed object and continue sketching.

It may be observed that this functionality of progressivelyconstructing an object can be used independently as a de-sign tool. In fact, the input image is used only as a guidefor setting up the camera (Section4), tracing out buildings(Section5) and texturing the reconstructed buildings (Sec-tion 7). Thus, geometry can be progressively reconstructedfrom perspective sketches once a perspective camera is setup. This is an extension of the work done in [SC04, LS96]and can be used as a tool for reconstruction of sketches.



(a)

(b) (c)

Figure 3: Determination of ground and background geome-try. (a) the image with reconstructed objects in wireframe;the vertices marked in red lie on the ground. (b) a least-squares plane is constructed from these vertices as shown ingrey. As it is an approximate plane, it passes through someobjects (shown in light blue). A horizon curve is sketched bythe user. (c) points on the horizon line are projected on theplane and are used with the red vertices to get the groundgeometry shown as an orange mesh. The background reliefis raised from the horizon curve as shown in Figure1(c).

6. Ground and Background Geometry

Once all the buildings have been constructed, the ground ge-ometry must be determined to complete the environment.All the edges of the buildings that should be on the groundare heuristically determined as discussed in Section5. Fig-ure3(a) shows these vertices in red. Let the set of such ver-tices be denoted byS. A least-squares plane P (shown in greyin Figure3(b)) is fit through points inS. It can be seen thatP may pass through some objects.

The user then sketches out a pseudo-horizon curve (inter-section of the ground and the background, see Figure3(b)).This curve should be drawn just under the buildings in thebackground, so that the image does not “fold” in the mid-dle of a building. The curve is then projected onP to obtainits 3D coordinates. These projected points are added toS,along with two points on the bottom corners of the screen.All points in Sare triangulated inP. These form the groundrelief (shown in orange in Figure3(c)). Points of the pseudo-horizon curve are raised up to form the background, as seenin Figure1(c). This completes the geometry of the 3D scene.

7. Image Completion

As a building is constructed, it has to be removed fromthe original image, creating a hole (Figure1(d)). Suchtextures are usually filled manually by cloning (such asin [OCDD01]), a user-intensive and cumbersome process.

(a) (b)

(c) (d) (e)

Figure 4: Hole-filling by texture synthesis. Top row: auto-matic texture synthesis (a) when a building is sketched bythe user, its outline is used as the region for synthesis. Thebounding box of this region is scaled up and used as thesource region for the texture synthesis. (b) Holes from Fig-ure 1(d) filled using this method. Note that although this im-age looks patchy, the reconstructed buildings overlap theseregions for most view points. Row 2: interactive texture syn-thesis (c) the source and target regions are specified in greenand red respectively. (d) the synthesized output for (a). (e)Holes from Figure1(d) filled using this interactive method.

Many image completion methods exist in computer graph-ics and vision literature. Many texture synthesis algorithmsexist that complete a given “target region” of an image fromsource regions contained within the same or different im-ages. Recent research has even led to real-time [LLX ∗01]and controllable [LH05] texture synthesis. However, texturesynthesis primarily works well for amorphous images (flow-ers, stones, fuzzy background scenery, etc.) and its match-ing techniques tend to break down when synthesizing well-defined geometric structures like buildings. Image inpaint-ing [BSCB00] works well for filling small holes in images,but it is not as effective on large missing regions. Althoughautomatic, both texture synthesis and image inpainting aretoo slow for the size of images in this paper, mitigating mostadvantages of the automation.

From our initial experiments with cloning, we observedthat buildings tend to be filled up using regions immedi-ately surrounding them, not from distant image regions. In-tuitively a building must be replaced by the background itoccludes, which can be approximated from the image re-gions adjacent to the building. We use this observation todefine the source region for a given hole. We calculate thebounding box of the hole to be filled, and scale it by 25%.We use the region in this scaled bounding box (minus thehole) as the source for the texture synthesis process. We use



the texture synthesis algorithm proposed in [EL99,LLX ∗01].As the hole is automatically derived from the user’s sketch,the synthesis is done with no extra input. Row 1 of Figure4shows how this method produces a synthesized image. Eachhole was filled in less than 30 seconds.

Although the image in Row 1 contains very obviouspatches and discontinuities, the reconstructed buildingsoc-clude large parts of these patches from most view points, andthus they are not as disturbing. However, because of the lim-itations of texture synthesis discussed earlier, the results pro-duced may sometimes be too disturbing. In these cases, theuser can resort to an interactive process. We take inspirationfrom the paint-by-numbers interface [Hae90] by augmentingthis process with a simple interface to interactively synthe-size these holes (see Row 2 of Figure4). The user marks thesource and target regions in the image with poly-lines to starttexture synthesis. As the user marks small regions at a time,this synthesis is fast.

8. Implementation and Results

All results in this paper were produced on a desktop systemwith a 2.6 GHz Pentium Xeon processor and 1 GB RAM,with an external tablet device (Wacom Cintiq PL-550).

Figure5(c) shows a part of Manhattan, New York recon-structed using Figure5(a)‡‡ as input. Figure5(b) shows howthe constructed 3D environment looks from a distant viewpoint. Eight buildings have been reconstructed in this exam-ple. The total time from (a) to (c) took about 15 minutes(including interaction), while the image completion took an-other 10 minutes. The actual optimization time is 2-3 sec-onds per building.

Figure 5(d) shows a drawing of Foshay Tower in Min-neapolis, MN (USA)§§ from the 1930s. Figure5(f) shows analternate view obtained by reconstruction (note the correctlyreconstructed pyramidal top of the tower). Three buildingswere reconstructed in this view. The total time taken for ge-ometry reconstruction was 5 minutes (including interaction),while the actual optimization time is 2-3 seconds per object.

Figure5(g) shows a drawing of the 1305 Church of Aston-Cantlow, Warwickshire, England¶¶. Figure 5(i) shows azoom and change of angle towards the church. The imagesynthesis in this case was done by cloning, as the back-ground is fairly homogeneous. Please see the accompanyingvideo for a visual illustration of our pipeline and fly-throughsof these reconstructed scenes.

Figure 5(j) shows a photograph from Liebowitzetal. [LCZ99] (used with permission from the authors). This

‡‡ Source:http://www.pilotlist.org/balades/manhattan/manhattan.html§§ Source:http://www.minneapolishistory.com/marriott3.htm¶¶ Source:http://www.holoweb.net/ liam/pictures/oldbooks/OldEngland/pages/1305-Church-of-Aston-Cantlow/

photograph is challenging because of some radial cameradistortion and because it breaks the layered initial guess as-sumption of Sheshet al. [SC04] (the corner between thewalls and ground is further away from the viewer than thevertices on the silhouette, instead of being nearer to it). Themodels in (k) and (l) were produced in two steps: first recon-structing the two walls and then reconstructing the roof andground (1 second each).

9. Discussion and Future Work

Peek-in-the-Piccreates “navigable” 3D models from a sin-gle image by amalgamating camera calibration techniquesand work done in the area of sketch reconstruction. Since ittakes mostly “tracing” operations from the user and workson general photographs taken casually, it is suitable for non-technical users as well.Peek-in-the-Picworks best for mod-ern architectural buildings that have polyhedral shapes. Re-constructing more involved architecture like buildings withancient carvings is more difficult. However, as the final ge-ometry is textured, artifacts are not noticeable even if abuilding with details is approximated by simpler geometry,unless one zooms in closely near a building.

In many ways,Peek-in-the-Piccan be seen as proof thatby compromising on the accuracy of the acquired 3D model,one can create a system that works with a much simpler userinterface that expects a lower degree of expertise from theuser. Thus it lies in the middle of the spectrum of image-based modeling techniques, ranging from the completely au-tomatic but often unsuccessful [HEH05] to the more accu-rate but also user-intensive [LCZ99], both in terms of thequality and extent of the user interaction and the quality ofthe resulting 3D models. Although working towards com-mon goals, it is difficult to quantitatively compare the userinteractions involved in such systems andPeek-in-the-Picbecause the nature of user interactions tends to be very dif-ferent.

As our method is based on inflation-based reconstruction,it shares some of its limitations. As the traced line draw-ing becomes more complex, the probability of convergenceto a visually incorrect local minimum increases. In manycases this can be remedied by specifying constraints afterreconstruction, but we acknowledge that this approach maynot always succeed. The user may also successfully con-struct a building part-by-part (Figure5(j-l)), as is common incase of inflation-based sketch reconstruction like SMART-PAPER [SC04]. However such an approach often followsan earlier failed attempt to construct the whole building atonce and thus represents a limitation in the context ofPeek-in-the-Pic. We believe an interesting extension would beto devise implicit gestures built into the tracing process it-self, that may aid the reconstruction process (specificallyinthe initial guess estimation that affects the optimizationthemost). For example, users could provide hints about con-cavity/convexity of an edge of the building by tracing it us-



(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 5: Results. (a) photograph of Manhattan, NY (b) view from a distant point. (c) a unique view of the reconstructed 3Dscene. (d) painting of 1930s downtown Minneapolis, MN. (e) view from a distant point. (f) a unique view of the reconstructed3D scene. (g) painting of 1305 church (h) view from a distant point. (i) a unique view of the reconstructed 3D scene. (j) theoriginal photograph of Merton College, Oxford, taken from [LCZ99]. (k) view from a distant point. (l) a unique close-up viewof the 3D reconstructed scene.

ing a different line style. As lines are being traced using asimple clicking interface, we believe such gestures wouldbe more tolerable than having to sketch different line stylesas in SMARTPAPER. Alternatively such hints may be builtas post-process gestures. However a more formal user studywould be needed to ascertain which gestures, if any, are ac-ceptable and meaningful to the general user so as to not

disturb the intuitiveness of the user interface that we havestrived to maintain.

Our optimization framework currently uses no domain-based knowledge about the geometry that it attempts to re-construct. We believe our approach can be combined withlearning-based approaches [LS02, HEH05] to learn local



characteristics of typical buildings and their projections inimages that will aid in the reconstruction process, and alsoto fix parameters likeΘ. However some user interaction isunavoidable for any method to work for general cases.

Finally, although tracing out buildings is simple and easy,it is still the most time-consuming task inPeek-in-the-Pic.Automating this process by devising smart edge-detectionmethods fails because of strong edges produced by textures.Extending methods likeLazy snapping[LSTS04] to segmentbuildings automatically is worthy of further study.

References

[BSCB00] BERTALMIO M., SAPIRO G., CASELLES V.,BALLESTER C.: Image inpainting. InProc. SIGGRAPH(2000), pp. 417–424.7

[BT81] BARROW H., TENNENBAUM J.: Interpreting linedrawings as three-dimensional surfaces.Artificial Intelli-gence 17(1981), 75–116.3

[Can86] CANNY J.: A computational approach to edge de-tection. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence 8(nov 1986), 679–698.4

[CDR99] CIPOLLA R., DRUMMOND T., ROBERTSON

D. P.: Camera calibration from vanishing points in im-age of architectural scenes. InProc. BMVC(1999). 3,4

[CFD02] CANTZLER H., FISHER R. B., DEVY M.: Im-proving architectural 3d reconstruction by plane and edgeconstraining. InBMVC(2002).4

[CRZ00] CRIMINISI A., REID I. D., ZISSERMAN A.:Single view metrology. International Journal of Com-puter Vision 40, 2 (2000), 123–148.3

[DTM96] DEBEVEC P. E., TAYLOR C. J., MALIK J.:Modeling and rendering architecture from photographs: Ahybrid geometry- and image-based approach.Proc. SIG-GRAPH(1996), 11–20.3

[EL99] EFROS A. A., L EUNG T. K.: Texture synthesisby non-parametric sampling.IEEE Intl. Conf. ComputerVision(September 1999), 1033–1038.8

[EMJK94] E.LUTTON, MAITRE H., J.LOPEZ-KRAHE:Contribution to the determination of vanishing points us-ing hough transform.IEEE Trans. Pattern Analysis andMach. Int. (PAMI) 16, 4 (1994), 430–438.4

[FB81] FISHER M., BOLLES R.: Random sample con-sensus: A paradigm for model fitting with applications toimage analysis and automated cartography.Comm. of theACM 24(1981), 381–395.4

[Hae90] HAEBERLI P.: Paint by numbers:abstract imagerepresentations.Proc. SIGGRAPH(1990), 207–214.8

[HEH05] HOIEM D., EFROSA. A., HEBERT M.: Auto-matic photo pop-up.Proc. SIGGRAPH(2005), 577–584.1, 3, 8, 9

[HiAA97] H ORRY Y., ICHI ANJYO K., ARAI K.: Tourinto the picture: using a spidery mesh interface to makeanimation from a single image.Proc. SIGGRAPH(1997),225–232.1, 3

[HK87] HERMAN M., KANADE T.: The 3D MOSAICscene understanding system: incremental reconstructionof 3d scenes for complex images.Readings in com-puter vision: issues, problems, principles, and paradigms(1987), 471–482.3

[Huf71] HUFFMAN D.: Impossible objects as nonsensesentences.Machine Intelligence 6(1971), 295–303.3

[IMT99] I GARASHI T., MATSUOKA S., TANAKA H.:Teddy: a sketching interface for 3d freeform design.Proc.SIGGRAPH(1999), 409–416.3

[Kan81] KANADE T.: Recovery of the three-dimensionalshape of an object from a single view.Artificial Intelli-gence 17(1981), 409–460.3

[KC06] KAPLAN M., COHEN E.: Producing models fromdrawings of curved surfaces. InProc. SBIM (2006),pp. 51–58.3

[KH06] K ARPENKO O. A., HUGHES J. F.:SmoothSketch: 3d free-form shapes from complexsketches.ACM Transactions on Graphics 25, 3 (2006),589–598.3

[LCZ99] L IEBOWITZ D., CRIMINISI A., ZISSERMAN

A.: Creating architectural models from images. InProc.Eurographics(1999), pp. 39–50.1, 3, 8, 9

[LF92] LECLERC Y., FISCHLER M.: An optimization-based approach to the interpretation of single line draw-ings as 3d wire frames.IJCV 9, 2 (November 1992), 113–136. 3, 5

[LH05] L EFEBVRE S., HOPPE H.: Parallel controllabletexture synthesis.Proc. SIGGRAPH 24, 3 (2005), 777–786. 7

[LLX ∗01] LIANG L., L IU C., XU Y.-Q., GUO B., SHUM

H.-Y.: Real-time texture synthesis by patch-based sam-pling. ACM Trans. Graphics. 20, 3 (2001), 127–150.7,8

[LS96] LIPSON H., SHPITALNI M.: Optimization-basedreconstruction of a 3d object from a single freehand linedrawing. J. Computer Aided Design 28, 8 (1996), 651–663. 2, 3, 4, 5, 6

[LS02] LIPSON H., SHPITALNI M.: Correlation-basedreconstruction of a 3d object from a single freehandsketch.AAAI Spring Symposium on Sketch Understanding(2002), 99–104.9

[LSTS04] LI Y., SUN J., TANG C.-K., SHUM H.-Y.:Lazy snapping.Proc. SIGGRAPH(2004), 303–308.10

[LTM96] L ENGAGNE R., TAREL J., MONGA O.: From2d images to 3d face geometry. pp. 301–306.1



[MA84] M.J.MAGEE, AGGARWAL J.: Determining van-ishing points from perspective images.Proc. CVGIP 26,2 (1984), 256–267.4

[Mar91] MARILL T.: Emulating the human interpretationof line drawings as three-dimensional objects.IJCV 6, 2(1991), 147–161.3

[MB95] M CM ILLAN L., BISHOP G.: Plenoptic model-ing:an image-based rendering system.Proc. SIGGRAPH(1995), 39–46.1

[OCDD01] OH B. M., CHEN M., DORSEYJ., DURAND

F.: Image-based modeling and photo editing.Proc. SIG-GRAPH(2001), 433–442.3, 7

[PF06] PRASAD M., FITZGIBBON A.: Single view re-construction of curved surfaces. InProc. CVPR(2006),pp. 1345–1354.3

[PMC03] PIQUER A., MARTIN R., COMPANY P.: Usingskewed mirror symmetry for optimisation-based 3d line-drawing recognition.Proc. 5th IAPR Intl. WS on GraphicsRecognition(2003), 182–193.5

[PTVF02] PRESSW., TEUKOLSKY S., VETTERLINGW.,FLANNERY B.: Numerical Recipes in C++: The Art ofScientific Computing. Cambridge University Press, NewYork, NY, 2002.6

[RMMD04] RECHE-MARTINEZ A., MARTIN I., DRET-TAKIS G.: Volumetric reconstruction and interactive ren-dering of trees from photographs.Proc. SIGGRAPH(2004), 720–727.1

[Rob63] ROBERTSL.: Machine Perception of 3-D Solids.PhD thesis, MIT, 1963.3

[Rot00] ROTHER C.: A new approach for vanishing pointdetection in architectural environments. InBMVC(2000),pp. 382–391.4

[SC04] SHESH A., CHEN B.: SMARTPAPER–an in-teractive and user-friendly sketching system.ComputerGraphics Forum (Proc. Eurographics) 23, 3 (2004), 301–310. 2, 3, 4, 5, 6, 8

[SC05] SHESH A., CHEN B.: Peek-in-the-pic: Architec-tural scene navigation from a single picture using linedrawing cues. Proc. Pacific Graphics (Short Paper)(2005).2

[SG00] SCHWEIKARDT E., GROSSM. D.: Digital clay:Deriving digital models from freehand sketches.Automa-tion in Construction 9(2000), 107–115.3

[SL97] SHPITALNI M., L IPSON H.: Classification ofsketch strokes and corner detection using conic sectionsand adaptive clustering.ASME J. Mechanical Design 119,2 (1997), 131–135.4

[Sze94] SZELISKI R.: Image mosaicing for tele-reality ap-plications.WACV94(1994), 44–53.1

[Var03] VARLEY P.: Automatic Creation of Boundary-

Representation Models from Single Line Drawings. PhDthesis, 2003.3

[VM00] VARLEY P., MARTIN R.: Constructing bound-ary representation solid models from a two-dimensionalsketch–topology of hidden parts. Proc. First UK-Korea Wksp. Geometric Modeling and Computer Graph-ics (2000), 129–144.6

[WT91] WANG L.-L., TSAI W.-H.: Camera calibrationby vanishing lines for 3-d computer vision.IEEE Trans.Pattern Analysis and Machine Intelligence 13, 4 (Apr.1991), 370–376.4

[ZHH96] ZELEZNIK R. C., HERNDON K. P., HUGHES

J. F.: SKETCH: an interface for sketching 3d scenes.Proc. SIGGRAPH(1996), 163–170.3


peek-in-the-pic: flying through architectural scenes from ... · pdf filepeek-in-the-pic:...

Documents