[ieee 2011 conference for visual media production (cvmp) - london, united kingdom...

8
Flowlab - an interactive tool for editing dense image correspondences F. Klose, K. Ruhl, C. Lipski, M. Magnor 1 1 {klose,ruhl,lipski,magnor}@cg.tu-bs.de Computer Graphics Lab, TU Braunschweig Abstract Finding dense correspondences between two images is a well-researched but still unsolved problem. For various tasks in computer graphics, e.g. image interpolation, obtaining plausible correspondences is a vital component. We present an interactive tool that allows the user to modify and correct dense correspondence maps between two given images. Incorporating state-of-the art algorithms in image segmentation, correspondence estimation and optical flow, our tool assists the user in selecting and correcting mismatched correspondences. Keywords: Dense Image Correspondences, Optical Flow 1 Introduction Dense image correspondences are a core component of many computer graphics, vision and image processing applications. In the last decade, an overwhelming amount of high-quality research has been conducted on improving the quality of image correspondence estimation. Especially in the optical flow community, new optimization techniques and quantitative benchmarks have led to remarkable improvements [1]. However, the ill posed nature of the problem and the wide range of applications leave the general problem of correspondence estimation still unresolved. To give a meaningful evaluation of the results for any given correspondence estimation technique, it is therefore important to have a specific application in mind. The applicability of the presented work is not limited to one scenario, but for this work we will focus on the use of correspondences for rendering synthetic in-between images. This process is often referred to as image interpolation, which can take place in multiple domains. Given two consecutive video frames of a static camera sequence, the image correspondences encode the motion in the scene and an interpolation would take place in the temporal domain. Given a single moving camera or multiple stationary cameras, the interpolation can also take place in a spatial domain. The correspondence problem can then be tackled by a variety of structure-from-motion or 3D reconstruction techniques. In the most general case, camera and scene motion is present and the image correspondences span the spatio-temporal domain. For all image correspondence algorithms pathological cases exist where the underlying model assumptions are violated and the results degrade dramatically. The most prominent examples include glares on non-diffuse surfaces and poorly textured regions. Other scenarios include long-range correspondences, illumination changes, complex scenes, and occlusions, where any contemporary algorithm has failure cases. Purely automatic solutions will therefore not be able to produce results that deliver convincing results when used for image interpolation. On the other hand, user-driven editing tools often require a lot of manual interaction. With our interactive tool called Flowlab, user corrections can be applied to precomputed correspondences, aided by automatic methods which refine the user input. Provided with an initial solution, the user selects and corrects erroneous regions of the correspondence maps. In order to keep user interaction at a minimum level, we employ state-of-the-art techniques that assist in segmenting distinct regions and locally refining their correspondence values. The rest of the paper is structured as follows. After reviewing related work (Sec.1.1), the workflow is presented from a user’s perspective (Sec.2), followed by a operations description (Sec.3). Applications are shown in Sec.4, and results on different data sets in Sec.5. 1.1 Related Work Multiple research areas in computer vision and computer graphics are concerned with dense image correspondences. Optical Flow. One of the most prominent areas is optical flow estimation. For ground truth generation, the correspondence fields are traditionally created in a user-assisted workflow [27] or are derived from other data, such as depth maps [5, 7]. For an automatic approach, optical flow algorithms estimate correspondence fields between two images, usually based on an energy minimization functional. A survey on recent optical flow algorithm was composed by Baker et al. [1]. Problem areas of contemporary algorithms are primarily due to optical ambiguities: Low-textured regions and objects with non-Lambertian reflectance cannot be followed well as they violate the brightness constancy assumption. Another active area is long-range correspondence estimation, particularly of small objects. Brox et al. [4] address large displacements using pre-segmentation and Lipski et al. [13] use a large scale belief propagation. While the problem of long range correspondences can be mitigated by using a 2011 Conference for Visual Media Production 978-0-7695-4621-6/11 $26.00 © 2011 IEEE DOI 10.1109/CVMP.2011.13 59

Upload: m

Post on 09-Mar-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Flowlab - an interactive tool for editing dense imagecorrespondences

F. Klose, K. Ruhl, C. Lipski, M. Magnor1

1 {klose,ruhl,lipski,magnor}@cg.tu-bs.de Computer Graphics Lab, TU Braunschweig

Abstract

Finding dense correspondences between two images is awell-researched but still unsolved problem. For varioustasks in computer graphics, e.g. image interpolation,obtaining plausible correspondences is a vital component.We present an interactive tool that allows the user to modifyand correct dense correspondence maps between two givenimages. Incorporating state-of-the art algorithms in imagesegmentation, correspondence estimation and optical flow, ourtool assists the user in selecting and correcting mismatchedcorrespondences.

Keywords: Dense Image Correspondences, Optical Flow

1 Introduction

Dense image correspondences are a core component of many

computer graphics, vision and image processing applications.

In the last decade, an overwhelming amount of high-quality

research has been conducted on improving the quality of

image correspondence estimation. Especially in the optical

flow community, new optimization techniques and quantitative

benchmarks have led to remarkable improvements [1].

However, the ill posed nature of the problem and the wide range

of applications leave the general problem of correspondence

estimation still unresolved. To give a meaningful evaluation of

the results for any given correspondence estimation technique,

it is therefore important to have a specific application in mind.

The applicability of the presented work is not limited to

one scenario, but for this work we will focus on the use of

correspondences for rendering synthetic in-between images.

This process is often referred to as image interpolation,

which can take place in multiple domains. Given two

consecutive video frames of a static camera sequence, the

image correspondences encode the motion in the scene and

an interpolation would take place in the temporal domain.

Given a single moving camera or multiple stationary cameras,

the interpolation can also take place in a spatial domain. The

correspondence problem can then be tackled by a variety of

structure-from-motion or 3D reconstruction techniques. In the

most general case, camera and scene motion is present and the

image correspondences span the spatio-temporal domain.

For all image correspondence algorithms pathological cases

exist where the underlying model assumptions are violated

and the results degrade dramatically. The most prominent

examples include glares on non-diffuse surfaces and poorly

textured regions. Other scenarios include long-range

correspondences, illumination changes, complex scenes, and

occlusions, where any contemporary algorithm has failure

cases. Purely automatic solutions will therefore not be able

to produce results that deliver convincing results when used

for image interpolation. On the other hand, user-driven

editing tools often require a lot of manual interaction. With

our interactive tool called Flowlab, user corrections can be

applied to precomputed correspondences, aided by automatic

methods which refine the user input. Provided with an initial

solution, the user selects and corrects erroneous regions of

the correspondence maps. In order to keep user interaction at

a minimum level, we employ state-of-the-art techniques that

assist in segmenting distinct regions and locally refining their

correspondence values.

The rest of the paper is structured as follows. After reviewing

related work (Sec.1.1), the workflow is presented from a

user’s perspective (Sec.2), followed by a operations description

(Sec.3). Applications are shown in Sec.4, and results on

different data sets in Sec.5.

1.1 Related Work

Multiple research areas in computer vision and computer

graphics are concerned with dense image correspondences.

Optical Flow. One of the most prominent areas is optical flow

estimation. For ground truth generation, the correspondence

fields are traditionally created in a user-assisted workflow [27]

or are derived from other data, such as depth maps [5, 7].

For an automatic approach, optical flow algorithms estimate

correspondence fields between two images, usually based on

an energy minimization functional. A survey on recent optical

flow algorithm was composed by Baker et al. [1].

Problem areas of contemporary algorithms are primarily due

to optical ambiguities: Low-textured regions and objects with

non-Lambertian reflectance cannot be followed well as they

violate the brightness constancy assumption.

Another active area is long-range correspondence estimation,

particularly of small objects. Brox et al. [4] address large

displacements using pre-segmentation and Lipski et al. [13]

use a large scale belief propagation. While the problem

of long range correspondences can be mitigated by using a

2011 Conference for Visual Media Production

978-0-7695-4621-6/11 $26.00 © 2011 IEEE

DOI 10.1109/CVMP.2011.13

59

dense camera array (spatial resolution) or high-speed cameras

(temporal resolution), other model violations such as non-

Lambertian surfaces remain unsolved.

The Flowlab tool uses the anisotropic Huber-L1 optical flow

by Werlberger et al. [25] as one possible local optimization

strategy, to be applied in regions where good correspondences

can be computed.

Geometric Models. Using geometric models as a means for

estimating image correspondences is a wide research area,

and an exhaustive survey is outside the scope of this paper.

An evaluation of static multi-view stereo algorithms and their

different basic assumptions has been composed by Seitz et

al. [21]. More recently, large scale reconstructions of outdoor

scenes based on point or patch models have been proposed

[8, 10]. These approaches handle static scenes. To capture

the motion and structure of a scene, the notion of scene

flow was introduced by Vedula [24]. To determine the scene

flow, multiple precomputed optical flow fields can be merged

[29, 23], or static 3D reconstructions at discrete time steps can

be registered to recover the motion data [28, 16, 17]. Klose

et al. [11] combined the 3D reconstruction with the motion

estimation into a single step.

Image Correspondences. Dense image correspondence

estimation and optical flow are often overlapping problems.

The previously mentioned long range optical flow algorithm

by Brox et al. [4] can be considered as an approach to both.

In our experiments, the initial correspondences are estimated

using an efficient correspondence estimator based on MRF

solved by belief propagation [12].

Correction Tools. There exist preciously little editing tools

for correspondences, because correction is mostly performed

in the image domain after interpolation. A pioneering approach

by Beier and Neely [3] employs sparse correspondences in the

form of lines at salient edges in human faces. Rohr et al. [19]

use thin-plate splines with user-selected landmark points to

register medical CT images. The commercial movie production

tool Ocula [26] provides an editing function for stereo disparity

mapping, which considers correspondences only for the stereo

case.

Applications. One use of dense correspondence fields is image

interpolation in movie production, e.g. for free-viewpoint

video generation.

Several image-based free-viewpoint approaches exist.

Germann et al. [9] represent soccer players as a collection of

articulated billboards. Ballan et al. [2] presented an image-

based view interpolation that uses billboards to represent

a moving actor. Lipski et al. proposed an image-based

free-viewpoint system [12] based upon multi-image morphing.

Image interpolation has been used in various movies, e.g. in “2

Fast 2 Furious” or “Swimming pool” [18], with all corrections

of visual artifacts performed in the image domain.

General Correspondence Editing. Flowlab is novel in that it

is the first general tool to not edit images, but the relationship

Figure 1: (left) The source image (middle) The target image(right) The automatically computed correspondences fromfirst to second image are used for forward warping thesource image. Due to incorrect long-range correspondenceestimation, the two bean bags are heavily distorted.

between images. It is general in that is is not directly linked

to a specific algorithm. This contrasts with previous ad-hoc

approaches to manual correspondence correction which have

been tied to the correction of one specific approach [1].

2 Workflow

The general Flowlab workflow is as follows: The user passes

two images I1 and I2 as input to the application. The

user can assess the quality of initial correspondences by

rendering in-between images. The user may apply editingoperations on mismatched areas. He may switch between

rendering and editing until all visible errors are corrected. The

overall interface is designed to provide a very efficient and fast

workflow.

2.1 Initial Correspondences

Flowlab starts with an initial set of two correspondence

maps. The map from image I1 to I2 is called forward

correspondences ω1→2 , and the opposing direction from I2 to

I1 is called backward correspondences ω2→1 . When sufficient

computing time is available in pre-processing, initial flows can

be calculated in advance and passed to Flowlab for manual

refinement. For a relatively quick initial flow estimation, we

embedded the freely available GPU-based Flowlib [25]. We

also employ this optical flow method for fast interactive local

refinement for the user specified corrections.

2.2 Rendering

To be able to improve the current correspondences, it is

necessary to assess their current quality. Because we target

image interpolation, Flowlab contains a renderer to assist the

user in the visual inspection of the current results. Using mouse

dragging, the in-between images are rendered. A triangle

mesh with one vertex per pixel from I1 (respective I2 ) is

warped using the forward ω1→2 (respective backward ω2→1 )

correspondences.

In order to rate the accuracy of correspondences in one

direction only, the user can choose to show only the results

from one of those warped meshes. To judge the overall

quality of both correspondence directions, a blended version

is generated where the two warped images are combined into

one interpolated image. Mismatching regions are visible as

60

Figure 2: Selection stages. (left) In the source image,a polygonal region is selected (middle) possibly supportedby zooming. (right) In the target image, the polygon canbe projectively transformed with four control points. Atransparent overlay showing the current transformation canbe used for aligning regions.

ghosting artifacts in the latter renderings.

For the interpolated image, the blending weights are dependent

on the warping distance from the source image, as proposed by

Seitz et al. [22].

An example of two source images and an initial warping in the

forward direction is shown in Fig. 1.

It is easily possible to add other visualisation tools into the

framework, e.g. for evaluating correspondences in other

application domains.

2.3 Editing operations

Flowlab supports three basic editing operations: Removal of

small outliers, direct region mapping, and optical flow assisted

region mapping.

Small outliers can be eliminated with median filtering. The

user specifies a polygonal region, and a median filter is applied

within the selected region. The correspondences are filtered

with a fixed window size. This removes very high frequency

noise, which can appear around object boundaries, from the

correspondence map.

Region mapping is performed by first selecting a polygon in

the source image, defined by an arbitrary number of control

points. These can be moved to better approximate for example

a distinct object. A zoom function can be used to improve the

matching accuracy in complex regions.

The selected region is then mapped onto the destination image.

A bounding box with four control points allows the projective

transformation of the selection. The opacity of the polygon

overlay can be adjusted to allow a more comfortable alignment.

Figure 2 shows the selection stages.

As further user convenience, selection of foreground objects

can also be supported by GrabCut [20]. The selected polygon

border marks the definite background, whereas the center

region is assumed to be the desired foreground object. The

GrabCut selection is shown in Fig. 3.

A more detailed discussion of the editing operations can be

found in section 3.

Figure 3: GrabCut selection. (left) A foreground objectis approximately selected. (middle, right) The GrabCutalgorithm is used to segment foreground from background.

2.4 Interface

The Flowlab application is called from the command line. The

only mandatory parameters are two images. Optionally, two

correspondence maps (ω1→2 and ω2→1 ) can be specified as

well. As third optional parameter, two alpha masks matching

the two input images can be supplied; these are used in the

rendering only and have no influence on the editing itself.

Usage of Flowlab is targeted at HD images using a single

monitor. Therefore, both images are incorporated in one view,

and can be switched. Since editing operations are typically

applied to many images (e.g. frames of a video sequence),

input speed is a key consideration. Thus, we opted for hotkeys

for all commands. In our experimental prototype, we omitted

menu bars and buttons to conserve a clean user interface and

keep as much screen space as possible for the images.

The task of Flowlab is different from existing image editing

tools such as Adobe Photoshop [6] in that not the images

themselves are modified, but the relationship between the

images. Normal image editing operations are not applicable,

so the workflow had to be designed from the ground up.

Typically, a user loads two images and pre-computed image

correspondence maps, either from the command line or from a

larger video processing framework. After assessing the quality

of the blended image, and then the quality of the one-way

warped images, the user begins to map regions forward and

backward.

In some cases the correspondences within ω1→2 and ω2→1

are symmetric. One edit operation however only modifies

one correspondence direction. To avoid the duplicate work

of reselecting the region for the reverse direction, a “selection

swap” command exists. When the user has mapped a source

to a target region, the swap makes the targeted region the new

source selection and inverts the projective transformation. The

process is shown in Fig. 4.

In practice, the tool is mainly operated by novice users on a

temporary basis for video production purposes. On average, 15

minutes of supervised training and about 30 minutes of practice

are required to become productive with the tool. Despite the

hotkey-based approach, the productivity bottleneck is still the

accurate selection of fine details.

61

Figure 4: Region swapping for symmetrically correspondingregions. (left) A previously selected region is mapped ontothe target image. (middle) Upon region switch, the targetselection becomes the new source selection. (right) The firstimage is now the target for the editing operation.

3 Operations

Operations are performed on the correspondences between

two images I1 and I2 . These correspondences are stored

in two maps ω1→2 and ω2→1 . An operation has a source

and target image; the effect of the operation is stored in the

correspondence map ωs→t from source to target.

3.1 Selecting Polygons

Each operation starts with a polygon defining a set of enclosed

pixels p. A projective transformation π can be specified where

π ◦ p is the transformed pixel set.

The user can select a polygon by specifying a list of vertices in

Is such that �x ∈ p for all enclosed points �x = (x, y).

3.2 Filtering flows

Within the selected polygon, filter operations can be applied.

The median filter is useful to remove high frequency noise from

the correspondence field. We use a 5×5 kernel on ωs→t for all

�x ∈ p.

3.3 Projective Transformation

Using four control points, the polygon p can be transformed

to closely match the object shape in the target image. Using

the vertices of the bounding box in Is and allowing the user to

modify its counterpart in It, the projective transformation π is

specified.

The transformation

∀�x ∈ p : ωs→t(�x) = π ◦ �x− �x (1)

is applied to the pixel coordinates within the polygon: The

distance vectors that result from subtraction are written back

into the correspondence map.

3.4 Automatic Local Match

To achieve a refined local solution, we use π from 3.3 as flow

initialization. We define two subimages I′s and I′t which only

contain the selected region and its transformed counterpart.

I′s(�x) =

{Is(�x) �x ∈ p0 else

(2)

(a) (b) (c) (d) (e)

Figure 5: Grabcut masks. (a) User selection of a foregroundobject (b) binary segmentation mask (c) dilated mask, whoseoutside is considered “definite background” (d) eroded mask,whose inside is considered “definite foreground” (e) selectedobject after Grabcut.

I′t(�x) =

{It(π ◦ �x) �x ∈ p0 else

(3)

Using those two subimages, the local correspondence

refinement is calculated with Flowlib [25]:

∀�x ∈ p : ωs→t(�x) = flowlib(I′s, I′t)(�x) (4)

Ultimately, the selected region of ωs→t is overwritten by the

newly computed correspondences.

3.5 Selection using GrabCut

To ease the selection of foreground objects, the user can apply

an automatic segmentation. Within the current selection, the

GrabCut algorithm [20] is applied. Its input is specified

as follows: The user selection is used to create an image

mask m Fig. 5(b). We use morphological operations to

create two additional masks m+ (dilated) and m− (eroded)

(Fig. 5(c),(d)). The region outside m+ is marked as “definitive

background”, between m+ and m “probably background”,

between m and m− “probably foreground” and inside m−“definite foreground”.

We modify Eq. 4 to only assign calculated correspondences in

the foreground region:

∀�x ∈ foreground(p) : ωs→t(�x) = flowlib(I′s, I′t)(�x) (5)

In effect, the user is allowed coarser selections without

sacrificing accuracy. This further speeds up the workflow and

decreases the effort needed for corrections.

4 Applications

Manual image correspondence correction is often needed to

optimize the correspondences for a concrete target application.

These range on the one hand from applications where

correspondences are needed to produce visually convincing

results, e.g. in special effects creation pipelines. On the other

hand, manually corrected correspondences can also be used to

rate the performance of automatic correspondence estimation

algorithms.

The development of Flowlab has been conducted primarily

with the former in mind.

62

Currently available automatic algorithms can handle quite a lot

of scenarios. In some cases, the performance can be enhanced

even further by changing the hardware setup. Increasing

the sampling rate generally improves the correspondence

estimation performance. Dense multi-view camera setups for

the spatial domain or high speed cameras for the temporal

domain can reduce the pixel-wise distances between images.

For such scenarios, impressive results can be found in the

optical flow community [1].

However for more affordable and therefore typical setups with

sparse camera placement or slower frame rates, larger pixel

distances are very common. Here, automated results may

initially not be of sufficient quality.

Considering image interpolation or image-based rendering

techniques, the quality of correspondences can be rated by

viewing the interpolation results. If the results are not visually

convincing, Flowlab can then be used to correct the erroneous

regions within the underlying correspondences. This often

makes any later correction pass on the output images a lot

faster or even unnecessary. Furthermore if the correspondences

are used for multiple output frames, e.g. in a slow motion

scenario, the effort for correction is reduced even more. Only

one correspondence pair instead of each frame has to be

modified.

Ultimately, it is on the user to decide whether an artifact is best

corrected in the correspondence or a later stage.

While Flowlab was primarily designed with movie production

in mind, there are other possible applications. For example

ground truth generation of correspondence maps is a non-

straightforward task due to the lack in tools. Pixel-exact ground

truth can be generated with Flowlab by mapping each region

manually down to pixel level.

5 Results

To evaluate the effectiveness of manual correspondence

field correction, we examine multiple image pairs. For small

movements (in general < 1 pixel), most optical flow algorithms

estimate very good results and make further corrections

unnecessary. Therefore, we examine situations where editing

is necessary because the classical optical flow assumptions are

violated. We start with large distance correspondences, non-

Lambertian surfaces, and low-textured regions. Unless stated

otherwise, the initial correspondence fields were generated by

Flowlib, which is a GPU implementation of the Werlberger et

al. optical flow [25].

It should be noted that the renderer used in Flowlab is not

designed to give the best possibly achievable rendering result,

but to render in a way that shows the deficiencies in the

correspondence fields. When pixels are warped, noticeable

streaking artifacts appear. Those artifacts could be avoided by

discarding mesh triangles that stretch too much, but this would

obscure the underlying error in correspondences.

Larger movements are in general not problematic as long as the

Figure 6: Basketball sequence from the Middlebury data set.The upper left shows the original first image, the upper rightthe fully warped second image without correction, the lowerleft the fully warped second image with correction, the lowerright the fully warped second image with an alternative largedisplacement optical flow[4]. Although the ball is big enoughto be captured by the pyramid scheme, its spherical shape isnot preserved in either of the two algorithms.

Figure 7: Bean bags sequence from the Middlebury dataset. The upper left shows the original first image, the upperright the fully warped second image without correction, thelower left the fully warped second image with correction, thelower right the fully warped second image with an alternativelarge displacement optical flow. The ball is too small to becaptured even by the large displacement optical flow, andneeds manual correction.

63

Figure 8: Calibration scene from a soccer tricks shortvideo recording. The upper left shows the original firstimage, the upper right the fully warped second image withoutcorrection, the lower left the fully warped second image withcorrection, the lower right the fully warped second image withan alternative large displacement optical flow. The specularhighlight on the checkerboard cannot be tracked very well,with noticeable distortions in both the fast and the largedisplacement optical flow.

size of the moving object is larger than the displacement. This

scenario is usually addressed by the pyramid scheme of most

optical flow algorithms, which recovers large scale motion

from downsampled versions of the original images. Smaller

details however are smoothed over during downsampling and

therefore lost.

Fig. 6 shows the Basketball scene from the Middlebury data

set [1]. Although the basketball itself is captured well, it is

heavily distorted. A local user correction shows that it can

be improved considerably. A direct comparison to the large

displacement optical flow algorithm by Brox et al. [4] shows

that although the ball remains largely intact, the spherical shape

is not preserved.

For small moving objects over increasing distances, at

some point even algorithms specifically designed for large

displacements cannot provide good results, as shown in the

Middlebury Bean Bags sequence depicted in Fig. 7. The

motion of the two bean bags can be corrected well; streaking

artifacts, which would be removed by triangle discarding

in a production renderer, highlight the motion. However,

occlusions and disocclusions (e.g. the fingers on the left side

hand catching the ball) remain open issues even with an editor,

because this information is simply not encoded within 2D

pixel correspondences.

Non-Lambertian cases present a violation of the brightness

constancy assumption and lead any optical flow algorithm

based on that assumption astray. Fig. 8 shows the calibration

scene for a short video featuring soccer tricks. The scene

includes a specular checkerboard with prominent highlights.

Neither the performance oriented Flowlib nor the quality

oriented large displacement algorithm can solve this ill-posed

Figure 9: Highway sequence from the Heidelberg stereo dataset. The upper left shows the original first image, the upperright the fully warped second image without correction, thelower left the fully warped second image with correction, thelower right the fully warped second image with an alternativelarge displacement optical flow. The low textured streetcannot be tracked well at all, as can be seen in the non-advancing street. The alternative large displacement opticalflow has a full failure case. In the corrected image, the lowerpart is black because the data does not exist in the sourceimage.

problem, making manual intervention mandatory.

Fig. 9, part of the gray scale highway sequence from the

Heidelberg data set [14], shows two low-textured region

effects. First, the performance oriented Flowlib algorithm errs

on the side of too little motion, and does not match for example

the road strips at all. The large displacement optical flow, with

its relaxed regularizer, features significant local distortions in

all directions. In the manual correction, the street has been

projectively mapped; due to the planar nature of the street,

it works particularly well in this case. The black strip at the

bottom border is caused by a recording disocclusion: This part

of the street was simply not photographed in the source image.

It should be noted that while it may seem unfair to compare

automatic estimation methods to manual corrections, the

results do demonstrate typical failure cases where manual

intervention is necessary. In the future, we expect the number

of failure cases to diminish with improved algorithms, but still

remain for pathological cases, leaving the justification for a

correspondence editing tool intact.

The pre-frame efforts for manual correction range from 10

seconds for simple objects to several minutes for fine structures

in complex scenes.

64

6 Conclusion

We presented Flowlab, an interactive tool for editing dense

image correspondences. In difficult cases for optical flow

and correspondence estimation algorithms, the tool facilitates

manual corrections to the resulting correspondence maps. This

represents a change of editing paradigm – previously, manual

correction had mostly been performed on the synthesized

images, not on the relationship between images.

Our results show typical failure cases for current state-of-

the-art algorithms: Small objects over large distances; non-

Lambertian objects; and low-textured regions. In all these

cases, the correspondence maps are considerably improved by

user editing.

Manually specified correspondence regions can be projectively

transformed, or a local optical flow can be computed.

This avoids over-edge-bleeding even for non-visible edges.

Convenience tools like GrabCut selection further ease the

editing task at hand.

Use cases in short video production demonstrated the small

learning curve of the tool, taking a new user less than an

hour to become already proficient, and thus enabling rapid

parallelization of editing tasks. Coupled with the considerable

improvements in the resulting correspondence maps, Flowlab

has become a standard editing tool in our production pipeline.

In the future, we plan to release the Flowlab software to the

general public in order to facilitate correspondence editing

tasks for groups interested in this technology.

Acknowledgements

Funding by the European Research Council ERC under

contract No. 256941 “Reality CG” and by the German Science

Foundation DFG MA 2555/1-3 is gratefully acknowledged.

We would like to thank the GPU4vision project [15] for making

the Flowlib library [25] openly available.

References

[1] Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth,

Michael J. Black, and Richard Szeliski. A database

and evaluation methodology for optical flow. In IEEEInternational Conference on Computer Vision (ICCV),pages 1–8. IEEE Computer Society, 2007.

[2] Luca Ballan, Gabriel J. Brostow, Jens Puwein, and

Marc Pollefeys. Unstructured video-based rendering:

Interactive exploration of casually captured videos. ACMTrans. on Graphics (Proc. SIGGRAPH), 29(3):87, July

2010.

[3] Thaddeus Beier and Shawn Neely. Feature-based

image metamorphosis. In Proceedings of the 19thannual conference on Computer graphics and interactivetechniques, SIGGRAPH ’92, pages 35–42, New York,

NY, USA, 1992. ACM.

[4] T. Brox, C. Bregler, and J. Malik. Large displacement

optical flow. IEEE Computer Society Conference onComputer Vision and Pattern Recognition, 0:41–48,

2009.

[5] Shenchang Eric Chen and Lance Williams. View

interpolation for image synthesis. In Proc. ofACM SIGGRAPH’93, pages 279–288. ACM Press/ACM

SIGGRAPH, 1993.

[6] Adobe Photoshop CS5. http://www.adobe.com/products/

photoshop.html, 2010.

[7] Piotr Didyk, Tobias Ritschel, Elmar Eisemann, Karol

Myszkowski, and Hans-Peter Seidel. Adaptive image-

space stereo view synthesis. In Vision, Modelingand Visualization Workshop, pages 299–306, Siegen,

Germany, 2010.

[8] Yasutaka Furukawa and Jean Ponce. Accurate, dense,

and robust multi-view stereopsis. IEEE Trans. on PatternAnalysis and Machine Intelligence, 32(8):1362–1376,

2009.

[9] Marcel Germann, Alexander Hornung, Richard Keiser,

Remo Ziegler, Stephan Wurmlin, and Markus Gross.

Articulated billboards for video-based rendering.

Comput. Graphics Forum (Proc. Eurographics),29(2):585, 2010.

[10] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and

S.M. Seitz. Multi-view stereo for community photo

collections. In IEEE International Conference onComputer Vision (ICCV), 2007.

[11] Felix Klose, Christian Lipski, and Marcus Magnor.

Reconstructing shape and motion from asynchronous

cameras. In Proc. Vision, Modeling and Visualization(VMV) 2010, pages 171–177, Siegen, Germany, 2010.

[12] Christian Lipski, Christian Linz, Kai Berger, Anita

Sellent, and Marcus Magnor. Virtual video camera:

Image-based viewpoint navigation through space and

time. Computer Graphics Forum, 29(8):2555–2568,

2010.

[13] Christian Lipski, Christian Linz, Thomas Neumann, and

Marcus Magnor. High resolution image correspondences

for video Post-Production. In CVMP 2010, pages 33–39,

London, 2010.

[14] Stephan Meister, Bernd Jahne, and Daniel Kondermann.

An outdoor stereo camera system for the generation

of real-world benchmark datasets with ground truth.

Technical report, Universitatsbibliothek der Universitat

Heidelberg, 2011.

[15] University of Graz. Gpu4vision project.

http://www.gpu4vision.org/.

65

[16] J.P. Pons, R. Keriven, and O. Faugeras. Modelling

dynamic scenes by registering multi-view image

sequences. In IEEE Computer Society Conference onComputer Vision and Pattern Recognition, 2005. CVPR2005, volume 2, 2005.

[17] J.P. Pons, R. Keriven, and O. Faugeras. Multi-view stereo

reconstruction and scene flow estimation with a global

image-based matching score. International Journal ofComputer Vision, 72(2):179–193, 2007.

[18] Evan Powell. Is frame interpolation important? ProjectorCentral, Whitepaper, 2009.

[19] K. Rohr, H. Stiehl, R. Sprengel, W. Beil, T. Buzug,

J. Weese, and M. Kuhn. Point-based elastic registration

of medical image data using approximating thin-plate

splines. In Visualization in Biomedical Computing,

volume 1131 of Lecture Notes in Computer Science,

pages 297–306. Springer Berlin / Heidelberg, 1996.

[20] C. Rother, V. Kolmogorov, and A. Blake. Grabcut:

Interactive foreground extraction using iterated graph

cuts. In ACM Transactions on Graphics (TOG),volume 23, pages 309–314. ACM, 2004.

[21] S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, and

R. Szeliski. A comparison and evaluation of multi-view

stereo reconstruction algorithms. Computer Vision andPattern Recognition, IEEE Computer Society Conferenceon, 1:519–528, 2006.

[22] S.M. Seitz and C.R. Dyer. View morphing. In

Proceedings of the 23rd annual conference on Computergraphics and interactive techniques, pages 21–30. ACM,

1996.

[23] S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade.

Three-dimensional scene flow. IEEE Transactions onPattern Analysis and Machine Intelligence, 27:475–480,

2005.

[24] Sundar Vedula, Simon Baker, Robert Collins, Takeo

Kanade, and Peter Rander. Three-dimensional scene

flow. Computer Vision, IEEE International Conferenceon, 2:722, 1999.

[25] Manuel Werlberger, Werner Trobin, Thomas Pock,

Andreas Wedel, Daniel Cremers, and Horst Bischof.

Anisotropic Huber-L1 optical flow. In Proceedings ofthe British Machine Vision Conference (BMVC), London,

UK, September 2009.

[26] Lucy Wilkes. The role of ocula in stereo post production.

The Foundry, Whitepaper, 2009.

[27] G. Wolberg. Image morphing: a survey. The VisualComputer, 14(8):360–372, 1998.

[28] L. Zhang, B. Curless, and S. Seitz. Spacetime stereo:

Shape recovery for dynamic scenes. In IEEE ComputerSociety Conference on Computer Vision and PatternRecognition, volume 2, pages 367–374, 2003.

[29] Y. Zhang and C. Kambhamettu. On 3D scene flow and

structure estimation. In Proc. of CVPR’01, volume 2,

pages 778–785. IEEE Computer Society, 2001.

66