on refractive optical flow

12
On Refractive Optical Flow Sameer Agarwal, Satya P. Mallick, David Kriegman, and Serge Belongie University of California, San Diego, La Jolla CA 92093, USA, {sagarwal@cs,spmallick@graphics,kriegman@cs,sjb@cs}.ucsd.edu, http://vision.ucsd.edu/ Abstract. This paper presents a novel generalization of the optical flow equation to the case of refraction, and it describes a method for recover- ing the refractive structure of an object from a video sequence acquired as the background behind the refracting object moves. By structure here we mean a representation of how the object warps and attenuates (or amplifies) the light passing through it. We distinguish between the cases when the background motion is known and unknown. We show that when the motion is unknown, the refractive structure can only be estimated up to a six-parameter family of solutions without additional sources of information. Methods for solving for the refractive structure are descri- bed in both cases. The performance of the algorithm is demonstrated on real data, and results of applying the estimated refractive structure to the task of environment matting and compositing are presented. 1 Introduction The human visual system is remarkable in its ability to look at a scene through a transparent refracting object and to deduce the structural properties of that object. For example, when cleaning a wine glass, imperfections or moisture may not be visible at first, but they become apparent when one holds the glass up and moves or rotates it. We believe that the primary cue here is the optical flow of the background image as observed through the refracting object, and our aims in this paper are to build a theory of how motion can be used for recovering the structure of a refracting object, to introduce algorithms for estimating this structure, and to empirically validate these algorithms. By structure here we mean a representation of how the object warps and attenuates (or amplifies) the light passing through it. Recall, as a light ray enters or exits the object, its direction changes according to Snell’s Law (known as Descartes’ Law in France). Furthermore, the emitted radiance may differ from the incident radiance due to the difference in solid angle caused by the geometry of the interfaces between the object and the air as well as absorption along the light ray’s path through the object. The geometric shape of the object itself, while of independent interest, is not the focus of our inquiry here. The primary contribution of this paper is to generalize the optical flow equa- tion to account for the warping and attenuation caused by a refractive object, and to present algorithms for solving for the warp and attenuation using a se- quence of images obtained as a planar background moves behind the refracting T. Pajdla and J. Matas (Eds.): ECCV 2004, LNCS 3022, pp. 483–494, 2004. c Springer-Verlag Berlin Heidelberg 2004

Upload: others

Post on 21-Feb-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

On Refractive Optical Flow

Sameer Agarwal, Satya P. Mallick, David Kriegman, and Serge Belongie

University of California, San Diego, La Jolla CA 92093, USA,{sagarwal@cs,spmallick@graphics,kriegman@cs,sjb@cs}.ucsd.edu,

http://vision.ucsd.edu/

Abstract. This paper presents a novel generalization of the optical flowequation to the case of refraction, and it describes a method for recover-ing the refractive structure of an object from a video sequence acquiredas the background behind the refracting object moves. By structure herewe mean a representation of how the object warps and attenuates (oramplifies) the light passing through it. We distinguish between the caseswhen the background motion is known and unknown. We show that whenthe motion is unknown, the refractive structure can only be estimatedup to a six-parameter family of solutions without additional sources ofinformation. Methods for solving for the refractive structure are descri-bed in both cases. The performance of the algorithm is demonstrated onreal data, and results of applying the estimated refractive structure tothe task of environment matting and compositing are presented.

1 Introduction

The human visual system is remarkable in its ability to look at a scene througha transparent refracting object and to deduce the structural properties of thatobject. For example, when cleaning a wine glass, imperfections or moisture maynot be visible at first, but they become apparent when one holds the glass up andmoves or rotates it. We believe that the primary cue here is the optical flow ofthe background image as observed through the refracting object, and our aimsin this paper are to build a theory of how motion can be used for recoveringthe structure of a refracting object, to introduce algorithms for estimating thisstructure, and to empirically validate these algorithms. By structure here wemean a representation of how the object warps and attenuates (or amplifies)the light passing through it. Recall, as a light ray enters or exits the object, itsdirection changes according to Snell’s Law (known as Descartes’ Law in France).Furthermore, the emitted radiance may differ from the incident radiance due tothe difference in solid angle caused by the geometry of the interfaces between theobject and the air as well as absorption along the light ray’s path through theobject. The geometric shape of the object itself, while of independent interest,is not the focus of our inquiry here.

The primary contribution of this paper is to generalize the optical flow equa-tion to account for the warping and attenuation caused by a refractive object,and to present algorithms for solving for the warp and attenuation using a se-quence of images obtained as a planar background moves behind the refracting

T. Pajdla and J. Matas (Eds.): ECCV 2004, LNCS 3022, pp. 483–494, 2004.c© Springer-Verlag Berlin Heidelberg 2004

Verwendete Distiller 5.0.x Joboptions
Dieser Report wurde automatisch mit Hilfe der Adobe Acrobat Distiller Erweiterung "Distiller Secrets v1.0.5" der IMPRESSED GmbH erstellt. Sie koennen diese Startup-Datei für die Distiller Versionen 4.0.5 und 5.0.x kostenlos unter http://www.impressed.de herunterladen. ALLGEMEIN ---------------------------------------- Dateioptionen: Kompatibilität: PDF 1.3 Für schnelle Web-Anzeige optimieren: Nein Piktogramme einbetten: Nein Seiten automatisch drehen: Nein Seiten von: 1 Seiten bis: Alle Seiten Bund: Links Auflösung: [ 2400 2400 ] dpi Papierformat: [ 594.962 841.96 ] Punkt KOMPRIMIERUNG ---------------------------------------- Farbbilder: Downsampling: Ja Berechnungsmethode: Bikubische Neuberechnung Downsample-Auflösung: 300 dpi Downsampling für Bilder über: 450 dpi Komprimieren: Ja Automatische Bestimmung der Komprimierungsart: Ja JPEG-Qualität: Maximal Bitanzahl pro Pixel: Wie Original Bit Graustufenbilder: Downsampling: Ja Berechnungsmethode: Bikubische Neuberechnung Downsample-Auflösung: 300 dpi Downsampling für Bilder über: 450 dpi Komprimieren: Ja Automatische Bestimmung der Komprimierungsart: Ja JPEG-Qualität: Maximal Bitanzahl pro Pixel: Wie Original Bit Schwarzweiß-Bilder: Downsampling: Ja Berechnungsmethode: Bikubische Neuberechnung Downsample-Auflösung: 2400 dpi Downsampling für Bilder über: 24000 dpi Komprimieren: Ja Komprimierungsart: CCITT CCITT-Gruppe: 4 Graustufen glätten: Nein Text und Vektorgrafiken komprimieren: Ja SCHRIFTEN ---------------------------------------- Alle Schriften einbetten: Ja Untergruppen aller eingebetteten Schriften: Nein Wenn Einbetten fehlschlägt: Warnen und weiter Einbetten: Immer einbetten: [ /Courier-BoldOblique /Helvetica-BoldOblique /Courier /Helvetica-Bold /Times-Bold /Courier-Bold /Helvetica /Times-BoldItalic /Times-Roman /ZapfDingbats /Times-Italic /Helvetica-Oblique /Courier-Oblique /Symbol ] Nie einbetten: [ ] FARBE(N) ---------------------------------------- Farbmanagement: Farbumrechnungsmethode: Farbe nicht ändern Methode: Standard Geräteabhängige Daten: Einstellungen für Überdrucken beibehalten: Ja Unterfarbreduktion und Schwarzaufbau beibehalten: Ja Transferfunktionen: Anwenden Rastereinstellungen beibehalten: Ja ERWEITERT ---------------------------------------- Optionen: Prolog/Epilog verwenden: Ja PostScript-Datei darf Einstellungen überschreiben: Ja Level 2 copypage-Semantik beibehalten: Ja Portable Job Ticket in PDF-Datei speichern: Nein Illustrator-Überdruckmodus: Ja Farbverläufe zu weichen Nuancen konvertieren: Ja ASCII-Format: Nein Document Structuring Conventions (DSC): DSC-Kommentare verarbeiten: Ja DSC-Warnungen protokollieren: Nein Für EPS-Dateien Seitengröße ändern und Grafiken zentrieren: Ja EPS-Info von DSC beibehalten: Ja OPI-Kommentare beibehalten: Nein Dokumentinfo von DSC beibehalten: Ja ANDERE ---------------------------------------- Distiller-Kern Version: 5000 ZIP-Komprimierung verwenden: Ja Optimierungen deaktivieren: Nein Bildspeicher: 524288 Byte Farbbilder glätten: Nein Graustufenbilder glätten: Nein Bilder (< 257 Farben) in indizierten Farbraum konvertieren: Ja sRGB ICC-Profil: sRGB IEC61966-2.1 ENDE DES REPORTS ---------------------------------------- IMPRESSED GmbH Bahrenfelder Chaussee 49 22761 Hamburg, Germany Tel. +49 40 897189-0 Fax +49 40 897189-71 Email: [email protected] Web: www.impressed.de
Adobe Acrobat Distiller 5.0.x Joboption Datei
<< /ColorSettingsFile () /AntiAliasMonoImages false /CannotEmbedFontPolicy /Warning /ParseDSCComments true /DoThumbnails false /CompressPages true /CalRGBProfile (sRGB IEC61966-2.1) /MaxSubsetPct 100 /EncodeColorImages true /GrayImageFilter /DCTEncode /Optimize false /ParseDSCCommentsForDocInfo true /EmitDSCWarnings false /CalGrayProfile () /NeverEmbed [ ] /GrayImageDownsampleThreshold 1.5 /UsePrologue true /GrayImageDict << /QFactor 0.9 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> /AutoFilterColorImages true /sRGBProfile (sRGB IEC61966-2.1) /ColorImageDepth -1 /PreserveOverprintSettings true /AutoRotatePages /None /UCRandBGInfo /Preserve /EmbedAllFonts true /CompatibilityLevel 1.3 /StartPage 1 /AntiAliasColorImages false /CreateJobTicket false /ConvertImagesToIndexed true /ColorImageDownsampleType /Bicubic /ColorImageDownsampleThreshold 1.5 /MonoImageDownsampleType /Bicubic /DetectBlends true /GrayImageDownsampleType /Bicubic /PreserveEPSInfo true /GrayACSImageDict << /VSamples [ 1 1 1 1 ] /QFactor 0.15 /Blend 1 /HSamples [ 1 1 1 1 ] /ColorTransform 1 >> /ColorACSImageDict << /VSamples [ 1 1 1 1 ] /QFactor 0.15 /Blend 1 /HSamples [ 1 1 1 1 ] /ColorTransform 1 >> /PreserveCopyPage true /EncodeMonoImages true /ColorConversionStrategy /LeaveColorUnchanged /PreserveOPIComments false /AntiAliasGrayImages false /GrayImageDepth -1 /ColorImageResolution 300 /EndPage -1 /AutoPositionEPSFiles true /MonoImageDepth -1 /TransferFunctionInfo /Apply /EncodeGrayImages true /DownsampleGrayImages true /DownsampleMonoImages true /DownsampleColorImages true /MonoImageDownsampleThreshold 10.0 /MonoImageDict << /K -1 >> /Binding /Left /CalCMYKProfile (U.S. Web Coated (SWOP) v2) /MonoImageResolution 2400 /AutoFilterGrayImages true /AlwaysEmbed [ /Courier-BoldOblique /Helvetica-BoldOblique /Courier /Helvetica-Bold /Times-Bold /Courier-Bold /Helvetica /Times-BoldItalic /Times-Roman /ZapfDingbats /Times-Italic /Helvetica-Oblique /Courier-Oblique /Symbol ] /ImageMemory 524288 /SubsetFonts false /DefaultRenderingIntent /Default /OPM 1 /MonoImageFilter /CCITTFaxEncode /GrayImageResolution 300 /ColorImageFilter /DCTEncode /PreserveHalftoneInfo true /ColorImageDict << /QFactor 0.9 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> /ASCII85EncodePages false /LockDistillerParams false >> setdistillerparams << /PageSize [ 595.276 841.890 ] /HWResolution [ 2400 2400 ] >> setpagedevice

484 S. Agarwal et al.

object. Both the case where the background motion is known and where it isunknown are considered. We demonstrate the performance of these algorithmson both synthetic and real data.

While there is a vast literature on motion understanding including transpa-rency, the warp induced by refraction appears to have been neglected hitherto [1,2,3,4]. The ability to recover the refractive structure of an object has a numberof applications. Recently, environment matting and compositing have been in-troduced as techniques for rendering images of scenes which contains foregroundobjects that refract and reflect light [5,6,7]. The proposed approach offers a newmethod for creating a matte of real objects without the need for extensive ap-paratus besides a video camera. Refractive optical flow may also be useful forvisual inspection of transparent/translucent objects.

This paper was inspired by the work of Zongker et. al [7] on environmentmatting and its subsequent extension by Chuang et. al. [5]. We discuss thismethod in the next section. However the work that comes closest to ours inspirit is that of H. Murase [8]. Murase uses optical flow to recover the shapeof a flexible transparent object, e.g., water waves that change over time. Tomake the problem tractable he makes a number of simplifying assumptions (a)The camera is orthographic, (b) The refracting object is in contact with thebackground plane which has a flat shape and a static unknown pattern on it, (c)the average shape over time of the refracting surface is known a priori to havezero slope. In this paper our interest is in scenes where the refracting object isrigid and stationary. Beyond that, we make no assumptions about the numberof objects, their refractive indices, or the positioning of the refracting object inthe scene.We do not address effects due to specular, Fresnel or total internalreflections in this work.

The rest of the paper is organized as follows. In the next section we begin bydescribing our image formation model and we then introduce the notion of anoptical kernel and describe how the choice of a particular optical kernel leads to anew generalization of the optical flow equation. Section 3 describes algorithms forsolving for the refractive structure using this equation. Section 4 demonstratesthe performance of our algorithm on synthetic and real data, and its applicationto matting. We conclude in Section 5 with a discussion and directions for futurework.

2 A Theory of Refractive Optical Flow

In this section we describe our image formation model and use it to derive therefractive optical flow equation. As illustrated in Figure 1(a), we assume thatthe scene consists of three entities.

1. A background image plane B, where B(u, t) denotes the emitted radianceat the point u = (u, v)� and time t.

2. A foreground image plane I, where I(x, t) denotes the incident radiance atthe point x = (x, y)� and time t.

3. A collection of one or more refracting objects between I and B.

On Refractive Optical Flow 485

I(x, t) B(u, t)

x1

x2

T (x2)

T (x1)

I(x, t) B(u, t)

(a) (b)

Fig. 1. The image formation model: (a) The general image formation model, wherethe incident irradiance at a point x in the foreground plane I is a result of the emittedradiance of some number of points in the background plane B. (b) The single ray imageformation model, where the incident irradiance at x is linear function of the emittedradiance at the point T (x) in the background plane.

The illumination in the scene is assumed to be ambient or coming from a di-rectional light source, and the background plane is assumed to be an isotropicradiator. The background and the foreground planes are not constrained to beparallel. No assumptions are made about the shape of the refracting objects,or optical properties of their constituent materials. We treat the refracting ob-jects as a “black-box” function that warps and attenuates the light rays passingthrough it. It is our aim to recover this function from a small set of images ofthe scene. Given the assumptions about scene illumination stated earlier, theincident radiance at a point x in the foreground is a result of the reflected andemitted radiance from the background plane. Now let the function that indicateswhat fraction of the light intensity observed at a point x in the foreground comesfrom the point u in the background be denoted by the optical kernel K(u,x).The total light intensity at x can now be expressed as an integral over the entirebackground plane,

I(x, t) =∫

K(u,x)B(u, t)du (1)

The problem of recovering the refractive structure of an object can now berestated as the problem of estimating the optical kernel associated with it.

The set of all functions of the form K(u,x) is huge. A large part of this setconsists of functions that violate laws of physics. However the set of physicallyplausible optical kernels is still very big, and the reconstruction of K using asmall number of images is an ill-posed problem. Additional constraints must beused to make the problem tractable.

One such direction of inquiry is to assume a low dimensional form for K bya small set of parameters. Zongker et al. in their work on environment matting,assume a parametric box form for K(u,x),

K(u,x) =

{1/µ(x) if a(x) ≤ u ≤ b(x)& c(x) ≤ v ≤ d(x)0 otherwise

(2)

486 S. Agarwal et al.

where, a(x), b(x), c(x), d(x) are functions of x. µ(x) is the area of the rectangleenclosed by (a, c) and (b, d). The kernel maps the average intensity of a rectan-gular region in the background to a point in the foreground. The values of theparameters {a(x), b(x), c(x), d(x)} for each point x are determined using a set ofcalibrated background patterns and performing a combinatorial search on themso as to minimize the reconstruction error of the foreground image. Chuang etal. [5] generalize K to be from the class of oriented two-dimensional Gaussians.In both of these cases, knowledge of the background image is assumed.

In this paper we choose to pursue an alternate direction. We consider opticalkernels of the form

K(u,x) = α(x)δ(u − T (x)) (3)

where δ(·) is Dirac’s delta function, T (x) is a piecewise differentiable functionthat serves as the parameter for the kernel indicating the position in the back-ground plane where the kernel is placed when calculating the brightness at xin the foreground image plane. The function α(x) is a positive scalar functionthat accounts for the attenuation of light reaching x. Figure 1(b) illustrates thesetup.

In the following we will show how if we restrict ourselves to this class of op-tical kernels, we can recover the refractive structure without any knowledge ofthe background plane. We will also demonstrate with experiments how this sub-class of kernels despite having a very simple description is capable of capturingrefraction through a variety of objects.

The image formation equation can now be re-written as

I(x, t) =∫

α(x)δ(u − T (x))B(u, t)du (4)

= α(x)B(T (x), t) (5)

We begin by differentiating the above equation w.r.t x, to get

∇xI(x, t) =(∇xα(x))B(T (x)) + α(x)J�(T (x))(∇T (x)B(T (x), t)) (6)

Here, J�(T (x)) is the transpose of the Jacobian of the transformation T (x).Using Equation (5) the above equation can be written as

∇xI(x, t) = I(x, t)∇xα(x)

α(x)+ J�(T (x))

[α(x)∇T (x)B(T (x))

]

J−�(T (x))[∇xI(x, t) − I(x, t)

∇xα(x)α(x)

]= α(x)∇T (x)B(T (x)) (7)

Taking temporal derivatives of Equation (5) gives

It(x, t) = α(x)Bt(T (x), t) (8)

Now, let c(u, t) denote the velocity of the point u at time t in the backgroundplane. Then from the Brightness Constancy Equation [9] we know

c(u, t)�∇uB(u, t) + Bt(u, t) = 0 (9)

On Refractive Optical Flow 487

Now assuming that the background image undergoes in-plane translation withvelocity c(u, t) = c(T (x), t), we take the dot product of Equation (7) withc(T (x), t) and add it to Equation (8) to get

c(T (x), t)�J−�(T (x))[∇xI(x, t) − I(x, t)

∇xα(x)α(x)

]+ It(x, t) =

α(x)[c(T (x), t)�∇T (x)B(T (x)) + Bt(T (x))

](10)

From Equation (9) we know that the right hand side goes to zero everywhere,giving us

c(T (x), t)�J−�(T (x))[∇xI(x, t) − I(x, t)

∇xα(x)α(x)

]+ It(x, t) = 0 (11)

Now for simplification’s sake, let β(x) = log α(x). Dropping the subscript on ∇we get

c(T (x), t)�J−�(T (x)) [∇I(x, t) − I(x, t)∇β(x)] + It(x, t) = 0 (12)

This is the refractive optical flow equation.

2.1 Properties of the Refractive Optical Flow Equation

Before we dive into the solution of Equation (12) for recovering the refractivestructure, we comment on its form and some of its properties. The first obser-vation is that if there is no distortion or attenuation, i.e.,

T (x) = x, α(x) = 1

the Jacobian of T reduces to the identity, and the gradient of β(x) reduces tozero everywhere, giving us the familiar optical flow equation in the foregroundimage.

c(x, t)�∇I(x, t) + It(x, t) = 0 (13)

The second observation is that the equation is independent of B(u, t). Thismeans that we can solve for the refractive flow through an object just by obser-ving a distorted version of the background image. Knowledge of the backgroundimage itself is not needed.

Finally we observe that Equation (12) is in terms of the Jacobian of T ,i.e. any function T ′(x) = T (x) + u0 will result in the same equation. Thisimplies that T can only be solved up to a translation ambiguity. Visually thisis equivalent to viewing the scene through a periscope. The visual system hasno way of discerning whether or not an image was taken through a periscope. Asecond ambiguity is introduced into the solution when we note that the velocityc(u, t) is in the background plane and there is nothing that constrains the twocoordinate systems from having different scales along each axis. Hence T (x) canonly be recovered up to a four parameter family of ambiguities correspondingto translation and scaling. The attenuation function β(x) is not affected by thescaling in c(u, t) and hence is recovered up to a translation factor which in turnmeans that α(x) is recovered up to a scale factor.

488 S. Agarwal et al.

3 Solving the Equation of Refractive Flow

In this section, we further analyze Equation (12) for the purposes of solving it.We begin by considering a further simplification of Equation (12). We assumethat the background plane translates with in-plane velocity c(u, t) = c(t) that isconstant over the entire background plane. We consider two cases, the calibratedcase (when the motion of the background c(t) is known) and the uncalibratedcase (when the motion of the background is unknown). In each case we describemethods for recovering T (x) and α(x), and note the ambiguities in the resultingsolution. Let T (x) be denoted by

T (x, y) = (g(x, y), h(x, y))� (14)

The Jacobian of T and its inverse transpose are

J(T (x)) =[

gx gy

hx hy

]J−�(T (x)) =

1gxhy − gyhx

[hy −hx

−gy gx

]

The translation velocity in the background plane is c(t) = (ξ(t), η(t))�. Substi-tuting these in Equation (12) we get

[ξ η

] 1gxhy − hxgy

[hy −hx

−gy gx

] [Ix − βxIIy − βyI

]+ It = 0 (15)

which rearranges to

ηIygx − ηIxgy − ξIyhx + ξIxhy + ηI(gyβx − gxβy) − ξI(hyβx − hxβy)gxhy − gyhx

+ It = 0

(16)

Now let

p =gx

gxhy − gyhxq =

gy

gxhy − gyhxa =

gyβx − gxβy

gxhy − gyhx(17)

r =hx

gxhy − gyhxs =

hy

gxhy − gyhxb =

hyβx − hxβy

gxhy − gyhx(18)

we get

ηIyp − ηIxq − ξIyr + ξIxs + ηIa − ξIb + It = 0 (19)

3.1 The Calibrated Case

In the case where the velocity of the background plane c(u, t) is known, Equa-tion (19) is linear in p, q, r, s, a, b. Given 7 or more successive frames of a videoor 14 or more pairs of successive frames and the associated motion of the back-ground plane, we can solve the equation point-wise over the entire image. Given

On Refractive Optical Flow 489

n + 1 successive frames, we get n equations in 6 variables at each point in theforeground plane, giving us an over-constrained linear system of the form

Ap = m (20)

here the ith row of the matrix A is given by

Ai =[η(i)Iy(i) −η(i)Ix(i) −ξ(i)Iy(i) ξ(i)Ix(i) η(i)I(i) −ξ(i)I(i)

]m = [−It(1), . . . ,−It(n)]� p =

[p q r s a b

]� (21)

The above system of equations is solved simply by the method of linear leastsquares. The corresponding values of gx, gy, hx, hx, βx, βy can then be obtainedas follows.

gx =p

ps − qrgy =

q

ps − qrβx =

bp − ar

ps − qr(22)

hx =r

ps − qrhy =

s

ps − qrβy =

bq − as

ps − qr(23)

3.2 The Uncalibrated Case

If the motion of the background image is not known, Equation (19) is bilinear inthe variables p, q, r, s, a, b and ξ(t), η(t). If we consider frames i = 1, . . . , n+1 andpixels j = 1, . . . , m in the foreground image, then we can rewrite Equation (19)in the following form

c�i Aijpj = 1 i = 1, . . . , n j = 1, . . . , m (24)

where

ci =[ξ(i) η(i)

]�pj =

[p(j) q(j) r(j) s(j) a(j) b(j)

]�

Aij =−1

It(i, j)

[0 0 −Iy(i, j) Ix(i, j) 0 −I(i, j)

Iy(i, j) −Ix(i, j) 0 0 I(i, j) 0

]

This gives us nm equations in 2n+6m variables, and we can solve them whenevernm > 2n + 6m.

Equation (24) is a system of bilinear equations, i.e. given ci, the systemreduces to a linear system in pj and vice versa. The overall problem howeveris highly non-linear. Using this observation a simple alternating procedure forsolving the system can be used, which starting with a random initialization,alternates between updating ci and pj using linear least squares. Even with thefast iterative procedure, solving for the structure in this manner is not feasible.Hence we use a small slice through image stack and use it to solve for cj , which isthen used as input to the calibrated algorithm described in the previous sectionto recover the flow over the entire foreground plane.

490 S. Agarwal et al.

Properties of the solution. Observe that QAijR = Aij ∀i, j where

Q =[

q12 q12q21 q22

]R =

1q11q22 − q12q21

q11 0 q21 0 0 00 q11 0 q21 0 0

q12 0 q22 0 0 00 q12 0 q22 0 00 0 0 0 q11 q210 0 0 0 q12 q22

(25)

Hence, not knowing c(u, t) gives rise to a solution which is ambiguous up to a2 × 2 invertible transformation, and a corresponding ambiguity in the values ofthe various gradient estimates. Coupled with this is the translation ambiguity ing and h. This gives rise to a six parameter family of ambiguities in the overallsolution.

3.3 Integration

The methods described in the previous two sections give us estimates of thepartial derivatives of g(x, y), h(x, y) and β(x, y). The final step is integratingthe respective gradient fields to recover the actual values of the functions. Re-construction of a function from its first partial derivatives is a widely studiedproblem. We use an iterative least squares optimization that minimizes the re-construction error [10].

4 Experiments

All experiments were done with video sequences of 200 frames each. The syn-thetic data were generated using a combination of MATLAB and POV-Ray, apublic domain ray tracer. The real data was captured by placing a refractingobject in front of an LCD screen, and imaging the setup using a firewire camera.Figure 2 illustrates the data acquisition setup. Calculation of image derivatives is

Fig. 2. (a) Shown above is a photograph of the data acquisition system used in ourexperiments. It consists of Samsung 19” LCD display screen on the left, a Sony DFW-VL500 firewire camera on the right and a refracting object between the screen and thecamera. (b) shows a frame from the background image sequence for all our experiments.

On Refractive Optical Flow 491

0 20 40 60 80 100 120 140 160 180 200−4

−3

−2

−1

0

1

2

timec u,c

v

true cu

estimated cu

true cv

estimated cv

Fig. 3. Estimation of background motion in the uncalibrated case. This figure plotsthe true and estimated motion in the background plane. The two curves show motionalong the ξ(t) and η(t) along the u and v axis respectively. As can be seen there isvirtually no difference between the true and estimated values for ξ(t) and η(t).

very sensitive to noise. Smoothing was performed on the images using anisotro-pic diffusion [11], which has superior behavior along sharp edges as comparedto Gaussian filtering. This is important for objects that cause light rays to beinverted, which in turn causes the optical flow across the boundary to be oppo-site sign; a naive Gaussian based smoothing procedure will result in significantloss of signal. The least squares estimation step in the calibrated estimation al-gorithm was made robust by only considering equations for which the temporalgradient term It was within 85% of the maximum temporal gradient at thatpixel over time. This choice results in only those constraints being active wheresome optical flow can be observed.

The boundary of refracting objects typically have little or no optical flowvisible. This results in the refractive optical flow constraint breaking down alongthe boundary as well as certain medium interfaces. We mask these pixels out byconsidering the average absolute temporal gradient at each point in the foregro-und image plane and ignoring those pixels that fall below a chosen threshold.This results in a decomposition of the image into a number of connected compo-nents. All subsequent calculations are carried out separately for each connectedcomponent.

4.1 Results

We begin by considering a synthetic example to illustrate the performance ofour algorithm in the calibrated and the uncalibrated case. The warping functionused was T (x, y) = (xe−(x2+y2), ye−(x2+y2))� and the attenuation factor wasα(x, y) = e−(x2+y2). Figure 3 shows a comparison between the estimated andthe true motion in the uncalibrated case. Figure 4 illustrates the results of theexperiment. The estimated warp and attenuation functions are virtually iden-tical to the true warp and attenuation functions. In the uncalibrated case, theestimated motion was disambiguated by doing a least squares fit to the groundtruth for the purposes of illustration.

492 S. Agarwal et al.

Figure 5 illustrates the result of applying the refractive structure of a glasswith and without water inside it to the task of environment matting. Note thatthe region along the rim of the glass is missing. This is more so in case of theglass filled with water. These are the regions where the refractive optical flowequation breaks down. The black band in the case of the filled glass is due to acombination of the breakdown of the refractive optical flow equation along theair/water/glass interface and the finite vertical extent of the image. More resultscan be seen at http://vision.ucsd.edu/papers/rof/.

(a) α(x) (b) αc(x) (c) αu(x)

(d) T (x) (e) Tc(x) (f) Tu(x)

Fig. 4. This figure shows compares the performance of refractive structure estimationin the calibrated and the uncalibrated case. (a) and (d) show the true warp T (x)applied to a checkerboard pattern and the attenuation factor α(x). (b) and (e) showthe estimated warp and the alpha for the calibrated case, and (c) and (f) show theestimated warp and alpha for the uncalibrated case.

5 Discussion

We have introduced a generalization of the optical flow constraint, describedmethods for solving for the refractive structure of objects in the scene, andshown that this can be readily computed from images. We now comment on thelimitations of our work and directions of future work.

First, our method does not address Fresnel and total internal reflection. Thisplaces a limit on our analysis to the case where all the illumination comes frombehind the object being observed. Methods for recovering surface geometry fromspecular reflections are an active area of research and are better suited for thistask [12,13,14].

On Refractive Optical Flow 493

(a) (b) (c) (d)Without Water

(e) (f) (g) (h)With Water

Fig. 5. Results of using the refractive structure for environment matting. (a), (c) showthe true warping of a background image when an empty glass is placed in front of it,(b) and (d) show the estimated refractive structure applied to the same images. (e)and (g) show the true warping of a background image when a glass filled with wateris placed in front of it, (f) and (h) show the estimated refractive structure applied tothe same images.

Second, the presented approach, like [5,7], formulates the objective as deter-mining a plane-to-plane mapping, i.e., it only informs us about how the objectdistorts a plane in 3-space. A more satisfactory solution will be to recover howthe object distorts arbitrary light rays. We believe this is solvable using two ormore views of the background plane and is the subject of our ongoing work.

The choice of the optical kernel that resulted in the single ray model wasmade for reasons of tractability. This however leaves the possibility of otheroptical kernels that account for multiple ray models.

Our experiments were carried out using a 19 inch LCD screen as the back-ground plane. For most refracting objects, as their surface curvature increases,the area of the background plane that they project onto a small region in theforeground image can increase rapidly. If one continues to use a flat backgroundone would ideally require an infinite background plane to be able to capture allthe optical flow. Obviously that is not practical. An alternate approach is to usea curved surface as the background, perhaps a mirror which reflects an imageprojected onto it.

The current work only considers gray scale images; extensions to color ormulti-spectral images is straightforward. There are two cases here: if the distor-tion T is assumed to be the same across spectral bands, the generalization canbe obtained by modeling α(x) as a vector valued function that accounts for at-tenuation in each spectral band. In case T is dependent on the wavelength oflight, each spectral band results is an independent version of Equation (12) andcan be solved using the methods described.

494 S. Agarwal et al.

Finally, the refractive optical flow equation is a very general equation de-scribing optical flow through a distortion function. This allows us to addressdistortion not due to transmission through transparent objects, but also due toreflection from non-planar mirrored and specular surfaces. We believe that theproblem of specular surface geometry can be addressed using this formalism,and this is also a subject of our future work.

Acknowledgements. The authors would like to thank Josh Wills and KristinBranson for helpful discussions. This work was partially supported under theauspices of the U.S. DOE by the LLNL under contract No. W-7405-ENG-48 toS.B. and National Science Foundation grant IIS-0308185 to D.K.

References

1. Irani, M., Rousso, B., Peleg, S.: Computing occluding and transparent motion.International Journal of Computer Vision 12 (1994) 5–16

2. Ju, S.X., Black, M.J., Jepson, A.D.: Skin and bones: Multilayer, locally affine,optical flow and regularization with transparency. In: CVPR 1996. (1996) 307–314

3. Levin, A., Zomet, A., Weiss, Y.: Learning to perceive transparency from the sta-tistics of natural scenes. In: Neural Information Processing Systems, 2002. (2002)

4. Schechner, Y., Kiryati, N., Shamir, J.: Blind recovery of transparent and semire-flected scenes. In: CVPR 2000. Volume 1., IEEE Computer Society (2000) 38–43

5. Chuang, Y.Y., Zongker, D.E., Hindorff, J., Curless, B., Salesin, D.H., Szeliski, R.:Environment matting extensions: Towards higher accuracy and real-time capture.In: Proc. of ACM SIGGRAPH 2000, ACM Press (2000) 121–130

6. Wexler, Y., Fitzgibbon, A.W., Zisserman, A.: Image-based environment matting.In: Proc. of the 13th Eurographics workshop on Rendering. (2002)

7. Zongker, D.E., Werner, D.M., Curless, B., Salesin, D.H.: Environment mattingand compositing. In: Proc. of ACM SIGGRAPH 99, ACM Press (1999) 205–214

8. Murase, H.: Surface shape reconstruction of a nonrigid transparent object usingrefraction and motion. IEEE. Trans. on Pattern Analysis and Machine Intelligence14 (1992) 1045–1052

9. Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artificial Intelligence 17(1981)

10. Horn, B.: Robot Vision. MIT Press (1986)11. Perona, P., Malik, J., Shiota, T.: Anisotropic Diffusion. In: Geometry-Driven

Diffusion in Computer Vision. Kluwer, Amsterdam (1995) 73–9212. Savarese, S., Perona, P.: Local analysis for 3D reconstruction of specular surfaces.

In: CVPR 2001. (2001) 738–74513. Savarese, S., Perona, P.: Local analysis for 3D reconstruction of specular surfaces:

Part II. In: ECCV 2002. (2002)14. Oren, M., Nayar, S.: A theory of specular surface geometry. IJCV 24 (1997)

105–124