unsupervised image translation - people
Post on 03-Jun-2022
6 Views
Preview:
TRANSCRIPT
Appears in: Proc. International Conference on Computer Vision, October 2003.
Unsupervised Image Translation
Romer Rosales, Kannan Achan, and Brendan FreyProbabilistic and Statistical Inference Laboratory
University of Toronto, Toronto, ON, Canada{romer,kannan,frey}@psi.toronto.edu
AbstractAn interesting and potentially useful vision/graphics taskis to render an input image in an enhanced form or alsoin an unusual style; for example with increased sharpnessor with some artistic qualities. In previous work [10, 5],researchers showed that by estimating the mapping froman input image to a registered (aligned) image of the samescene in a different style or resolution, the mapping couldbe used to render a new input image in that style or res-olution. Frequently a registered pair is not available, butinstead the user may have only a source image of an un-related scene that contains the desired style. In this case,the task of inferring the output image is much more diffi-cult since the algorithm must both infer correspondencesbetween features in the input image and the source im-age, and infer the unknown mapping between the images.We describe a Bayesian technique for inferring the mostlikely output image. The prior on the output image P (X )is a patch-based Markov random field obtained from thesource image. The likelihood of the input P (Y|X ) is aBayesian network that can represent different renderingstyles. We describe a computationally efficient, probabilis-tic inference and learning algorithm for inferring the mostlikely output image and learning the rendering style. Wealso show that current techniques for image restoration orreconstruction proposed in the vision literature (e.g., im-age super-resolution or de-noising) and image-based non-photorealistic rendering could be seen as special cases ofour model. We demonstrate our technique using severaltasks, including rendering a photograph in the artistic styleof an unrelated scene, de-noising, and texture transfer.
1 Introduction and Related WorkWe pursue a formal method for modifying image statisticswhile maintaining image content. By appropriately manip-ulating image statistics, one could for example, improveimage clarity or modify the image appearance into a moreconvenient, preferable, or useful one. For instance, one ap-plication of our approach is demonstrated in Fig. 1. Weuse a known painting (top) to specify the attributes that wewould like to transfer to our input image (middle). Ouralgorithm infers the latent image (bottom), which displaysthestyle specified by the painting.
Many fundamental problems in image processing arespecific cases of the above problem. In image de-noisingone seeks to removeunwanted noise from a given image toachieve a visually clearer one. In image super-resolution,given a low-resolution image, the goal is to estimate a high-resolution version of that same image. More generally, inimage restoration we seek to discover what the original im-age looked like before it underwent the effect of some un-known (or partially known) process.
In this paper, we will maintain the scope of this prob-lem general. Thus, we refer to our method as that ofimage translation, suggesting a similar process to that of
language translation (despite their clear conceptual differ-ences). Besides approaching the aboverestoration prob-lems, our method allows, in general, to intervene in theimage properties. For example, one could try to increase
Figure 1:Imageµ, Starry Night by van Gogh, is used to specifythe source patches (top), input imageY (center), inferred latentimageX (bottom).
1
the image photorealism (e.g.,relative to an originally non-photorealistic version) or change the artistic appearance ofimages (e.g.,change a painting style into another).
A variety of recent and past research work have beenproposed to approach specific problems from those men-tioned above,e.g.,[9, 26, 5, 10, 16, 23, 28, 24, 21]. Fromthese, our approach is most closely related to [5, 26, 9]since the joint distribution of the latent image (i.e.,imageto be estimated) is represented as a Markov Random Field(MRF) with pairwise potentials. However, there are signif-icant differences, in our work (including the latent imagerepresentation). The primary differences pertain to:
(1) the inclusion ofunknown transformation functionsthat relate (probabilistically) latent and observed imagepatches. Thus, the joint distribution of the model is dif-ferent from that of [5, 26, 9], where we can think of thesefunctions as known, as explained in Sec. 5. Moreover, herewe allow for multiple transformation functions. We alsoshow how these transformations can be estimated, insteadof assuming that they are given.
(2) the conditional probability distribution of a patch inthe latent image given a patch in the observed image can-not be estimateddirectly. In our case this distribution isun-known. In previous work, it was assumed that for training,an original and modified version of the source or exampleimages were given (supervised learning). In our work wedrop this assumption, we only have patches from a sourceimage with different properties than the input image. Thisdifference is related to the first one, since knowledge ofthe transformations would make this conditional probabil-ity partially (or fully) known.
(3) the derivation of new algorithms for inference. Theestimation of new parameters, the use of different condi-tional independence assumptions, and the incorporation ofnew hidden random variables make estimation and infer-ence a more complex task.
In the graphics literature, our approach is also relatedto [10], in the area of non-photorealistic rendering (alsorelated to [3, 4]). In [10], a user presents two (source) im-ages with the same content and aligned, but with two dif-ferent styles. Given a new (input) image in one of the abovestyles, the system then transforms it into a new image dis-playing the other image style. A nearest neighbor algo-rithm is then used to match image features/pixels locally(similar to [3, 21]) and globally. Excellent results wereobtained using this method. However, full supervision isrequired, since the user has to present two well aligned im-ages displaying the desired relationship.
One common disadvantage of previous methods is thatfrequently a registered image pair is not available, but in-stead the user may only have the input image and also asource image of an unrelated scene that contains the ap-propriate style. In this case, the task of inferring the out-put image is much more difficult, since the algorithm mustboth infer correspondences between features in the inputimage and the source image, and infer the unknown map-ping between the images. We propose a novel approach tosolving this problem.
We formalize the problem and solution using a prob-abilistic approach, in the context of graphical models[20, 6]. The full joint distribution in our model is repre-sented by a chain graph [15], a class of graph-representeddistributions that is a superset of Bayes networks andMarkov random fields [20]. In our chain graph, part of
the nodes are associated to image patches which are to beinferred. Also, a set of patch transformations is used to en-force consistency in the transformation used across the im-age. These transformations are estimated by our algorithm,thus enabling us to discover transformations that relate theobserved and estimated image.
We cast this problem into an approximate probabilisticinference/learning problem and show how it can be ap-proached using belief propagation and expectation maxi-mization (EM). Our image translation method is also ap-pealing because of its generality. Most of the above ap-proaches for image reconstruction or transformation canbe seen as instances of the approach presented here, as willbe discussed later.
2 Image Translation as Probabilistic Infer-ence
We will now introduce the image translation problem.First, the problem is presented at an intuitive level. Despiteseveral simplifications, the simple description in this sec-tion may become helpful later in the paper. We then makethese ideas more precise by introducing a formal mathe-matical formulation.
2.1 Overview: Informal Problem DescriptionAn intuitive way to summarize the image translation prob-lem in the context proposed in this paper is to think of thetask of finding an imageX that satisfies certain inter-patch(e.g.,smoothness) and within-patch (e.g.,contrast) proper-ties and that produces the observed imageY after everypatch undergoes one of several transformations.
In contrast with previous work, we do not want to as-sume that we know in advance these patch transformations.Also, we will most likely not be able to satisfyexactly allthe above properties for the new imageX . Thus, we con-sider a probabilistic approach where given an original im-ageY, we try to construct a probable imageX (1) whichis formed by taking sample patches from a patch data-set(or dictionary), (2) whose patches are constrained to ap-proximately satisfy certain local inter-patch properties, and(3) whose patches arerelated to corresponding patches inthe original imageY by one or more unknown but lim-ited patch transformations. We would also like to esti-mate these patch transformations, the degree of certaintywe should have in these transformations for each patch,and how probable a reconstructed image is with respect toanother,e.g.,a posterior marginal probability distributionoverX .
2.2 Definitions and SetupWe consider image representations based on a set of local,possibly overlapping, patches defined simply as a set ofpixels. LetY ∈ IY be an image formed by a set of patchesyp, with p ∈ P , P = {1, .., P} andyp ∈ �S . In thispaper,S is the number of pixels in each patch (this assumesone real value per pixel; however S could also account forrepresentations using multiple channels)1. We will call Ythe input or observed image. Consider also a latent imageX ∈ IX , with patchesxp ∈ �T ; X will be the image to beestimated, and therefore it is unknown.
1In general,yp does not have to represent pixels, but for example,filter responses
Let µ ∈ IX denote a known image (or concatenation ofimages), here simply referred to as the source or dictionaryimage. Assume that the set of patches inµ are a represen-tative sample which possesses the patch statistics that wewishX to display.
Consider a set of patch transformations (i.e.,patch trans-lators)Λ = {Λl}l={1,...,L}, whereΛl : �T → �S . Inour model, the task of a singleΛl is to transform a latentpatch into an observed patch. These transformations areinitially unknown, and we will try to estimate them. Es-timating them intuitively accounts for discovering the setof ’painters’ (or styles) that were used to paint imageYfrom an imageX (note that we would like to achieve theinverse process). There can be multiple painters, perhapseach of them specializing in a particular image transfor-mation. The finite random variablelp will represent theindex of the transformation employed to transform patchxp; l = (l1, ..., lP ) is thus the vector with the indices of thepatch transformations for every patch inX .
We will consider another class of transformations, topo-logical transformations, that can perform, for example,horizontal and vertical patch translation. These will beused as follows: a patch fromµwill be moved horizontallyand vertically to be positioned at a new location in imageX . This class of 2D transformations (2D translations) aredefined to be the finite set of sparse permutation matricesΓ, in a manner similar to [7]. In our case, each element isa matrixΓk of dimensionsS × |µ|, with |µ| the number ofpixels in the dictionary imageµ. Thus, for simplicity, inthe future the dictionary imageµ is represented as a long1D vector formed by stacking all the patches. We then re-strict Γk to be a binary sparse permutation matrix of theform: [
0 ... 0 1 0 0 0 ... 00 ... 0 0 1 0 0 ... 00 ... 0 0 0 1 0 ... 0
], (1)
which only accounts for copying and translating a groupof neighboring pixels by an integer number of pixels hori-zontally and vertically. This restriction is not necessary, itis straightforward to consider other classes of transforma-tions such as rotation or shearing, using the same represen-tation. We use the random variablet to denote thet − thelement of the setΓ, andt = (t1, ..., tP ) to represent thetopological transformations for all patches in the image.
2.3 Probabilistic ModelWe defined our model to have joint probability distribu-tion represented by the chain graph of Fig. 2, which canbe factorized as the product of local conditional prob-abilities associated with the directed edges and positivepotential functions associated with the undirected edges[15, 20, 6, 14]:
p(Y,X , l, t|Γ,Θ, µ) =∏p∈P
p(yp|lp, tp,Γ,Θ, µ)
∏p∈P
P (tp|xp)∏p∈P
P (lp)1Z
∏c∈C
ψc(xp∈c), (2)
with {c ∈ C} denoting the set of latent image patches thatbelong to cliquec in the MRF at the upper layer in Fig. 2,Cthe set of cliques in the MRF sub-graph, andψc the cliquepotential functions.
xj
tj
yj
lj
xi
ti
yi
li
xk
tk
yk
lk
xp
tp
yp
lp
xq
tq
yq
lq
Figure 2: Chain graph representing the model joint probabilitydistribution.
In this paper, every image patchyp follows a Gaussiandistribution given the patch transformation indexl p and thetopological transformationtp, with the conditional inde-pendence properties as shown in the graph in Fig. 3:
p(yp|lp, tp,Γ,Θ, µ) = N (yp; Λlp(Γtpµ),Ψp), (3)
whereΘ = {Λ,Ψ} is used to succinctly denote the distri-bution parameters andΨ = {Ψp}p∈P . This is representedin Fig. 3, a sub-graph taken from the full graph in Fig. 2,representing local conditional independences for each ob-served image patch. For this paper, we setΛi to be a lineartransformation2. This is not a restriction of our framework,it may be advantageous to also allow non-linear transfor-mations and they could also be incorporated. However,even with linear transformations, the probabilistic modelas a whole is non-linear.
In our case, we considerp(x) to be a pairwise MRF, thusevery cliquec only contains two patchesc1 andc2. Thuswe simply have:
ψ(xc) ∝ ed(xc1 ,xc2)/2σ2(4)
We will defined as a square distance function; thusψ de-fines a Gaussian MRF. However,d is computed only onoverlapping areas in the associated patches, in a way simi-lar to [5].
In order to link our discrete transformation random vari-able with the continuous latent random variableX , we usethe deterministic relationship:
p(tp|xp) ={
1 if xp = Γtpµ0 otherwise. (5)
This is equivalent to saying that a transformationtp willhave conditional probability one, only if its patchxp isequal to the patch taken from the dictionaryµ using trans-formationtp (here we assume that the setΓ is such that itproduces unique patches).
2Since the inferred patches are not the result of a linear function ofthe observed image patches, this is different than simply transforming theimage patches linearly. Instead, by this we are imposing a constraint ontheflexibility that each of theL transformations is allowed to have.
xjtjyj
lj
Figure
3:S
ub-graphrepresenting
localconditional
indepen-dences
foreach
observedim
agepatch.
The
variableµ
isfixed
forallpatches
andit
isusually
givenin
theform
ofthe
sourceim
age(s).
3A
lgorithms
forL
earning/InferenceLetus
analyzethe
systemassociated
tothe
chaingraph
inF
ig.2.F
irstwe
cansee
thatitcontainsan
undirectedsub-
graphw
ithloops.
Even
ifallthem
odelparameters
Θw
ereknow
n3,inference
(computing
theconditionaldistribution
overtheim
ageX,given
theim
ageY)
isstillcom
putation-ally
intractablein
general(inthe
same
way
asM
AP
esti-m
ationis).
More
specifically,thisproblem
hascom
plexityO
(|K| |P|),w
ith|K|thenum
berofpossiblestates
thateachpatchx
pcan
take.Learning
them
odelparameters
isalso
intractable,ascan
beseen
fromcom
putingthe
derivativesof
Eq.
2w
ithre-
spectto
thevariables
ofinterest.H
owever,there
existap-proxim
ationalgorithm
s;one
ofthem
,based
onalternat-
ingoptim
izations[1],
isE
xpectationM
aximization
(EM
).E
Mrequires
computing
posteriordistributionsoverlandt,
thatin
turnrequires
computing
conditionalmarginaldis-
tributionsfor
eachnode
ofthe
undirectedportion
ofour
chaingraph.
As
we
haveseen,
thisis
computationally
in-tractable.
Thus,
itseem
sthe
keyproblem
isto
compute
theconditionalm
arginaldistributionoverthe
latentpatchesx
p .In
thefollow
ingsection
we
explainhow
thisis
doneusing
anapproxim
ation.E
venthough
learningcan
beseen
asan
instanceof
in-ference,w
edivide
thissection
into(1)
inferringthe
latentim
age(usually
referredto
asinference)
and(2)
estimating
them
odelparameters
(usuallyreferred
toas
learning).
3.1Inference
andA
pproximate
E-step
We
assume
thereader
havesom
efam
iliarityw
iththe
EM
algorithm(seee.g.,[18,2]).T
heintractability
ofcomputing
theE
-stepexactly
canbe
seenas
follows.
Inourm
odel,theE
-stepis
equivalenttocom
putingthe
posterior:
P(l,t|Y
)∝ ∫X ∏p∈Pp(y
p |lp ,tp )P
(tp |x
p )p(X)dX
,(6)
butwe
cannotsolvethis
integral;thusw
eare
forcedto
findan
approximate
way
toperform
theE
-step.A
pproximating
patchconditional
distributions.Let
ussay
thatΛl andΨ
phave
some
value(w
ecould
initializeΛ
landΨ
pto
arandom
matrix
foreachland
p).T
hen,forevery
pw
ecan
selectthe
setKp
ofK
most
likelytopo-
logicaltransform
ationsgiven
thedictionaryµ.
This
canbe
easilydone
foreach
patchinX
onceP
(tp ,lp |y
p )(see
below)
iscom
putedby
takingthose
topologicaltransfor-m
ationsw
ithhighest
probabilityper
hiddenim
agepatch.
This
takesO(TL
)probability
evaluationsper
patchand
asorting
operationam
ongTelem
ents.T
heapproxim
ation
3more
importantly,
ifthetransform
ationindicestp w
ereknow
n.
isnecessary
forcom
putationalreasons
andcan
bem
adeas
exactas
desiredif
we
arew
illingto
paythe
associatedcom
putationalcost.T
hisapproxim
ationw
asalso
usedin
asim
ilarw
ayin
[5];it
accountsfor
cuttingoff
thelongtails
ofthedistributionP
(tp |y
p )(or
theapproxim
ationto
P(t
p |Y))
byignoring
veryunlikely
topologicaltransfor-
mations.U
singthis
succinctrepresentation,we
canthen
compute
anapproxim
ateE
-step.T
hisis
equivalenttoinferring
theim
agepatchesx
pgiven
theparam
etersso
farestimated
andthe
currentposteriordistributionsovertp .
Note
that(1)Xis
conditionallyindependentofthe
restofthem
odelvariablesgiven
tand
(2)w
ecan
compute
them
arginal-conditionaldistributions
overtp
aloneby
simply
usingP(t
p |yp )
=∑
lpP
(tp ,lp |y
p ).Inferring
thelatentim
age.We
havereduced
ourprob-
lemto
thatof
inferringa
distributionover
thestates
ofX
givena
distributionovert
pfor
allp.
One
way
toperform
thiscom
putationis
byperform
ingloopybeliefpropagation
inthe
MR
FforX
tocom
putethe
posteriorm
arginalsover
eachp(xp ).
Loopybeliefpropagation
accountsforapprox-
imating
p(xp )
byusing
thebelief
propagationm
essagepassing
updatesmi→
j[20]for
severaliterations,which
inour
modelcan
beform
allyw
rittenas
follows:
mi→
j (xj )
=∑xi =
si P
(xi |t
i )ψij (x
i ,xj )
∏k∈N(i)\
j
mk→
i (xi )
bi (x
i )=
P(x
i |ti ) ∏k∈N
(i) mk→
i (xi ),
withN
(i)the
neighborsands
ithe
candidatesforx
i .T
heseupdates
guaranteethat
uponconvergence
them
arginalprobabilitiesp(x
i |Y),
obtainedby
simply
nor-m
alizingbi (x
i ),w
ouldbe
atleast
ata
localminim
umof
thecorrespondingB
ethefree
energy[27]
ofthe
(condi-tional)M
RF.T
hedom
ainofx
pis
inpractice
discretesince
theprobability
distributionis
concentratedonly
atthecan-
didatepatches
givenbyK
lp .T
hus,everyfulliteration
hascom
plexityO(K
2).W
ithknow
ledgeofthe
(approximate)
conditionalmarginals,the
E-step
fromE
q.6is
thengiven
by:
P(t
p ,lp |yp )∝ ∑x
p
p(xp |Y
)P(t
p |xp )P
(lp )p(yp |lp ,t
p ).(7)
Once
thisis
done,we
canthen
performthe
M-step
(asex-
plainednext)
anditerate
theE
Malgorithm
asusual.
Even
thoughloopy
beliefpropagationis
notexact(clearly,sincethis
problemcannotbe
solvedexactly)
andnotguaranteed
toconverge,som
erecentw
orksupports
thisapproxim
ation[8,13,17].O
therapproximations
havealso
beensuggested
(e.g.,[25,12,9]).Inourcase
loopybeliefpropagation
seemto
provideaccurate
posteriormarginals
fortheexperim
entsperform
ednext.
3.2L
earningthe
Translation
Param
etersO
ncew
eperform
edthe
E-step,
theM
-stepoptim
izesthe
expectedvalue
ofthe
model
jointdistribution
underP
(tp ,lp |y
p )w
ithrespect
tothe
modelparam
etersΛ
land
Ψp .
This
canbe
doneby
computing
firstderivatives,
asin
aM
AP
estimate
setting.F
orlinear
transformation
func-tions,w
ecan
obtaina
closed-formsolution
forthe
optimal
values of the parameters:
Λl =
∑p
∑tpP (tp, lp = l|yp)yp(Γtpµ)�∑
p
∑tp,lp
P (tp, lp|yp)(Γtpµ)(Γtpµ)�(8)
Ψp =1Zp
∑tp
∑lp
P (tp, lp|yp)
(yp − Λlp(Γltµ))(yp − Λlp(Γltµ))�, (9)
with Zp =∑
tp
∑lpP (tp, lp|yp).
For non-linear transformation functions, we need to usenon-linear function optimization, such as gradient meth-ods. In any case, the gradient can be computed efficiently.
In summary, learning/inference can be done exactlyonly up to inferring the marginal distribution of the latentpatches (a problem which is in NP). One way around thisproblem is to approximate these marginal distributions anduse them to update the model parameters.
4 Experimental ResultsWe illustrate the image translation method in diverse tasks.Using our method, these tasks can all be seen as instancesof the same problem. In each task, between four andseven transformationsL were chosen. The dimension-ality of the latent and observed image patches has beenreduced in dimensionality by 25% using Principal Com-ponent Analysis. This is mainly of numerical concernsince even a20 × 20 patch will generate a vector of di-mensionality 400, for which it is hard to compute statis-tics using finite precision computations and small datasets.All the patches considered in these experiments are squarepatches. The overlap between neighboring patches used inEq. 4 to compute the clique potentials is set to four pix-els deep. We use the luminance value (from the YIQ colorspace) as our representation for each pixel (instead of itsRGB values) [10]. For color images, the color compo-nents (IQ) are then simply copied to the final estimated im-age. Most images can be (much) better perceived directlyfrom the computer screen due to resolution / space limita-tions (these and more tests are also available athttp://www.psi.toronto.edu).
4.1 De-blurring / De-noisingIn our first experiment, we strongly down-sampled a pho-tographic image as seen in Fig. 4(left). This will be our ob-served imageY. We used four transformations and15×15patches for both observed and latent image. Our goal is toobtain a higher-resolution version of the observed image.Thus, as our dictionary, we used photographic patches withthe desired resolution (female face photos). The result isshown in Fig. 4(right). The system was able to infer thehigh-resolution patches significantly well given the infor-mation present in the considerably degraded input image(also, see video for MAP estimates at each iteration). Inprevious work’s experimental evaluation, it is rare to seetests performed with this level of degradation in the inputimage. Note that our model is meant to solve the more gen-eral task of learning arbitrary convolution kernelsi.e.,notjust de-blurring kernel.
4.2 Photo-to-Line-Art TranslationNow the goal is to make our input image acquire the line-art attributes of an unrelated source image. For this task we
Figure 4:Input image (left) and inferred image (MAP)(right).
Figure 5: Source line art examples: engraving from Gus-tave Dore’s illustrations ofDon Quixote, stippling by ClaudiaNice[19], andCupid [11].
used patches of size15 × 15 andL = 5. We chose threesource examples, shown in Fig. 5, and applied them to theimage in Fig. 6(left). The results show that our method issuitable for performing this task also. On the positive side,note that even larger scale properties are accurately dis-played by the corresponding inferred images; this providesexcellent line consistency across the image. On the nega-tive side, the algorithm has some difficulty in regions withstrong depth discontinuities, such as the outer face con-tour, perhaps because it does not count with patches withenough detail in the source image.
4.3 Photo-to-Paint Translation
We now apply renowned artistic styles to the input im-ages. We use an image of a well-known painting by vanGogh, shown in Fig. 1(top), as the image with the desiredattributes, and the photo in Fig. 1(center) as the input. Weused L=4 and a30 × 30 patch size. The inferred image inFig. 1(bottom) inherited the local patch statistics and alsoglobal features found in the source painting. We repeatedthe experiment using the same painting but different inputimage; results are shown in Fig. 7. This image also ac-quired the source style. However, Lena’s eyes could not beinferred with enough accuracy (or pleasing artistic detail),perhaps also because of the lack of patches with the cor-rect properties in the original painting. We also employedpaintings with different properties, on example is shown inFig. 8. In order to more carefully observe the details in theimage translation, zoomed-in areas of previous examplesare shown in Fig. 9.
Figure 6:Input imageY (left) and inferred images using the three source line art example images in Fig. 5 (in same order)
Figure 8:Source imageµ, Cistern in the Park at Chateau Noir by Cezanne (left), input imageY, shore photo by John Shaw[22] (center),inferred imageX (right).
5 Relation to Previous Methods
We now briefly compare our model with several relevantapproaches, proposed to solve more specific problems.The model in [5], was proposed to perform image super-resolution. Using our framework, this approach can be ob-tained by (1) settingL = 1, i.e.,use only one transforma-tion and (2)Λ1 equal to a fixed and known low-pass filter;in [5] there is no need to estimateΛ because it is given. An-other important difference is that the input to this algorithmis different than the input to our algorithm. This model as-sumes that there exist a set of image pairs for training. Eachpair consists of the high and the low resolution version ofan image (the low-res version can be simply generated us-ing the low-pass filter transformation). Our algorithm doesnot assume that we have access to these image pairs fortraining, but that we only have access to one image withthe desired statistics. This a much more difficult computa-tional and modeling task which also has important practi-cal implications: it is more restrictive to have to find a pairof perfectly aligned images, one with the desired statistics(e.g.,style) and the other one obtained in the same mode asthe input image we hold.
The model presented in [10], proposed for non-photorealistic rendering is also, like [5], a model where itis assumed that there exist a set of image pairs for train-ing, with the statistics ofY andX . Λ is not given, but inthe supervised approach taken in this work,Λ can be easilyfound. More specifically,Λ is the equivalent of a nearestneighbor algorithm, which is easy to compute (given thetraining image pair). Large scale consistency was achieved
using also a nearest neighbor approach, which in our modelis equivalent to learning the distribution overX from oursample images and using it to define the MRF energy func-tion (i.e.,effectively replacing Eq. 4)
Inference in [9] can be seen as inference in our modelusing only the MRF sub-graph, with observationsyp di-rectly linked to unobserved nodesxp. Also, instead of be-lief propagation, in [9] Monte Carlo techniques were pro-posed to infer the hidden state of the image patchesxp.
6 ConclusionsThe image translation approach proposed here provides ageneral formalism for the analysis of a variety of problemsin image processing where the goal is to estimate an unob-served image from another (observed) one. These includea number of fundamental problems previously approachedusing separate methods. In this sense, the image transla-tion approach can be of practical and theoretical interest.Several practical extensions are possible, for example inthe choice of a different approximate inference method, inthe choice of clique potential functions used in the MRF,or in the form of the patch transformations. These changesare likely to be application dependent, and can be easilyincorporated within the framework presented here.
AcknowledgmentsWe thank Aaron Hertzmann for helpful discussions onnon-photorealistic rendering and for evaluation of results.
Figure 7:Input imageY (top), inferred latent imageX (bottom).
Figure 9:Two zoomed-in translations from previous examples:source image (real painting) (left) and inferred image (right); in-put images (not shown here) were the example landscape photos.
References[1] I. Csiszar and G. Tusnady. Information geometry and alternat-
ing minimization procedures.Statistics and Decisions, 1:205–237,1984.
[2] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estima-tion from incomplete data.Journal of the Royal Statistical Society(B), 39(1), 1977.
[3] A. Efros and W. Freeman. Quilting for texture synthesis and trans-fer. In International Conference on Computer Vision, 1999.
[4] A. Efros and W. Freeman. Texture synthesis by non-parametricsampling. InSIGGRAPH, 2001.
[5] B. Freeman, E. Pasztor, and O. Carmichael. Learning low-levelvision. IJCV, 2000.
[6] B. Frey. Graphical Models for Machine Learning and Digital Com-munication. MIT Press, 1998.
[7] B. Frey and N. Jojic. Learning graphical models of images, videosand their spatial transformations. InUncertainty in Artificial Intel-ligence, 2000.
[8] B. Frey and D. MacKay. A revolution: Belief propagation in graphswith cycles. InAdvances in Neural Information Processing Systems,volume 10. The MIT Press, 1998.
[9] D Geman and S Geman. Stochastic relaxation, gibbs distributionand bayesian restoration of images.IEEE Trans. PAMI, 6(6):721–741, 1984.
[10] A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, and D. Salesin. Im-age analogies. http://www.mrl.nyu.edu/projects/image-analogies/.In SIGGRAPH 2001., pages 327–340, 2001.
[11] A. Hertzmann and D. Zorin. Illustrating smooth surfaces. InSIG-GRAPH 2000., pages 517–526, 2001.
[12] M. Jordan, Z. Ghahramani, T.Jaakkola, and L. Saul. An introductionto variational methods for graphical models.Learning in GraphicalModels, M. Jordan (editor), 1998.
[13] F. Kschischang and B. Frey. Iterative decoding of compound codesby probability propagation in graphical models.IEEE J. SelectedAreas in Communications, 16:219–230, 1998.
[14] F. Kschischang, B. Frey, and H. Loeliger. Factor graphs and thesum-product algorithm.IEEE Transactions on Information Theory,2001.
[15] S. L. Lauritzen and N. Wermuth. Graphical models for associationsbetween variables, some of which are qualitative and some quanti-tative. Annals of Statistics, 17:31–57, 1989.
[16] S. Li. Markov Random Field Modeling in Computer Vision.Springer-Verlag, 1995.
[17] R. McEliece, D. MacKay, and J. Cheng. Turbo decoding as an in-stance of pearl’s belief propagation algorithm.IEEE J. SelectedAreas in Communications, 16, 1998.
[18] R. Neal and G. Hinton. A view of the em algorithm that justifies in-cremental, sparse, and other variants.Learning in Graphical Mod-els, M. Jordan (editor), 1998.
[19] C. Nice. Sketching Your Favorite Subjects in Pen and Ink. F-WPublications, 1993.
[20] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufman, 1988.
[21] A. Pentland and B. Horowitz. A practical approach to fractal-basedimage compression.Digital Images and Human Vision, 1993.
[22] F. Petrie and J. Shaw.The Big Book of Painting Nature in Water-color. Watson-Guptill, 1990.
[23] J. Portilla and E. Simoncelli. Statistics of complexwavelet coeffi-cients.International Journal of Computer Vision, 40(1), 2000.
[24] R. Schultz and R. Stevenson. A bayesian approach to image expan-sion for improved definition.IEEE Transactions on Image Process-ing, 3(3):233–242, May 1994.
[25] M. Wainwright, T. Jaakkola, and A. Willsky. A new class of upperbounds on the log partition function. InUncertainty in ArtificialIntelligence, 2002.
[26] Y. Weiss. Interpreting images by propagating bayesian beliefs.In Advances in Neural Information Processing Systems, volume 9,page 908, 1997.
[27] J. Yedidia. An idiosyncratic journey beyond mean field theory.Ad-vanced Mean Field Methods, pages 21–36, 2001.
[28] S. Zhu and D. Mumford. Prior learning and gibbs reaction-diffusion. IEEE Trans. PAMI, 1997.
top related