example-based style synthesis - tau

8
Example-Based Style Synthesis Iddo Drori Daniel Cohen-Or Hezy Yeshurun School of Computer Science Tel Aviv University Tel Aviv, Israel, 69978 Abstract We introduce an example-based synthesis technique that extrapolates novel styles for a given input image. The tech- nique is based on separating the style and content of image fragments. Given an image with a new style and content, it is first adaptively partitioned into fragments. Stitching to- gether novel fragments produces a coherent image in a new style for a given content. The aggregate of synthesized frag- ments approximates a globally non-linear model with a set of locally linear models. We show the result of our method for various artistic, sketch, and texture filters and painterly styles applied to different image content classes. 1. Introduction Images and paintings can be regarded as a composition of content and style. An illustrative example is shown in Figure 1, where the four images on the top left are ex- amples consisting of two pairs of images which have the same “content” and two pairs with the same “style”. Let p i represent the different underlying content of each row i =1,...,m, and f j the different style operator of each column j =1,...,n. Then the m × n matrix of images is A =(a ij )= f j (p i ). Problem statement: Given the sub-matrix of images A(1 : m - 1, 1: n - 1) as a small training set, and the input image A(m, n) shown on the lower right of Fig- ure 1, which has a different content and a different style, the task of the algorithm is to complete the remaining images A(m, 1: n - 1) and A(1 : m - 1,n). Loosely speaking we would like to synthesize a new image that contains the same content but with novel styles. In particular, the goal is to complete the image matrix and synthesize images with the same content as the input image, with the styles of the images of the training set. Synthesizing images in various styles by example is re- lated to the problem known as image analogies [20]: given Figure 1. A training set of painterly rendered images and an input from a Still Life by Paul Cezanne form an image matrix. Each column consists of various styles of the same image, and each row results from applying the same style to different images. The goal is to syn- thesize the remaining images. training images f 1 (p 1 ),f 2 (p 1 ) in different styles, and an- other example image in one of the known styles, f 1 (p 2 ), the goal is to synthesize the example in the other known style f 2 (p 2 ). In contrast, we deal with a two-factor prob- lem, since the image f n (p n ) has both unknown content p n = p 1 , p 2 and unknown style f n = f 1 ,f 2 . Notice that once any of the images in the image matrix is synthesized, the remaining images can then be generated by analogy. In image analogies [20], the training set is used as a look-up table, which shows explicitly the mapping between styles of a given pixel in a given local content. Here we take a dif-

Upload: others

Post on 30-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Example-Based Style Synthesis - TAU

Example-Based Style Synthesis

Iddo Drori Daniel Cohen-Or Hezy Yeshurun

School of Computer ScienceTel Aviv University

Tel Aviv, Israel, 69978

Abstract

We introduce an example-based synthesis technique thatextrapolates novel styles for a given input image. The tech-nique is based on separating the style and content of imagefragments. Given an image with a new style and content, itis first adaptively partitioned into fragments. Stitching to-gether novel fragments produces a coherent image in a newstyle for a given content. The aggregate of synthesized frag-ments approximates a globally non-linear model with a setof locally linear models. We show the result of our methodfor various artistic, sketch, and texture filters and painterlystyles applied to different image content classes.

1. Introduction

Images and paintings can be regarded as a compositionof content and style. An illustrative example is shown inFigure 1, where the four images on the top left are ex-amples consisting of two pairs of images which have thesame “content” and two pairs with the same “style”. Letpi represent the different underlying content of each rowi = 1, . . . ,m, andfj the different style operator of eachcolumnj = 1, . . . , n. Then them × n matrix of images isA = (aij) = fj(pi).

Problem statement: Given the sub-matrix of imagesA(1 : m − 1, 1 : n − 1) as a small training set, andthe input imageA(m,n) shown on the lower right of Fig-ure 1, which has a different content and a different style, thetask of the algorithm is to complete the remaining imagesA(m, 1 : n − 1) andA(1 : m − 1, n). Loosely speakingwe would like to synthesize a new image that contains thesame content but with novel styles. In particular, the goalis to complete the image matrix and synthesize images withthe same content as the input image, with the styles of theimages of the training set.

Synthesizing images in various styles by example is re-lated to the problem known asimage analogies[20]: given

Figure 1. A training set of painterly renderedimages and an input from a Still Life by PaulCezanne form an image matrix. Each columnconsists of various styles of the same image,and each row results from applying the samestyle to different images. The goal is to syn-thesize the remaining images.

training imagesf1(p1),f2(p1) in different styles, and an-other example image in one of the known styles,f1(p2),the goal is to synthesize the example in the other knownstyle f2(p2). In contrast, we deal with a two-factor prob-lem, since the imagefn(pn) has both unknown contentpn 6= p1, p2 and unknown stylefn 6= f1, f2. Notice thatonce any of the images in the image matrix is synthesized,the remaining images can then be generated by analogy. Inimage analogies [20], the training set is used as a look-uptable, which shows explicitly the mapping between stylesof a given pixel in a given local content. Here we take a dif-

Page 2: Example-Based Style Synthesis - TAU

ferent approach: the training set is used to generate a modelthat factors style and content, and then novel images aresynthesized based on the computed model. Factorization ofstyle and content has been applied successfully to alignedimages [17, 29]. However, in our setting there is no natu-ral correspondence between the training images of differentcontent, as shown in the rows of Figure 1, and no naturalcorrespondence with the given input image, and thus theimages cannot be aligned. Moreover, the input image has adifferent content, and its style is not necessarily present inany of the examples in the training set.

Separating style and content in the general case is a dif-ficult problem. It has been successfully applied to factorssuch as illumination, that can be approximated by a linearsubspace [4]. The method introduced in this paper dealswith models which are globally non-linear. The idea is toapproximate a globally non-linear model by multiple lo-cally linear models. The input image is decomposed intoa set of overlapping fragments of various sizes, and frag-ments of the training images are considered at multiple po-sitions, scales, and orientations. At the fragment level, sim-ilar fragments are approximately locally linear. Composingthe local fragments back into a full image, approximates theglobally non-linear model by locally linear models. Thisproperty is maintained by an adaptive choice of fragments,as will be described in detail below.

Building upon recent example-based image synthesistechniques [13, 15, 20, 22, 28, 29], our approach is to adap-tively decompose the input image into fragments, and thensynthesize a coherent image by stitching together synthe-sized fragments that follow multiple constraints. Takingan adaptive approach captures features at multiple scalesand avoids drastic blocking artifacts. It should be empha-sized that this work takes into account image content thatincludes both stationary and local regions such as textures,as well as non-stationary and global information present ingeneral images. However, our approach is limited to captur-ing styles that are local, and at the fragment level, the modelis simple and fixed.

2. Previous work

Many operations ranging from low-level vision tasks tohigh-end graphics applications have been efficiently per-formed based on examples [3, 6, 8, 9, 10, 15, 16, 20, 21,27, 29].

The idea of defining and factoring image style and con-tent using a bilinear model was introduced to computer vi-sion by Freeman and Tenenbaum [17]. Given a training setof aligned face images of different people under varying il-lumination, Tenenbaum and Freeman [29] refer to the iden-tity of a face as content whereas the illumination is consid-ered as style. Style and content are factored pixel-wise by

fitting a symmetric bilinear model to the training images.Combinations of style and content are synthesized from anew face image under novel illumination by changing theestimated parameters of the new image.

Given a set of parameters describing facial expressionssuch as degree of happiness or surprise, along with a rela-tive motion field of the face, Beymeret al. [5, 6] use a re-gression function that estimates a mapping from parametersto motion for synthesizing novel facial expressions from aset of labelled training images.

Rather than defining individual filters by hand, Hertz-mannet al. [20] automatically studied a mapping of spa-tially local filters from image pairs in pixel-wise correspon-dence, one a filtered version of the other. A new targetimage is then filtered analogously to the training images.Pixels are rendered by comparing their neighborhood, andthose of a coarser level in a multi-resolution pyramid, toneighborhoods of a training image pair. Considering theoriginal image as output and taking its segmentation map asinput allows the texture to be painted by numbers [18]. Anew image that is composed of the various textures is syn-thesized by painting a new segmentation map. Swappingthe output segmentation map with the original input imageresults in example-based segmentation [8]. In our case, wecannot only approximate a common representation for im-ages by blurring and median filtering and then proceed byanalogy, as this results in the loss of image detail.

Freemanet al. [15, 16] derive a model used for perform-ing super-resolution by example. The technique is based onexamining many pairs of high-resolution and low-resolutionversions of image patches from several training images.Baker and Kanade [3] apply this technique restrictively, toa class of face images. Given a new low-resolution faceimage its corresponding high-resolution image is inferredby re-using the existing mapping between individual low-resolution and high-resolution face patches.

The term image quilting[13] was coined for describ-ing the process of synthesizing a new image by stitch-ing together blocks of existing images. The technique fo-cuses on handling boundaries between blocks and is ap-plied to texture synthesis. It enforces a constraint that theerror between overlapping blocks is minimal, and blocksare stitched along a minimum cost path [11] that is com-puted by dynamic programming. The results depend on thesize of a block, which varies according to the texture prop-erties. Hierarchical pattern mapping [28] takes into accountthat patterns in a texture are often in many different sizes.A 3D surface is progressively covered by texture patchesof various shapes and sizes. Starting from a large patch, itis split into smaller ones based on the texture fitting errorwith already textured neighborhoods. By adding the con-straint that the synthesized texture match an example image,Ashikhmin [2] presented texture transfer, in which an exam-

Page 3: Example-Based Style Synthesis - TAU

ple image is rendered with a training texture. This techniquewas extended to color transfer [30], a special case of imageanalogies, by matching local image statistics. In our setting,we cannot simply transfer color and texture from one styleto another, as applying properties of one style over the othercould result in artifacts.

3. Image manifolds

A general mathematical perspective for describing thevariability of images is to regard an image, which is repre-sented by a vector of pixel intensities, as a point in an ab-stract image space, and then to characterize a set of imagesby a manifold in this high-dimensional image space [26].The statistics of natural images are correlated [14] and farfrom random. Recently, several researchers [4, 23] have in-dependently shown that irradiance environment maps, forrendering Lambertian objects under distant illumination,are well approximated by a nine-dimensional linear sub-space. Thus, from any given view, the appearance of anarbitrary Lambertian shape, under arbitrary distributed dis-tant illumination, is approximately near a ten-dimensionalmanifold. By adding more parameters, the dimensionalityof the manifold increases, and it becomes highly complexand non-linear, although its dimensionality remains far lessthan that of the image space.

In our work, we would like to be able to extrapolate be-tween images in parameter space by moving along such amanifold and extrapolating novel points. This requires ad-dressing three issues: (i) The manifold is complex and hasa highly non-linear global structure, (ii) The “curse of di-mensionality”; the dimensionality of the image space is toohigh, and (iii) There is no natural correspondence betweenthe images. Classical approaches such as Principle Compo-nent Analysis or Multiple Discriminant Analysis, that findprojections that best represent or separate data in a leastsquares sense, are unsuitable for our purposes since theyrequire correspondence.

Our approach is based on the observation that althoughthe manifold is globally non-linear,locally it is approxi-mately linear. This alleviates the problem of complexity ofthe manifold. It permits applying simple local linear mod-els as a good approximation of a global highly non-linearmodel. To reduce the high dimensionality of the space andto overcome the lack of correspondence, the images are de-composed into multi-scale fragments. A fragment is definedas a connected region cut from an image tile. The tiles areoverlapping and their aggregation is an over-complete rep-resentation of the image. The key idea is to apply the syn-thesis at the tile level, and in a way that the composition oftheir fragments into a full image is coherent.

4. Style synthesis

As explained in the introduction, we would like to syn-thesize novel images based on the training set. Our solu-tion is based on adaptively decomposing the input imageinto fragments and applying the synthesis locally. We con-struct a feature pyramid for each image to capture variousfeatures at multiple scales. The training set constitutes anover-complete representation, considering fragments at allscales, positions, and eight orientations. The input imageis adaptively subdivided by a quad-tree whose leaves con-sist of overlapping tiles that are traversed in Morton’s order[25]. Images are then synthesized from coarse to fine whereeach level is based on a coarser level. This captures featuresat multiple scales and accelerates the computations.

At each level, tiles are synthesized from a number of ex-ample tiles. The choice of these examples is based on multi-ple constraints, which include the underlying image geome-try. The extrapolated tiles are then cut into fragments whichtogether form a coherent tessellation of the novel image. Inthe following we elaborate on each of these stages.

4.1. Image decomposition

We adaptively decompose the input imageA(m,n) intooverlapping tiles. Our algorithm maintains a quad-tree forthe input image, where each of its nodes represents an imagetile (element), and its four children represent a partition ofthat element. The leaves of the tree correspond to elementsthat together cover the image. Each element is extended toinclude overlapping boundaries. These boundaries form theconstraints among adjacent elements which are necessaryto generate a coherent image. During image generation, el-ements are traversed in an order that preserves coherencewith previously synthesized regions.

An adaptive subdivision requires a method for decidingwhether an element should be split or not. Our initial subdi-vision of the input is based on the luminance and color val-ues across an element. If any of the absolute differences be-tween the maximum and minimum values is above a thresh-old δ, and the minimum element sizeεmin has not yet beenreached, then the element is recursively subdivided. In allthe results shown in this paper we setδ between 0.25 and0.4, andεmin to 4 by 4 and 8 by 8.

This coarse subdivision is further refined at runtime inany of the following stages: (i) Multi-dimensional search:if a suitable set of candidate tiles is not found; (ii) Fragmentsynthesis: if the model fails to accurately reconstruct (val-idate) the input tile; and (iii) Image reconstruction: if thesynthesized fragments do not agree with previously synthe-sized regions. There is a trade off here since finer elementsmeet the above conditions but do not capture large features.

Page 4: Example-Based Style Synthesis - TAU

4.2. Multi-dimensional search

An important part of the algorithm is the selection of aset of fragments which has a 2D (style and content) em-bedding that reflects the relation among the training set im-ages in the higher dimension. At the same time, the syn-thesized fragments must agree with their neighbors. More-over, in a single step, the algorithm synthesizes a set offragments, one for each style and content. Thus, a multi-dimensional search must simultaneously take into consider-ation constraints from multiple tiles and boundary values.Boundaries are crucial in capturing geometric configura-tions and maintaining consistency with previously synthe-sized regions.

A search vectoris the concatenation of components frommultiple origins, as illustrated in Figure 2. Formally, letA(i, j)l denote an image in rowi and columnj of the ma-trix, at scalel, where l = 0 is the coarsest level. LetTp,ε,o denote an extended tile around positionp, of sizeε,in orientationo, and similarly, letBp,ε,o denote the corre-sponding boundary. Each search vector is a concatenationof the boundariesB with previously synthesized fragmentsin stylesj = 1, . . . , n − 1, and tileT in style n, and thecorresponding tiles at a coarser scalel − 1, for l > 0,

Bp,ε,o(A(i, 1)l), . . . , Bp,ε,o(A(i, n− 1)l), Tp,ε,o(A(i, n)l),

T p2 , ε

2 ,o(A(i, 1)l−1), . . . , T p2 , ε

2 ,o(A(i, n)l−1). (1)

Given the vector defined in Eq. (1), with input contenti = m and upright orientation, we find its nearest neigh-bors in each content rowi = 1, . . . ,m− 1, over each posi-tion, and orientation. This enforces the same positionp andorientationo, for the same content across different stylesj = 1, . . . , n, but allows different positions and orientationsfor different contentsi = 1, . . . ,m.

T p2 , ε

2(A(i, 1)l−1) T p

2 , ε2(A(i, n − 1)l−1) T p

2 , ε2(A(i, n)l−1)

Bp,ε(A(i, 1)l) Bp,ε(A(i, n − 1)l) Tp,ε(A(i, n)l)

Figure 2. The components of a search vectorfor a single content row are highlighted. No-tice that the extension of a uniform tile cap-tures the edge.

The above multi-dimensional search is applied at multi-ple scales over several features. Our algorithm maintainsa feature pyramid, created in a pre-processing stage, foreach image in the matrixA. The feature pyramid consistsof color and luminance Gaussian pyramids, as well as fourgradient pyramids in the horizontal, vertical and two diago-nal directions, and a Laplacian pyramid, which are derivedfrom the luminance values. Features are selected accord-ing to properties of the training set and the component type.Color is effective for a rich training set, which contains alarge distribution of samples for a high dimensional search.Luminance is suitable for both the tiles(T ) and boundaries(B). In addition, gradients and the Laplacian are effectivefeatures for tiles, but not used for thin boundaries. Gradientsare computed by the Sobel operator and are correspondinglyisotropic for horizontal, vertical and diagonal edges. TheLaplacian is invariant under rotations for increments of 45degrees. In this work we experimented withL1, L2 norms,normalized correlation coefficient, and ordinal measures [7]for similarity of vectors. The normalized correlation coef-ficient is linearly invariant to contrast, and ordinal measureis too expensive for our purposes. Therefore, we use theL1

andL2 norms depending on the selected features.The search vectors can be regarded as points in multi-

dimensional space. Finding a best match is then equiva-lent to finding the nearest-neighbor. Three main strategiesof searching for nearest neighbors are pre-structuring, par-tial distances, and pruning [12]. We use partial distances,which constitute a monotonic non-decreasing function, andtest intermediate distances twice. In order to further im-prove matching performance, the search for a similar vectoris performed hierarchically from coarse to fine. The po-sition of the best matching vector is successively refinedbased on the best position found at a coarser level. Thesearch is then confined to a window that decreases expo-nentially in size while traversing the pyramid from coarseto fine. For a full pyramid, each search is logarithmic inthe number of pixels in the finest level times the numberof features. We choose the coarsest level in the search foreach vector such that the size is at leastεmin. Overall, theentire search is linear in the resolution of the input imagetimes the logarithmic factor in the number of training im-ages, their resolution, the number of orientations, and thenumber of features.

4.3. Fragment synthesis

In this section we describe the mathematical tool used tolocally factor style and content of tiles and synthesize novelfragments. We begin by briefly reviewing multilinear andbilinear forms, and factoring equations for symmetric bilin-ear forms (the reader is referred to Tenenbaum and Freeman[29] for detailed derivations).

Page 5: Example-Based Style Synthesis - TAU

A multilinear form onV n is a mappingf : V n → Fwhich is linear in each argument, that is, for allxi, x′i ∈ V ,and allλ ∈ F :

f(x1, . . . , xi + x′i, . . . , xn) = f(x1, . . . , xi, . . . , xn) +f(x1, . . . , x′i, . . . , xn)

f(x1, . . . , λxi, . . . , xn) = λf(x1, . . . , xi, . . . , xn).

The special case of a 1-form is a linear mappingf(αx +βy) = αf(x)+βf(y). A 2-form is a bilinear form. DefinedonV ×W , it is a mappingf : V ×W → F which is linearin each argument, that is, for allx, x′ ∈ V , all y, y′ ∈ Wand allλ ∈ F :

f(x + x′, y) = f(x, y) + f(x′, y)f(x, y + y′) = f(x, y) + f(x, y′)

f(λx, y) = λf(x, y) = f(x, λy).

If f : V × V → F is a bilinear form then the matrixMof f relative to the basis(v1, . . . , vn) is given bymij =f(vi, vj). If x =

∑ni=1 xivi andy =

∑ni=1 yivi then the

polynomial and matrix representations of the form are:

f(x, y) =n∑

i,j=1

xiyjmij = xtMy . (2)

A bilinear form is symmetric iff(x, y) = f(y, x) for allx, y ∈ V , and in matrix formM = M t.

As described above, a bilinear form maps two vectorsxandy by a matrixM to a scalar, and introducing subscriptsfor multiple vectors, matrices and scalars:

aijk = xtiMkyj . (3)

Now consider the inverse problem. Given only a set ofscalarsaijk, a bilinear model can be fitted by minimizingthe total squared error:

minxi,Mk,y

j

∑i

∑j

∑k

(aijk − xtiMkyj)

2. (4)

A solution consists of a set of bilinear forms, defined bymatrices{Mk}, together with sets of vectors{xi} and{yj}.

We use an iterative SVD method as presented in [29] forEq. (4). An exact solution exists when the number of vec-tors{xi} and{yi} equals their dimension. For example, forgeneral matricesMk, an exact solution exists when choos-ing each set{xi} and{yi} to be a basis. For the special caseof the standard basis, the matrices are simply defined by thescalar values themselvesMk(i, j) = aijk.

Next, we would like to estimate the two vectorsu andv,derived from the established mappings{Mk}, for anothervectorb:

bk = utMkv. (5)

The above is solved iteratively, where the initial guess forthe vectorv is the mean of the vectorsyi, and the superscriptt denotes the vector transpose of a block matrix [29]:

u = ((Mv)t)−1b v = ((M tu)t)−1b. (6)

Finally, the derivation above is applied to a style andcontent matrix of tiles. The tile positions, orientations andscales are of the boundaries and tile in scalel in Eq. (1),for the best match in each content rowi = 1, . . . ,m − 1,and the original search vector. Thekth pixel value of a tilein content rowi and style columnj is denoted asaijk. Thevectorsxi andyj determine the style and content of eachtile. The vectorb denotes the input tile, and its estimatedstyle and content parameters areu andv. In case the modelfails to reconstruct (validate) the input tile, Eqs. (5,6), wefirst extend the number of nearest neighbors considered inthe search, and then refine the input subdivision.

Having estimated the style and content parametersu, v ofthe remaining row and column, the tile matrix is completedby applyingxiMkv anduMkyj .

As explained, the multi-dimensional search is based ona number of selected features. However, after the searchfinds a set of tiles, we consider only luminance, and colorfor a sufficiently varied training set. Synthesis is performedin the perceptually-based(l, α, β) color space [24]. Thisspace minimizes the correlation between channels, and sotile synthesis is performed separately for each channel. Theresults are then transformed back to(r, g, b) values for dis-play.

4.4. Image reconstruction

Prior to synthesis, the search for a set of tiles enforcesthat they match previously synthesized regions. This is notsufficient to guarantee that the synthesized tiles match pre-viously synthesized regions as well. The error after syn-thesis is computed for the overlapping boundaries, over allfeatures, across the different styles. If it is proportional tothe error before synthesis, then we repeat the synthesis pro-cess, extending the number of nearest neighbors. If a smallnumber of nearest neighbors is exhausted, then the searchand synthesis process is refined by recursively subdividingtiles. Once a set of matching tiles is synthesized, we find theminimum cost path along the error surface of the overlap-ping boundaries by dynamic programming [13], breakingties by taking the center values.

Traversal of the entire input image guarantees a coverof the images in themth row, but not of the images inthenth column ofA. The synthesized tiles of thenth col-umn are transformed back to their original orientation, andmay appear at any image position according to the search.Thus, for each scalel, after synthesizing imagesA(m, j)l

for columnsj = 1, . . . , n− 1, the imagesA(i, n)l for rows

Page 6: Example-Based Style Synthesis - TAU

i = 1, . . . ,m− 1 are completed by extending image analo-gies. The roles of the training images and the inputs are thenreversed. The images of each rowi < m serve separatelyas input, and themth row serves as the training images.Additional modifications, compared with image analogies[20], consist of (i) adaptively decomposing the input, (ii)stitching together fragments instead of synthesizing pixels,and (iii) searching for a simultaneous match of previouslysynthesized boundaries and candidate regions, in multiplescales and across all styles as shown in Figure 2.

Since style synthesis is performed from coarse to fine wespecify the initial conditions for the search and synthesis,and the inductive step applied from each level to the next.First, if the coarsest levell = 0 in the pyramid consists ofa single tile, then the search is empty, and the remainingtiles are synthesized. Next, forl > 0, the image boundariesfor each style in the remainingmth row are initialized, andthe coarser levell − 1 serves as an initial guess for the re-mainingnth column. In particular, for different luminanceacross styles, after initializingA(m, j) with the target, lin-early matching the histograms toA(1 : m− 1, j) improvesthe result. In this case,A(i, n) is initialized with randomsamples ofA(m : n). For the same colors across styles,initializing A(i, n) with the mean ofA(i, 1 : n − 1) yieldscomparable results.

Finally, we consider two special cases: (i) A single con-tent training row; The search is performed under multipleconstraints simultaneously, finding a single fragment foreach image. Therefore instead of extrapolation, the near-est neighbor is selected. Notice that if there is no underly-ing model across the different content rows, then images inthe same style and different contents are regarded as a sin-gle image; and (ii) A single style training column; Contentrows are considered as one image, and the search finds asingle fragment, selecting the nearest neighbor.

5. Results

We have experimented with our method with varioussources and styles: (i) Artistic, sketch, and texture filtersfrom an image editing tool [1]; (ii) Painterly renderings gen-erated by applying layers of curved brush strokes [19]; and(iii) Real paintings. The filters and painterly styles wereapplied to a variety of image classes from a database ofstock photographs. The subjects of these photographs arevaried and include landscapes, buildings, people, products,and more. For all of these families of styles and image con-tent classes, our method produces visually plausible results.Computation times range from one hour (Figure 4) to tenhours (Figure 3) for synthesizing a set of 256 by 256 im-ages on a 1.8GHz PC, running a Java implementation.

The transposed matrix in Figure 3 shows natural imagesin four different artistic styles, in each row from top to bot-

tom: poster edges, fresco, paint daubs and dry brush. Theposter edges filter reduces the number of colors in an imageand accentuates edges with black lines. The fresco filter ap-plies short and rounded dabs creating a coarse style. Thepaint daubs filter paints an image with simple round brushstrokes. The dry brush filter simplifies an image by paintingedges with round brush strokes and reducing the range ofcolors.

The transposed matrix in Figure 4 shows landscape im-ages in three different sketch styles: charcoal, reticulation,and graphic-pen. The charcoal filter draws major edges inbold, while mid-tones are sketched using diagonal strokes.The reticulation filter creates an image that appears clumpedin the shadow areas and lightly grained in the highlights.The graphic pen filter uses fine linear strokes to captureimage details. Only upright orientation was considered forsynthesizing the images shown on the right column, to cap-ture the directional strokes of the graphic-pen. We compareour results with applying the filter directly to the originalimages, as shown in the bottom row of Figure 4.

The matrix in Figure 5 shows landscape images. Eachof the landscape images has a different appearance. We ap-plied three different artistic and texture styles: craquelure,sponge, and water-color. The craquelure filter paints an im-age onto a relief surface, generating cracks that follow con-tours. The sponge filter creates images with highly texturedareas of contrasting color. The water-color filter simplifiesimage details by saturating color at edges.

The matrix in Figure 6 shows images in two differentpainterly styles [19], impressionist and expressionist, anda real painting. The impressionist style applies moderatelycurved brush strokes without random jitter. The expression-ist style paints an image with elongated brush strokes withrandom color jitter. The input image is from a painting in apost-impressionist style.

6. Conclusions and future work

We have introduced a method for style synthesis that uti-lizes a training set of images. While the specific imple-mentation here is image synthesis, the main motivation isto explore the separability of content and style, which canbe viewed as a fundamental topic in image pattern analysis.Our method is based on an adaptive scheme to synthesizelocal fragments of an image by extrapolating multiple frag-ments. Future work will focus on problems with more thantwo factors, using a local multilinear model for approximat-ing highly complex factors of appearance, such as changingweather conditions.

Our approach is example-based, which means that itsperformance is dependent on the richness of the availablefragments in the training set. In addition, it is an image-based 2D method and does not incorporate high-level in-

Page 7: Example-Based Style Synthesis - TAU

Poster Edges

Fresco

Paint Daubs

Dry Brush

Figure 3. A set of natural images and artisticfilters form an image matrix. The two imagesin dry-brush on the left of the lower row, andthe three images in poster-edges, fresco, andpaint-daubs on the right column were synthe-sized by our algorithm.

formation about the underlying scene. Although we haveapplied several techniques for accelerating the search, ouralgorithm is still very slow. To speedup the search we wouldlike to apply geometric hashing, and then use larger trainingsets and consider tile deformations.

Finally, an interesting extension of this work is to 3Dsurface patches of 3D models. This will require to partitiona mesh into patches and incorporate mesh coordinates in thesynthesis process.

Acknowledgments

This work was supported by a grant from the Israeli Min-istry of Science and by a grant from the Israeli Academy ofSciences (center of excellence).

Charcoal

Reticulation

Graphic Pen

Ground Truth

Figure 4. A set of landscape images andsketch filters form an image matrix. The twoimages in graphic-pen on the left of the thirdrow, and the two images in charcoal and retic-ulation on the right column were synthesizedby our algorithm.

References

[1] Adobe Systems Incorporated. http://www.adobe.com.[2] M. Ashikhmin. Synthesizing natural textures. InACM Sym-

posium on Interactive 3D Graphics, pages 217–226, 2001.[3] S. Baker and T. Kanade. Limits on super-resolution and how

to break them.IEEE Transactions on Pattern Analysis andMachine Intelligence, 24(9):1167–1183, 2002.

[4] R. Basri and D. Jacobs. Lambertian reflectance and linearsubspaces. InIEEE International Conference on ComputerVision, volume 2, pages 383–390, 2001.

[5] D. Beymer and T. Poggio. Image representation for visuallearning.Science, 272:1905–1909, 1996.

[6] D. Beymer, A. Shashua, and T. Poggio. Example based im-age analysis and synthesis. MIT AI Lab Technical Memo1431, 1993.

[7] D. N. Bhat and S. K. Nayar. Ordinal measures for imagecorrespondence.IEEE Transactions on Pattern Analysis andMachine Intelligence, 20:415–423, 1998.

Page 8: Example-Based Style Synthesis - TAU

Sponge Craquelure Water Color

Figure 5. A set of landscape images andartistic, texture filters form an image matrix.The images in sponge and craquelure on thelower row, and the two images in water coloron the top right column were synthesized byour algorithm.

[8] E. Borenstein and S. Ullman. Class-specific, top-down seg-mentation. InEuropean Conference on Computer Vision,pages 109–124, 2002.

[9] M. Brand and A. Hertzmann. Style machines. InACM Sig-graph, pages 183–192, 2000.

[10] C. Bregler, M. Covell, and M. Slaney. Video rewrite: drivingvisual speech with audio. InACM Siggraph, pages 353–360,1997.

[11] J. Davis. Mosaics of scenes with moving objects. InIEEEComputer Society Conference on Computer Vision and Pat-tern Recognition, pages 354–360, 1998.

[12] R. O. Duda, P. E. Hart, and D. G. Stork.Pattern Classifica-tion. Wiley-Interscience, 2nd ed., 2001.

[13] A. A. Efros and W. T. Freeman. Image quilting for texturesynthesis and transfer. InACM Siggraph, pages 341–346,2001.

[14] D. J. Field. Relations between the statistics of natural imagesand the response properties of cortical cells.Journal of theOptical Society of America, 4(12):2379–2394, 1987.

[15] W. T. Freeman, T. R. Jones, and E. Pasztor. Example-basedsuper-resolution. IEEE Computer Graphics and Applica-tions, pages 56–65, 2002.

[16] W. T. Freeman and E. Pasztor. Learning low-level vision. InIEEE International Conference on Computer Vision, pages182–189, 1998.

[17] W. T. Freeman and J. B. Tenenbaum. Learning bilinear mod-els for two-factor problems in vision. InIEEE Conference onComputer Vision and Pattern Recognition, pages 554–560,1997.

[18] P. Haeberli. Paint by numbers: abstract image representa-tions. InACM Siggraph, pages 207–214, 1990.

Impressionist Expressionist Cezanne

Figure 6. A set of painterly rendered imagesand an input from a Still Life by Paul Cezanneform an image matrix. The images in impres-sionist and expressionist style on the lowerrow, and the two images on the top right col-umn were synthesized by our algorithm.

[19] A. Hertzmann. Painterly rendering with curved brush strokesof multiple sizes. InACM Siggraph, pages 453–460, 1998.

[20] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H.Salesin. Image analogies. InACM Siggraph, pages 327–340,2001.

[21] A. Hertzmann, N. Oliver, B. Curless, and S. M. Seitz. Curveanalogies. InProceedings of 13th Eurographics Workshopon Rendering, pages 233–245, 2002.

[22] L. Liang, C. Liu, Y.-Q. Xu, B. Guo, and H.-Y. Shum. Real-time texture synthesis by patch-based sampling.ACM Trans-actions on Graphics, 20(3):127–150, 2001.

[23] R. Ramamoorthi and P. Hanrahan. A signal-processingframework for inverse rendering. InACM Siggraph, pages117–128, 2001.

[24] E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley. Colortransfer between images.IEEE Computer Graphics and Ap-plications, pages 34–40, 2001.

[25] H. Samet.Applications of Spatial Data Structures. Addison-Wesley Publishing, 1989.

[26] S. H. Seung and D. D. Lee. The manifold ways of perception.Science, 290:2268–2269, 2000.

[27] P.-P. Sloan, C. Rose, and M. Cohen. Shape by example. InACM Symposium on Interactive 3D Graphics, pages 135–143, 2001.

[28] C. Soler, M.-P. Cani, and A. Angelidis. Hierarchical patternmapping. InACM Siggraph, pages 673–680, 2002.

[29] J. B. Tenenbaum and W. T. Freeman. Separating styleand content with bilinear models.Neural Computation,12(6):1247–1283, 2000.

[30] T. Welsh, M. Ashikhmin, and K. Mueller. Transferring colorto greyscale images. InACM Siggraph, pages 277–280,2002.