hiroshi ishikawa - nthu

Satoshi Iizuka Edgar Simo‐Serra Kazuma Sasaki

Department of Computer Science & EngineeringWaseda University

Hiroshi Ishikawa

Joint work with:

3

Colorization of Black‐and‐white Pictures

4

Fully‐automatic colorization

5

Colorization of Old Films

6

Related Work Scribble‐based [Levin+ 2004; Yatziv+ 2004; An+ 2009; Xu+ 2013; Endo+ 2016]Specify colors with scribblesRequire manual inputs

Reference image‐based [Chia+ 2011; Gupta+ 2012]Transfer colors of reference imagesRequire very similar images

[Levin+ 2004]

[Gupta+ 2012]Input Reference Output

7

Related WorkAutomatic colorization with hand‐crafted features [Cheng+ 2015]Uses existing multiple image featuresComputes chrominance via a shallow neural networkDepends on the performance of semantic segmentationOnly handles simple outdoor scenes

Image features

Chroma OutputInput Neural Network

8

In our work End‐to‐end network jointly learns global and local features for automatic image colorizationFusion layer merges the global and local featuresClassification labels are exploited for learning

Iizuka, Simo‐serra, Ishikawa: “Let there be Color!: Joint End‐to‐end Learning of Global and Local Image Priorsfor Automatic Image Colorization with Simultaneous Classification”, SIGGRAPH2016.

9

Network model

Two branches: local features and global featuresComposition of four networks

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

InputOutput

Low‐Level Features Network

Convolutional layers (3x3 kernel)No pooling Lower resolution with strides > 1

# of feature maps

Convolutional layer

Sharedweights

2 2

4 4

8 8

112 11228 2856 56

512256128

512

Scaling by Strides

Down-convolution

Flat-convolution

Up-convolution

stride

stride

stride

(Not used in colorization. Used in the sketch simplification)

12

Network model

Scaling

Chrominance




Upsampling

Luminance

Fusion Layer

Sharedweights

13

Global Features Network

Compute a global 256‐dimensional vector representation of the image

28 2851214 14 7 7

512

1024256

Fully Connected

256‐dim vector

14

Mid‐Level Features Network

Extracts mid‐level features such as texture

Scaling

Chrominance




Upsampling

Luminance

Sharedweights

512 256

8 8

15

Fusion Layer

Scaling

Chrominance




Upsampling

Luminance

Fusion Layer

Sharedweights

16

Fusion Layer

Combines the global and mid‐level features

256‐dim vector

8 8

256

8 8

512

Global Features Network

Low-Level FeaturesNetwork

Fusion LayerColorization

Network

17


Scaling




Upsampling

Luminance

Fusion Layer

Sharedweights

Compute chrominance from the fused featuresRestore the image to the input resolution

18

TrainingMean Squared Error (MSE) and as loss function for chrominanceCross‐entropy loss for classification networkOptimization using ADADELTA [Zeiler 2012]Adaptively sets a learning rate

Model

Forward

Backward

MSE

Input Output Ground truth

19

Joint Training with Classification

Scaling

Chrominance




Upsampling

Luminance

Fusion Layer

Sharedweights

Training for classification jointly with the colorizationClassification network connected to the global features

20.60% Formal Garden16.13% Arch13.50% Abbey

7.07% Botanical Garden6.53% Golf Course

Predicted labels

Classification Network

20

DatasetMIT Places Scene Dataset [Zhou+ 2014]2.3M training images with 205 scene labels pixels

Abbey Airport terminal Aquarium Baseball field

Dining room Forest road Gas station Gift shop

22

Computational Time Colorize within a few seconds

80ms

23

Colorization of MIT Places Dataset

24

Comparisons

Input [Cheng+ 2015] Ours(w/ global features)

Ours(w/o global features)

25

Effectiveness of Global Features

w/ global featuresInput w/o global features

26

Colorization of Historical Photographs

Mount Moran, 1941 Scott's Run, 1937 Youngsters, 1912 Burns Basement, 1910

27


California National Park, 1936 Homes, 1936 Spinners, 1910 Doffer Boys, 1909

28


Norris Dam, 1933North Dome, 1936 Miner,1937 Community Center, 1936

29

Limitations Saturation tend to be low

Does not recover true colorsInput Ground truth Output

Input Ground truth Output

31

Rough Sketch Simplification

32

Rough Sketch SimplificationInput: Output:

Simo‐serra, Iizuka, Sasaki, Ishikawa: “Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup”, SIGGRAPH2016.

Rough Sketch SimplificationRough Target Rough Target Rough Target

Related work Sketch simplification

Progressive Online Modification Igarashi et al. 1997, Bae et al. 2008, Grimm and

Joshi 2012, Fišer et al. 2015 Stroke Reduction

Deussen and Strothotte 2000, Wilson and Ma 2004, Grabli et al. 2004, Cole et al. 2006

Stroke Grouping Barla et al. 2005, Orbay and Kara 2011, Liu et al.

2015 Vector input

Vectorization Model Fitting (Bezier, …) Gradient‐based approaches Require fairly clean input sketches

Liu et al. 2015

Noris et al. 2015

Network model 23 convolutional layers Output has the same resolution as the input

Flat-convolution

Up-convolution

Down-convolution

2 2

4 4

8 82 2

4 4

Learning Trained from scratch (about 3 weeks) Using 424x424px patchesWeighted Mean Square Error loss Batch Normalization [Ioffe and Szegedy 2015] is critical Optimized with ADADELTA [Zeiler 2012]

Input Output Target

Sketch dataset 68 pairs of rough and target sketches 5 illustrators

・・・

Extracted patchesSketch dataset

・・・

Inverse Dataset Creation Data quality is critical Creating target sketches from rough sketches has misalignments Creating rough sketches from target sketches properly aligns

Standard Inverse Creation

Data augmentation 68 pairs is insufficient Scaling, random cropping, flipping, and rotation Additional augmentation: tone, slur, and noise

Input NoiseTone Slur

41

Sketch Simplification Experimental Environment Intel(R) Core i7‐5960X CPU @ 3.00GHzNVIDIA GeForce TITAN X GPU Torch framework

About 3 weeks for training Simplification time

42

Comparison

Input OursAdobe Live Trace Potrace

43

Comparison

Input OursAdobe Live Trace Potrace

44

ComparisonInput

[Liuetal.2015]

Ours

(a) Fairy (b) Mouse (c) Duck (d) Car

Vector Input

Raster Input

45

Results

46

Results

47

Results

48

ConclusionAutomatic Image colorization Fuse global and local information Joint training of colorization and classification

Automatic Rough Sketch Simplification Convolutional networks are suited to image processing

Proper data is crucial for training

hiroshi ishikawa - nthu

Documents