hiroshi ishikawa - nthu

Satoshi Iizuka Edgar Simo‐Serra Kazuma Sasaki

Department of Computer Science & EngineeringWaseda University

Hiroshi Ishikawa

Joint work with:

Colorization of Black‐and‐white Pictures

Fully‐automatic colorization

Colorization of Old Films

Related Work Scribble‐based [Levin+ 2004; Yatziv+ 2004; An+ 2009; Xu+ 2013; Endo+ 2016]Specify colors with scribblesRequire manual inputs

Reference image‐based [Chia+ 2011; Gupta+ 2012]Transfer colors of reference imagesRequire very similar images

[Levin+ 2004]

[Gupta+ 2012]Input Reference Output

Related WorkAutomatic colorization with hand‐crafted features [Cheng+ 2015]Uses existing multiple image featuresComputes chrominance via a shallow neural networkDepends on the performance of semantic segmentationOnly handles simple outdoor scenes

Image features

Chroma OutputInput Neural Network

In our work End‐to‐end network jointly learns global and local features for automatic image colorizationFusion layer merges the global and local featuresClassification labels are exploited for learning

Iizuka, Simo‐serra, Ishikawa: “Let there be Color!: Joint End‐to‐end Learning of Global and Local Image Priorsfor Automatic Image Colorization with Simultaneous Classification”, SIGGRAPH2016.

Network model

Two branches: local features and global featuresComposition of four networks

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

InputOutput

Low‐Level Features Network

Convolutional layers (3x3 kernel)No pooling Lower resolution with strides > 1

# of feature maps

Convolutional layer

Sharedweights

112 11228 2856 56

512256128

Scaling by Strides

Down-convolution

Flat-convolution

Up-convolution

stride

(Not used in colorization. Used in the sketch simplification)

Network model

Scaling

Chrominance

Upsampling

Luminance

Fusion Layer

Sharedweights

Global Features Network

Compute a global 256‐dimensional vector representation of the image

28 2851214 14 7 7

1024256

Fully Connected

256‐dim vector

Mid‐Level Features Network

Extracts mid‐level features such as texture

Scaling

Chrominance

Upsampling

Luminance

Sharedweights

512 256

Fusion Layer

Scaling

Chrominance

Upsampling

Luminance

Fusion Layer

Sharedweights

Fusion Layer

Combines the global and mid‐level features

256‐dim vector

Global Features Network

Low-Level FeaturesNetwork

Fusion LayerColorization

Network

Scaling

Upsampling

Luminance

Fusion Layer

Sharedweights

Compute chrominance from the fused featuresRestore the image to the input resolution

TrainingMean Squared Error (MSE) and as loss function for chrominanceCross‐entropy loss for classification networkOptimization using ADADELTA [Zeiler 2012]Adaptively sets a learning rate

Forward

Backward

Input Output Ground truth

Joint Training with Classification

Scaling

Chrominance

Upsampling

Luminance

Fusion Layer

Sharedweights

Training for classification jointly with the colorizationClassification network connected to the global features

20.60% Formal Garden16.13% Arch13.50% Abbey

7.07% Botanical Garden6.53% Golf Course

Predicted labels

Classification Network

DatasetMIT Places Scene Dataset [Zhou+ 2014]2.3M training images with 205 scene labels pixels

Abbey Airport terminal Aquarium Baseball field

Dining room Forest road Gas station Gift shop

Computational Time Colorize within a few seconds

Colorization of MIT Places Dataset

Comparisons

Input [Cheng+ 2015] Ours(w/ global features)

Ours(w/o global features)

Effectiveness of Global Features

w/ global featuresInput w/o global features

Colorization of Historical Photographs

Mount Moran, 1941 Scott's Run, 1937 Youngsters, 1912 Burns Basement, 1910

California National Park, 1936 Homes, 1936 Spinners, 1910 Doffer Boys, 1909

Norris Dam, 1933North Dome, 1936 Miner,1937 Community Center, 1936

Limitations Saturation tend to be low

Does not recover true colorsInput Ground truth Output

Input Ground truth Output

Rough Sketch Simplification

Rough Sketch SimplificationInput: Output:

Simo‐serra, Iizuka, Sasaki, Ishikawa: “Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup”, SIGGRAPH2016.

Rough Sketch SimplificationRough Target Rough Target Rough Target

Related work Sketch simplification

Progressive Online Modification Igarashi et al. 1997, Bae et al. 2008, Grimm and

Joshi 2012, Fišer et al. 2015 Stroke Reduction

Deussen and Strothotte 2000, Wilson and Ma 2004, Grabli et al. 2004, Cole et al. 2006

Stroke Grouping Barla et al. 2005, Orbay and Kara 2011, Liu et al.

2015 Vector input

Vectorization Model Fitting (Bezier, …) Gradient‐based approaches Require fairly clean input sketches

Liu et al. 2015

Noris et al. 2015

Network model 23 convolutional layers Output has the same resolution as the input

Flat-convolution

Up-convolution

Down-convolution

8 82 2

Learning Trained from scratch (about 3 weeks) Using 424x424px patchesWeighted Mean Square Error loss Batch Normalization [Ioffe and Szegedy 2015] is critical Optimized with ADADELTA [Zeiler 2012]

Input Output Target

Sketch dataset 68 pairs of rough and target sketches 5 illustrators

・・・

Extracted patchesSketch dataset

・・・

Inverse Dataset Creation Data quality is critical Creating target sketches from rough sketches has misalignments Creating rough sketches from target sketches properly aligns

Standard Inverse Creation

Data augmentation 68 pairs is insufficient Scaling, random cropping, flipping, and rotation Additional augmentation: tone, slur, and noise

Input NoiseTone Slur

Sketch Simplification Experimental Environment Intel(R) Core i7‐5960X CPU @ 3.00GHzNVIDIA GeForce TITAN X GPU Torch framework

About 3 weeks for training Simplification time

Comparison

Input OursAdobe Live Trace Potrace

Comparison

Input OursAdobe Live Trace Potrace

ComparisonInput

[Liuetal.2015]

(a) Fairy (b) Mouse (c) Duck (d) Car

Vector Input

Raster Input

Results

ConclusionAutomatic Image colorization Fuse global and local information Joint training of colorization and classification

Automatic Rough Sketch Simplification Convolutional networks are suited to image processing

Proper data is crucial for training

hiroshi ishikawa - nthu

Documents

spillover eﬀects of economic integration in a...

ishikawa fishbone diagram_eng_20111109

hiroshi tomita

2017 android of wmntaa - nmsl@nthu

topics on hi in the context of galaxy evolution: baryonic...

pllab, nthu,cs2403 programming languages scope binding and...

hiroshi komiyama

nthu research methodology 3:17:17

ate t literature library - phoenix research labs library...

ishikawa takashi

nthu h.264/avc video encoder & decoder

ishikawa fishbone diagram - template.net · ishikawa...

original article prognostic factors of synchronous...

hiroshi miyai

adaptive optics scanning laser ophthalmoscope for ... ·...

juran & ishikawa

inheritance and the modifiers cs340100, nthu yoshi

sandy ishikawa diagram

hiroshi murakami

assignment 4 solution - nthu-datalab.github.io