hiroshi ishikawa - nthu
TRANSCRIPT
Satoshi Iizuka Edgar Simo‐Serra Kazuma Sasaki
Department of Computer Science & EngineeringWaseda University
Hiroshi Ishikawa
Joint work with:
3
Colorization of Black‐and‐white Pictures
4
Fully‐automatic colorization
5
Colorization of Old Films
6
Related Work Scribble‐based [Levin+ 2004; Yatziv+ 2004; An+ 2009; Xu+ 2013; Endo+ 2016]Specify colors with scribblesRequire manual inputs
Reference image‐based [Chia+ 2011; Gupta+ 2012]Transfer colors of reference imagesRequire very similar images
[Levin+ 2004]
[Gupta+ 2012]Input Reference Output
7
Related WorkAutomatic colorization with hand‐crafted features [Cheng+ 2015]Uses existing multiple image featuresComputes chrominance via a shallow neural networkDepends on the performance of semantic segmentationOnly handles simple outdoor scenes
Image features
Chroma OutputInput Neural Network
8
In our work End‐to‐end network jointly learns global and local features for automatic image colorizationFusion layer merges the global and local featuresClassification labels are exploited for learning
Iizuka, Simo‐serra, Ishikawa: “Let there be Color!: Joint End‐to‐end Learning of Global and Local Image Priorsfor Automatic Image Colorization with Simultaneous Classification”, SIGGRAPH2016.
9
Network model
Two branches: local features and global featuresComposition of four networks
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
InputOutput
Low‐Level Features Network
Convolutional layers (3x3 kernel)No pooling Lower resolution with strides > 1
# of feature maps
Convolutional layer
Sharedweights
2 2
4 4
8 8
112 11228 2856 56
512256128
512
Scaling by Strides
Down-convolution
Flat-convolution
Up-convolution
stride
stride
stride
(Not used in colorization. Used in the sketch simplification)
12
Network model
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
13
Global Features Network
Compute a global 256‐dimensional vector representation of the image
28 2851214 14 7 7
512
1024256
Fully Connected
256‐dim vector
14
Mid‐Level Features Network
Extracts mid‐level features such as texture
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Sharedweights
512 256
8 8
15
Fusion Layer
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
16
Fusion Layer
Combines the global and mid‐level features
256‐dim vector
8 8
256
8 8
512
Global Features Network
Low-Level FeaturesNetwork
Fusion LayerColorization
Network
17
Colorization Network
Scaling
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
Compute chrominance from the fused featuresRestore the image to the input resolution
18
TrainingMean Squared Error (MSE) and as loss function for chrominanceCross‐entropy loss for classification networkOptimization using ADADELTA [Zeiler 2012]Adaptively sets a learning rate
Model
Forward
Backward
MSE
Input Output Ground truth
19
Joint Training with Classification
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
Training for classification jointly with the colorizationClassification network connected to the global features
20.60% Formal Garden16.13% Arch13.50% Abbey
7.07% Botanical Garden6.53% Golf Course
Predicted labels
Classification Network
20
DatasetMIT Places Scene Dataset [Zhou+ 2014]2.3M training images with 205 scene labels pixels
Abbey Airport terminal Aquarium Baseball field
Dining room Forest road Gas station Gift shop
22
Computational Time Colorize within a few seconds
80ms
23
Colorization of MIT Places Dataset
24
Comparisons
Input [Cheng+ 2015] Ours(w/ global features)
Ours(w/o global features)
25
Effectiveness of Global Features
w/ global featuresInput w/o global features
26
Colorization of Historical Photographs
Mount Moran, 1941 Scott's Run, 1937 Youngsters, 1912 Burns Basement, 1910
27
Colorization of Historical Photographs
California National Park, 1936 Homes, 1936 Spinners, 1910 Doffer Boys, 1909
28
Colorization of Historical Photographs
Norris Dam, 1933North Dome, 1936 Miner,1937 Community Center, 1936
29
Limitations Saturation tend to be low
Does not recover true colorsInput Ground truth Output
Input Ground truth Output
31
Rough Sketch Simplification
32
Rough Sketch SimplificationInput: Output:
Simo‐serra, Iizuka, Sasaki, Ishikawa: “Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup”, SIGGRAPH2016.
Rough Sketch SimplificationRough Target Rough Target Rough Target
Related work Sketch simplification
Progressive Online Modification Igarashi et al. 1997, Bae et al. 2008, Grimm and
Joshi 2012, Fišer et al. 2015 Stroke Reduction
Deussen and Strothotte 2000, Wilson and Ma 2004, Grabli et al. 2004, Cole et al. 2006
Stroke Grouping Barla et al. 2005, Orbay and Kara 2011, Liu et al.
2015 Vector input
Vectorization Model Fitting (Bezier, …) Gradient‐based approaches Require fairly clean input sketches
Liu et al. 2015
Noris et al. 2015
Network model 23 convolutional layers Output has the same resolution as the input
Flat-convolution
Up-convolution
Down-convolution
2 2
4 4
8 82 2
4 4
Learning Trained from scratch (about 3 weeks) Using 424x424px patchesWeighted Mean Square Error loss Batch Normalization [Ioffe and Szegedy 2015] is critical Optimized with ADADELTA [Zeiler 2012]
Input Output Target
Sketch dataset 68 pairs of rough and target sketches 5 illustrators
・・・
Extracted patchesSketch dataset
・・・
Inverse Dataset Creation Data quality is critical Creating target sketches from rough sketches has misalignments Creating rough sketches from target sketches properly aligns
Standard Inverse Creation
Data augmentation 68 pairs is insufficient Scaling, random cropping, flipping, and rotation Additional augmentation: tone, slur, and noise
Input NoiseTone Slur
41
Sketch Simplification Experimental Environment Intel(R) Core i7‐5960X CPU @ 3.00GHzNVIDIA GeForce TITAN X GPU Torch framework
About 3 weeks for training Simplification time
42
Comparison
Input OursAdobe Live Trace Potrace
43
Comparison
Input OursAdobe Live Trace Potrace
44
ComparisonInput
[Liuetal.2015]
Ours
(a) Fairy (b) Mouse (c) Duck (d) Car
Vector Input
Raster Input
45
Results
46
Results
47
Results
48
ConclusionAutomatic Image colorization Fuse global and local information Joint training of colorization and classification
Automatic Rough Sketch Simplification Convolutional networks are suited to image processing
Proper data is crucial for training