hiroshi ishikawa - nthu

48
Satoshi Iizuka Edgar Simo‐Serra Kazuma Sasaki Department of Computer Science & Engineering Waseda University Hiroshi Ishikawa Joint work with:

Upload: others

Post on 16-Jan-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hiroshi Ishikawa - NTHU

Satoshi Iizuka Edgar Simo‐Serra      Kazuma Sasaki

Department of Computer Science & EngineeringWaseda University

Hiroshi Ishikawa

Joint work with:

Page 2: Hiroshi Ishikawa - NTHU
Page 3: Hiroshi Ishikawa - NTHU

3

Colorization of Black‐and‐white Pictures

Page 4: Hiroshi Ishikawa - NTHU

4

Fully‐automatic colorization 

Page 5: Hiroshi Ishikawa - NTHU

5

Colorization of Old Films

Page 6: Hiroshi Ishikawa - NTHU

6

Related Work Scribble‐based [Levin+ 2004; Yatziv+ 2004; An+ 2009; Xu+ 2013; Endo+ 2016]Specify colors with scribblesRequire manual inputs

Reference image‐based [Chia+ 2011; Gupta+ 2012]Transfer colors of reference imagesRequire very similar images

[Levin+ 2004]

[Gupta+ 2012]Input Reference Output

Page 7: Hiroshi Ishikawa - NTHU

7

Related WorkAutomatic colorization with hand‐crafted features [Cheng+ 2015]Uses existing multiple image featuresComputes chrominance via a shallow neural networkDepends on the performance of semantic segmentationOnly handles simple outdoor scenes

Image features

Chroma OutputInput Neural Network

Page 8: Hiroshi Ishikawa - NTHU

8

In our work End‐to‐end network jointly learns global and local features for automatic image colorizationFusion layer merges the global and local featuresClassification labels are exploited for learning

Iizuka, Simo‐serra, Ishikawa: “Let there be Color!: Joint End‐to‐end Learning of Global and Local Image Priorsfor Automatic Image Colorization with Simultaneous Classification”, SIGGRAPH2016.

Page 9: Hiroshi Ishikawa - NTHU

9

Network model

Two branches: local features and global featuresComposition of four networks

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

InputOutput

Page 10: Hiroshi Ishikawa - NTHU

Low‐Level Features Network

Convolutional layers (3x3 kernel)No pooling Lower resolution with strides > 1

# of feature maps

Convolutional layer

Sharedweights

2 2

4 4

8 8

112 11228 2856 56

512256128

512

Page 11: Hiroshi Ishikawa - NTHU

Scaling by Strides

Down-convolution

Flat-convolution

Up-convolution

stride

stride

stride

(Not used in colorization. Used in the sketch simplification)

Page 12: Hiroshi Ishikawa - NTHU

12

Network model

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

Sharedweights

Page 13: Hiroshi Ishikawa - NTHU

13

Global Features Network

Compute a global 256‐dimensional vector representation of the image

28 2851214 14 7 7

512

1024256

Fully Connected

256‐dim vector

Page 14: Hiroshi Ishikawa - NTHU

14

Mid‐Level Features Network

Extracts mid‐level features such as texture

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Sharedweights

512 256

8 8

Page 15: Hiroshi Ishikawa - NTHU

15

Fusion Layer

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

Sharedweights

Page 16: Hiroshi Ishikawa - NTHU

16

Fusion Layer

Combines the global and mid‐level features

256‐dim vector

8 8

256

8 8

512

Global Features Network

Low-Level FeaturesNetwork

Fusion LayerColorization

Network

Page 17: Hiroshi Ishikawa - NTHU

17

Colorization Network

Scaling

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

Sharedweights

Compute chrominance from the fused featuresRestore the image to the input resolution

Page 18: Hiroshi Ishikawa - NTHU

18

TrainingMean Squared Error (MSE) and as loss function for chrominanceCross‐entropy loss for classification networkOptimization using ADADELTA [Zeiler 2012]Adaptively sets a learning rate

Model

Forward

Backward

MSE

Input Output Ground truth

Page 19: Hiroshi Ishikawa - NTHU

19

Joint Training with Classification

Scaling

Chrominance

Low-Level FeaturesNetwork Global Features Network

Mid-Level FeaturesNetwork

Colorization Network

Upsampling

Luminance

Fusion Layer

Sharedweights

Training for classification jointly with the colorizationClassification network connected to the global features

20.60% Formal Garden16.13% Arch13.50% Abbey

7.07% Botanical Garden6.53% Golf Course

Predicted labels

Classification Network

Page 20: Hiroshi Ishikawa - NTHU

20

DatasetMIT Places Scene Dataset [Zhou+ 2014]2.3M training images with 205 scene labels pixels

Abbey Airport terminal Aquarium Baseball field

Dining room Forest road Gas station Gift shop

Page 21: Hiroshi Ishikawa - NTHU
Page 22: Hiroshi Ishikawa - NTHU

22

Computational Time Colorize within a few seconds

80ms

Page 23: Hiroshi Ishikawa - NTHU

23

Colorization of MIT Places Dataset

Page 24: Hiroshi Ishikawa - NTHU

24

Comparisons

Input [Cheng+ 2015] Ours(w/ global features)

Ours(w/o global features)

Page 25: Hiroshi Ishikawa - NTHU

25

Effectiveness of Global Features

w/ global featuresInput w/o global features

Page 26: Hiroshi Ishikawa - NTHU

26

Colorization of Historical Photographs

Mount Moran, 1941 Scott's Run, 1937 Youngsters, 1912 Burns Basement, 1910

Page 27: Hiroshi Ishikawa - NTHU

27

Colorization of Historical Photographs

California National Park, 1936  Homes, 1936  Spinners, 1910 Doffer Boys, 1909

Page 28: Hiroshi Ishikawa - NTHU

28

Colorization of Historical Photographs

Norris Dam, 1933North Dome, 1936  Miner,1937 Community Center, 1936

Page 29: Hiroshi Ishikawa - NTHU

29

Limitations Saturation tend to be low

Does not recover true colorsInput Ground truth Output

Input Ground truth Output

Page 30: Hiroshi Ishikawa - NTHU
Page 31: Hiroshi Ishikawa - NTHU

31

Rough Sketch Simplification

Page 32: Hiroshi Ishikawa - NTHU

32

Rough Sketch SimplificationInput: Output:

Simo‐serra, Iizuka, Sasaki, Ishikawa: “Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup”, SIGGRAPH2016.

Page 33: Hiroshi Ishikawa - NTHU

Rough Sketch SimplificationRough Target Rough Target Rough Target

Page 34: Hiroshi Ishikawa - NTHU

Related work Sketch simplification

Progressive Online Modification Igarashi et al. 1997, Bae et al. 2008, Grimm and 

Joshi 2012, Fišer et al. 2015 Stroke Reduction

Deussen and Strothotte 2000, Wilson and Ma 2004, Grabli et al. 2004, Cole et al. 2006

Stroke Grouping Barla et al. 2005, Orbay and Kara 2011, Liu et al. 

2015 Vector input

Vectorization Model Fitting (Bezier, …) Gradient‐based approaches Require fairly clean input sketches

Liu et al. 2015

Noris et al. 2015

Page 35: Hiroshi Ishikawa - NTHU

Network model 23 convolutional layers Output has the same resolution as the input

Flat-convolution

Up-convolution

Down-convolution

2 2

4 4

8 82 2

4 4

Page 36: Hiroshi Ishikawa - NTHU

Learning Trained from scratch (about 3 weeks) Using 424x424px patchesWeighted Mean Square Error loss Batch Normalization [Ioffe and Szegedy 2015] is critical Optimized with ADADELTA [Zeiler 2012]

Input Output Target

Page 37: Hiroshi Ishikawa - NTHU

Sketch dataset 68 pairs of rough and target sketches 5 illustrators

・・・

Extracted patchesSketch dataset

・・・

Page 38: Hiroshi Ishikawa - NTHU

Inverse Dataset Creation Data quality is critical Creating target sketches from rough sketches has misalignments Creating rough sketches from target sketches properly aligns

Standard Inverse Creation

Page 39: Hiroshi Ishikawa - NTHU

Data augmentation 68 pairs is insufficient Scaling, random cropping, flipping, and rotation Additional augmentation: tone, slur, and noise

Input NoiseTone Slur

Page 40: Hiroshi Ishikawa - NTHU
Page 41: Hiroshi Ishikawa - NTHU

41

Sketch Simplification Experimental Environment Intel(R) Core i7‐5960X CPU @ 3.00GHzNVIDIA GeForce TITAN X GPU Torch framework

About 3 weeks for training Simplification time

Page 42: Hiroshi Ishikawa - NTHU

42

Comparison

Input OursAdobe Live Trace Potrace

Page 43: Hiroshi Ishikawa - NTHU

43

Comparison

Input OursAdobe Live Trace Potrace

Page 44: Hiroshi Ishikawa - NTHU

44

ComparisonInput

[Liuetal.2015]

Ours

(a) Fairy (b) Mouse (c) Duck (d) Car

Vector Input

Raster Input

Page 45: Hiroshi Ishikawa - NTHU

45

Results

Page 46: Hiroshi Ishikawa - NTHU

46

Results

Page 47: Hiroshi Ishikawa - NTHU

47

Results

Page 48: Hiroshi Ishikawa - NTHU

48

ConclusionAutomatic Image colorization Fuse global and local information Joint training of colorization and classification

Automatic Rough Sketch Simplification Convolutional networks are suited to image processing

Proper data is crucial for training