lecture 19 computer vision ii - yale universitytba3/stat665/lectures/lec19/lecture19.pdf · lecture...

52
Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/52

Upload: others

Post on 18-Jan-2021

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Lecture 19Computer Vision II06 April 2016

Taylor B. ArnoldYale StatisticsSTAT 365/665

1/52

Page 2: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Convolutional�Models�in�Computer�Vision

There is a long history of specific advances and uses of convolutional neural networks. Today, I’llfocus on the following set of models:

▶ LeNet-5 (1998)▶ AlexNet (2012)▶ OverFeat (2013)▶ VGG-16, VGG-19 (2014)▶ GoogLeNet (2014)▶ PReLUnet (2015)▶ ResNet-50, ResNet-101, ResNet-152 (2015)▶ SqueezeNet (2016)▶ Stochastic Depth (2016)▶ ResNet-200, ResNet-1001 (2016)

When you hear about these models people may be referring to: the architecture, the architectureand weights, or just to the general approach.

2/52

Page 3: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

AlexNet�(2012)

A model out of the University of Toronto, now known as AlexNet, became the first CNNto produce state-of-the-art classification rates on the ILSVRC-2012 dataset:

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classificationwith deep convolutional neural networks.” In Advances in neural informationprocessing systems, pp. 1097-1105. 2012.

3/52

Page 4: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

AlexNet�contributions

AlexNet was the first to put together several key advances, all of which we have alreadyused or discussed in this class:

1. relu units

2. dropout

3. data augmentation

4. multiple GPUs

While not all invented by the AlexNet group, they were the first to put them all togetherand figure out how to train a deep neural network.

4/52

Page 5: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

5/52

Page 6: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Visualizing�CNNs�(2013)

Following the success of AlexNet, the year 2013 saw a much larger number of neuralnetwork entrants into the ILSVRC competition. The winning entry came about due tothe visualization techniques described in the following paper:

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutionalnetworks.” Computer vision–ECCV 2014. Springer International Publishing, 2014.818-833.

Their incredibly diverse set of techniques allowed the team to tweak the AlexNetarchitecture to get even better results.

6/52

Page 7: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

7/52

Page 8: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

8/52

Page 9: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

9/52

Page 10: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

10/52

Page 11: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

11/52

Page 12: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

A demo of applying these techniques to the MNIST dataset with ConvNetJS:

http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

12/52

Page 13: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

OverFeat�(2013)

The 2013 competition also brought about the incredibly influential OverFeat model froma team based at NYU:

Sermanet, Pierre, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, andYann LeCun. “Overfeat: Integrated recognition, localization and detection usingconvolutional networks.” arXiv preprint arXiv:1312.6229 (2013).

The won the image localization task, by trying to solve localization and identification in aunified process. I’ll give a very simplified version of what they did (the paper is a greatread, and I suggest working through it if you are interested in computer vision).

13/52

Page 14: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

14/52

Page 15: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

15/52

Page 16: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

16/52

Page 17: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

17/52

Page 18: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Python�demo�I:�OverFeat�adaptation�of�AlexNet�(2012)

18/52

Page 19: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

VGG-16, VGG-19�(2014)

One of the top entries from 2014, by an Oxford-based team, took advantage ofsignificantly deeper models.

Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks forlarge-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).

19/52

Page 20: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

20/52

Page 21: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Python�demo�II:�Pre-trained�VGG-19�Model

21/52

Page 22: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

GoogLeNet�(2014)

The winning entry from 2014, by Google, also took advantage of much deeperarchitectures:

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, DragomirAnguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Goingdeeper with convolutions.” In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pp. 1-9. 2015.

They called their model GoogLeNet in honor of the original LeNet architecture.

22/52

Page 23: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

23/52

Page 24: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

24/52

Page 25: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

25/52

Page 26: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

26/52

Page 27: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Python�demo�III:�GoogLeNet�-�Inception�Module

27/52

Page 28: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

28/52

Page 29: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Batch�Normalization�(2015)

Not a model architecture itself, but one very useful new tweak in the past year has beenBatch Normalization, first presented in this paper:

Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift.” arXiv preprintarXiv:1502.03167

29/52

Page 30: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Python�demo�IV:�Batch�normalization

30/52

Page 31: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

PReLUnet�(2015)

Microsoft’s first contribution in 2015 was the idea of using a modifed ReLU activationfunction:

He, Kaiming, et al. “Delving deep into rectifiers: Surpassing human-levelperformance on imagenet classification.” Proceedings of the IEEE InternationalConference on Computer Vision. 2015.

31/52

Page 32: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

32/52

Page 33: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

ResNet-50, -101, -152�(2015)

Finally, in the 2015 competition, Microsoft produced an model which is extremelydeeper than any previously used. These models are known as ResNet, with their depthgiven as an suffix.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep ResidualLearning for Image Recognition.” arXiv preprint arXiv:1512.03385 (2015).

33/52

Page 34: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

34/52

Page 35: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

35/52

Page 36: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

36/52

Page 37: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

37/52

Page 38: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Python�demo�V:�ResNet�Unit�(2015)

38/52

Page 39: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Stochastic�Depth�Models�(2016)

Another tweak on the ResNet architecture, which subsamples layers in the network:

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, Kilian Weinberger. “Deep Networkswith Stochastic Depth”, arXiv preprint arXiv:1603.09382 (2016).

Notice how this seems like an almost obvious thing to try given the ResNet architecture,but less-so in a generic neural network.

39/52

Page 40: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

40/52

Page 41: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

ResNet-200, -1001�(2016)

Microsoft’s update to last year’s model. Posted only two weeks ago!

He, Kaiming, et al. “Identity Mappings in Deep Residual Networks.” arXiv preprintarXiv:1603.05027 (2016).

41/52

Page 42: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

42/52

Page 43: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

SqueezeNet�(2016)

A new line of research involves looking at ways to produce near state-of-the-art resultswith a minimal model size (or minimal computational cost):

Iandola, Forrest N., et al. “SqueezeNet: AlexNet-level accuracy with 50x fewerparameters and <1MB model size.” arXiv preprint arXiv:1602.07360 (2016).

43/52

Page 44: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

44/52

Page 45: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

45/52

Page 46: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

46/52

Page 47: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Microsoft�Common�Images�in�Context�(MS COCO)

47/52

Page 48: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

48/52

Page 49: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

49/52

Page 50: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

50/52

Page 51: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

More information is found on their website, http://mscoco.org/, and in the paperdescribing the dataset:

Lin, Tsung-Yi, et al. “Microsoft coco: Common objects in context.” ComputerVision–ECCV 2014. Springer International Publishing, 2014. 740-755.

I should mention that Microsoft’s own entry (ResNet) essentially swept every winningmetric with their technique.

51/52

Page 52: Lecture 19 Computer Vision II - Yale Universitytba3/stat665/lectures/lec19/lecture19.pdf · Lecture 19 Computer Vision II 06 April 2016 Taylor B. Arnold Yale Statistics STAT365/665

Video�Classification

I don’t want to make it sound as though the only interesting research in computer visionwith neural networks involves one of these large public competitions (though the biggestconceptual advances have come out of these). For example, an influential paper doingvideo scene classification

Karpathy, Andrej, George Toderici, Sanketh Shetty, Thomas Leung, RahulSukthankar, and Li Fei-Fei. “Large-scale video classification with convolutionalneural networks.” In Proceedings of the IEEE conference on Computer Vision andPattern Recognition, pp. 1725-1732. 2014.

Which yields a fantastic video demonstration of the output:

https://www.youtube.com/watch?v=qrzQ_AB1DZk

52/52