ecs$289g$–$uc$davis$...

19
ECS 289G – UC Davis Paper Presenta6on #1 ImageNet Classifica6on with Deep Convolu6onal Neural Networks Mohammad Motamedi Mohammad Motamedi ECS 289G PAPER PRESENTATION UC DAVIS 1

Upload: others

Post on 17-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

ECS  289G  –  UC  Davis  Paper  Presenta6on  #1

ImageNet  Classifica6on  with  Deep  Convolu6onal  Neural  Networks

Mohammad  Motamedi

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   1  

Page 2: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Convolu6onal  Neural  Networks  (CNNs)   Easier  to  Train     Much  Fewer  ConnecEon  

  Using  locality  of  pixel  dependency     Capacity  is  funcEon  of  depth  and  breadth  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   2  

Image  source:  stackexchange.com  

Page 3: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Training  Examples   ImageNet  ◦  Dataset  of  15  million  labeled  high  resoluEon  images  

◦  22000  categories  ◦  Various  image  resoluEons  

  Data  size  ◦  1.2  million  training  examples  ◦  50000  validaEon  images  ◦  150000  tesEng  images  

  Preprocessing  ◦  Down-­‐sampled  to  256×256  ◦  SubtracEng  mean  acEvity  over  training  set  from  each  pixel  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   3  

Image  source:  image-­‐net.org  

Page 4: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

The  Architecture Innova6ons  and  Details

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   4  

Page 5: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Rec6fied  Linear  Units  (ReLU)   Using  𝑓(𝑥)=max(0,  𝑥)  instead  of  tanh(𝑥)  ◦  No  input  normalizaEon  is  required  for  saturaEon  prevenEon  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   5  

Image  source:  cs231n.github.io  

Page 6: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Local  Response  Normaliza6on   Normalizing  over  𝑛  adjacent  feature  maps  at  the  same  spaEal  posiEon.    ◦  It  is  performed  aYer  applying  ReLU.    

  Effect  ◦  Reduces  top  1  error  by  1.4  %  ◦  Reduces  top  5  error  by  1.2  %  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   6  

Image  source:  computer.org  

Page 7: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Overlapping  Pooling   Pooling  grid  of  space  2  are  used  for  summarizing  neighborhoods  of  size  3×3.  

  Effects  ◦  Reduces  the  top  1  error  rate  by  0.4  %  ◦  Reduces  the  top  5  error  rate  by  0.4  %  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   7  

Page 8: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Architecture

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   8  

•  Response  normalizaEon:  AYer  first  and  Second  Layer  •  Max  Pooling:  AYer  both  response  normalizaEons  and  fiYh  layer  •  ReLU:  AYer  each  layer  

Page 9: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

OverfiQng Techniques  to  Reduce  OverfiQng

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   9  

Page 10: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Data  Augmenta6on   Data  is  augmented  by  ◦  ExtracEng  random  224×224  patches  ◦  Using  both  patches  and  their  horizontal  reflecEon  ◦  The  same  approaches  is  used  in  the  test  Eme  (10  patches)  

  Altering  the  intensity  of  RGB  channels  ◦  Add  found  principle  components  Emes  a  random  variable  proporEonal  to  the  corresponding  eigenvalue  

  Effect  ◦  Reduces  the  top  1  error  by  over  1%  

 

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   10  

Page 11: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Dropout   Sedng  the  output  of  each  hidden  neuron  with  probability  of  0.5  ◦  This  neuron  is  not  effecEve  in  the  forward  path  and  does  not  play  a  role  in  the  backpropagaEon.    

  Reduces  complex  co-­‐adapEon  ◦  No  neuron  can  rely  on  the  presence  of  another  neuron  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   11  

Page 12: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Implementa6on   Training  Eme:  six  days  on  two  GTX  580  3  GB  GPUs  

  Effect  on  network  ◦  It  is  required  to  minimize  the  inter  chip  communicaEon  

  AugmenEng  the  data  on  CPU  in  parallel  with  training  on  GPU  ◦  Augmented  data  does  not  need  to  be  stored  on  the  disk  

  Effect:  ◦  Reduces  the  top  1  error  by  1.7  %  ◦  Reduces  the  top  5  error  by  1.2  %  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   12  

Page 13: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Training   Network  is  trained  with  stochasEc  gradient  descent  ◦  Weight  decay:  0.0005  ◦  Momentum:  0.9  ◦  Weights  are  iniEalized  by  random  numbers  from  a  zero  –  mean  Gaussian  distribuEon  with  standard  deviaEon  of  0.01  

◦  Divide  learning  rate  by  10  when  error  stops  improving    

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   13  

Page 14: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Results

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   14  

Page 15: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

Kernel  values  aTer  training

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   15  

Page 16: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

ILSVRC  2010

0  

5  

10  

15  

20  

25  

30  

35  

40  

45  

50  

Top  -­‐  1   Top  -­‐  5  

Error  (%)  

CNN   SIFT  +  FVs  [24]   Sparse  coding  [2]  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   16  

Page 17: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

ILSVRC  2012

0  

5  

10  

15  

20  

25  

30  

Top  5  

Error  (%)  

SIFT  +  FVs  [7]   5  CNNs   7  CNNs  

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   17  

Model   Top  –  1  Error  (Val)  

Top  –  5  Error  (test)  

SIFT  +  FVs  [7]   -­‐   26.2%  

1  CNN   18.2%   -­‐  

5  CNNs   16.4%   16.4%  

7  CNNs   15.4%   15.3%  

Page 18: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

ILSVRC  2010

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   18  

Page 19: ECS$289G$–$UC$Davis$ Paper$Presenta6on$#1web.cs.ucdavis.edu/.../ecs289g-fall2015/Presentation_MM.pdf · 2015. 10. 2. · Mohammad’Motamedi’ ECS’289G’PAPERPRESENTATION’9’UC’DAVIS

ILSVRC

Mohammad  Motamedi   ECS  289G  PAPER  PRESENTATION  -­‐  UC  DAVIS   19