“deep”&learning&demo.clab.cs.cmu.edu/nlp/f20/files/slides/26-deep... · 2020. 11. 24. · 2...

57
“Deep” Learning

Upload: others

Post on 25-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • “Deep”  Learning  

  • 2  

    natural    language  analyzer  

    Big  picture:  natural  language  analyzers  

    Natural  language  input  signal:    -‐  Web  page  -‐  Ques

  • 3  

    sen

  • 4  

    sen

  • Agenda  

    •  Big  picture  

    •  Why  deep  learning?  

    •  Building  blocks  of  a  deep  neural  network  

    •  How  to  train  deep  neural  networks  

    •  Important  results  

  • 6  

    do  

    classifica

  • 7  

    classifica

  • 8  

    How  to  define  f(l,  d):  linear  models  

    Linear  models:  f(l,  d)  =  w  .  g(l,d)  

    0  0  0  1  0  0  1  0  0  0  0  …  

    0.4  -‐1.2  0.2  0.2  -‐0.4  -‐1.0  5.1  1.1  2.3  0.8  -‐0.1  …   Number  of  

  • 9  

    How  to  define  f(l,  d):  linear  models  

    Linear  models:  f(l,  d)  =  w  .  g(l,d)      -‐  Easy  to  implement  -‐  Easy  to  op

  • Agenda  

    •  Big  picture  

    •  Why  deep  learning?  

    •  Building  blocks  of  a  deep  neural  network  

    •  How  to  train  deep  neural  networks  

    •  Important  results  

  • 11  

    Linear  models:  f(l,  d)  =  w  .  g(l,d)  =  w(l)  .  x(d)  e.g.,  y1  =  x1  w1,1+  x2  w2,1+  x3  w3,1+  x4  w4,1+  x5  w5,1  =  w(1)  .  x(d)    

    w1,1  

    w5,3  

    Number  of  

  • 12  

    neural  network  v1.0:  linear  model  

    Linear  models:  f(l,  d)  =  w  .  g(l,d)  =  w(l)  .  x(d)  e.g.,  y1  =  x1  w1,1+  x2  w2,1+  x3  w3,1+  x4  w4,1+  x5  w5,1  =  w(1)  .  x(d)                             x  

    W  same  as    

    y  

    W  

    =  

    x   y  

    similar  words  s

  • 13  

    neural  network  v2.0:  representa

  • 14  

    neural  network  v2.1:  representa

  • 15  

    neural  network  v3.0:  complex  func

  • 16  

    neural  network  v3.0:  complex  func

  • 17  

    neural  network  v3.0:  complex  func

  • 18  

    neural  network  v3.0:  complex  func

  • 19  

    neural  network  v3.5:  “deeper”  networks  

    x  

    W2  

    y  

    W1  

    h1  

    y  =  W3  h2  =  W3  a2(  W2  a1(W1  x)  )  

    W3  

    h2  

    Wait  but  why  do  we  need  more  layers?  

  • 20  

    neural  network  v3.5:  “deeper”  networks  

    x  

    W2  

    y  

    W1  

    h1  

    y  =  W3  h2  =  W3  a2(  W2  a1(W1  x)  )  

    W3  

    h2  

  • 21  

    neural  network  v4.0:  recurrent  neural  networks  

    Big  idea:  use  hidden  layers  to  represent  sequen

  • 22  

    neural  network  v4.0:  recurrent  neural  networks  

    Figure  credits:  Christopher  Olah  

    How  to  compute  the  hidden  layers?  

  • 23  

    neural  network  v4.1:  output  sequences  

    Figure  credits:  Andrej  Karpathy  

  • 24  

    neural  network  v4.1:  output  sequences  

    Figure  credits:  Andrej  Karpathy  

    Example:      Character-‐level    language  models  

  • 25  

    neural  network  v4.1:  output  sequences  

    Credits:  Andrej  Karpathy  

    Sample  output:  Copyright  was  the  succession  of  independence  in  the  slop  of  Syrian  influence  that  was  a  famous  German  movement  based  on  a  more  popular  servicious,  non-‐doctrinal  and  sexual  power  post.  Many  governments  recognize  the  military  housing  of  the  [[Civil  Liberaliza

  • 26  

    neural  network  v4.2:  Long-‐Short  Term  Memory  

    Figure  credits:  Christopher  Olah  hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/  

    LSTMs  

    Regular    RNNs  

  • 27  

    neural  network  v4.2:  Long-‐Short  Term  Memory  

    Figure  credits:  Christopher  Olah  hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/  

  • 28  

    neural  network  v4.3:  bidirec

  • 29  

    neural  network  v4.4:  acen

  • 30  

    neural  network  v5:  convolu

  • 31  

    neural  network  v5:  convolu

  • 32  

    neural  network  v5:  convolu

  • convolu

  • 34  

    neural  network  v5.1:  recursive  NNs  

  • 35  

    neural  network  v6:  dropout  

  • Agenda  

    •  Big  picture  

    •  Why  deep  learning?  

    •  Building  blocks  of  a  deep  neural  network  

    •  How  to  train  deep  neural  networks  

    •  Important  results  

  • How  to  train  NN  models?  •  argmaxl  f(d,  l)  only  tells  us  which  label  to  predict.  •  Supervised  learning  (need  input/output  pairs)  •  Loss  func

  • How  to  op

  • How  to  op

  • How  to  op

  • How  to  op

  • How  to  op

  • How  to  op

  • How  to  op

  • Agenda  

    •  Big  picture  

    •  Why  deep  learning?  

    •  Building  blocks  of  a  deep  neural  network  

    •  How  to  train  deep  neural  networks  

    •  Important  results  

  • 46  

    Major  results:  language  modeling  

  • Krizehvsky  et  al.  (2012)  

    47  

    Major  results:  image  classifica

  • 48  

    Major  results:  ImageNet  

    Krizehvsky  et  al.    (2012):  posi

  • 49  

    Major  results:  ImageNet  

    Krizehvsky  et  al.  (2012):  sample  convolu

  • 50  

    Major  results:  speech  recogni

  • 51  

    Major  results:  transla

  • 52  

    Major  results:  transla

  • 53  

    Major  results:  dependency  parsing  

    Chen  and  Manning  (2014)  

  • 54  

    Major  results:  dependency  parsing  

    Dyer  et  al.  (2015)  

  • Important  things  we  didn’t  cover  

    •  Dark  knowledge  •  Connec

  • Agenda  

    •  Big  picture  

    •  Why  deep  learning?  

    •  Building  blocks  of  a  deep  neural  network  

    •  How  to  train  deep  neural  networks  

    •  Important  results  

  • 57  

    sen