cs388: natural language processing lecture 15:...

33
CS388: Natural Language Processing Lecture 15: A9en :on Greg Durrett

Upload: others

Post on 11-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

CS388:NaturalLanguageProcessingLecture15:A9en:on

GregDurrett

Page 2: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

ThisLecture‣ GrahamNeubig(CMU)talkthisFridayat11amin6.302. “TowardsOpen-domainGenera:onofProgramsfromNaturalLanguage”

‣Mini2gradedbythisweekend

‣ Project2outbytheendoftoday;due*Friday*November2

Page 3: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Recall:Seq2seqModel‣ Generatenextwordcondi:onedonpreviouswordaswellashiddenstate

themoviewasgreat <s>

‣Wsizeis|vocab|x|hiddenstate|,soamaxoveren:revocabulary

Decoderhasseparateparametersfromencoder,sothiscanlearntobealanguagemodel(produceaplausiblenextwordgivencurrentone)

P (y|x) =nY

i=1

P (yi|x, y1, . . . , yi�1)

P (yi|x, y1, . . . , yi�1) = softmax(W¯h)

Page 4: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Recall:Seq2seqTraining

‣ Objec:ve:maximize

themoviewasgreat <s> lefilmétaitbon

le

‣ Onelosstermforeachtarget-sentenceword,feedthecorrectwordregardlessofmodel’spredic:on

[STOP]était

X

(x,y)

nX

i=1

logP (y⇤i |x, y⇤1 , . . . , y⇤i�1)

Page 5: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Recall:Seman:cParsingasTransla:on

JiaandLiang(2015)

‣Writedownalinearizedformoftheseman:cparse,trainseq2seqmodelstodirectlytranslateintothisrepresenta:on

‣Mightnotproducewell-formedlogicalforms,mightrequirelotsofdata

“whatstatesborderTexas”

lambda x ( state ( x ) and border ( x , e89 ) ) )

‣ Noneedtohaveanexplicitgrammar,simplifiesalgorithms

Page 6: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

RegexPredic:on‣ Canuseforotherseman:cparsing-liketasks

‣ Predictregexfromtext

‣ Problem:requiresalotofdata:10,000examplesneededtoget~60%accuracyonpre9ysimpleregexes

Locascioetal.(2016)

Page 7: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

SQLGenera:on‣ Convertnaturallanguagedescrip:onintoaSQLqueryagainstsomeDB

‣ Howtoensurethatwell-formedSQLisgenerated?

Zhongetal.(2017)

‣ Threeseq2seqmodels

‣ Howtocapturecolumnnames+constants?‣ Pointermechanisms

Page 8: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

ThisLecture

‣ Transformers

‣ A9en:on

‣ Copying

Page 9: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

A9en:on

Page 10: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

ProblemswithSeq2seqModels

‣ Needsomeno:onofinputcoverageorwhatinputwordswe’vetranslated

‣ Encoder-decodermodelsliketorepeatthemselves:

AboyplaysinthesnowboyplaysboyplaysUngarçonjouedanslaneige

‣ Oaenabyproductoftrainingthesemodelspoorly

Page 11: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

ProblemswithSeq2seqModels

‣ Unknownwords:

‣ Noma9erhowmuchdatayouhave,you’llneedsomemechanismtocopyawordlikePont-de-Buisfromthesourcetotarget

Page 12: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

ProblemswithSeq2seqModels

‣ Badatlongsentences:1)afixed-sizerepresenta:ondoesn’tscale;2)LSTMss:llhaveahard:merememberingforreallylongperiodsof:me

RNNsearch:introducesa9en:onmechanismtogive“variable-sized”representa:on

Bahdanauetal.(2014)

Page 13: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

AlignedInputs

<s>lefilmétaitbon

themoviewasgreat

themoviewasgreat

lefilmétaitbon

‣Muchlessburdenonthehiddenstate

‣ Supposeweknewthesourceandtargetwouldbeword-by-wordtranslated

‣ Canlookatthecorrespondinginputwordwhentransla:ng—thiscouldscale!

lefilmétaitbon[STOP]

‣ Howcanweachievethiswithouthardcodingit?

Page 14: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

A9en:on

‣ Ateachdecoderstate,computeadistribu:onoversourceinputsbasedoncurrentdecoderstatethemoviewasgreat <s> le

themovie wa

sgreatthe

movie wa

sgreat

… …

‣ Usethatinoutputlayer

Page 15: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

A9en:on

themoviewasgreat

h1 h2 h3 h4

<s>

h̄1

‣ Foreachdecoderstate,computeweightedsumofinputstates

eij = f(h̄i, hj)

ci =X

j

↵ijhj

c1

‣ Unnormalized scalarweight

‣ Normalizedscalarweight

‣Weightedsumofinputhidden states(vector)

le

↵ij =exp(eij)Pj0 exp(eij0)

P (yi|x, y1, . . . , yi�1) = softmax(W [ci; ¯hi])

P (yi|x, y1, . . . , yi�1) = softmax(W¯hi)‣ Noa9n:

Page 16: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

A9en:on

<s>

h̄1

eij = f(h̄i, hj)

ci =X

j

↵ijhj

c1

‣ Notethatthisallusesoutputsofhiddenlayers

f(h̄i, hj) = tanh(W [h̄i, hj ])

f(h̄i, hj) = h̄i · hj

f(h̄i, hj) = h̄>i Whj

‣ Bahdanau+(2014):addi:ve

‣ Luong+(2015):dotproduct

Luongetal.(2015)

‣ Luong+(2015):bilinear

le

↵ij =exp(eij)Pj0 exp(eij0)

Page 17: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Whatcana9en:ondo?‣ Learningtocopy—howmightthiswork?

Luongetal.(2015)

0 3 2 1

0 3 2 1

‣ LSTMcanlearntocountwiththerightweightmatrix

‣ Thisiseffec:velyposi:on-basedaddressing

Page 18: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Whatcana9en:ondo?‣ Learningtosubsampletokens

Luongetal.(2015)

0 3 2 1

3 1

‣ Needtocount(forordering)andalsodeterminewhichtokensarein/out

‣ Content-basedaddressing

Page 19: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

A9en:on

‣ Decoderhiddenstatesarenowmostlyresponsibleforselec:ngwhattoa9endto

‣ Doesn’ttakeacomplexhiddenstatetowalkmonotonicallythroughasentenceandspitoutword-by-wordtransla:ons

‣ Encoderhiddenstatescapturecontextualsourcewordiden:ty

Page 20: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

themoviewasgreat

BatchingA9en:on

Luongetal.(2015)

themoviewasgreat

tokenoutputs:batchsizexsentencelengthxdimension

sentenceoutputs:batchsizexhiddensize

<s>

hiddenstate:batchsizexdimension

eij = f(h̄i, hj)

↵ij =exp(eij)Pj0 exp(eij0)

a9en:onscores=batchsizexsentencelength

c=batchsizexhiddensize ci =X

j

↵ijhj

‣Makesuretensorsaretherightsize!

Page 21: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Alterna:ves‣Whendowecomputea9en:on?CancomputebeforeoraaerRNNcell

Bahdanauetal.(2015)

<s>

h̄1

c1

<s>

c1

Luongetal.(2015)

‣ AaerRNNcell

‣ BeforeRNNcell;thisoneisali9lemoreconvolutedandlessstandard

lele

Page 22: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Results‣Machinetransla:on:BLEUscoreof14.0onEnglish-German->16.8witha9en:on,19.0withsmartera9en:on(we’llcomebacktothislater)

Luongetal.(2015) Chopraetal.(2016) JiaandLiang(2016)

‣ Summariza:on/headlinegenera:on:bigramrecallfrom11%->15%

‣ Seman:cparsing:~30%accuracy->70+%accuracyonGeoquery

Page 23: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

CopyingInput/Pointers

Page 24: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

UnknownWords

Jeanetal.(2015),Luongetal.(2015)

‣Wanttobeabletocopynameden::eslikePont-de-Buis

1

P (yi|x, y1, . . . , yi�1) = softmax(W [ci; ¯hi])

froma9en:on fromRNNhiddenstate

‣ S:llcanonlygeneratefromthevocabulary

Page 25: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Copying

{ {thea

zebra

Pont-de-Buisecotax

‣ Vocabularycontains“normal”vocabaswellaswordsininput.Normalizesoverbothofthese:

‣ Bilinearfunc:onofinputrepresenta:on+outputhiddenstate

{P (yi = w|x, y1, . . . , yi�1) /expWw[ci; ¯hi]

h>j V h̄i

ifwinvocabifw=xj

Page 26: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

PointerNetworks

Vinyalsetal.(2015)

‣ Onlypointtotheinput,don’thaveanyno:onofvocabulary

‣ Usedfortasksincludingsummariza:onandsentenceordering

Page 27: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Results

JiaandLiang(2016)

‣ Inmanyse}ngs,a9en:oncanroughlydothesamethingsascopying

‣ Forseman:cparsing,copyingtokensfromtheinput(texas)canbeveryuseful

Page 28: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Transformers

Page 29: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Self-A9en:on

themoviewasgreat

‣ LSTMabstrac:on:mapseachvectorinasentencetoanew,context-awarevector

‣ CNNsdidsomethingsimilarwithfilters

‣ A9en:oncangiveusathirdwaytodothis

Vaswanietal.(2017)

Page 30: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Self-A9en:on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa9en:onovereachword

‣Mul:ple“heads”analogoustodifferentconvolu:onalfilters.UseparametersWkandVktogetdifferenta9en:onvalues+transformvectors

x4

x

04

scalar

vector=sumofscalar*vector

↵i,j = softmax(x

>i xj)

x

0i =

nX

j=1

↵i,jxj

↵k,i,j = softmax(x

>i Wkxj) x

0k,i =

nX

j=1

↵k,i,jVkxj

Page 31: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

DeepTransformers‣ Supervised:transformercanreplaceLSTM;willrevisitthiswhenwediscussMT

‣ Unsupervised:transformersworkbe9erthanLSTMforunsupervisedpre-trainingofembeddings:predictwordgivencontextwords

‣ Devlinetal.October11,2018“BERT:Pre-trainingofDeepBidirec:onalTransformersforLanguageUnderstanding”

‣ Strongerthansimilarmethods,SOTAon~11tasks(includingNER—92.8F1)

Page 32: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Takeaways‣ A9en:onisveryhelpfulforseq2seqmodels

‣ Usedfortasksincludingsummariza:onandsentenceordering

‣ Explicitlycopyinginputcanbebeneficialaswell

‣ Transformersarestrongmodelswe’llcomebacktolater

Page 33: CS388: Natural Language Processing Lecture 15: A9en:ongdurrett/courses/fa2018/lectures/lec15-1pp.pdfRegex Predic:on ‣ Can use for other seman:c parsing-like tasks ‣ Predict regex

Wherearewegoing‣We’venowtalkedaboutmostoftheimportantcoretoolsforNLP

‣ Restoftheclass:morefocusedonapplica:ons

‣ Informa:onextrac:on,thenMT,thenagrabbagofthings