question answering and machine comprehension ... - minjoon …minjoon seo phd student computer...

Questionansweringandmachinecomprehensionwith

neuralattentionMinjoonSeoPhDStudent

ComputerScience&EngineeringUniversityofWashington

TwoEnd-to-EndQuestionAnsweringSystemswithNeuralAttention

• BidirectionalAttentionFlow(BiDAF)• OnStanfordQuestionAnsweringDatasetandCNN/DailyMail ClozeTest

• Query-ReductionNetworks(QRN)• OnbAbI QAanddialog, DSTC2 datasets

TwoQuestionAnsweringSystemswithNeuralAttention

• Query-ReductionNetworks(QRN)• OnbAbI QAanddialogdatasets

QuestionAnsweringTask(StanfordQuestionAnsweringDataset,2016)

Q:WhichNFLteamrepresentedtheAFCatSuperBowl50?

A:DenverBroncos

WhyNeuralAttention?

Q:WhichNFLteamrepresentedtheAFCatSuperBowl50?

Allowsadeeplearningarchitecturetofocusonthemostrelevantphraseofthecontexttothequery

inadifferentiablemanner.

OurModel:Bi-directionalAttentionFlow(BiDAF)

Attention

Modeling

MLP+softmax

𝑖" = 0 𝑖% = 1

BarakObamaisthepresidentoftheU.S. WholeadstheUnitedStates?

Attention

(Bidirectional)AttentionFlow

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Char/WordEmbeddingLayers

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

CharacterandWordEmbedding

• Wordembeddingisfragileagainstunseenwords• Charembeddingcan’teasilylearnsemanticsofwords• Useboth!

• CharembeddingasproposedbyKim(2015)

Seattle

SeattleCNN

+MaxPooling

concat

Embeddingvector

PhraseEmbeddingLayer

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

PhraseEmbeddingLayer• Inputs:thechar/wordembeddingofqueryandcontextwords• Outputs:wordrepresentationsawareoftheirneighbors(phrase-awarewords)

• ApplybidirectionalRNN(LSTM)forbothqueryandcontext

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Context Query

AttentionLayer

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

AttentionLayer

• Inputs:phrase-awarecontextandquerywords• Outputs:query-awarerepresentationsofcontextwords

• Context-to-queryattention:Foreach(phrase-aware)contextword,choosethemostrelevantwordfromthe(phrase-aware)querywords• Query-to-contextattention:Choosethecontextwordthatismostrelevanttoanyofquerywords.

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Context-to-QueryAttention(C2Q)

Q:WholeadstheUnitedStates?

C:BarakObamaisthepresidentoftheUSA.

Foreachcontextword,findthemostrelevantqueryword.

Query-to-ContextAttention(Q2C)

WhileSeattle’sweatherisveryniceinsummer,itsweatherisveryrainyinwinter,makingitoneofthemostgloomycitiesintheU.S.LAis…

Q:Whichcityisgloomyinwinter?

ModelingLayer

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

ModelingLayer

• Attentionlayer:modelinginteractionsbetweenqueryandcontext• Modelinglayer:modelinginteractionswithin(query-aware)contextwordsviaRNN(LSTM)

• Divisionoflabor:letattentionandmodelinglayerssolelyfocusontheirowntasks• Weexperimentallyshowthatthisleadstoabetterresultthanintermixingattentionandmodeling

OutputLayer

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Training

• Minimizesthenegativelogprobabilitiesofthetruestartindexandthetrueendindex

𝑦() Trueendindexofexamplei

𝑦(* Truestartindexofexamplei

𝐩) Probabilitydistributionofstopindex

𝐩* Probabilitydistributionofstartindex

Previouswork

• Usingneuralattentionasacontroller(Xiong etal.,2016)• UsingneuralattentionwithinRNN(Wang&Jiang,2016)• Mostoftheseattentionsareuni-directional

• BiDAF (ourmodel)• usesneuralattentionasalayer,• Isseparatedfrommodelingpart(RNN),• Isbidirectional

VGG-16

Modeling Layer

Output Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

Start End

h1 h2 hT

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

Context Query

WordEmbedding

GLOVE Char-CNN

CharacterEmbedding

g1 g2 gT

m1 m2 mT

BiDAF (ours)

ImageClassifierandBiDAF

StanfordQuestionAnsweringDataset(SQuAD)(Rajpurkar etal.,2016)

• MostpopulararticlesfromWikipedia• QuestionsandanswersfromTurkers• 90ktrain,10kdev,?test(hidden)• Answermustlieinthecontext• Twometrics:ExactMatch(EM)andF1

SQuAD Results(http://stanford-qa.com)asof12pmToday

SQuAD Results

1:Rajpurkar etal.(2016)2:Yuetal.(2016)3:Yangetal.(2016)4:Wang&Jiang(2016)6:Xiong etal.(2016)

EM F1Stanford1 (baseline) 40.4 51.0IBM2 62.5 71.0CMU3 62.5 73.3SingaporeManagement4 (ensemble) 67.9 77.0IBMResearch(ensemble) 68.2 77.2SalesforceResearch6 (ensemble) 71.6 80.4MicrosoftResearchAsia (ensemble) 72.1 79.7Ours (ensemble) 73.3 81.1

NoCharEmbedding NoWordEmbedding NoC2QAttention NoQ2CAttention DynamicAttention FullModel

Ablationsondevdata

InteractiveDemo

http://allenai.github.io/bi-att-flow/demo

AttentionVisualizations

There%are%13 natural%reserves%in%Warsaw%–among%others%,%Bielany Forest%,%KabatyWoods%,%Czerniaków Lake%.%About%15%kilometres (%9%miles%)%from%Warsaw%,%the%Vistula%river%'s%environment%changes%strikingly%and%features%a%perfectly%preserved%ecosystem%,%with%a%habitat%of%animals%that%includes%the%otter%,%beaver%and%hundreds%of%bird%species%.%There%are%also%several%lakes%in%Warsaw%– mainly%the%oxbow%lakes%,%like%Czerniaków Lake%,%the%lakes%in%the%Łazienkior%Wilanów Parks%,%Kamionek Lake%.%There%are%lot%of%small%lakes%in%the%parks%,%but%only%a%few%are%permanent%– the%majority%are%emptied%before%winter%to%clean%them%of%plants%and%sediments%.

Howmany

naturalreserves

arethere

inWarsaw

[]hundreds, few, among, 15, several, only, 13, 9natural, ofreservesare, are, are, are, are, includes[][]Warsaw, Warsaw, Warsawinter species

Super%Bowl%50%was%an%American%football%game%to%determine%the%champion%of%the%National%Football%League%(%NFL%)%for%the%2015%season%.%The%American%Football%Conference%(%AFC%)%champion% Denver%Broncos%defeated%the%National%Football%Conference%(%NFC%)%champion%Carolina%Panthers%24–10%to%earn%their%third%Super%Bowl%title%.%The%game%was%played%on%February%7%,%2016%,%at%Levi%'s%Stadium%in%the%San%Francisco%Bay%Area%at%Santa%Clara%,%California .%As%this%was%the%50th%Super%Bowl%,%the%league%emphasized%the%"%golden%anniversary%"%with%various%goldZthemed%initiatives%,%as%well%as%temporarily%suspending%the%tradition%of%naming%each%Super%Bowl%game%with%Roman%numerals%(%under%which%the%game%would%have%been%known%as%"%Super%Bowl%L%"%)%,%so%that%the%logo%could%prominently%feature%the%Arabic%numerals%50%.

at, the, at, Stadium, Levi, in, Santa, Ana

Super, Super, Super, Super, Super

Bowl, Bowl, Bowl, Bowl, Bowl

initiatives

EmbeddingVisualizationatWordvsPhraseLayers

January

September

August

effect and may result in

the state may not aid

of these may be more

Opening in May 1852 at

debut on May 5 ,

from 28 January to 25

but by September had been

Howdoesitcomparewithfeature-basedmodels?

CNN/DailyMail ClozeTest(Hermannetal.,2015)

• ClozeTest(PredictingMissingwords)• ArticlesfromCNN/DailyMail• Human-writtensummaries• Missingwordsarealwaysentities• CNN– 300karticle-querypairs• DailyMail – 1Marticle-querypairs

CNN/DailyMail ClozeTestResults

SomelimitationsofSQuAD

TwoQuestionAnsweringSystemswithNeuralAttention

• Query-ReductionNetworks(QRN)• OnbAbI QAanddialogdatasets

ReasoningQuestionAnswering

DialogSystem

U:CanyoubookatableinRomeinItalianCuisine

S:Howmanypeopleinyourparty?

U:Forfourpeopleplease.

S:Whatpricerangeareyoulookingfor?

DialogtaskvsQA

• DialogsystemcanbeconsideredasQAsystem:• Lastuser’sutteranceisthequery• Allpreviousconversationsarecontexttothequery• Thesystem’snextresponseistheanswertothequery

• Posesafewuniquechallenges• Dialogsystemrequirestrackingstates• Dialogsystemneedstolookatmultiplesentencesintheconversation• Buildingend-to-enddialogsystemismorechallenging

Ourapproach:Query-Reduction

<START>Sandragottheapplethere.Sandradroppedtheapple.Danieltooktheapplethere.Sandrawenttothehallway.Danieljourneyedtothegarden.

Q:Whereistheapple?

Reducedquery:

Whereistheapple?WhereisSandra?WhereisSandra?WhereisDaniel?WhereisDaniel?WhereisDaniel?à garden

A:garden

Query-ReductionNetworks• Reducethequeryintoaneasier-to-answerqueryoverthesequenceofstate-changingtriggers(sentences),invectorspace

Sandragottheapplethere.

Where isSandra?

Sandradroppedtheapple

Danieltooktheapplethere.

Where isDaniel?

Sandrawenttothehallway.

Where isDaniel?

Danieljourneyedtothegarden.

%($ → *+

Where isDaniel?

Whereistheapple?

garden

Where isSandra?

∅ ∅ ∅ ∅

QRNCell

𝛼 𝜌

1 − ×

𝐱𝑡 𝐪𝑡

𝐡𝑡−1 𝐡𝑡

𝐳𝑡 𝐡𝑡

sentence query

reducedquery(hiddenstate)

updategatecandidatereducedquery

updatefunc reductionfunc

CharacteristicsofQRN

• Updategatecanbeconsideredaslocalattention• QRNchoosestoconsider/ignoreeachcandidatereducedquery• Thedecisionismadelocally(asopposedtoglobalsoftmax attention)

• SubclassofRecurrentNeuralNetwork(RNN)• Twoinputs,hiddenstate,gatingmechanism• Abletohandlesequentialdependency(attentioncannot)

• Simplerrecurrentupdateenablesparallelization overtime• Candidatehiddenstate(reducedquery)iscomputedfrominputsonly• Hiddenstatecanbeexplicitlycomputedasafunctionofinputs

Parallelizationcomputedfrominputsonly,socanbetriviallyparallelized

Canbeexplicitlyexpressedasthegeometricsumofpreviouscandidatehiddenstates

Parallelization

CharacteristicsofQRN

• Updategatecanbeconsideredaslocalattention• SubclassofRecurrentNeuralNetwork(RNN)• Simplerrecurrentupdateenablesparallelization overtime

QRNsitsbetweenneuralattentionmechanismandrecurrentneuralnetworks,takingtheadvantageofbothparadigms.

bAbI QADataset

• 20 differenttasks• 1kstory-questionpairsforeachtask(10kalsoavailable)• Syntheticallygenerated• Manyquestionsrequirelookingatmultiplesentences• Forend-to-endsystemsupervisedbyanswersonly

What’sdifferentfromSQuAD?

• Synthetic• Morethanlexical/syntacticunderstanding• Differentkindsofinferences• induction,deduction,counting,pathfinding,etc.

• Reasoningovermultiplesentences• InterestingtestbedtowardsdevelopingcomplexQAsystem(anddialogsystem)

bAbI QAResults(1k)

LSTM DMN+ MemN2N GMemN2N QRN(Ours)

AvgError(%)

bAbI QAResults(10k)

MemN2N DNC GMemN2N DMN+ QRN(Ours)

AvgError(%)

DialogDatasets

• bAbI DialogDataset• Synthetic• 5differenttasks• 1kdialogsforeachtask

• DSTC2*Dataset• Realdataset• EvaluationmetricisdifferentfromoriginalDSTC2:responsegenerationinsteadof“state-tracking”• Eachdialogis800+utterances• 2407possibleresponses

bAbI DialogResults(OOV)

MemN2N GMemN2N QRN(Ours)

AvgError(%)

DSTC2*DialogResults

MemN2N GMemN2N QRN(Ours)

AvgError(%)

bAbI QAVisualization

𝑧- = Localattention(updategate)atlayerl

DSTC2(Dialog)Visualization

𝑧- = Localattention(updategate)atlayerl

Conclusion

• PresentedtwonovelapproachesforQAtasksusingneuralattention

• BidirectionalAttentionFlow:usingattentionasalayer,onbothdirections(contexttoquery,querytocontext)• Query-reductionNetworks:asequentialmodelthattakesadvantageofbothattentionandRNNforreasoningovermultiplesentences

Thanks!

Whydoweneedattention?

• RNNhaslong-termdependencyproblem• Vanishinggradients(Pascanu etal.,2013)• Inherentlyunstableoveralongperiodoftime(Westonetal.,2016)

• Attentionprovidesshortcutaccesstorelevantinformation• Directlyretrievesthecontextvectorfromadistantlocation

• Criticaltomostmodernsequencemodels• Machinetranslation• Questionanswering,machinecomprehension

NeuralAttentioninSequenceModeling

(Bahdanau etal.,2015)

• ApplyRNNoncontextvectors• ApplyRNNonqueryvectors• Ateachtimestep,useneuralattentiontosoft-selectasinglecontextvector• Usetheselectedcontextvector,alongwithcurrentqueryvectorandcurrenthiddenstate,toobtainthenexthiddenstate

question answering and machine comprehension ... - minjoon …minjoon seo phd student computer...

Documents

image question answering: a visual semantic embedding...

question answering tutorial

question answering

question answering and dialog systems - ut · factoid qa...

answering question 1

question answering and summarization

holding meeting & answering question

question answering question answering available from:...

question answering question answering question answering ...

answering the analysis question

arabic question answering

question classification in question answering...

question-answering: systems & resources

answering core question

natural language question answering

minjoon portfolio3 (f/w)

question answering - iro.umontreal.ca

teknik of answering question

minjoon portfolio2 (s/s)

answering pharaoh’s question