lili mou doublepower.mou@gmail › resource › discourse.pdf · l imdb (official split) & yelp...

52
Document Modeling and Discourse Analysis Lili Mou [email protected]

Upload: others

Post on 06-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Document Modeling and Discourse Analysis

Lili [email protected]

Page 2: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

l Task: Document­level sentiment analysisl Datasets: l IMDB (official split) & Yelp Dataset Challenge (4:1:1)

LONG

Page 3: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Architecture

l Sentence­level: CNN or LSTMl Document­level: GRU + Avg Pooling

Page 4: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Comparing with Other Methods

Page 5: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Model Analysis

Page 6: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Remarks

l Weird to use LSTM and GRU differently l          for sentence­level and document level modeling

l Consensus: LSTM ~ GRU

Page 7: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

l Task: To classify the relation between 2 sentences l (successive in a paragraph, possibly)

Arg1: Our competitions say we overbidArg2: Who carsLabel:  COMPARISON

l Other labels:  TEMPORAL , CONTINGENCY, EXPANSION

SHORT

Page 8: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 9: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Statistics

Page 10: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Results

Page 11: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Remarks

l Related topic: sentence pair modeling

l Related task: paragraph detection

l New dataset: A large annotated corpus for learning natural language inference, EMNLP, best data set or resource paper

Page 12: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Task: To classify the role of a sentence in an essay

Dataset: High school essays, annotated by two volunteers              NOT publicly available

SHORT

Page 13: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 14: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Approach

● Individual classifier for each sentence● Structure learning: Linear CRF

Page 15: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Local features

Page 16: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Cohesive Features● Define cohesive chains

– Identity chain: NER person, third­person resolution– Lexical chain: 

word2vec   cluster by threshold   → →

add a link chain between sentences that contain words in a cluster

● Distinguish two cohesive chains– Local chain: A chain connecting <=2 paragraphs– Global chain: >= 3 paragraphs

● Features: # of chains– global­identity, global lexical, global lexical, local lexical

Page 17: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 18: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

More Heuristic Features

● Global­title: (binary) – A sentence in a global chain AND Overlap the title?

● Chain interaction (2 binary features)– Two chains containing multiple sentences– Distinguish between GLOBAL and LOCAL interaction

● Strength features (for a sentence)– the number chains, the maximum and average number of 

covered sentences and paragraphs over chains

Page 19: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Experiments

● CRF > SVM● What is the baseline? SVM?

Page 20: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Model Analysis

● How is AUC computed?● To compute AUC, we need a 

hyperparmeter to balance 

between P and R

Page 21: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Dataset: Penn Discourse Treebank● Main finding:

Dense vec > sparse

● What special in DRR?

LONG

Page 22: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 23: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

(Computational psycholinguistics track)● Continuous relation: 

– E.g., causal, temporal succession, topic succession 

● Discontinuous relation:– E.g., contradiction

 ● I was tired, so I drank a cup of coffee. (continuous)● I drank a cup of coffee but I was still tired. (discontinuous)

SHORT

Page 24: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

●  Continuity hypothesis: – Sentences connected by continuous relations are easier to 

understand than ones connected by discontinuous relations

● The goal of the paper:– To propose a measure (markedness) to fit the continuity 

hypothesis

Page 25: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Re­weight the contribution of each discourse unit, based on its poistion in a dependency­like representation of the discourse structure

● Recursively propagate sentiment up through the RST parse (like RNN)

SHORT

Page 26: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 27: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Dataset

● Pang and Lee (2004)  ~2000 reviews● Socher et al. (2013) ~50,000 reviews

???

Page 28: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Discourse depth reweighting

● lambda_i: coefficient;   d_i: depth● The overall prediction

● w_i: BoW vector; theta=1 if postive, ­1, if negative

Page 29: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Rhetorical Recursive Neural Network

● Overall document representation

Long propagation path also addressed in the paper

Page 30: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Results

● Lesson: Combining the idea of RNN (or TBCNN) with traditional surface features

Page 31: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Bob gave Tina the burger.

She was hungry.● Bob gave Tina the burger.

He was hungry.

TACL

Page 32: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Architecture

Page 33: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Decision function (scoring function) 

– A(m, n): Aligned entity mentions (Why does it matter?)– Phi: Surface features– A, B: Low rank approximation

● Cost function: Hinge loss

Page 34: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Explicit relation v.s. Implicit relation

● Weak supervision: Use explicit data to train models for implicit data

● Poor performance: <== linguistically dissimilar (?)● The goal of this paper: Domain adaption

– Feature representation learning + Resampling 

SHORT

Page 35: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Learning robust features by Denoising AE

● x_tilde: corrupted features– Gaussian noise for continuous features– Drop out for binary features

● x: features, ~1e5 dimensions● W: prohibitively large

– Trick: kappa pivot features (Blitzer et al., 2006)

● Features (proposed in previous work)– Lexical– Syntactic– Others

Page 36: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Resampling with minimal supervision

● Matching the distribution

● Instance weighting– Require sampled instance having at least tau­cosine similarity 

with at least one sample in the target domain

Page 37: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Results

Page 38: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Generate a comparison story based on an ontology– Pattern: “the [predicate(s)] of [subject] (is/are) [object(s)]”

● Main idea: using discourse relations to improve discourse planning● Evaluation: Crowd­sourced human evalution

SHORT

Page 39: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Message (Multi­)graph

● Vertex: A message● Edge: A potential relation 

(annotated according to predicates)

● Goal: To hind a Hamiltonian path through the selected subgraph

Page 40: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

N­gram model on relations

● Choosing the order: 4 messages in total­­­tractable for even brute force search.

Page 41: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 42: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Results● Base: random order● PDTB: n­gram model

on PDTB● Wiki: n­gram model on

Wikipedia, annotated

by a discourse parser

● Equal: two models are 

the same in human eval

Page 43: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Goal: To encode a sentence, and to decode it 

ACL­LONG

Page 44: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 45: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Evaluation– BLEU, ROUGE, Coherence

Page 46: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:
Page 47: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

● Dataset: Rhetorical Structure Theory Discourse Treebank

(RST­DT)● 385 documents, 347 for training (5­fold), 49 for testing● Each doc represented as a tree

– Elementary Discourse Units (EDUs): Clauses– Relations: hypotactic v.s. paratactic 

Page 48: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

EDU Modeling

● Standard RAE

Page 49: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Discourse Parsing● 2­step strategy

– Binary classifier: To determine whether two adjacent text units should be merged to form a new subtree

– Multi­class classifier: To determine which relation

Page 50: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Inference

● Choose the parse tree with max. prob.● Dynamic programming, keeping 10 options at each time

Page 51: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

Wrap UpDiscourse Analysis– Document­level classification

● Traditional document classification    trivial→● Topic modeling   LDA or variants→● Document­level sentiment analysis   New research topic→● Shall we use discourse parse tree structures?

– Discourse relation classification● Sentence pair modeling● Paraphrase detection

– Discourse parsing (TODO)

Page 52: Lili Mou doublepower.mou@gmail › resource › discourse.pdf · l IMDB (official split) & Yelp Dataset Challenge (4:1:1) L O N G. Architecture l Sentencelevel: CNN or LSTM l Documentlevel:

TODO list

● Datasets– Penn Discourse Treebank– IMDB, Yelp

● Discourse parser– …