segbot: a generic neural text segmentation model with

25
SegBot: A Generic Neural Text Segmentation Model with Pointer Network Jing Li, Aixin Sun and Shafiq Joty School of Computer Science and Engineering Nanyang Technological University (NTU), Singapore IJCAI-ECAI 2018 Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 1

Upload: others

Post on 11-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SegBot: A Generic Neural Text Segmentation Model with

SegBot: A Generic Neural Text Segmentation Modelwith Pointer Network

Jing Li, Aixin Sun and Shafiq Joty

School of Computer Science and EngineeringNanyang Technological University (NTU), Singapore

IJCAI-ECAI 2018

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 1

Page 2: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Text segmentation has been a fundamental task in NLP that has beenaddressed at different levels of granularity.

▶ Topic Segmentation

▶ Elementary Discourse Unit (EDU) Segmentation

[A person]EDU [who never made a mistake]EDU [never tried anything new]EDU

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 2

Page 3: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Text segmentation has been a fundamental task in NLP that has beenaddressed at different levels of granularity.

▶ Topic Segmentation

▶ Elementary Discourse Unit (EDU) Segmentation

[A person]EDU [who never made a mistake]EDU [never tried anything new]EDU

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 2

Page 4: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Text segmentation has been a fundamental task in NLP that has beenaddressed at different levels of granularity.

▶ Topic Segmentation

▶ Elementary Discourse Unit (EDU) Segmentation

[A person]EDU [who never made a mistake]EDU [never tried anything new]EDU

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 2

Page 5: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Unsupervised topic segmentation models exploit the strong correlationbetween topic and lexical usage. However, they fail to utilize trainingdata and unable to learn an accurate model.

✓Hearst et al., Texttiling: Segmenting text into multi-paragraph subtopicpassages. Computational linguistics, 1997.✓Choi et al., Advances in domain independent linear text segmentation.NAACL, 2000.

Supervised models are more flexible in using more features (e.g., cuephrases, length and similarity scores) and generally perform better thanunsupervised models, however, come with the price of efforts tomanually design informative features.

✓Hsueh et al., Automatic segmentation of multiparty dialogue. EACL, 2006.✓Joty et al., Codra: A novel discriminative framework for rhetorical analysis.Computational Linguistics, 2015.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 3

Page 6: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Unsupervised topic segmentation models exploit the strong correlationbetween topic and lexical usage. However, they fail to utilize trainingdata and unable to learn an accurate model.

✓Hearst et al., Texttiling: Segmenting text into multi-paragraph subtopicpassages. Computational linguistics, 1997.✓Choi et al., Advances in domain independent linear text segmentation.NAACL, 2000.

Supervised models are more flexible in using more features (e.g., cuephrases, length and similarity scores) and generally perform better thanunsupervised models, however, come with the price of efforts tomanually design informative features.

✓Hsueh et al., Automatic segmentation of multiparty dialogue. EACL, 2006.✓Joty et al., Codra: A novel discriminative framework for rhetorical analysis.Computational Linguistics, 2015.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 3

Page 7: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Due to the sparsity of ’yes’ boundary tags in EDU and topicsegmentation tasks, CRFs did not provide any additional gain oversimple classifiers like MaxEnt.

!! !!

"# "$

!!

"%

!!&'()*+

, -

./012&3/4253# 3$ 3%

The limitation of Sequence-to-sequence model is that the outputvocabulary (i.e.,, from which O1 and O2 are drawn) is fixed so that itneeds to train different models with respect to different vocabularies.

!! !! !! !!

"# "$ "%

&# &$

'()*+,"(-+.

/(01234

5301234

!!

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 4

Page 8: SegBot: A Generic Neural Text Segmentation Model with

Motivation

Due to the sparsity of ’yes’ boundary tags in EDU and topicsegmentation tasks, CRFs did not provide any additional gain oversimple classifiers like MaxEnt.

!! !!

"# "$

!!

"%

!!&'()*+

, -

./012&3/4253# 3$ 3%

The limitation of Sequence-to-sequence model is that the outputvocabulary (i.e.,, from which O1 and O2 are drawn) is fixed so that itneeds to train different models with respect to different vocabularies.

!! !! !! !!

"# "$ "%

&# &$

'()*+,"(-+.

/(01234

5301234

!!

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 4

Page 9: SegBot: A Generic Neural Text Segmentation Model with

SegBot: Neural Text Segmentation - Model Architecture

!!

"#

!!

"$

!!

"% !

!!

"&

!!

"' "

!!

"#

()*+,-./0,123.14*31)+

!!

"&#$%&$'

()*+,-./

!!

!!

!!

"5 (

()*+,-./

6+7),8.091,,8+023-382

:87),8.091,,8+023-382

!!

"5

()*+,-./

)&%*+*+,;0<8-798.0=).71+>)-.$;0?)@/04)*+,-./A20+81>94).

/+0

!!

Figure 1: The model architecture of SegBot. Input sequence: U0, U1, ..., U8.Identified boundaries: U3, U6, U8.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 5

Page 10: SegBot: A Generic Neural Text Segmentation Model with

SegBot: Neural Text Segmentation - Encoding Phase

RNN

U0

RNN

U1

RNN

U2 U3

RNN

U4

RNN

U5 U6

RNN

U0

RNN

U4Start:

RNN

RNN

RNN

U7 U8

Encoder hidden states

Decoder hidden states

RNN

U7

Training: Teacher forcingTest: Copy boundary’s neighbor

RNN

RNN

U0

RNN

U1

RNN

U2 U3

RNN

U4

RNN

U5 U6

RNN

U0

Boundary distribution

RNN

U4Start:

Boundary

RNN

RNN

RNN

U7 U8

Boundary

Encoder hidden states

Decoder hidden states

RNN

U7

Boundary

Training: Teacher forcingTest: Copy boundary’s neighbor

End

RNN

Boundary distribution

Boundary

Boundary

Boundary

End

zn = σ(Wzxn +Rzhn−1 + bz) (1)

rn = σ(Wrxn +Rrhn−1 + br) (2)

nn = tanh(Whxn +Rh(rn ⊙ hn−1) + bh) (3)

hn = zn ⊙ hn−1 + (1− zn)⊙ yn (4)

hn =−→h n ⊕

←−h n (5)

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 6

Page 11: SegBot: A Generic Neural Text Segmentation Model with

SegBot: Neural Text Segmentation - Decoding

RNN

U0

RNN

U1

RNN

U2 U3

RNN

U4

RNN

U5 U6

RNN

U0

RNN

U4Start:

RNN

RNN

RNN

U7 U8

Encoder hidden states

Decoder hidden states

RNN

U7

Training: Teacher forcingTest: Copy boundary’s neighbor

RNN

RNN

U0

RNN

U1

RNN

U2 U3

RNN

U4

RNN

U5 U6

RNN

U0

Boundary distribution

RNN

U4Start:

Boundary

RNN

RNN

RNN

U7 U8

Boundary

Encoder hidden states

Decoder hidden states

RNN

U7

Boundary

Training: Teacher forcingTest: Copy boundary’s neighbor

End

RNN

dm = GRU(xm,θ) (6)

In decoding phase, at each step, the decoder takes a start of a segment inthe input sequence as input.The decoder hidden state at a time step is computed by a unidirectionalRNN.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 7

Page 12: SegBot: A Generic Neural Text Segmentation Model with

SegBot: Neural Text Segmentation - Pointing

RNN

U0

RNN

U1

RNN

U2

RNN

U4

RNN

U5

RNN

Boundary distribution

RNN

Boundary

RNN

RNN

RNN

U7

Boundary

Encoder hidden states

Decoder hidden states

RNN

Boundary

Training: Teacher forcingTest: Copy boundary’s neighbor

End

RNN

U3 U6 U8 Start: U0 U4 U7

umj = vT tanh(W1hj +W2dm) , for j ∈ (m, . . . ,M) (7)

p(ym|xm) = softmax(um) (8)

where j ∈ [m,M ] indicates a possible position in the input sequence, andsoftmax normalizes umj indicating the probability that the unit Uj is aboundary given the start unit Um.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 8

Page 13: SegBot: A Generic Neural Text Segmentation Model with

SegBot: Neural Text Segmentation - Model Training

The loss function L is the negative log likelihood of boundary distributionover the whole training set D, and can be written as:

L(ω) =∑D

M∑m=1

− log p(ym|xm;ω) +λ

2||ω||22 (9)

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 9

Page 14: SegBot: A Generic Neural Text Segmentation Model with

Experiments

1. Topic Segmentation

2. Elementary Discourse Unit (EDU) Segmentation

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 10

Page 15: SegBot: A Generic Neural Text Segmentation Model with

Experiment: Topic Segmentation

Data: Choi Dataset

700 documents, each being a concatenation of 10 segments.

A segment of a document is the first n (s.t. 3 ≤ n ≤ 11, totally 4subsets) sentences of a randomly selected document from the Browncorpus.

Evaluation Metric: Pk

Pk =∑

1≤s≤t≤T

1(δtru(s, t) = δhyp(s, t)) (10)

The error metric Pk, which is the commonest metric to evaluate topicsegmentation models.Pk compares the inferred segmentation with gold-standard.Note that lower Pk means higher accuracy.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 11

Page 16: SegBot: A Generic Neural Text Segmentation Model with

Experiment: Topic Segmentation

Group Method Pk (%)

A

TextTiling [Hearst, 1997] 45.25C99 [Choi, 2000] 10.50U00 [Utiyama and Isahara, 2001] 7.75ADDP [Ji and Zha, 2003] 5.68TSM [Du et al., 2013] 0.92GraphSeg [Glavas et al., 2016] 6.64

B

TopSeg [Brants et al., 2002] 8.22F04 [Fragkou et al., 2004] 4.20M09 [Misra et al., 2009] 2.72SegBot (our model) 0.33

CTopicTiling [Riedl and Biemann, 2012] 0.88BiLSTM-CRF [Lample et al., 2016] 0.67SegBot (our model) 0.11*

Table 1: Segmentation results on Choi dataset. Significant improvement overBiLSTM-CRF is marked with * ( p-value < 0.01).

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 12

Page 17: SegBot: A Generic Neural Text Segmentation Model with

Experiment: Topic Segmentation

Ground boundaries

S8 S13 S16 S20 S25 S35 S38 S41 S52 S62P

red

icte

d b

ou

nd

arie

s S0→ S8

S9→ S13

S14→ S16

S17→ S20

S21→ S25

S26→ S35

S36→ S38

S39→ S41

S42→ S52

S53→ S620

0.2

0.4

0.6

0.8

⇓ For the sake of space, we show three segments from S14∼ S25 for illustration: S14 −→ S16; S17 −→ S20; S21 −→ S25

!"#$ %&" &'($ %)*++$, ", -+".*' &'($ +)$'+$/ *+ '0 1' 2$'),$/ /*($)0*", "3 ' 2$'),$/ %")2/45 1' .&',+'06 %*+& %&*7& 8")$

'#90$/ &*#0$2345 1' &"2*/'6 %"): 5 ' 0.",+',$"90 "($)32"% "3 *,+$22$7+9'2 &*;& 0.*)*+0 5 ' )$($2 "3 /$<'+$ 5 .')'/"= 5 7"#$/6 ',/

*,($,+*",4> ?*+& )$0.$7+ +" +&*0 (*$%5 +%" ."*,+0 ')$ %")+& #':*,;> @*)0+ 5 *+ '..$')0 +" <$ <'0$/ ", +&$ 3'7+ +&'+ ", *+0 +*+2$

.';$ -+".*' *0 /$07)*<$/ '0 13$0+*(9045 1 ;'64> A&$ 8*/%$0+ 5 "=*/'+*", .",/0 ')$ 90$/ $=+$,0*($26 3") +&$ +)$'+#$,+ "3 /"#$0+*7

0$%';$ 3)"# 09<9)<', ')$'0> A&$ &*;& 7"0+ "3 2',/ ',/ ' 3$% ".$)'+*",'2 .)"<2$#0 )$092+*,; 3)"# $=7$00*($ 2"'/*,;0 &'($

7)$'+$/ +&$ ,$$/ 3") ' %'0+$%'+$) +)$'+#$,+ 060+$# %*+& +&$ ".$)'+*",'2 7&')'7+$)*0+*70 "3 +&$ "=*/'+*", .",/ <9+ %*+& +&$ '<*2*+6

+" +)$'+ #")$ ");',*7 #'++$) .$) 9,*+ ("29#$> B$0$')7& '+ @'6$++$5 8*00"9)* ", "=*/'+*", .",/0 &'0 0&"%, +&'+ +&$ CDE *,

+&$ +)$'+$/ $3329$,+ (')*$/ 3)"# FG +" HF #; %*+& 2"'/*,;0 3)"# I +" JKG 2<> !*,7$ $=.$)*$,7$ *,/*7'+$0 +&'+ $3329$,+0 3)"#

"=*/'+*", .",/0 /" ,"+ 7)$'+$ #'L") .)"<2$#0 '+ +&$0$ CDE 7",7$,+)'+*",05 +&$ ;"'2 3") +&$ $3329$,+ M9'2*+6 "3 +&$ '77$2$)'+$/

+)$'+#$,+ 060+$# %'0 +&$ 0'#$ '0 3)"# 7",($,+*",'2 "=*/'+*", .",/0> N .)"+", #';,$+*7 )$0",',7$ 0+9/6 "3 ."267)60+'22*,$

N3L '0 ' 39,7+*", "3 #';,$+*7 3*$2/ ',/ +$#.$)'+9)$ *0 .)$0$,+$/> N3L *0 .')'#';,$+*75 ',/ $2$7+)", .')'#';,$+*7 /*."2$ '0 %$22

'0 ,972$') /*."2$ $33$7+0 2$'/ +" 2*,$ <)"'/$,*,;> A&$ 2*,$0 ')$ '06##$+)*7 ',/ "($) +&$ )',;$ "3 3*$2/ N3L ;'900 ',/ +$#.$)'+9)$

N3L +&$ '06##$+)6 *,7)$'0$0 %*+& *,7)$'0*,; N3L ',/ /$7)$'0*,; A> N, *0"+)".*7 )$0",',7$ 0&*3+ "3 N3L +" 2"%$) '..2*$/ 3*$2/0

*,/*7'+$0 ' %$': *0"+)".*7 &6.$)3*,$ 7",+'7+ *,+$)'7+*",> A&$ ;$,$)'2 +&$")6 "3 )$0",',7$ 0&*3+0 *0 90$/ +" /$)*($ ' ;$,$)'2

$=.)$00*", 3") +&$ 0$7",/ #"#$,+ N3L "3 ' ."267)60+'22*,$ .')'#';,$+*7 0'#.2$ ',/ *0 0.$7*'2*O$/ +" N3L>

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 13

Page 18: SegBot: A Generic Neural Text Segmentation Model with

Experiment: EDU Segmentation

Data: Rhetorical Structure Theory Discourse Treebank (RST-DT)

A training set of 347 articles (6,132 sentences)

A test set of 38 articles (991 sentences)

Evaluation Metric: P, R, F1

Following previous work [Hernault et al., 2010, Joty et al., 2015], wemeasure segmentation accuracy with respect to the sentence-internalsegmentation boundaries.

Precision =c

h, Recall =

c

g, andF–score =

2c

g + h(11)

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 14

Page 19: SegBot: A Generic Neural Text Segmentation Model with

Experiment: EDU Segmentation

Method Precision Recall F-score

HILDA [Hernault et al., 2010] 77.9 70.6 74.1SPADE [Soricut and Marcu, 2003] 83.8 86.8 85.2F&R [Fisher and Roark, 2007] 91.3 89.7 90.5DS [Joty et al., 2015] 88.0 92.3 90.1BiLSTM-CRF [Lample et al., 2016] 89.1 87.8 88.5SegBot (our model) 91.6* 92.8* 92.2*

Table 2: Segmentation results on RST-DT Dataset. Significant improvementsover BiLSTM-CRF is marked with * ( p-value < 0.01).

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 15

Page 20: SegBot: A Generic Neural Text Segmentation Model with

Experiment: EDU Segmentation

Gold boundariesSheraton and Pan Am said they are assuredunder the Soviet joint-venture law that they can repatriate profits from their hotel venture .

Pre

dic

ted b

oundaries

Sheraton→ said

they→ law

that→ .

0

0.2

0.4

0.6

0.8

Figure 2: Visualization of EDU segmentation.

Sheraton and Pan Am saidthey are assured under the Soviet joint-venture lawthat they can repatriate profits from their hotel venture.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 16

Page 21: SegBot: A Generic Neural Text Segmentation Model with

Web Application

Interface: http://138.197.118.157:8000/segbot/

Figure 3: Interface of SegBot for EDU Segmentation.

User API: http://138.197.118.157:8000/segbot/api/?q=import urllibresponse = urllib.request.urlopen(url).read()print(response)

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 17

Page 22: SegBot: A Generic Neural Text Segmentation Model with

Conclusion

SegBot, an end-to-end neural model for text segmentation at differentlevels of granularity.

SegBot does not need hand-crafted features or any prior knowledge ofthe given texts.

We have demonstrated the effectiveness of SEGBOT againststate-of-the-art solutions.

A web application.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 18

Page 23: SegBot: A Generic Neural Text Segmentation Model with

Q&A

[email protected]

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 19

Page 24: SegBot: A Generic Neural Text Segmentation Model with

Brants, T., Chen, F., and Tsochantaridis, I. (2002).

Topic-based document segmentation with probabilistic latent semantic analysis.In CIKM, pages 211–218.

Choi, F. Y. (2000).

Advances in domain independent linear text segmentation.In NAACL, pages 26–33.

Du, L., Buntine, W. L., and Johnson, M. (2013).

Topic segmentation with a structured topic model.In HLT-NAACL, pages 190–200.

Fisher, S. and Roark, B. (2007).

The utility of parse-derived features for automatic discourse segmentation.In ACL, volume 45, page 488.

Fragkou, P., Petridis, V., and Kehagias, A. (2004).

A dynamic programming algorithm for linear text segmentation.JIIS, 23(2):179–197.

Glavas, G., Nanni, F., and Ponzetto, S. P. (2016).

Unsupervised text segmentation using semantic relatedness graphs.In SEM@ACL.

Hearst, M. A. (1997).

Texttiling: Segmenting text into multi-paragraph subtopic passages.Computational linguistics, 23(1):33–64.

Hernault, H., Bollegala, D., and Ishizuka, M. (2010).

A sequential model for discourse segmentation.In CICLing, pages 315–326.

Ji, X. and Zha, H. (2003).

Domain-independent text segmentation using anisotropic diffusion and dynamic programming.In SIGIR, pages 322–329.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 19

Page 25: SegBot: A Generic Neural Text Segmentation Model with

Joty, S., Carenini, G., and Ng, R. T. (2015).

Codra: A novel discriminative framework for rhetorical analysis.Computational Linguistics, 41:385–435.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. (2016).

Neural architectures for named entity recognition.arXiv preprint arXiv:1603.01360.

Misra, H., Yvon, F., Jose, J. M., and Cappe, O. (2009).

Text segmentation via topic modeling: an analytical study.In CIKM, pages 1553–1556.

Riedl, M. and Biemann, C. (2012).

Topictiling: a text segmentation algorithm based on lda.In ACL, pages 37–42.

Soricut, R. and Marcu, D. (2003).

Sentence level discourse parsing using syntactic and lexical information.In NAACL, pages 149–156.

Utiyama, M. and Isahara, H. (2001).

A statistical model for domain-independent text segmentation.In ACL, pages 499–506.

Jing Li (NTU) SegBot: Neural Text Segmentation IJCAI-ECAI 2018 19