leveraging discourse information effectively for
TRANSCRIPT
![Page 1: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/1.jpg)
Leveraging discourse information effectively for
authorship attributionElisa Ferracane, Su Wang, Raymond J. Mooney
University of Texas at Austin
![Page 2: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/2.jpg)
2
Task
• Authorship Attribution: identify the author of a text, given a set of author-labeled training texts.
![Page 3: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/3.jpg)
3
Authorship Attribution• Neural networks (e.g., character-level CNNs) have proven
very powerful…
• capture stylometric cues at the surface level
“My very photogenic mother died in a freak accident (picnic, lightning) when I was three...”
“But what principally attracted attention of Nicholas, was the old gentleman’s eye… Grafted upon the quaintness and oddity of his appearance, was something…”
Lolita, Nabokov
Nichola Nickleby, Dickens
![Page 4: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/4.jpg)
4
Authorship Attribution
• Authors also have particular rhetorical styles…
• But how do you incorporate discourse into a neural net?
![Page 5: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/5.jpg)
5
Our Contributions
1) How can you featurize discourse information?
2) How can you integrate discourse information into the network?
3) Can discourse help in SOTA model (bigram character CNN)?
![Page 6: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/6.jpg)
Q1: How can you featurize discourse information?
6
• Use an entity grid model (Barzilay & Lapata, 2008) with either:
• grammatical relations, or
• RST discourse relations
![Page 7: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/7.jpg)
Q1: How can you featurize discourse information?
7
(1) My father was a clergyman of the north of England, who was deservedly respected by all who knew him; and, in his younger days, lived pretty comfortably on the joint income of a small incumbency and a snug little property of his own.
(2) My mother, who married him against the wishes of her friends, was a squire’s daughter, and a woman of spirit.
(3) In vain it was represented to her, that if she became the poor parson’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 8: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/8.jpg)
Q1: How can you featurize discourse information?
8
(1) My father was a clergyman of the north of England, who was deservedly respected by all who knew him; and, in his younger days, lived pretty comfortably on the joint income of a small incumbency and a snug little property of his own.
(2) My mother, who married him against the wishes of her friends, was a squire’s daughter, and a woman of spirit.
(3) In vain it was represented to her, that if she became the poor parson’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 9: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/9.jpg)
Q1: How can you featurize discourse information?
9
(1) My father was a clergyman of the north of England, who was deservedly respected by all who knew him; and, in his younger days, lived pretty comfortably on the joint income of a small incumbency and a snug little property of his own.
(2) My mother, who married him against the wishes of her friends, was a squire’s daughter, and a woman of spirit.
(3) In vain it was represented to her, that if she became the poor parson’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 10: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/10.jpg)
10
Q1: How can you featurize discourse information?
(1)
(2)
(3)
father mother
Barzilay and Lapata (2008)
row: sentencecolumn: salient entity
![Page 11: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/11.jpg)
Q1: How can you featurize discourse information?
11
(1) [My father]SUBJECT was a clergyman of the north of England, who was deservedly respected by all who knew him; and, in his younger days, lived pretty comfortably on the joint income of a small incumbency and a snug little property of his own.
(2) [My mother]SUBJECT, who married [him]OBJECT against the wishes of her friends, was a squire’s daughter, and a woman of spirit.
(3) In vain it was represented to her, that if [she]SUBJECT became the [poor parson]OTHER’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 12: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/12.jpg)
Q1: How can you featurize discourse information?
12
(1) [My father]SUBJECT was a clergyman of the north of England, who was deservedly respected by all who knew him; and, in his younger days, lived pretty comfortably on the joint income of a small incumbency and a snug little property of his own.
(2) [My mother]SUBJECT, who married [him]OBJECT against the wishes of her friends, was a squire’s daughter, and a woman of spirit.
(3) In vain it was represented to her, that if [she]SUBJECT became the [poor parson]OTHER’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 13: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/13.jpg)
13
Q1: How can you featurize discourse information?
(1) S -
(2) O S
(3) X S
father mother
Grammatical relations
Barzilay and Lapata (2008)
![Page 14: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/14.jpg)
Q1: How can you featurize discourse information?
14
• Discourse relations:• Rhetorical Structure Theory (RST)
• Divide a document into elementary discourse units (EDUs), usually clauses
• Organize EDUs into a tree structure:• edges are discourse relation types• node in a relation can be either the nucleus (more
“important”) or satellite
![Page 15: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/15.jpg)
Q1: How can you featurize discourse information?
15
if she became the poor parson’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 16: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/16.jpg)
Q1: How can you featurize discourse information?
16
if she became the poor parson’s wife, she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence; which to her were little less than the necessaries of life.
![Page 17: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/17.jpg)
Q1: How can you featurize discourse information?
17
if she became the poor parson’s wife,
she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence;
which to her were little less than the necessaries of life.
![Page 18: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/18.jpg)
Q1: How can you featurize discourse information?
18
if she became the poor parson’s wife,
she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence;
which to her were little less than the necessaries of life.
condition-ncondition-s
![Page 19: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/19.jpg)
Q1: How can you featurize discourse information?
19
if she became the poor parson’s wife,
which to her were little less than the necessaries of life.
condition-ncondition-s
she must relinquish her carriage and her lady’s-maid, and all the luxuries and elegancies of affluence;
interpretation-sinterpretation-n
![Page 20: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/20.jpg)
Q1: How can you featurize discourse information?
20
![Page 21: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/21.jpg)
21
Q1: How can you featurize discourse information?
(1)
background.N, TopicShift,
elaboration.S, background.S
-
(2) elaboration.Selaboration.N,
circumstance.N, TopicShift
(3) condition.Nattribution.S, condition.N,
interpretation.S
father mother
RST discourse relations
Feng and Hirst (2014)
![Page 22: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/22.jpg)
22
Q2: How can you integrate discourse information into the network?
• Use probability vector
• Use embeddings!
![Page 23: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/23.jpg)
Q2: How can you integrate discourse information into the network?
CNN without discourse
Ruder et al., 2016; Shrestha et al., 2017, Sari et al., 2017
![Page 24: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/24.jpg)
Q2: How can you integrate discourse information into the network?
CNN with discourse probability vector
![Page 25: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/25.jpg)
Q2: How can you integrate discourse information into the network?
CNN with discourse embeddings
![Page 26: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/26.jpg)
Q2: How can you integrate discourse information into the network?
• Use embeddings
• Local vs. Global
• Local: how are entities changing across contiguous sentences?
• Global: how is each entity changing across a document?
![Page 27: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/27.jpg)
27
father mother
Sequence:
Q2: How can you integrate discourse information into the network?
Local: by contiguous sentences
(1) S -
(2) O S
(3) X S
1 2
3 4
so, -s, ox, ss
![Page 28: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/28.jpg)
28
father mother
Sequence:
Q2: How can you integrate discourse information into the network?
Global: by entity
(1) S -
(2) O S
(3) X S
1 3
2 4
so,ox, -s, ss
![Page 29: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/29.jpg)
29
Datasets
Dataset # authors mean words/auth
mean words/text
IMDB62 62 349,004 349
Novel-50 50 709,880 2,000
![Page 30: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/30.jpg)
Results
30
1) How to featurize?grammatical relations
vs.RST discourse relations
F1
90
92.5
95
97.5
100
IMDB Novel-50
grammatical relationsRST discourse relations
![Page 31: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/31.jpg)
Results
31
1) How to featurize?grammatical relations
vs.RST discourse relations
F1
90
92.5
95
97.5
100
IMDB Novel-50
grammatical relationsRST discourse relations
![Page 32: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/32.jpg)
Results
32
2) How to integrate?probability vector
vs. discourse embedding
F1
90
92.5
95
97.5
100
IMDB Novel-50
probability vectordiscourse embedding
![Page 33: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/33.jpg)
Results
33
2) How to integrate?probability vector
vs. discourse embedding
F1
90
92.5
95
97.5
100
IMDB Novel-50
probability vectordiscourse embedding
![Page 34: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/34.jpg)
Results
34
2) How to integrate?localvs.
global
F1
91
93.25
95.5
97.75
100
IMDB Novel-50
local global
![Page 35: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/35.jpg)
Results
35
2) How to integrate?localvs.
global
F1
91
93.25
95.5
97.75
100
IMDB Novel-50
local global
![Page 36: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/36.jpg)
Results
36
3) Does discourse help?
It depends…
F1
90
92.5
95
97.5
100
IMDB
No discourse baselineprobability vectordiscourse embedding (gr. rels.)discourse embedding (RST)
![Page 37: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/37.jpg)
Results
37
3) Does discourse help?
Yes!
F1
95
96.25
97.5
98.75
100
Novel-50
No discourse baselineprobability vectordiscourse embedding (gr. rels.)discourse embedding (RST)
![Page 38: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/38.jpg)
Error Analysis• The least-represented author (Ambrose Bierce) obtains the
biggest improvement from discourse:
—Discourse feature is more robust with smaller, fewer samples compared to character bigrams
• Two authors who gained large improvements from discourse wrote a variety of genres (e.g., both supernatural horror and love stories)
—Character bigrams can’t generalize well to the different vocabularies, but discourse captures the similar rhetorical style
38
![Page 39: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/39.jpg)
Conclusion• Discourse improves authorship attribution over a
strong baseline of character-level CNN
• Embeddings of RST discourse relations at the global level perform the best
• Works better on longer documents
39
![Page 40: Leveraging discourse information effectively for](https://reader031.vdocument.in/reader031/viewer/2022020621/61e985f90e3bb555092be597/html5/thumbnails/40.jpg)
Thank you!
Leveraging discourse information effectively for authorship
attribution