improving human text comprehension through …improving human text comprehension through semi-markov...

30
Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck Dernoncourt @SebGehr [email protected]

Upload: others

Post on 11-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Improving Human Text Comprehension through Semi-Markov CRF-based

Neural Section Title Generation

Sebastian Gehrmann, Steven Layne, Franck Dernoncourt

@[email protected]

Page 2: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Long texts are hard to comprehend

When another old cave is discovered in the south of France, it is not usually news. Rather, it is an ordinary event. Such discoveries are so frequent these days that hardly anybody pays heed to them. However, when the Lascaux cave complex was discovered in 1940, the world was amazed. Painted directly on its walls were hundreds of scenes showing how people lived thousands of years ago. The scenes show people hunting animals, such as bison or wild cats. Other images depict birds and, most noticeably, horses, which appear in more than 300 wall images, by far outnumbering all other animals.

Early artists drawing these animals accomplished a monumental and difficult task. They did not limit themselves to the easily accessible walls but carried their painting materials to spaces that required climbing steep walls or crawling into narrow passages in the Lascaux complex. Unfortunately, the paintings have been exposed to the destructive action of water and temperature changes, which easily wear the images away. Because the Lascaux caves have many entrances, air movement has also damaged the images inside. Although they are not out in the open air, where natural light would have destroyed them long ago, many of the images have deteriorated and are barely recognizable.

To prevent further damage, the site was closed to tourists in 1963, 23 years after it was discovered.

Page 3: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Can Summaries Help?

When another old cave is discovered in the south of France, it is not usually news. Rather, it is an ordinary event. Such discoveries are so frequent these days that hardly anybody pays heed to them. However, when the Lascaux cave complex was discovered in 1940, the world was amazed. Painted directly on its walls were hundreds of scenes showing how people lived thousands of years ago. The scenes show people hunting animals, such as bison or wild cats. Other images depict birds and, most noticeably, horses, which appear in more than 300 wall images, by far outnumbering all other animals.

Early artists drawing these animals accomplished a monumental and difficult task. They did not limit themselves to the easily accessible walls but carried their painting materials to spaces that required climbing steep walls or crawling into narrow passages in the Lascaux complex. Unfortunately, the paintings have been exposed to the destructive action of water and temperature changes, which easily wear the images away. Because the Lascaux caves have many entrances, air movement has also damaged the images inside. Although they are not out in the open air, where natural light would have destroyed them long ago, many of the images have deteriorated and are barely recognizable.

To prevent further damage, the site was closed to tourists in 1963, 23 years after it was discovered.

Lascaux cave complex discovered

Paintings exposed to destructive action

Site closed to tourists

Page 4: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

We hypothesize that summaries, presented as section titles, can improve the fact retention, fact retrieval and text comprehension.

Dooling et al. (1971), Kintsch et al. (1978), Smith et al. (1992)

Goals

(1) A low-resource approach to generating section titles(2) An evaluation framework for comprehension tasks

Page 5: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Section Titles in Two Steps

argmaxsent 2 para

saliency(sent)<latexit sha1_base64="4BueSHlBjyT+AG9b7qJ7aMVRUNA=">AAACN3icbVBNSwMxEM36bf2qevQSLIJeyq4KehS9eBIF2wrdUrLptAaT7JLMSsuy/iov/g1vevGgiFf/gWm7B78GAm/em8dkXpRIYdH3n7yJyanpmdm5+dLC4tLySnl1rW7j1HCo8VjG5ipiFqTQUEOBEq4SA0xFEhrRzclQb9yCsSLWlzhIoKVYT4uu4Awd1S6fhanuOB0wCxH6mFnQmN+FQt+N+4QZlueFyExPsX6eh3TUG5VZJgVoPsi3h8addrniV/1R0b8gKECFFHXeLj+GnZinypm5ZNY2Az/BlluEgkvIS2FqIWH8hvWg6aBmCmwrG92d0y3HdGg3Nu5ppCP2uyNjytqBitykYnhtf2tD8j+tmWL3sJUJnaTojhsv6qaSYkyHIdKOMMBRDhxg3Aj3V8qvXVAcXZQlF0Lw++S/oL5bDfaquxf7laPjIo45skE2yTYJyAE5IqfknNQIJ/fkmbySN+/Be/HevY/x6IRXeNbJj/I+vwDIx7Cg</latexit>

xi8(x1, y1), . . . , (xn, yn) i↵ yi = 1<latexit sha1_base64="XsZmpryMYguDk2Fkozp56rQsf78=">AAACJHicbVDLSsNAFJ3UV62vqks3g0VQKCWpgoIIohuXFawVmhIm00k7dDIJMzfSEOq/uPFX3LjwgQs3fovTmoVaDwwczrmHuff4seAabPvDKszMzs0vFBdLS8srq2vl9Y1rHSWKsiaNRKRufKKZ4JI1gYNgN7FiJPQFa/mD87HfumVK80heQRqzTkh6kgecEjCSVz4eehy7QaSIEHh36DnV1HP2qtgV3Qh0dSxJI8m9OxfYEDIeBKO71GROsOOVK3bNngBPEycnFZSj4ZVf3W5Ek5BJoIJo3XbsGDoZUcCpYKOSm2gWEzogPdY2VJKQ6U42OXKEd4zSxWZT8yTgifozkZFQ6zT0zWRIoK//emPxP6+dQHDUybiME2CSfn8UJAJDhMeN4S5XjIJIDSFUcbMrpn2iCAXTa8mU4Pw9eZpc12vOfq1+eVA5PcvrKKIttI12kYMO0Sm6QA3URBTdo0f0jF6sB+vJerPev0cLVp7ZRL9gfX4BF0SjQg==</latexit>

(1) Select the most important Sentence

(2) Compress the sentence

When another old cave is discovered in the south of France, it is not usually news. Rather, it is an ordinary event. Such discoveries are so frequent these days that hardly anybody pays heed to them. However, when the Lascaux cave complex was discovered in 1940, the world was amazed. Painted directly on its walls were hundreds of scenes showing how people lived thousands of years ago. The scenes show people hunting animals, such as bison or wild cats. Other images depict birds and, most noticeably, horses, which appear in more than 300 wall images, by far outnumbering all other animals.

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Lascaux cave complex discovered

Page 6: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Section Titles in Two Steps

argmaxsent 2 para

saliency(sent)<latexit sha1_base64="4BueSHlBjyT+AG9b7qJ7aMVRUNA=">AAACN3icbVBNSwMxEM36bf2qevQSLIJeyq4KehS9eBIF2wrdUrLptAaT7JLMSsuy/iov/g1vevGgiFf/gWm7B78GAm/em8dkXpRIYdH3n7yJyanpmdm5+dLC4tLySnl1rW7j1HCo8VjG5ipiFqTQUEOBEq4SA0xFEhrRzclQb9yCsSLWlzhIoKVYT4uu4Awd1S6fhanuOB0wCxH6mFnQmN+FQt+N+4QZlueFyExPsX6eh3TUG5VZJgVoPsi3h8addrniV/1R0b8gKECFFHXeLj+GnZinypm5ZNY2Az/BlluEgkvIS2FqIWH8hvWg6aBmCmwrG92d0y3HdGg3Nu5ppCP2uyNjytqBitykYnhtf2tD8j+tmWL3sJUJnaTojhsv6qaSYkyHIdKOMMBRDhxg3Aj3V8qvXVAcXZQlF0Lw++S/oL5bDfaquxf7laPjIo45skE2yTYJyAE5IqfknNQIJ/fkmbySN+/Be/HevY/x6IRXeNbJj/I+vwDIx7Cg</latexit>

xi8(x1, y1), . . . , (xn, yn) i↵ yi = 1<latexit sha1_base64="XsZmpryMYguDk2Fkozp56rQsf78=">AAACJHicbVDLSsNAFJ3UV62vqks3g0VQKCWpgoIIohuXFawVmhIm00k7dDIJMzfSEOq/uPFX3LjwgQs3fovTmoVaDwwczrmHuff4seAabPvDKszMzs0vFBdLS8srq2vl9Y1rHSWKsiaNRKRufKKZ4JI1gYNgN7FiJPQFa/mD87HfumVK80heQRqzTkh6kgecEjCSVz4eehy7QaSIEHh36DnV1HP2qtgV3Qh0dSxJI8m9OxfYEDIeBKO71GROsOOVK3bNngBPEycnFZSj4ZVf3W5Ek5BJoIJo3XbsGDoZUcCpYKOSm2gWEzogPdY2VJKQ6U42OXKEd4zSxWZT8yTgifozkZFQ6zT0zWRIoK//emPxP6+dQHDUybiME2CSfn8UJAJDhMeN4S5XjIJIDSFUcbMrpn2iCAXTa8mU4Pw9eZpc12vOfq1+eVA5PcvrKKIttI12kYMO0Sm6QA3URBTdo0f0jF6sB+vJerPev0cLVp7ZRL9gfX4BF0SjQg==</latexit>

(1) Select the most important Sentence

(2) Compress the sentence

When another old cave is discovered in the south of France, it is not usually news. Rather, it is an ordinary event. Such discoveries are so frequent these days that hardly anybody pays heed to them. However, when the Lascaux cave complex was discovered in 1940, the world was amazed. Painted directly on its walls were hundreds of scenes showing how people lived thousands of years ago. The scenes show people hunting animals, such as bison or wild cats. Other images depict birds and, most noticeably, horses, which appear in more than 300 wall images, by far outnumbering all other animals.

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Lascaux cave complex discovered

Page 7: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Section Titles in Two Steps

argmaxsent 2 para

saliency(sent)<latexit sha1_base64="4BueSHlBjyT+AG9b7qJ7aMVRUNA=">AAACN3icbVBNSwMxEM36bf2qevQSLIJeyq4KehS9eBIF2wrdUrLptAaT7JLMSsuy/iov/g1vevGgiFf/gWm7B78GAm/em8dkXpRIYdH3n7yJyanpmdm5+dLC4tLySnl1rW7j1HCo8VjG5ipiFqTQUEOBEq4SA0xFEhrRzclQb9yCsSLWlzhIoKVYT4uu4Awd1S6fhanuOB0wCxH6mFnQmN+FQt+N+4QZlueFyExPsX6eh3TUG5VZJgVoPsi3h8addrniV/1R0b8gKECFFHXeLj+GnZinypm5ZNY2Az/BlluEgkvIS2FqIWH8hvWg6aBmCmwrG92d0y3HdGg3Nu5ppCP2uyNjytqBitykYnhtf2tD8j+tmWL3sJUJnaTojhsv6qaSYkyHIdKOMMBRDhxg3Aj3V8qvXVAcXZQlF0Lw++S/oL5bDfaquxf7laPjIo45skE2yTYJyAE5IqfknNQIJ/fkmbySN+/Be/HevY/x6IRXeNbJj/I+vwDIx7Cg</latexit>

xi8(x1, y1), . . . , (xn, yn) i↵ yi = 1<latexit sha1_base64="XsZmpryMYguDk2Fkozp56rQsf78=">AAACJHicbVDLSsNAFJ3UV62vqks3g0VQKCWpgoIIohuXFawVmhIm00k7dDIJMzfSEOq/uPFX3LjwgQs3fovTmoVaDwwczrmHuff4seAabPvDKszMzs0vFBdLS8srq2vl9Y1rHSWKsiaNRKRufKKZ4JI1gYNgN7FiJPQFa/mD87HfumVK80heQRqzTkh6kgecEjCSVz4eehy7QaSIEHh36DnV1HP2qtgV3Qh0dSxJI8m9OxfYEDIeBKO71GROsOOVK3bNngBPEycnFZSj4ZVf3W5Ek5BJoIJo3XbsGDoZUcCpYKOSm2gWEzogPdY2VJKQ6U42OXKEd4zSxWZT8yTgifozkZFQ6zT0zWRIoK//emPxP6+dQHDUybiME2CSfn8UJAJDhMeN4S5XjIJIDSFUcbMrpn2iCAXTa8mU4Pw9eZpc12vOfq1+eVA5PcvrKKIttI12kYMO0Sm6QA3URBTdo0f0jF6sB+vJerPev0cLVp7ZRL9gfX4BF0SjQg==</latexit>

(1) Select the most important Sentence

(2) Compress the sentence

When another old cave is discovered in the south of France, it is not usually news. Rather, it is an ordinary event. Such discoveries are so frequent these days that hardly anybody pays heed to them. However, when the Lascaux cave complex was discovered in 1940, the world was amazed. Painted directly on its walls were hundreds of scenes showing how people lived thousands of years ago. The scenes show people hunting animals, such as bison or wild cats. Other images depict birds and, most noticeably, horses, which appear in more than 300 wall images, by far outnumbering all other animals.

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Lascaux cave complex discovered

Page 8: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Compressive Deletion

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

0 0 0 0 0 0 0 0 0 01 1 1 1y :

x :

Page 9: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Page 10: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Page 11: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Page 12: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Page 13: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

Page 14: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

ϕEmission(x, ⟨y, start, end⟩) =end

∑i=start

WTE [hi, hend − hstart, embend−start]

Page 15: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Data-Efficiency through SCRFs

However, when the Lascaux cave complex was discovered in 1940, the world was amazed.

ϕEmission(x, ⟨y, start, end⟩) =end

∑i=start

WTE [hi, hend − hstart, embend−start]

ϕEmission(x, ⟨y,3,5⟩) =5

∑i=3

WTE [hi, hcomplex − hLascaux, emb2]

Page 16: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comparison to S2SOn 200,000 data points

SP after Filippova et al. (2015)

Page 17: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comparison to S2S

Sequential Pointer w/ features

SCRF

+ Features (POS, NER…)

+ LM reranking

60 67.5 75 82.5 90

On 200,000 data points

SP after Filippova et al. (2015)

Page 18: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comparison to S2S

Sequential Pointer w/ features

SCRF

+ Features (POS, NER…)

+ LM reranking

60 67.5 75 82.5 90

On 200,000 data points

SP after Filippova et al. (2015)

Page 19: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comparison to S2S

Sequential Pointer w/ features

SCRF

+ Features (POS, NER…)

+ LM reranking

60 67.5 75 82.5 90

On 200,000 data points

SP after Filippova et al. (2015)

Page 20: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comparison to S2SOn limited data

SCRF gains

Page 21: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

But does it improve comprehension?

National Geographic interactive reading practice:33 texts, 4-7 paragraphs, two reading difficulties, various topics

We compare no titles, human-written titles, and generated titles

144 participants completed six 2-3 min long tasks

Page 22: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

RetentionAsk people to answer questions after reading

Page 23: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

RetrievalAsk people to find information

Page 24: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comprehension Ask people to summarize the text

Page 25: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Retrieval and Retention

Accuracy Time Taken

Baseline (no titles) 0 0

Human-written -0.01 -2.2 secs

Generated -0.01 -27.1 secs (p < 0.01)

Page 26: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Comprehension

Readability Relevance Length Time Taken

Baseline (no titles) 4.66 ± 0.65 4.11 ± 0.86 0 0

Human-written 4.55 ± 0.76 4.09 ± 0.95 +8.6 words* -20.9 secs*

Generated 4.52 ± 0.72 4.12 ± 1.02 +5.3 words* -2.6 secs*

Page 27: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Section titles help with text comprehension

The type of title influences what is remembered about a textExtractive (Generated) titles: Fact retention is easierHuman-written titles: The overall story is easier to understand

Schallert (1975), Kozminsky (1977), Lorch Jr (2011)

Page 28: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Conclusion• We introduced a data-efficient title-generation pipeline

• We found that the SCRF-based compression outperforms S2S models in low-data settings

• We developed an evaluation framework and confirmed the positive effect of titles on text comprehension

But….

• The deletion-based compression only works for languages that retain grammaticality, even English has problems at times

• The efficacy of the low-resource model in an interface is still unknown

Page 29: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Generated Section Titles Improve Text Comprehension

Sebastian Gehrmann, Steven Layne, Franck Dernoncourt

@[email protected]

Page 30: Improving Human Text Comprehension through …Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation Sebastian Gehrmann, Steven Layne, Franck

Sentence Selection through Word-Level Extractive Summarization

Gehrmann et al. (2018)

x = x1, …, xn y = y1, …, ym

extraction functiont = t1, …, tn

Objective: Learn

Selection: Pick

log p(t |x) =n

∑i=1

log p(ti |x)

saliency(sent) =1

|sent |

|sent|

∑i=1

p(ti |sent)

source document summary