human summary production operations for computer-aided summarisation

30
Human summary production operations for computer- aided summarisation Laura Hasler University of Wolverhampton 30 May 2007

Upload: piper

Post on 14-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Human summary production operations for computer-aided summarisation. Laura Hasler University of Wolverhampton 30 May 2007. Overview. Original contributions of my thesis Human summarisation (HS) Automatic summarisation (AS) Computer-aided summarisation (CAS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Human summary production operations for computer-aided summarisation

Human summary production operations for computer-

aided summarisation

Laura HaslerUniversity of Wolverhampton

30 May 2007

Page 2: Human summary production operations for computer-aided summarisation

2

Overview

• Original contributions of my thesis

• Human summarisation (HS)

• Automatic summarisation (AS)

• Computer-aided summarisation (CAS)

• Classification of human summary production operations

• Guidelines derived from the classification

• Evaluation of guidelines and classification

Page 3: Human summary production operations for computer-aided summarisation

3

Original contributions

• Reliable ways of creating abstracts from extracts, improving coherence/readability

• Set of guidelines to annotate source texts for important information resulting in extracts for corpus of extract/abstract pairs

• Corpus of extract/abstract pairs for analysis• Corpus-based classification of human

summary production operations that successfully transform extracts into abstracts by improving coherence and readability

Page 4: Human summary production operations for computer-aided summarisation

4

Original contributions 2

• Set of summary production guidelines derived from classification which can be issued to users of a CAS system

• Development of Centering Theory (Grosz, Joshi & Weinstein 1995) as evaluation metric due to unsuitable existing methods

• Evaluation of coherence and readability of abstracts produced using summary production operations therefore of guidelines and operations themselves

Page 5: Human summary production operations for computer-aided summarisation

5

Human summarisation: 3 stages(Endres-Niggemeyer 1998)• Document exploration: summariser

explores layout and organisation of document to identify position of important information

• Relevance assessment: summariser assesses information in document to see if it is relevant to summary by recognising the theme (what it is ‘about’)

• Summary production: summariser cuts and pastes relevant information from document and edits it to form a coherent summary

Page 6: Human summary production operations for computer-aided summarisation

6

Automatic summarisation

Extracting• Units extracted from source verbatim

problems with coherence, unnecessary info• Methods can be easily used across domains• Currently more popular; CAST

Abstracting • Additional knowledge can be used concepts• Not restricted to linguistic realisation of source

more coherent and concise• Needs knowledge base domain dependent

Page 7: Human summary production operations for computer-aided summarisation

7

Computer-aided summarisation• A feasible alternative to fully automatic

summarisation given current technology – problems of coherence and readability with automatic extracts

• Uses automatic summarisation methods to produce an extract (stages 1&2) then post-edited by human summariser/user (stage 3)

• Focus of this research on post-editing (extract abstract) to improve coherence/readability

Page 8: Human summary production operations for computer-aided summarisation

8

Aim of the research

A) Chernobyl reactor number 4 was ripped apart by an explosion on 26 April 1986. Last September, the IAEA and the WHO released a report. Its headline conclusion that radiation from the accident would kill a total of 4000 people was widely reported.

B) Last September, the IAEA/WHO released a report on the explosion of Chernobyl reactor number 4 on 26 April 1986, concluding that radiation from the accident would kill a total of 4000 people. (h03-ljh)

Page 9: Human summary production operations for computer-aided summarisation

9

How can we consistently transform extracts into abstracts?

• Guidelines: available for other aspects/types of summarisation

• Investigation of what exactly a human summariser does to get from an extract to an abstract (and improve coherence)

• Corpus to allow analysis and classification

• Set of guidelines derived from classification

• Application and evaluation of classification/ guidelines to prove they work

Page 10: Human summary production operations for computer-aided summarisation

10

Corpus of extract/abstract pairs

• 43 pairs of news texts (extract, abstract)

• Source texts manually annotated for important information - higher quality

• Annotated using adapted CAST guidelines (Hasler et al. 2003): 30% extracts produced

• Extracts transformed into 20% abstracts - no guidelines given

Page 11: Human summary production operations for computer-aided summarisation

11

Classification of operations

• 5 general classes of operations

• Atomic and complex

• Atomic: deletion, insertion

• Complex: replacement, reordering, merging

• Each split into sub-operations (26 in total)

• Sub-operations linked to triggers, or recognisable surface forms

• Function of units also important

Page 12: Human summary production operations for computer-aided summarisation

12

Classification

Atomic operations and sub-operations

• Deletion: complete sentences, subordinate clauses, PPs, adverb phrases, reporting clauses, NPs, determiners, the verb be, specially formatted text, punctuation

• Insertion: connectives, formulaic units, modifiers, punctuation

Page 13: Human summary production operations for computer-aided summarisation

13

Classification 2

Complex operations and sub-operations

• Replacement: pronominalisation, lexical substitution, NP restructuring, nominalisation, referred sentences, VPs, passivisation, abbreviations

• Reordering: emphasising, coherence

• Merging: clause/sentence restructuring, punctuation/connectives

Page 14: Human summary production operations for computer-aided summarisation

14

Deletion

• “The process of removing a unit from a certain place in the extract so it does not appear in the same place in the abstract”

• Used alone or as part of complex operations

• Very useful for reducing text when used alone

• Deletes non-essential units e.g. details, repetitions

• Complete sentences, subordinate clauses, PPs, reporting clauses, determiners, be

Page 15: Human summary production operations for computer-aided summarisation

15

Deletion examples

• [I suspect that] the set would be the ideal book for a physicist to be cast away with on a desert island. (new-sci-B7L-54-ljh)

• Three papers published recently in Science move us a little closer to understanding the basis of the disease[, which turns out to be highly complex]. (sci04done-an)

• Britain [is] among [the] front runners as tomorrow’s supercomputers take shape. (sci05done-an)

Page 16: Human summary production operations for computer-aided summarisation

16

Insertion

• “The process of adding a unit which is not present in the extract into the abstract”

• Used alone or as part of complex operations

• Interesting because it adds text to something which is supposed to be reduced

• Used to add coherence and to clarify whilst saving space

• Connectives, modifiers, ‘formulaic units’, punctuation

Page 17: Human summary production operations for computer-aided summarisation

17

Insertion examples

• He sees the need to raise public awareness and demystify science and technology as a key point… (new-sci-B7L-75-ljh) [X sees Y as Z]

• The TV series Men of Science is now being shown in a few other areas. (new-sci-B7L-69-ljh)

Page 18: Human summary production operations for computer-aided summarisation

18

Replacement

• “The deletion of one unit and the insertion of a different one in the same place in the text”

• Complex operation, can be used in combination with other complex operations

• Useful for avoiding repetition and saving space

• Pronominalisation, lexical substitution, NP restructuring, nominalisation, VPs, passivisation, abbreviations

Page 19: Human summary production operations for computer-aided summarisation

19

Replacement examples

• [Zhanat Carr, a radiation scientist with the WHO in Geneva,] The WHO [says] admits the 5000 deaths were omitted because the report was a "political communication tool". (h03-ljh)

• [All this] [is] hardly Culver’s fault. [The same difficulties are to be found in all other parts of evolutionary ecology.] These general difficulties of evolutionary ecology are hardly Culver’s fault. (new-sci-B7L-63-ljh)

Page 20: Human summary production operations for computer-aided summarisation

20

Reordering

• “The deletion of a unit from one place in the extract and its insertion in a different place in the abstract”

• Complex operation, can be used in combination with other complex operations

• Sub-functions rather than operations – difficult to sub-classify

• Emphasises information, improves coherence and readability

Page 21: Human summary production operations for computer-aided summarisation

21

Reordering example

• Text about world’s second face transplant, all other sentences about a specific person/ operation: S2 last sentence

• Experts predict the number of these operations will rise rapidly as centres around the world gear up to perform the procedure. (h01-ljh)

Page 22: Human summary production operations for computer-aided summarisation

22

Merging

• “Taking information from different units in the extract and presenting them as one unit in the abstract”

• All other operations can be used

• Large class, most difficult to sub-classify – anything (appropriate) goes!

• Best embodies abstracting as opposed to extracting – conciseness

• Restructuring of clauses/sentences, punctuation/ connectives

Page 23: Human summary production operations for computer-aided summarisation

23

Merging example

• In October 1980 Zuccarelli filed [an expensive] European patent application, covering nine countries including Britain [. … The cost of pushing a European patent through in nine countries is around $10000. The cost of application alone is around $2000 and Zuccarelli has already paid an extra $500

for a further stage of official examination]. (new-sci-B7K-37)

Page 24: Human summary production operations for computer-aided summarisation

24

Evaluation

• Applied guidelines to a different set of extracts

• 25 human-produced extracts + corresponding abstracts

• 25 automatically produced extracts + corresponding abstracts

• Developed Centering Theory as an evaluation method due to unsuitability of existing methods

Page 25: Human summary production operations for computer-aided summarisation

25

Centering Theory (CT) (Grosz, Joshi & Weinstein 1995)

• Theory of local coherence and salience

• Accounts for coherence using repetitions of entities across consecutive utterances (Cfs, Cps, Cbs)

• Uses the relationship between repetitions to derive ‘transitions’ (position in utterance)

• Transitions are ordered in preference from most to least coherent (continue, retain, smooth shift, rough shift, no transition/no Cb)

Page 26: Human summary production operations for computer-aided summarisation

26

Centering Theory: an exampleJohn[Cp] went to his favorite music store to buy a piano.He[Cp], [Cb] had frequented the store for many years.He[Cp], [Cb] was excited that he could finally buy a piano.He[Cp], [Cb] arrived just as the store was closing for the day.Continue, continue, continue

John[Cp] went to his favorite music store to buy a piano.It[Cp] was a store John[Cb] had frequented for many years.He[Cp], [Cb] was excited that he could finally buy a piano.It[Cp] was closing just as John[Cb] arrived.Retain, continue, retain

(Grosz, Joshi & Weinstein 1995: 206)

Page 27: Human summary production operations for computer-aided summarisation

27

Centering Theory: a real example1. (Everybody)[Cp] should be ready for ((Monday)'s national

championship game), despite (casualties in ((Saturday night)'s NCAA semifinal battles)). no transition (indirect)

2. (Jason Terry of (Arizona))[Cp], [Cb] was injured. retain3. “(We)[Cp] were going to put (him)[Cb] in late in (the game),”

said (Arizona coach (Lute Olson)). rough shift4. “(He)[Cp] had played a lot before (that), of course, but when

(we)'re protecting (a lead), (we)[Cb] like getting (four perimeter guys) in there and (that) gives (us) (another ball handler), gives (us) (another free throw shooter).” retain

5. (Kentucky coach (Rick Pitino))[Cp] predicted that ((Monday)'s championship game) would be also be physical, in view of (((Kentucky)'s all-out pressure defence) and ((Arizona)[Cb]'s blazing speed)).

Page 28: Human summary production operations for computer-aided summarisation

28

CT evaluation metric

Transition Weight

Continue +3

Retain +2

No transition (indirect) +1

Smooth shift -1

Rough shift -2

No transition (no Cb) -5

Page 29: Human summary production operations for computer-aided summarisation

29

Evaluation 2

• Human judgment obtained to complement CT

• Overall, human summary production operations improve texts: CT = 78%; Judge = 82%

• Agreement between CT and judge = 70%

• Classification and resulting guidelines can be reliably used during post-editing in CAS

• CT is useful as an evaluation method

Page 30: Human summary production operations for computer-aided summarisation

30

Directions for future work

• To use more human summarisers/judges to further validate classification/guidelines

• To further explore/improve CT for evaluation

• To investigate the feasibility of automating certain elements of summary production operations for CAS

• To look at scientific texts (also popular in AS)