what is computer-aided summarisation and does it really work?

What is computer-aided What is computer-aided summarisation and does it summarisation and does it

really work?really work?

Constantin OrasanConstantin Orasan

http://clg.wlv.ac.uk/projects/CAST/http://clg.wlv.ac.uk/projects/CAST/

StructureStructure

1.1. IntroductionIntroduction

2.2. CASTCAST

3.3. EvaluationEvaluation

4.4. ConclusionsConclusions

Computer-aided summarisationComputer-aided summarisation

Combines automatic methods with human Combines automatic methods with human inputinput

Relies on automatic methods to identify Relies on automatic methods to identify the important informationthe important information

Humans can decide to include this Humans can decide to include this information and/or additional oneinformation and/or additional one

Humans post-edit the information to Humans post-edit the information to produce a coherent summaryproduce a coherent summary

Automatic Automatic summarisation (AS)summarisation (AS)

Produces summaries Produces summaries automatically with the help automatically with the help of computersof computers Does not require human Does not require human inputinput The quality is low The quality is low (especially when compared (especially when compared to human summaries)to human summaries)

Computer-aided Computer-aided summarisation (CAS)summarisation (CAS)

Uses automatic methods Uses automatic methods to produce summaries, butto produce summaries, but Allows the humans to Allows the humans to postedit the resultpostedit the result High quality, but less effortHigh quality, but less effort

Why CAS can work?Why CAS can work?

Endres-Niggemeyer (1998) identifies three Endres-Niggemeyer (1998) identifies three stages in human summarisation: stages in human summarisation: document explorationdocument exploration, , relevance relevance assessmentassessment and and summary productionsummary productionWe hypothesise the first two stages can We hypothesise the first two stages can be replaced by automatic methodsbe replaced by automatic methodsCraven (1996) and Narita (2000) tried to Craven (1996) and Narita (2000) tried to help humans summarisers using help humans summarisers using automatic meansautomatic means

Computer-aided summarisation tool Computer-aided summarisation tool (CAST)(CAST)

Work funded by Arts and Humanities Work funded by Arts and Humanities Research CouncilResearch CouncilWork done together with Laura HaslerWork done together with Laura HaslerThe most important outcome of the project The most important outcome of the project is the toolis the toolAllows the user to run automatic methods Allows the user to run automatic methods to identify important sentencesto identify important sentencesIn order to produce an abstract, the user In order to produce an abstract, the user can take sentences and edit themcan take sentences and edit them

CAST- the tool (II)CAST- the tool (II)

At present CAST contains the following At present CAST contains the following methods:methods:– Keyword methodKeyword method– Indicating phrasesIndicating phrases– Surface cluesSurface clues– Lexical cohesionLexical cohesion

These methods were chosen because they are These methods were chosen because they are highly customisable and domain independenthighly customisable and domain independentThe user can select the setting which is the most The user can select the setting which is the most appropriate for a particular text/genreappropriate for a particular text/genre

Feedback from the userFeedback from the user

We analysed the work of our human summariserWe analysed the work of our human summariser– Term-based summarisation was used first to produce Term-based summarisation was used first to produce

30% summaries30% summaries– Whenever a useful sentence was found lexical chains Whenever a useful sentence was found lexical chains

were used to identify related sentenceswere used to identify related sentences– Avoids to run too many automatic methods because it Avoids to run too many automatic methods because it

becomes confusingbecomes confusing– Requested a way to know which sentences have Requested a way to know which sentences have

been included in the summarybeen included in the summary

EvaluationEvaluation

Our assumption about CAS is that it is Our assumption about CAS is that it is possible to produce summaries in less possible to produce summaries in less time without any loss in qualitytime without any loss in quality

2 experiments were carried out:2 experiments were carried out:– We recorded the time for producing We recorded the time for producing

summaries with and without CASTsummaries with and without CAST– Showed pairs of summaries and asked Showed pairs of summaries and asked

humans to pick the better onehumans to pick the better one

Experiment 1Experiment 1

Used one professional summariserUsed one professional summariser

69 texts from CAST corpus were used69 texts from CAST corpus were used

Summaries were produced with and Summaries were produced with and without the tool at one year distancewithout the tool at one year distance

Without CASTWithout CAST With CASTWith CAST Reduction %Reduction %

Newswire textsNewswire texts 498secs498secs 382secs382secs 23.29%23.29%

New Scientist textsNew Scientist texts 771secs771secs 623secs623secs 19.19%19.19%


We evaluated the term-based summariser We evaluated the term-based summariser used in the processused in the process

We found correlation between the success We found correlation between the success of the automatic summariser and the time of the automatic summariser and the time reductionreduction


Turing-like experiment where we asked Turing-like experiment where we asked humans to pick the better summary in a humans to pick the better summary in a pairpair

Each pair contained one summary Each pair contained one summary produced with CAST and one without produced with CAST and one without CASTCAST

17 judges were shown 4 randomly 17 judges were shown 4 randomly selected pairsselected pairs


In 41 pairs the summary produced with In 41 pairs the summary produced with CAST was preferredCAST was preferredIn 27 pairs the summary produced without In 27 pairs the summary produced without CAST was preferredCAST was preferredOur assumption was that there is no Our assumption was that there is no difference between themdifference between themChi-square shows that there is no Chi-square shows that there is no statistically significant difference with 0.05 statistically significant difference with 0.05 confidenceconfidence

ConclusionsConclusions

Computer-aided summarisation really Computer-aided summarisation really works for professional summarisersworks for professional summarisersand reduces the time necessary to and reduces the time necessary to produce summaries by about 20%produce summaries by about 20%It would be interesting to try with non-It would be interesting to try with non-professional summarisersprofessional summarisersTry on other textsTry on other textsCompare to other computer-aided Compare to other computer-aided methodsmethods

what is computer-aided summarisation and does it really work?

Technology

automatic summarisation

cast reduction

computeraided methods

human summarisation

cast corpus

present cast

automatic summariser

pairs of summaries