summarisation work at sheffield robert gaizauskas natural language processing group department of...
TRANSCRIPT
![Page 1: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/1.jpg)
Summarisation Work at Sheffield
Robert GaizauskasNatural Language Processing GroupDepartment of Computer Science
University of Sheffield
![Page 2: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/2.jpg)
January, 2001 AKT Workshop
Outline
Terminology
Approach 1: Generation from Templates
Approach 2: Coreference Chains
Approach 3: Statistical
![Page 3: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/3.jpg)
January, 2001 AKT Workshop
Terminology
Extract vs Abstract Extract - subset of the sentences in the original Abstract - fusion of topics in original + text generation
Generic vs User-focused Generic - captures essence of text, independent of
user’s interests User-focused – summarises content wrt a particular user
interest Indicative vs Informative
Indicative – indicates whether document should be examined in more detail
Informative – serves as a surrogate for original
![Page 4: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/4.jpg)
January, 2001 AKT Workshop
Approach 1: Generation from Templates
To generate user-focused informative abstracts
we have used an IE system + simple NL generation techniques to produce simple summaries
![Page 5: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/5.jpg)
January, 2001 AKT Workshop
Example: A Wall Street Journal Article<DOC><DOCID> wsj94_008.0212 </DOCID><DOCNO> 940413-0062. </DOCNO><HL> Who's News:@ Burns Fry Ltd. </HL><DD> 04/13/94 </DD><SO> WALL STREET JOURNAL (J), PAGE B10 </SO><CO> MER </CO><IN> SECURITIES (SCR) </IN><TXT><p> BURNS FRY Ltd. (Toronto) -- Donald Wright, 46 years old, was named executive
vice president and director of fixed income at this brokerage firm. Mr. Wright resigned as president of Merrill Lynch Canada Inc., a unit of Merrill Lynch & Co., to succeed Mark Kassirer, 48, who left Burns Fry last month. A Merrill Lynch spokeswoman said it hasn't named a successor to Mr. Wright, who is expected to begin his new position by the end of the month.
</p></TXT></DOC>
![Page 6: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/6.jpg)
January, 2001 AKT Workshop
Example: BNF Definition of a Management Succession Event Template (MUC-6)
<TEMPLATE> := DOC_NR: "NUMBER" ^ CONTENT: <SUCCESSION_EVENT> *<SUCCESSION_EVENT> := ORGANIZATION: <ORGANIZATION> ^ POST: "POSITION TITLE" | "no title" ^ IN_AND_OUT: <IN_AND_OUT> + VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK} ^<IN_AND_OUT> := PERSON: <PERSON> ^ NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING} ^ ON_THE_JOB: {YES, NO, UNCLEAR} OTHER_ORG: <ORGANIZATION> - REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG} -<ORGANIZATION> := ORG_NAME: "NAME" - ORG_ALIAS: "ALIAS" * ORG_DESCRIPTOR: "DESCRIPTOR" - ORG_TYPE: {GOVERNMENT, COMPANY, OTHER} ^ ORG_LOCALE: LOCALE_STRING {{CITY, PROVINCE, COUNTRY, REGION, UNK} * ORG_COUNTRY: NORMALIZED-COUNTRY-or-REGION | COUNTRY-or-REGION-STRING *<PERSON> := PER_NAME: "NAME" - PER_ALIAS: "ALIAS" * PER_TITLE: "TITLE" *
![Page 7: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/7.jpg)
January, 2001 AKT Workshop
<TEMPLATE-9404130062> := DOC_NR: "9404130062" CONTENT: <SUCCESSION_EVENT-1><SUCCESSION_EVENT-1> := SUCCESSION_ORG: <ORGANIZATION-1> POST: "executive vice president" IN_AND_OUT: <IN_AND_OUT-1> <IN_AND_OUT-2> VACANCY_REASON: OTH_UNK<IN_AND_OUT-1> := <IN_AND_OUT-2> := IO_PERSON: <PERSON-1> IO_PERSON: <PERSON-2> NEW_STATUS: OUT NEW_STATUS: IN ON_THE_JOB: NO ON_THE_JOB: NO OTHER_ORG: <ORGANIZATION-2> REL_OTHER_ORG: OUTSIDE_ORG<ORGANIZATION-1> := <ORGANIZATION-2> := ORG_NAME: "Burns Fry Ltd.“ ORG_NAME: "Merrill Lynch Canada Inc." ORG_ALIAS: "Burns Fry“ ORG_ALIAS: "Merrill Lynch" ORG_DESCRIPTOR: "this brokerage firm“ ORG_DESCRIPTOR: "a unit of Merrill Lynch & Co." ORG_TYPE: COMPANY ORG_TYPE: COMPANY ORG_LOCALE: Toronto CITY ORG_COUNTRY: Canada<PERSON-1> := <PERSON-2> := PER_NAME: "Mark Kassirer" PER_NAME: "Donald Wright" PER_ALIAS: "Wright" PER_TITLE: "Mr."
Example: A (Partially) Filled Management Succession Event Template
![Page 8: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/8.jpg)
January, 2001 AKT Workshop
Example: One Use for a Template - Generating a Summary
From the completely filled version of the preceding template the LaSIE system generates the following natural language summary:
BURNS FRY Ltd. named Donald Wright as executive vice president.Donald Wright resigned as president of Merrill Lynch Canada Inc..Mark Kassirer left as president of BURNS FRY Ltd.
Producing summaries in other languages is relatively easy (compared to full machine translation).
![Page 9: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/9.jpg)
January, 2001 AKT Workshop
Approach 2: Coreference Chains
To generate generic informative extracts
we have used coreference chains
![Page 10: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/10.jpg)
January, 2001 AKT Workshop
Approach 2: Coreference Chains (cont)
Background: Morris and Hirst (’94) investigated lexical chains –
chains of lexically-related words in a text that serve to make texts cohere
Barzilay + Elhadad (’97) suggested using lexical chains as a basis for selecting sentences to form a summary – rank chains based on number of links + extent over text
Halliday and Hassan (’76) proposed coreference as another major factor contributing to coherence of NL texts
Idea: Explore use of coreference chains to produce
summaries
![Page 11: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/11.jpg)
January, 2001 AKT Workshop
Approach 2: Coreference Chains (cont)
Technique Use LaSIE to carry out discourse analysis of text,
including coreference resolution Extract all coreference chains Rank chains by a metric which counts chain length +
extent + starting point• Intuition: entities which occur most frequently and most
widely in a text are those which the text is most “about” Depending on desired summary length, select m
sentences from top n chains Details in Azzam, Humphreys and Gaizauskas ’99
![Page 12: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/12.jpg)
January, 2001 AKT Workshop
Approach 3: Statistical
To generate generic indicative extracts
we have used a stastical approach based on a set of factors
![Page 13: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/13.jpg)
January, 2001 AKT Workshop
Approach 3: Statistical (cont)
Factors which have been examined in selecting sentences for inclusion in extractive summaries include: number of content words shared with title/headings (T) presence of “cue words” (C) location of sentence in text (L) number of content words discriminative of current text
as opposed to corpus of texts from which it is drawn, using, e.g. tf-idf measure (K)
![Page 14: Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield](https://reader036.vdocument.in/reader036/viewer/2022072111/56649f2f5503460f94c48ed9/html5/thumbnails/14.jpg)
January, 2001 AKT Workshop
Approach 3: Statistical (cont)
Assign a weight to each sentence according to a weighted linear combination of these factors
Learn weights to optimise sentence selection as measured against a corpus of extracts + texts
Select top ranked sentences up to desired summary length
)(
1),()()()()(
snumkey
isikeywordKLCT wwsLwsCwsTwsW