processing the scope of negation and modality cues in...

66
Processing the Scope of Negation and Modality Cues in Biomedical Texts Roser Morante, Walter Daelemans CNTS-Language Technology Group University of Antwerp

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

Processing the Scope of Negation andModality Cues in Biomedical Texts

Roser Morante, Walter Daelemans

CNTS-Language Technology Group

University of Antwerp

Page 2: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

1

Framework

• The BIOGRAPH project (www.biograph.be)University of Antwerp:

- Text Mining: CNTS, Department of Linguistics,Walter Daelemans

- Data Mining: ADReM, Department ofMathematics and Computer Science

Bart Goethals

- Genetics: AMG, Department of MolecularGenetics, Jurgen Del-Favero

Page 3: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

2

Framework

• The BIOGRAPH project aims at:

- Assisting researchers in ranking candidate diseasecausing genes by putting forward a new methodologyfor combined text analysis and data mining fromheterogeneous information sources

- Mining biomedical texts: providing accurate relationsautomatically extracted from text and weightedaccording to their reliability

• Treatment of negation, modality and quantification

Page 4: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

3

Framework

The BIOGRAPH flow

Page 5: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

4

• Candidate region

- Gene responsible for a disease (e.g. schizophrenia orAlzheimer) is in a known area of the genome

- Many genes (> 200) are in this candidate region

• Experimental validation is needed

- Very expensive in time and cost

• Combine information in literature and in databases!

- Which genes in the candidate region could be mostrelevant for the disease and why?

- Provide a prioritization (ranking problem)

Gene Prioritization

Page 6: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

5

Event Extraction

MEDLINE:7747440Epstein-Barr virus replicative gene transcription during de novo infection of humanthymocytes: simultaneous early expression of BZLF-1 and its repressor RAZ.Epstein-Barr virus (EBV) is known to infect B cells and epithelial cells. We and others haveshown that EBV can also infect a subset of thymocytes. Infection of thymocytes wasaccompanied by the appearance of linear EBV genome within 8 hr of infection.Circularization of the EBV genome was not detected. This is in contrast to the infection inB cells where the genome can circularize within 24 hr of infection. The appearance of theBamHI ZLF-1 gene product, ZEBRA, by RT-PCR, was observed within 8 hr of infection. Theappearance of a novel fusion transcript (RAZ), which comprised regions of the BZLF-1locus and the adjacent BRLF-1 locus, was detected by RT-PCR. ZEBRA protein was alsoidentified in infected thymocytes by immunoprecipitation. In addition, we demonstratedthat the EBNA-1 gene in infected thymocytes was transcribed from the Fp promoter,rather than from the Cp/Wp promoter which is used in latently infected B cells.Transcripts encoding gp350/220, the major coat protein of EBV, were identified, but wedid not find any evidence of transcription from the LMP-2A or EBER-1 loci in infectedthymocytes. These observations suggest that de novo EBV infection of thymocytes differsfrom infection of B cells. The main difference is that with thymocytes, no evidence couldbe found that the virus ever circularizes. Rather, EBV remains in a linear configurationfrom which replicative genes are transcribed.

Page 7: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

6

Event ExtractionMEDLINE:7747440... In addition, we demonstrated that the EBNA-1 gene in infected thymocytes was transcribed from theFp promoter, rather than from the Cp/Wp promoter which is used in latently infected B cells.

Transcripts encoding gp350/220, the major coat protein ofEBV, were identified, but we did not find any evidence oftranscription from the LMP-2A or EBER-1 loci in infectedthymocytes. These observations suggest that de novo EBV infection of thymocytes differs from

infection of B cells.

<event id="E10" source="7747440" neg="1" spec="1"> <predicate type="Transcription" begin="1216" end="1229"> transcription </predicate> <patient type="Theme" begin="1239" end="1245"> LMP-2A </patient></event>

Page 8: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

7

Contents• Motivation

• Negation- Task description

- Related work

- Corpus

- System description

- Results

• Modality- Related work

- Results

• Negation vs. modality

• Conclusions

• Further Research

Page 9: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

8

Motivation

• Extracted information that falls in the scope ofhedge or negation cues cannot be presentedas factual information

• Vincze et al. (2008) report that 17.70% ofthe sentences in the BioScope corpus containhedge cues and 13 % negation cues

• Light et al. (2004) estimate that 11% ofsentences in MEDLINE abstracts containspeculative fragments

Page 10: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

9

Finding the scope of negation

• Finding the scope of a negation cue meansdetermining at a sentence level which words inthe sentence are affected by the negation(s)

Analysis at the phenotype and genetic level showed that

lack of CD5 expression was due neither to

segregation of human autosome 11, on which

the CD5 gene has been mapped, nor to deletion of

the CD5 structural gene.

Page 11: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

10

Related work

• Most of the related work focuses on detectingwhether a term is negated or not

- Rule or regular expression based systems like NegEx(Chapman et al. 2001) and NegFinder (Mutalik et al.2001)

- Machine learning systems like Averbuch et al. (2004)

- Huang and Lowe (2007) develop a hybrid system thatcombines regular expression matching with parsing inorder to locate negated concepts

Page 12: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

11

Corpus

Page 13: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

12

PMA treatment, and <xcope id=“X1.4.1”><cue type=“negation'' ref="X1.4.1"> not</cue> retinoic acid treatment of the U937cells</xcope> acts in inducing NF-KBexpression in the nuclei.

Corpus

• Medical and biological texts annotated withinformation about negation and speculation

</xcope>

<xcope id=“X1.4.1”>

</cue>

<cue type=“negation'' ref="X1.4.1">

2822436093541985#Words

1187126706383#Sent.

127391954#Docs.

AbstractsPapersClinical• Corpora

Page 14: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

13

Experimental Setting

• Abstracts corpus:

- 10 fold cross-validation experiments

• Clinical and papers corpora:robustness test

- Training on abstracts

- Testing on clinical and papers

Page 15: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

14

System Description

• We model the scope finding task as twoconsecutive classification tasks:

- Finding negation cues: a token is classified as being atthe beginning of a negation signal, inside or outside

- Finding the scope: a token is classified as being thefirst element of a scope sequence, the last, or neither

• Supervised machine learning approach

Page 16: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

15

System Architecture

Page 17: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

16

Preprocessing

Page 18: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

17

Finding Negation Cues

• We filter out negation cues that areunambiguous in the training corpus(17 out of 30)

• For the rest, a classifier predictswhether a token is the first token of anegation signal, inside or outside of it

- Algorithm : IGTREE as implementedin TiMBL (Daelemans et al. 2007)

- Instances represent all tokens in asentence

- Features about the token in focus andits context

Page 19: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

18

Features negation cue finding

• Of the token

- Lemma, word, POS and IOB chunk tag

• Of the token context

- Word, POS and IOB chunk tag of 3 tokensto the right and 3 to the left

Page 20: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

19

Ambiguous Negation CuesIn Abstracts Corpus

Page 21: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

20

97.42

88.03

88.09

F1

90.70

79.42

94.46

IAA

97.53

92.46

95.17

RECALL

97.31

84.01

82.00

PREC

Clinical

Papers

Abstracts

BASELINE

Results• Baseline: tagging as negation signals tokens that are

negation signals at least in 50% of the occurrences inthe training corpus

BASELINETOKENS

absence,absent,cannot, couldnot, fail,failure,impossible,instead of,lack, miss,neither,never, no,none, nor,not, ratherthan, unable,with theexception of,without

97.71 (+0.29)

91.25 (+3.22)

91.20 (+3.11)

F1

98.09

95.72

98.75

RECALL

97.33

87.18

84.72

PREC

Clinical

Papers

Abstracts

SYSTEM

Page 22: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

21

Results system vs. baselinein abstracts corpus

• The system performs better

Page 23: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

22

Results in the three corpora

• The system is portable

Page 24: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

23

Discussion

• Cause of lower recall on papers corpus:

93.6853.22Papers

91.226.72Clinical

98.2558.89Abstracts

% classifiedcorrectly

% negationsignals

NOT

• Errors: not is classified as negation signal

However, programs for tRNA identification [...] do not necessarily performwell on unknown ones

The evaluation of this ratio is difficult because not all true interactions areknown

Page 25: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

24

Finding Scopes

• Three classifiers predict whether atoken is the first token in the scopesequence, the last or neither

- MBL (Daelemans et al. 2007)

- SVMlight (Joachims 1999)

- CRF++ (Lafferty et al. 2001)

• A fourth classifier predicts the sametaking as input the output of theprevious classifiers- CRF++

• The features used by the objectclassifiers and the metalearner aredifferent

Page 26: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

25

Finding Scopes

Page 27: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

26

Finding Scopes

• Previous attempts: lower results

- Chunk-based classification, instead of word-based

- BIO classification of tokens (EMNLP’08) instead ofFOL (First, Other, Last)

- Single classifier approach, instead of metalearner

Page 28: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

27

Features Scope FindingObject Classifiers

• Of the negation signal: Chain of words.

• Of the paired token: Lemma, POS, chunk IOB tag, type of chunk;lemma of the second and third tokens to the left; lemma, POS,chunk IOB tag, and type of chunk of the first token to the left andthree tokens to the right; first word, last word, chain of words, andchain of POSs of the chunk of the paired token and of two chunks tothe left and two chunks to the right.

• Of the tokens between the negation signal and the token infocus: Chain of POS types, distance in number of tokens, and chainof chunk IOB tags.

• Others: A feature indicating the location of the token relative to thenegation signal (pre, post, same).

Page 29: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

28

Features Scope FindingMetalearner

• Of the negation signal: Chain of words, chain of POS, word of the twotokens to the right and two tokens to the left, token number divided by thetotal number of tokens in the sentence.

• Of the paired token: Lemma, POS, word of two tokens to the right andtwo tokens to the left, token number divided by the total number of tokensin the sentence.

• Of the tokens between the negation signal and the token in focus:Binary features indicating if there are commas, colons, semicolons, verbalphrases or one of the following words between the negation signal and thetoken in focus: Whereas, but, although, nevertheless, notwithstanding,however, consequently, hence, therefore, thus, instead, otherwise,alternatively, furthermore, moreover.

• About the predictions of the three classifiers: prediction, previous andnext predictions of each of the classifiers, full sequence of previous and fullsequence of next predictions of each of the classifiers.

• Others: A feature indicating the location of the token relative to thenegation signal (pre, post, same).

Page 30: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

29

Parameters Classifiers

• TiMBL: IB1

- Similarity metric: overlap

- Feature weighting: gain ratio

- 7 k-nn

- Weighting class vote of neighbors as a function of their inverselinear distance

• SVM

- Classification

- Cost factor: 1

- Biased hyperplane

- Linear kernel function

• CRF

- Regularisation algorithm L2 fortraining

- Cut-off threshold of features: 1

- Unchanged hyper-parameter

Page 31: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

30

Post-processing

• Scope is always a consecutive block of scope tokens,including the negation signal

• The classifiers predict the first and last token of the scopesequence:- None or more than one FIRST and one LAST elements are

predicted

• In the post-processing we apply some rules to select oneFIRST and one LAST token

Example:

- If more than one token has been predicted as FIRST, take as FIRST thefirst token of the negation signal

Page 32: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

31

Results

• Baseline: calculating the average length of the scope to theright of the negation signal and tagging that number of tokensas scope tokens

- Motivation: 85.70 % of scopes to the right

12.95

4.76

7.11

PCS

62.27

24.86

37.45

PCS-2

76.29

70.86

92.46

IAA

Clinical

Papers

Abstracts

BASELINE

71.2170.75Clinical

41.00

66.07

PCS

44.44

66.93

PCS-2

Papers

Abstracts

SYSTEM

+16.74+16.52Clinical

+9.79+9.26Papers

+7.17+7.29Abstracts

PCS PCS-2SYSTEM

gold negs

Page 33: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

32

Results on the abstracts corpus

The system performs clearly better than baseline

There is a higher upperbound calculated with gold standard negation signals

Page 34: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

33

The system is portable

Lower results in the papers corpus

Results on the three corpora

Page 35: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

34

Discussion

• Clinical reports are easier to process thanabstracts and papers

- Negation signal no is very frequent (76.65 %) andhas a high PCS (73.10 %)

No findings to account for symptoms

No signs of tuberculosis

- Sentences are shorter than in abstracts and papers

• Average length: 7.8 tokens vs. 26.43 and 26.24

• 75.85 % of the sentences have 10 or less tokens

Page 36: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

35

Discussion

• Papers are more difficult to process thanabstracts- Negation signal not is frequent (53.22%) and has

a low PCS (39.50) in papers. Why?

16.4123.28% Scopes left

8.856.45Av. scope length

8.825.60Av. scope left

14.2925.56Ambiguity (%¬neg)

AbstractsPapersNOT

Page 37: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

36

PCS results of the metalearner

compared to the object classifiers

The metalearner performs better than the three object classifiers(except SVMs on the clinical corpus)

Page 38: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

37

Finding the scope of modality

• Finding the scope of a hedge cue meansdetermining at a sentence level which words inthe sentence are affected by the hedge cues(s)

These results [suggest that expression of

c-jun, jun B and jun D genes [might be

involved in terminal granulocyte differentiation

[or in regulating granulocyte functionality]]].

Page 39: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

38

Related Work

• Theoretical descriptions that define hedging andmodality (Lakoff 1972, Palmer 1986) based oncorpora (Hyland 1998, Saurí et al. 2006,Thompson et al. 2008)

• Machine learning experiments that focus onclassifying a sentence into speculative or definite(Medlock and Briscoe 2007, Medlock 2008,Szarvas 2008, Kilicoglu and Bergler 2008)

Page 40: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

39

Related work

• The system that we present here is based on thesystem developed for processing the scope ofnegation cues

• Our goal is to check whether the same approachcan be applied to processing hedge cues

Page 41: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

40

System Architecture

SAM

E SYSTEM

Page 42: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

41

Preprocessing

Page 43: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

42

Finding Hedge Cues

• A classifier predicts whether a tokenis at the beginning of a hedge cue,inside or outside of it

- Algorithm : IGTREE asimplemented in TiMBL (Daelemanset al. 2007)

- Instances represent all tokens in asentence

- Features about the token in focusand its context

SAM

E SYSTEM

Page 44: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

43

Ambiguity in Hedge CuesSample from Abstracts Corpus

# Hedge cues:

110

# Non ambiguoushedge cues:

40

Page 45: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

44

50.57

57.60

62.67

F1

84.01

77.60

79.12

IAA

40.78

61.21

71.77

RECALL

66.55

54.39

55.62

PREC

Clinical

Papers

Abstracts

BASELINE

Results

41.92

71.59

84.77

F1

27.51

68.18

79.84

RECALL

88.10

75.35

90.81

PREC

Clinical

Papers

Abstracts

SYSTEM

• Baseline: tagging as hedge cues a list of words extractedfrom the abstracts corpusBASELINE

TOKENS

appear, apparent,apparently, believe,estimate,hypothesis,hypothesize, if,imply, likely, may,might, or, perhaps,possible, possibly,postulate,potentially,presumably,probably, propose,putatitve, should,seem, speculate,suggest, support,suppose, suspect,think, uncertain,unclear, unknown,unlikely, whether,would

Page 46: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

45

Results system vs. baselinein abstracts corpus

• The system performs better than baseline, with a main increasein precision (+35.19)

Page 47: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

46

Results in the three corpora

• The system is portable in terms of precision, but less so in termsof recall, which decreases (-13.27) in the clinical corpus. Why?

Page 48: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

47

Discussion

• Cause of lower recall on clinical corpus:

276

27

118

# ashedge

24.62

4.04

4.42

% ofhedges

98.22

16.99

11.29

% ashedge

281

153

1062

total #

0.007

0.137

0.129

recall

Papers

Clinical

Abstracts

OR

• The use of OR as hedge cue is difficult to interpret

+CUE: Nucleotide sequence and PCR analyses demonstrated the presenceof novel duplications or deletions involving the NF-kappa B motif.

-CUE: In nuclear extracts from monocytes or macrophages, induction ofNF-KB occurred only if the cells were previously infected with HIV-1.

(= AND)

Page 49: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

48

Finding Scopes

• Three classifiers predict whether a tokenis the first token in the scope sequence,the last or neither

- MBL (Daelemans et al. 2007)

- SVMlight (Joachims 1999)

- CRF++ (Lafferty et al. 2001)

• A fourth classifier predicts the sametaking as input the output of the previousclassifiers- CRF++

• The features used by the objectclassifiers and the metalearner aredifferent

SAM

E SYSTEM

Page 50: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

49

Finding Scopes

Page 51: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

50

Postprocessing

• Scope is always a consecutive block of scope tokens,including the negation signal

• The classifiers predict the first and last token of the scopesequence:- None or more than one FIRST and one LAST elements

might be predicted by the classifiers

• In the postprocessing we apply some rules to select oneFIRST and one LAST token

Example:

- If more than one token has been predicted as FIRST, take as FIRST thefirst token of the negation signal

SAM

E SYSTEM

Page 52: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

51

Results

• Baseline: calculating the average length of the scope to theright of the hedge cue and tagging that number of tokens asscope tokens

- Motivation: 82.45 % of scopes to the right

2.72

2.19

3.15

PCS

3.53

2.26

3.17

PCS-2

Clinical

Papers

Abstracts

BASELINE

27.4426.21Clinical

35.92

65.55

PCS

42.37

66.10

PCS-2

Papers

Abstracts

SYSTEM

+36.50+34.38Clinical

+15.84+12.02Papers

+12.11+11.58Abstracts

PCS PCS-2SYSTEM

gold cues

Page 53: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

52

Baseline Modality vs Negation

• Baseline results are much lower for the hedge scope finder

Page 54: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

53

Results on the abstracts corpus

The system performs clearly better than baseline

There is a higher upperbound calculated with gold standardhedge cues

Page 55: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

54

Results are lower for papers (PCS -29.63) and clinical (PCS -39.34).Why?

Results on the three corpora

Page 56: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

55

Discussion

• Why are the results in papers lower?

- 41 cues (47.00%) in papers are not in abstracts

- Some cues that are in abstracts and are frequentin papers get low scores.

• Example: suggest

(92.33 PCS in abstracts vs. 62.85 PCS in papers)

Page 57: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

56

Discussion

• Errors suggest:

- Bibliographic references

- Sentences with format typical of papers and notof abstracts

The conservation from Drosophila to mammals of these twostructurally distinct but functionally similar E3 ubiquitin ligases is likelyto reflect a combination of evolutionary advantages associated with:(i) specialized expression pattern, as evidenced by the cell-specificexpression of the neur gene in sensory organ precursor cells [52]; (ii)

specialized function, as suggested by the role of murine MIB inTNF?? signaling [32]; (iii) regulation of protein stability, localization,and/or activity.

Page 58: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

57

Discussion

• Why are the results in clinical lower?

- 68 cues (35.45%) in clinical are not in abstracts

- Frequent hedge cues in clinical are notrepresented in abstracts

0.003.9921.41or

0.00

0.00

0.00

% Abstracts

0.00

3.84

0.00

PCS Clinical

6.67evaluate for

5.28consistent with

5.12rule out

%ClinicalCUE

Page 59: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

58

Hedge Scope Finder Compared to

Negation Scope Finder

• Gold hedge cues = no errorpropagation from the firstphase

• The abstracts results showthat the same system can beapplied to finding the scope ofnegation and hedge processing

• The systems are equallyportable to the papers corpus

• The negation system is betterportable to the clinical corpus

Page 60: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

59

Hedge Scope Finder Compared to

Negation Scope Finder

• Error propagation fromthe first phase:

- The hedge system ismuch less portable tothe clinical corpusthan the negationsystem

Page 61: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

60

Conclusions

• We have presented a metalearning approach to processingthe scope of negation cues. The metalearner performsbetter than the object classifiers

- We achieve a 32.07% error reduction over previousresults (Morante et al 2008)

• We have shown that the same scope finding approach canbe applied to both negation and modality

- Finding the scope of modality cues is more difficult

- Modality cues are more diverse and ambiguous thannegation cues

Page 62: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

61

Conclusions

• We have shown that the system is portable to differentcorpora, although:

- Negation & modality: results are worse for the paperscorpus

- In general, modality cues are less portable acrosscorpora (Szarvas 2008)

• Negation: results per corpus are mostly determined bythe scores of the negation signals no and not

• Modality: results per corpus are determined by corpus-specific cues

Page 63: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

62

Further Research• Error analysis to explain:

- why the metalearner performs better than the objectclassifiers

- why the papers corpus is more difficult to process

- why some negation signals are more difficult to processthan others

• Experimenting with more features

- dependency syntax

• Test on general domain corpora

• Experimenting with other machine learning approaches(constraint satisfaction)

Page 64: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

63

References

• R. Morante and W. Daelemans. A metalearning approach toprocessing the scope of negation. Proceedings of theThirteenth Conference on Computational Natural LanguageLearning (CoNLL), pages 21–29, Boulder, Colorado, June2009. ACL.

• R. Morante and W. Daelemans. Learning the scope of hedgecues in biomedical texts. Proceedings of the Workshop onBioNLP, pages 28–36, Boulder, Colorado, June 2009. ACL.

• Roser Morante, Anthony Liekens, and Walter Daelemans.Learning the Scope of Negation in Biomedical Texts.Proceedings of the 2008 Conference on Empirical Methods inNatural Language Processing, pages 715-724, Honolulu,Hawai, October 2008. ACL.

Page 65: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

64

Acknowledgements

• GOA project BIOGRAPH of theUniversity of Antwerp

- www.biograph.be

• BioScope team

• Thanks for your attention!

Page 66: Processing the Scope of Negation and Modality Cues in ...feast.coli.uni-saarland.de/slides/MoranteR010709.pdf · Processing the Scope of Negation and Modality Cues in Biomedical Texts

65

Results Scope Finding

• Baseline: calculating the average length of the scope to theright of the negation signal and tagging that number of tokensas scope tokens (85.70 % of scopes to the right)

12.95

4.76

7.11

PCS

62.27

24.86

37.45

PCS-2

80.47

68.11

77.46

F1

76.29

70.86

92.46

IAA

74.96

66.92

78.26

RECALL

86.85

69.34

76.68

PREC

Clinical

Papers

Abstracts

BASELINE

71.2170.7584.2082.1486.38Clinical

+16.74+16.52+7.87+1-.36+5.27gold

+9.79+9.26+13.77+15.23+12.26gold

+7.17+7.29+8.07+7.23+8.92gold

41.00

66.07

PCS

44.44

66.93

PCS-2

70.94

82.60

F1

69.72

83.45

RECALL

72.21

81.76

PREC

Papers

Abstracts

SYSTEM