learning subjective nouns using extraction pattern bootstrapping
DESCRIPTION
Learning Subjective Nouns using Extraction Pattern Bootstrapping. Ellen Riloff School of Computing University of Utah Janyce Wiebe , Theresa Wilson Computing Science University of Pittsburgh CoNLL-03. Introduction (1/2). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/1.jpg)
Learning Subjective Nouns using Extraction Pattern Bootstrapping
Ellen Riloff School of Computing University of Utah Janyce Wiebe , Theresa Wilson Computing Science University of Pittsburgh
CoNLL-03
![Page 2: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/2.jpg)
Introduction (1/2)
Many Natural Language Processing applications can benefit from being able to distinguish between factual and subjective information .
Subjective remarks come in a variety of forms , including opinions , rants , allegations , accusations and speculation .
QA should distinguish between factual and speculative answers .
Multi-document summarization system need to summarize different opinions and perspectives .
Spam filtering systems must recognize rants and emotional tirades , among other things .
![Page 3: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/3.jpg)
Introduction (2/2) In this paper , we use Meta-Bootstrapping (Riloff and J
ones 1999) , Basilisk (Thelen and Riloff 2002) algorithms to learn lists of subjective nouns :
Both bootstrapping algorithms automatically generated extraction patterns to identify words belonging to a semantic category .
We hypothesize that extraction patterns can also identify subjective words .
The Pattern “expressed <direct_object>” often extracts subjective nouns , such as “concern” , “hope” , “support” .
Both bootstrapping algorithm require only a handful of seed words and unannotated texts for training ; no annotated data is need at all .
![Page 4: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/4.jpg)
Annotation Scheme The goal of the annotation scheme is to identify
and characterize expressions of private states in a sentence .
Private state is a general covering term for opinions , evaluations , emotions and speculations .
“ The time has come , gentleman , for Sharon , the assassin , to realize that injustice cannot last long” -> writer express a negative evaluation .
Annotator are also asked to judge the strength of each private state . A private state can have low , medium , high or extreme strength .
![Page 5: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/5.jpg)
Corpus , Agreement Results
Our data consist of English-language versions of foreign news document from FBIS .
The annotated corpus used to train and test our subjective classifiers (the experiment corpus) consist of 109 documents with a total of 2197 sentences .
We use a separate , annotated tuning corpus to establish experiment parameters .
![Page 6: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/6.jpg)
Extraction Pattern In the last few years , two bootstrapping algorithms hav
e been developed to create semantic dictionaries by exploiting extraction patterns .
Extraction patterns represent lexico-syntactic expression that typically rely on shallow parsing and syntactic role assignment .
“ <subject> was hired . ” A bootstrapping process looks for words that appear in
the same extraction patterns as the seeds and hypothesize that those words belong to the same semantic category .
![Page 7: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/7.jpg)
Meta-Bootstrapping (1/2)
Meta-Bootstrapping process begins with a small set of seed words that represent a targeted semantic category (eg.” seashore ” is a location) and an unannotated corpus .
Step1 , MetaBoot automatically creates a set of extraction patterns for the corpus by applying syntactic templates .
Step2 , MetaBoot computes a score for each pattern based on the number of the seed words among its extractions .
The best pattern is saved and all of its extracted noun phrase are automatically labeled as the targeted semantic category .
![Page 8: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/8.jpg)
Meta-Bootstrapping (2/2)
MetaBoot then re-scores the extraction patterns , using the original seed words plus the newly labeled words , and the process repeats . (Mutual Bootstrapping)
When the mutual bootstrapping process is finished , all nouns that were put into the semantic dictionary are re-evaluated.
Each noun is assigned a score based on how many different patterns extracted it .
Only the five best nouns are allowed to remain in the dictionary .
Mutual bootstrapping process starts over again using the revised semantic dictionary
![Page 9: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/9.jpg)
Basilisk (1/2) Step1 , Basilisk automatically creates a set of
extraction patterns for the corpus and scores each pattern based on the number of seed words among its extraction .
Basilisk Put the best patterns into a Pattern Pool . Step2 , All nouns extracted by a pattern in the
pattern pool are put into a Candidate Word Pool . Basilisk scores each noun based on the set of patterns
that extracted it and their collective association with the seed words .
Step3 , the top 10 nouns are labeled as the targeted semantic class and are added to dictionary .
![Page 10: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/10.jpg)
Basilisk (2/2) Then the bootstrapping process then repeats , using
the original seed and the newly labeled words . The major difference Basilisk and Meta-Bootstrapping :
Basilisk scores each noun based on collective information gathered from all patterns that extracted it .
Meta-Bootstrapping identify a single best pattern and assumes that everything it extracts belongs to the same semantic category .
In comparative experiment , Basilisk outperformed Meta-Bootstrapping .
![Page 11: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/11.jpg)
Experimental Results (1/2)
We create the bootstrapping corpus , by gathering 950 new texts from FBIS and manually selected 20 high-frequency words as seed words .
We run each bootstrapping algorithm for 400 iterations , generating 5 word per iteration . Basilisk generates 2000 nouns and Meta-Bootstrapping generates 1996 nouns .
![Page 12: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/12.jpg)
Experimental Results (2/2)
Next , we manually review 3996 words proposed by the algorithm and classify the words as StrongSubjective , Weak Subjective or Objective .
X - the number of words generated
Y - the percentage of those words
that were manually classified as
subjective
![Page 13: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/13.jpg)
Subjective Classifier (1/3)
To evaluate the subjective nouns , we train a Naïve Bayes classifier using the nouns as features . We also incorporated previously established subjectivity clues , and added some new discourse features .
Subjective Noun Features : We define four features BA-Strong , BA-weak , MB-Strong ,
MB-Weak to represent the sets of subjective nouns produced by bootstrapping algorithm .
We create a three-valued feature based on the presence of 0 , 1 , >=2 words from that set .
![Page 14: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/14.jpg)
Subjective Classifier (2/3)
WBO Features : Wiebe , Bruce and O’Hara (1999) , a machine learning sy
stem to classify subjective sentences . Manual Features :
Levin 1993 ; Ballmer and Brennenstuhl 1981 Some fragment lemmas with frame element experiencer
(Baker et al. 1998) Adjectives manually annotated for polarity (Hatzivassilogl
ou and McKeown 1997 ) Some subjective clues list in (Wiebe 1990)
![Page 15: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/15.jpg)
Subjective Classifier (3/3)
Discourse Features : We use discourse feature to capture the density of
clues in the text surrounding a sentence . First , we compute the average number of subjective
clues and objective clues per sentence . Next , we characterize the number of subjective and
objective clues in the previous and next sentence as :
higher-than-expected (high) , lower-than-expected (low) , expected (medium) .
We also define a feature for sentence length .
![Page 16: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/16.jpg)
Classification Result (1/3)
We evaluate each classifier using 25-fold cross validation on the experiment corpus and use paired t-test to measure significance at the 95% confidence level .
We compute Accuracy (Acc) as the percentage that match the gold-standard , and Precision (Prec) , Recall (Rec) with respect to subjective sentences .
Gold-standard : a sentence is subjective if it contains at least one private-state expression of medium or higher strength .
Objective class consist of everything else .
![Page 17: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/17.jpg)
Classification Result (2/3)
We train a Naive Bays classifier using only the SubjNoun features . This classifier achieve good precision (77%) but only moderate recall (64%) .
We discover that the subjective nouns are good indicators when they appear , but not every subjective sentence contains a subjective noun .
![Page 18: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/18.jpg)
Classification Result (3/3)
There is a synergy between these feature set : using both types of features achieves better performance than either one alone .
In Table 8 Row 1 , we use WBO + SubjNoun + manual + discourse feature . This classifier achieve 81.3% precision , 77.4% recall and 76.1% accuracy .
![Page 19: Learning Subjective Nouns using Extraction Pattern Bootstrapping](https://reader035.vdocument.in/reader035/viewer/2022071714/56812dbd550346895d92fac1/html5/thumbnails/19.jpg)
Conclusion We demonstrate that weakly supervised bootstrapping
techniques can learn subjective terms from unannotated texts.
Bootstrapping algorithms can learn not only general semantic category , but any category for which words appear in similar linguistic phrase .
The experiment suggest that reliable subjective classification require a broad array of features .