slot filler validation
DESCRIPTION
Slot Filler Validation. Mark Sammons, John Wieting , Subhro Roy, Chizeng Wang, and Dan Roth Computer Science Department, University of Illinois . OUTLINE. Slot Filler Validation (SFV): Task and Background SFV Data Our approach Argument Checking Argument Matching Relation Matching - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/1.jpg)
Slot Filler Validation
Mark Sammons, John Wieting, Subhro Roy, Chizeng Wang, and Dan Roth
Computer Science Department, University of Illinois
![Page 2: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/2.jpg)
OUTLINE
Slot Filler Validation (SFV): Task and Background SFV Data Our approach
Argument Checking Argument Matching Relation Matching Learning
Results Future Directions Conclusion
Page 2
![Page 3: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/3.jpg)
Slot Filler Validation (SFV) Task
Teams build KBP Slot Filler (SF) systems that provide answers to queries about specific entities
The Slot Filler Validation task is to review the answers proposed by the KBP SF systems and filter out those that are incorrect
An example candidate answer:QUERY: Paul Gray employee_or_member_of 38
Page 3
![Page 4: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/4.jpg)
Slot Filler Validation (SFV) Task Large set of relations relating to either persons or
organizations Examples for subject type PER:
Age Alternate_names
Examples for subject type ORG: Date_founded Member_of
Page 4
![Page 5: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/5.jpg)
Slot Filler Validation (SFV) Task
From a per-example perspective, this is closely related to Recognizing Textual Entailment
Differences: The data set is heavily skewed (many more negative examples than
positive – about 9:1 in 2011, 5:1 in 2012) The source documents are longer than the short texts in traditional
RTE There are many more examples (order of 50x as much data) The examples are produced by NLP systems
Page 5
![Page 6: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/6.jpg)
SFV DATA
Page 6
![Page 7: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/7.jpg)
Data Statistics KBP 2011: ~48,000 candidate answers split into dev and test
sets. Dev set was labeled We inferred the labels for the test set from the KBP SF human gold
standard KBP 2012: 22,885 candidate answers
Some changes to the set of relations about 20% of answers were from web log/
newsgroup/web forum documents KBP 2013: 52,641 candidate answers
about 9% were from web sources
Page 7
![Page 8: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/8.jpg)
Newsgroup, Web log, Discussion forum data Very noisy: teribble speling an grammmer w/ lot’s of abbr. and
typos and f****ng A lot of HTML markup, different standards from different
sources A lot of structure: quoting, post nesting
Within each genre, tags are consistent, but to use the structure would require additional parsing/logic
A lot of ad-hoc formatting with a wide range of characters A lot of non-standard punctuation Some documents are ludicrously large -- >100k characters
Page 8
![Page 9: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/9.jpg)
OUR APPROACH
Page 9
![Page 10: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/10.jpg)
Our SFV System
We look at the SFV task as a long term project We built a relatively simple baseline system It is a modular system that can improve through the years to create a
more advanced system First year participating in task and so started from scratch
We approach the task as a continuation of research in RTE Can be seen as essentially equivalent to RTE
Source (or Text) is entire document Target (or Hypothesis) in RTE can often be viewed as a structured query
Trista Sutter works for B’s Purses.QUERY: Trista Sutter employee_of B's Purses
Page 10
![Page 11: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/11.jpg)
Our SFV System
Page 11
![Page 12: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/12.jpg)
PRE-PROCESSING
Page 12
![Page 13: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/13.jpg)
Our SFV System
Page 13
![Page 14: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/14.jpg)
Pre-Processing The document collection was first processed to remove XML
and HTML markup and stored in a Lucene database. Then at runtime, queried documents were further cleaned
To optimally use NLP tools we needed to remove or alter problematic character sequences
Remove repeated punctuation Map characters to ascii equivalents if possible. Otherwise remove them. Normalize quotation marks/single quotes/apostrophes Remove any XML/HTML tags in queries
All large documents (> 100k characters) were truncated at 100k characters (nearest sentence break).
Before applying heuristics, 4,395 documents could not be processed of the 2012 data set with the Curator. After heuristics, down to 2,007 – still significant though at ~9%.
Page 14
![Page 15: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/15.jpg)
Pre-Processing Lastly, use NLP tools to annotate text.
Process with Illinois tokenizer, POS tagger, Shallow parser, NER, and Wikifier
Processing was done using the Curator NLP tool management system designed to simplify the use of and
aggregate a variety of NLP resources. Curator caches results of annotations
Page 15
![Page 16: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/16.jpg)
ARGUMENT CHECKING
Page 16
![Page 17: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/17.jpg)
Our SFV System
Page 17
![Page 18: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/18.jpg)
Argument Compatibility Checking
Goal: filter out bad candidates without too much computation High recall, and as high precision as possible
Generated type constraints for each relation For instance city_of_birth must have a person as the subject and a city
as the object school_attended must have a person as subject.
A constraint is satisfied if The constituent belonged to that type The constituent did not belong to any other type.
Also used gazetteers for some entity classes (e.g. cities)
Page 18
![Page 19: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/19.jpg)
Argument Checker Examples
Below are some of the queries filtered by the argument checker QUERY: Billy Mays title Ed Wood QUERY: Blake Edwards parents mother QUERY: Ko Yong-Hi spouse Japanese QUERY: Paul Gray employee_or_member_of 38 QUERY: Scorpions city_of_headquarters James.The
Page 19
![Page 20: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/20.jpg)
ARGUMENT MATCHING
Page 20
![Page 21: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/21.jpg)
Our SFV System
Page 21
![Page 22: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/22.jpg)
Argument Matching
Used Shallow Parser, NER, and Wikifier to generate candidate arguments from query reference document Infer types from constituent labels Some arguments were missed by these NLP tools. We therefore
checked the documents for an exact match of subjects and objects and created appropriate constituent when they were found.
Page 22
![Page 23: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/23.jpg)
Argument Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
BAA has been ordered by Britain's Competition Commission to also dispose of London Stansted airport and either its airport in Edinburgh or Glasgow in Scotland. The company is allowed to keep London Heathrow airport.BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 23
![Page 24: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/24.jpg)
Argument Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
BAA has been ordered by Britain's Competition Commission to also dispose of London Stansted airport and either its airport in Edinburgh or Glasgow in Scotland. The company is allowed to keep London Heathrow airport.BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 24
![Page 25: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/25.jpg)
Argument Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
BAA has been ordered by Britain's Competition Commission to also dispose of London Stansted airport and either its airport in Edinburgh or Glasgow in Scotland. The company is allowed to keep London Heathrow airport.BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 25
![Page 26: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/26.jpg)
Argument Matching
Problem: NER, Wikifier etc. may miss an entity mention QUERY: Trista Sutter employee_of B's Purses
The baby is due this summer, said publicist Yani Chang. Thecouple tied the knot in 2003 in a two-hour ABC special.Ryan Sutter, 32, is a fireman in Vail, Colorado. Trista Sutter,34, is a designer for B's Purses, an online boutique.
Solution: look for exact match for query subject and object, and create NER-like mentions for them
Page 26
![Page 27: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/27.jpg)
Argument Matching
Use NESim to find matches between candidates and query subject/object Context free, high recall, Named Entity similarity metric matches some acronyms, partial names (e.g. surname or first name
only) Require at least one mention in the document to match the
entire argument Avoid e.g. last-name-only matches, or homonymous acronyms
In cases where object is not a Named Entity (i.e. charges) we can rule out some candidates by checking if they are a number, date, person, etc.
Page 27
![Page 28: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/28.jpg)
Argument Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
BAA has been ordered by Britain's Competition Commission to also dispose of London Stansted airport and either its airport in Edinburgh or Glasgow in Scotland. The company is allowed to keep London Heathrow airport.BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 28
![Page 29: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/29.jpg)
Argument Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
BAA has been ordered by Britain's Competition Commission to also dispose of London Stansted airport and either its airport in Edinburgh or Glasgow in Scotland. The company is allowed to keep London Heathrow airport.BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 29
![Page 30: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/30.jpg)
RELATION MATCHING
Page 30
![Page 31: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/31.jpg)
Our SFV System
Page 31
![Page 32: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/32.jpg)
Relation Matching
Due to lack of training data to train a relation classifier and a lack of outside resources, we created hand-coded rules specifying lexical patterns Each relation has own set of rules Control matching precision with set of parameters to allow emphasis
on precision or on recall Around 600 rules for 41 relations
Some rules are duplicated over relations Rules were created based on analysis of 3000 examples subsets from
2011 and 2012 Slot Filler Validation data sets.
Page 32
![Page 33: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/33.jpg)
Rules
Two main types of rules Note that subject and object have already been type checked by
argument checker and argument matching components Position based:
alternate_names @@@ adj:OBJ; (; adj:SUBJ; )age @@@ adj:SUBJ; *; adj:OBJ
Window based:charges @@@ charge; withemployee_or_member_of @@@ CEO; anti:.
Martha Stewart was charged with insider trading.
Page 33
![Page 34: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/34.jpg)
Relation Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
RULE: ID: 476, Components : [[base]] [[in, at]]
BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 34
![Page 35: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/35.jpg)
Relation Matching – Example
QUERY: Global Infrastructure Partners stateorprovince_of_headquartersNew York
RULE: ID: 476, Components : [[base]] [[in, at]]
BAA, owned by a consortium headed by Grupo Ferrovial S.A. of Spain, said it would use the proceeds to pay off debt.Global Infrastructure Partners, based in New York, also owns a 75 percent stake in London City Airport.The sale, subject to clearance by regulators, is expected to be completed in December.
Page 35
![Page 36: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/36.jpg)
Relation Matching – Example
QUERY: River Road Asset Management city_of_headquarters Louisville
RULE: ID: 95 Components : [[SUBJ]] [[*]] [[OBJ]]
British insurance company Aviva PLC said Tuesday that it has agreed to buy all the shares in River Road Asset Management of Louisville, Kentucky.Aviva did not say what it was paying for the company, which has gross assets of $6 million and $3.6 billion of assets under management.Aviva has $549 billion of assets under management, not including the River Road acquisition.
Page 36
![Page 37: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/37.jpg)
Relation Matching – Example
QUERY: River Road Asset Management city_of_headquarters Louisville
RULE: ID: 95 Components : [[SUBJ]] [[*]] [[OBJ]]
British insurance company Aviva PLC said Tuesday that it has agreed to buy all the shares in River Road Asset Management of Louisville, Kentucky.Aviva did not say what it was paying for the company, which has gross assets of $6 million and $3.6 billion of assets under management.Aviva has $549 billion of assets under management, not including the River Road acquisition.
Page 37
![Page 38: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/38.jpg)
Rule behavior
Constrained via several parameters How far away can subject, object be from rule match term? Do all the rule components have to match? Does the order of components matter? Can arguments be matched separately?
The last parameter is a weak proxy for coreference Too expensive to use coreference with current system.QUERY: John Smith cause_of_death cancer of the liver
John Smith, former leader of the Labour Party, has died. Known for his aggressive stance on…
…he died after a long battle with cancer of the liver.
Page 38
![Page 39: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/39.jpg)
DECISION MAKING
Page 39
![Page 40: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/40.jpg)
Our SFV System
Page 40
![Page 41: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/41.jpg)
Rule Decision
A rule was considered to be triggered if the number of matching components surpassed some learned threshold.
Threshold was tuned on some held out set of data and the same threshold was used for all rules and all relations. Used 2011 or 2012 SFV data to tune
Page 41
![Page 42: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/42.jpg)
Learning component
Strategy with the learning-based system was to correct mistakes made by the rule based system and also expand its coverage.
Used SVM model trained with libSVM Weighted positive examples 6 times more than negative examples due
to imbalanced data One weight vector for entire model, not separate for each relation
Features were conjoined with relation type Features were not necessarily conjoined with specific rules
Used linear kernel Optimal parameters were found with grid search
Page 42
![Page 43: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/43.jpg)
Features
Coarse model Used coarser features in hopes of better generalization with less data Features included:
whether a rule was triggered whether a sentence had ended between query arguments which rule was triggered binned minimum distance between arguments when rule was not
triggered unigrams and POS tags between arguments in same sentence when rule
not triggered
Page 43
![Page 44: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/44.jpg)
Features
Expressive model Used more expressive features and tried to correct mistakes of specific
rules Features included those in the coarse model with the addition of:
maximum distance between all components of a specific rule and the subject and object
tokens of the specific rule found in the document
Page 44
![Page 45: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/45.jpg)
SYSTEM PERFORMANCE
Page 45
![Page 46: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/46.jpg)
System performance (2012)
System Notes Prec Rec F1Baseline Always label “YES” 0.181 1.0 0.307
Arg Checker 1 Only argument checker 0.184 0.986 0.310
Arg Checker 2 With argument match 0.280 0.798 0.415
Rules (0.55) Low threshold 0.449 0.667 0.537
Rules (0.85) High threshold 0.475 0.558 0.513
Learning 1 Coarse features 0.402 0.766 0.527
Learning 2 Expressive features 0.402 0.687 0.507
Learning 3 Coarse features, 10 CV 0.447 0.827 0.581
Learning 4 Expressive features, 10 CV 0.489 0.775 0.600
Page 46
All systems with the exception of the learning-based ones were evaluated on all of 2012. Learning systems then were either cross-validated on 2012 or trained on the development and evaluated on the test set of 2012 (both data sets had ~12k examples).
![Page 47: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/47.jpg)
System performance (2012)
System Notes Prec Rec F1Baseline Always label “YES” 0.181 1.0 0.307
Arg Checker 1 Only argument checker 0.184 0.986 0.310
Arg Checker 2 With argument match 0.280 0.798 0.415
Rules (0.55) Low threshold 0.449 0.667 0.537
Rules (0.85) High threshold 0.475 0.558 0.513
Learning 1 Coarse features 0.402 0.766 0.527
Learning 2 Expressive features 0.402 0.687 0.507
Learning 3 Coarse features, 10 CV 0.447 0.827 0.581
Learning 4 Expressive features, 10 CV 0.489 0.775 0.600
Page 47
All systems with the exception of the learning-based ones were evaluated on all of 2012. Learning systems then were either cross-validated on 2012 or trained on the development and evaluated on the test set of 2012 (both data sets had ~12k examples).
![Page 48: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/48.jpg)
System performance (2012)
System Notes Prec Rec F1Baseline Always label “YES” 0.181 1.0 0.307
Arg Checker 1 Only argument checker 0.184 0.986 0.310
Arg Checker 2 With argument match 0.280 0.798 0.415
Rules (0.55) Low threshold 0.449 0.667 0.537
Rules (0.85) High threshold 0.475 0.558 0.513
Learning 1 Coarse features 0.402 0.766 0.527
Learning 2 Expressive features 0.402 0.687 0.507
Learning 3 Coarse features, 10 CV 0.447 0.827 0.581
Learning 4 Expressive features, 10 CV 0.489 0.775 0.600
Page 48
All systems with the exception of the learning-based ones were evaluated on all of 2012. Learning systems then were either cross-validated on 2012 or trained on the development and evaluated on the test set of 2012 (both data sets had ~12k examples).
![Page 49: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/49.jpg)
System performance (2012)
System Notes Prec Rec F1Baseline Always label “YES” 0.181 1.0 0.307
Arg Checker 1 Only argument checker 0.184 0.986 0.310
Arg Checker 2 With argument match 0.280 0.798 0.415
Rules (0.55) Low threshold 0.449 0.667 0.537
Rules (0.85) High threshold 0.475 0.558 0.513
Learning 1 Coarse features 0.402 0.766 0.527
Learning 2 Expressive features 0.402 0.687 0.507
Learning 3 Coarse features, 10 CV 0.447 0.827 0.581
Learning 4 Expressive features, 10 CV 0.489 0.775 0.600
Page 49
All systems with the exception of the learning-based ones were evaluated on all of 2012. Learning systems then were either cross-validated on 2012 or trained on the development and evaluated on the test set of 2012 (both data sets had ~12k examples).
![Page 50: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/50.jpg)
System performance (2012)
System Notes Prec Rec F1Baseline Always label “YES” 0.181 1.0 0.307
Arg Checker 1 Only argument checker 0.184 0.986 0.310
Arg Checker 2 With argument match 0.280 0.798 0.415
Rules (0.55) Low threshold 0.449 0.667 0.537
Rules (0.85) High threshold 0.475 0.558 0.513
Learning 1 Coarse features 0.402 0.766 0.527
Learning 2 Expressive features 0.402 0.687 0.507
Learning 3 Coarse features, 10 CV 0.447 0.827 0.581
Learning 4 Expressive features, 10 CV 0.489 0.775 0.600
Page 50
All systems with the exception of the learning-based ones were evaluated on all of 2012. Learning systems then were either cross-validated on 2012 or trained on the development and evaluated on the test set of 2012 (both data sets had ~12k examples).
![Page 51: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/51.jpg)
System performance (2013)
System Notes Prec Rec F1Baseline Always label “YES” 0.248 1.0 0.398
Arg Checker 1 Only argument checker 0.254 0.967 0.402
Arg Checker 2 With argument match 0.305 0.825 0.446
Rules (0.55) Low threshold 0.349 0.708 0.467
Rules (0.85) High threshold 0.367 0.604 0.456
Learning 1 Expressive features, 2012 0.330 0.398 0.361
Learning 2 Coarse features, 2012 0.338 0.783 0.472
Learning 3* Expressive features, 2013 0.469 0.604 0.528
Learning 4* Coarse features, 2013 0.400 0.863 0.547
Page 51
All systems with the exception of the learning-based ones were evaluated on all of 2013. Learning systems then were either trained on all of 2012 (~24k examples) and evaluated on all of 2013 or trained on half of the 2013 data (~26k examples) and evaluated on the other half.
![Page 52: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/52.jpg)
System performance (2013)
System Notes Prec Rec F1Baseline Always label “YES” 0.248 1.0 0.398
Arg Checker 1 Only argument checker 0.254 0.967 0.402
Arg Checker 2 With argument match 0.305 0.825 0.446
Rules (0.55) Low threshold 0.349 0.708 0.467
Rules (0.85) High threshold 0.367 0.604 0.456
Learning 1 Expressive features, 2012 0.330 0.398 0.361
Learning 2 Coarse features, 2012 0.338 0.783 0.472
Learning 3* Expressive features, 2013 0.469 0.604 0.528
Learning 4* Coarse features, 2013 0.400 0.863 0.547
Page 52
All systems with the exception of the learning-based ones were evaluated on all of 2013. Learning systems then were either trained on all of 2012 (~24k examples) and evaluated on all of 2013 or trained on half of the 2013 data (~26k examples) and evaluated on the other half.
![Page 53: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/53.jpg)
System performance (2013)
System Notes Prec Rec F1Baseline Always label “YES” 0.248 1.0 0.398
Arg Checker 1 Only argument checker 0.254 0.967 0.402
Arg Checker 2 With argument match 0.305 0.825 0.446
Rules (0.55) Low threshold 0.349 0.708 0.467
Rules (0.85) High threshold 0.367 0.604 0.456
Learning 1 Expressive features, 2012 0.330 0.398 0.361
Learning 2 Coarse features, 2012 0.338 0.783 0.472
Learning 3* Expressive features, 2013 0.469 0.604 0.528
Learning 4* Coarse features, 2013 0.400 0.863 0.547
Page 53
All systems with the exception of the learning-based ones were evaluated on all of 2013. Learning systems then were either trained on all of 2012 (~24k examples) and evaluated on all of 2013 or trained on half of the 2013 data (~26k examples) and evaluated on the other half.
![Page 54: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/54.jpg)
System performance (2013)
System Notes Prec Rec F1Baseline Always label “YES” 0.248 1.0 0.398
Arg Checker 1 Only argument checker 0.254 0.967 0.402
Arg Checker 2 With argument match 0.305 0.825 0.446
Rules (0.55) Low threshold 0.349 0.708 0.467
Rules (0.85) High threshold 0.367 0.604 0.456
Learning 1 Expressive features, 2012 0.330 0.398 0.361
Learning 2 Coarse features, 2012 0.338 0.783 0.472
Learning 3* Expressive features, 2013 0.469 0.604 0.528
Learning 4* Coarse features, 2013 0.400 0.863 0.547
Page 54
All systems with the exception of the learning-based ones were evaluated on all of 2013. Learning systems then were either trained on all of 2012 (~24k examples) and evaluated on all of 2013 or trained on half of the 2013 data (~26k examples) and evaluated on the other half.
![Page 55: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/55.jpg)
System performance (2013)
System Notes Prec Rec F1Baseline Always label “YES” 0.248 1.0 0.398
Arg Checker 1 Only argument checker 0.254 0.967 0.402
Arg Checker 2 With argument match 0.305 0.825 0.446
Rules (0.55) Low threshold 0.349 0.708 0.467
Rules (0.85) High threshold 0.367 0.604 0.456
Learning 1 Expressive features, 2012 0.330 0.398 0.361
Learning 2 Coarse features, 2012 0.338 0.783 0.472
Learning 3* Expressive features, 2013 0.469 0.604 0.528
Learning 4* Coarse features, 2013 0.400 0.863 0.547
Page 55
All systems with the exception of the learning-based ones were evaluated on all of 2013. Learning systems then were either trained on all of 2012 (~24k examples) and evaluated on all of 2013 or trained on half of the 2013 data (~26k examples) and evaluated on the other half.
![Page 56: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/56.jpg)
NEXT STEPS
Page 56
![Page 57: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/57.jpg)
Future Directions
Improve the rule abstraction There are common patterns in text that would make useful
abstractions E.g. apposition, possessive constructions
Improve rule application Use a lexical similarity measure for greater generalization Often, there are multiple compatible arguments in close proximity and
the relations between arguments constrain each other Can identify other instances of the relations near the arguments and use
these as constraints.
In 2008, Omnicorp CEO John Armitage fired his VP, Steve Nickelson.
Page 57
?
![Page 58: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/58.jpg)
Future Directions
Push the “purposeful inference” direction Selectively apply deeper, more expensive analysis, e.g. SRL,
Coreference, processing only special cases Can cluster rules into types that would benefit from features derived
from these NLP tools. For instance, rules having a verb and an object would benefit from verb
SRL Enrich the token-based representation
Allow external constraints on tokenization: e.g. “treat ‘E!’ and ‘will.i.am’ as single tokens”
Do this in a straightforward way that doesn’t break existing tools
Page 58
![Page 59: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/59.jpg)
CONCLUSION
Page 59
![Page 60: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/60.jpg)
Conclusion
The SFV task can be seen as equivalent to the RTE task, however the large scale of the SFV task requires it to have a different treatment
We have presented a modular system that is robust to large changes in the dataset, provides an abstraction for generating useful features, and can be easily modified to accommodate new relations.
Future work will build upon the current system and seek to improve it by selectively using heavier NLP tools, lexical similarity and a more abstract rule syntax.
Page 60
![Page 61: Slot Filler Validation](https://reader035.vdocument.in/reader035/viewer/2022062410/56816266550346895dd2cf80/html5/thumbnails/61.jpg)
THANK YOU!
Page 61