information extractors
Post on 23-Feb-2016
53 Views
Preview:
DESCRIPTION
TRANSCRIPT
Hassan A. Sleiman
Information Extractors
RoadMap• Introduction• Comparison• IE Framework• Conclusions
We are talking about IEs
WrapperForm FillerNavigator
Information ExtractorOntologiser
Verifier
IE in action
¨ Input:¨ Web pages¨ Rules/patterns
¨ Output:¨ Extracted data
Extraction rules
Information extractor
Document
DataThe Da Vinci Code
Dan Brown
15.95 €
2006
Robert Langdon…
Doubleday
Comparison
...
...
Framework
¨ IE framework.¨ Reusable.¨ Comparable results.
• Introduction• Our work:
• Survey • Framework
• Conclusions
RoadMap
Survey
¨ 62 Information Extractors identified.¨ 43 IEs are studied.
• Introduction• Our work:
• Survey • Framework
• Conclusions
RoadMap
Components
DataSet
Resultset
RuleSet
Learner
InfoExtractor
PreprocessorUtilities
<a href=“http://example.com”> the _<span> Times </span></a>
<a href=“http://example.com”> the _<span> Times </span></a>
<a “href=http://example.com”> the _<span> Times </span></a>
Tokenisation
<a “href=http://example.com”> the <span> Times </span></a>
• Tag & Text
• Word & No-Word
• Chars
Example:
DataSet 1/2
DataSet 2/2
RuleSet
Keep in mind!
Dataset
• Introduction• Our work:
• Survey • Framework
• Conclusions
RoadMap
Conclusions
¨ Goals for 2010:¨ IE Framework.¨ Survey.¨ Comparable IE implementations.¨ Marking tool.¨ Tokeniser.
¨ Achievements 2009:¨ Studying 43 IEs.¨ Framework Modules definition.
Seeking for a paper?Try The TDG Scholar at
http://scholar.tdg-seville.info/
Thanks!
top related