information extractors

Post on 23-Feb-2016

53 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Information Extractors. Hassan A. Sleiman. RoadMap. Introduction Comparison IE Framework Conclusions. Wrapper. Form Filler. Navigator. Information Extractor. Ontologiser. Verifier. We are talking about IEs. The Da Vinci Code. Doubleday. 2006. Dan Brown. 15.95 €. - PowerPoint PPT Presentation

TRANSCRIPT

Hassan A. Sleiman

Information Extractors

RoadMap• Introduction• Comparison• IE Framework• Conclusions

We are talking about IEs

WrapperForm FillerNavigator

Information ExtractorOntologiser

Verifier

IE in action

¨ Input:¨ Web pages¨ Rules/patterns

¨ Output:¨ Extracted data

Extraction rules

Information extractor

Document

DataThe Da Vinci Code

Dan Brown

15.95 €

2006

Robert Langdon…

Doubleday

Comparison

...

...

Framework

¨ IE framework.¨ Reusable.¨ Comparable results.

• Introduction• Our work:

• Survey • Framework

• Conclusions

RoadMap

Survey

¨ 62 Information Extractors identified.¨ 43 IEs are studied.

• Introduction• Our work:

• Survey • Framework

• Conclusions

RoadMap

Components

DataSet

Resultset

RuleSet

Learner

InfoExtractor

PreprocessorUtilities

<a href=“http://example.com”> the _<span> Times </span></a>

<a href=“http://example.com”> the _<span> Times </span></a>

<a “href=http://example.com”> the _<span> Times </span></a>

Tokenisation

<a “href=http://example.com”> the <span> Times </span></a>

• Tag & Text

• Word & No-Word

• Chars

Example:

DataSet 1/2

DataSet 2/2

RuleSet

Keep in mind!

Dataset

• Introduction• Our work:

• Survey • Framework

• Conclusions

RoadMap

Conclusions

¨ Goals for 2010:¨ IE Framework.¨ Survey.¨ Comparable IE implementations.¨ Marking tool.¨ Tokeniser.

¨ Achievements 2009:¨ Studying 43 IEs.¨ Framework Modules definition.

Seeking for a paper?Try The TDG Scholar at

http://scholar.tdg-seville.info/

Thanks!

top related