information extractors

19
Hassan A. Sleiman Information Extractors

Upload: konala

Post on 23-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Information Extractors. Hassan A. Sleiman. RoadMap. Introduction Comparison IE Framework Conclusions. Wrapper. Form Filler. Navigator. Information Extractor. Ontologiser. Verifier. We are talking about IEs. The Da Vinci Code. Doubleday. 2006. Dan Brown. 15.95 €. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Extractors

Hassan A. Sleiman

Information Extractors

Page 2: Information Extractors

RoadMap• Introduction• Comparison• IE Framework• Conclusions

Page 3: Information Extractors

We are talking about IEs

WrapperForm FillerNavigator

Information ExtractorOntologiser

Verifier

Page 4: Information Extractors

IE in action

¨ Input:¨ Web pages¨ Rules/patterns

¨ Output:¨ Extracted data

Extraction rules

Information extractor

Document

DataThe Da Vinci Code

Dan Brown

15.95 €

2006

Robert Langdon…

Doubleday

Page 5: Information Extractors

Comparison

...

...

Page 6: Information Extractors

Framework

¨ IE framework.¨ Reusable.¨ Comparable results.

Page 7: Information Extractors

• Introduction• Our work:

• Survey • Framework

• Conclusions

RoadMap

Page 8: Information Extractors

Survey

¨ 62 Information Extractors identified.¨ 43 IEs are studied.

Page 9: Information Extractors

• Introduction• Our work:

• Survey • Framework

• Conclusions

RoadMap

Page 10: Information Extractors

Components

DataSet

Resultset

RuleSet

Learner

InfoExtractor

PreprocessorUtilities

Page 11: Information Extractors

<a href=“http://example.com”> the _<span> Times </span></a>

<a href=“http://example.com”> the _<span> Times </span></a>

<a “href=http://example.com”> the _<span> Times </span></a>

Tokenisation

<a “href=http://example.com”> the <span> Times </span></a>

• Tag & Text

• Word & No-Word

• Chars

Example:

Page 12: Information Extractors

DataSet 1/2

Page 13: Information Extractors

DataSet 2/2

Page 14: Information Extractors

RuleSet

Page 15: Information Extractors

Keep in mind!

Page 16: Information Extractors

Dataset

Page 17: Information Extractors

• Introduction• Our work:

• Survey • Framework

• Conclusions

RoadMap

Page 18: Information Extractors

Conclusions

¨ Goals for 2010:¨ IE Framework.¨ Survey.¨ Comparable IE implementations.¨ Marking tool.¨ Tokeniser.

¨ Achievements 2009:¨ Studying 43 IEs.¨ Framework Modules definition.

Page 19: Information Extractors

Seeking for a paper?Try The TDG Scholar at

http://scholar.tdg-seville.info/

Thanks!