machine learning for information extraction

15
Machine Learning for Information Extraction Li Xu

Upload: lyle-walker

Post on 31-Dec-2015

29 views

Category:

Documents


2 download

DESCRIPTION

Machine Learning for Information Extraction. Li Xu. Objective. Learn how to apply the machine learning concept to the application Learn how to improve the performance of the existed application by applying the machine learning algorithms. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning for Information Extraction

Machine Learning for Information Extraction

Li Xu

Page 2: Machine Learning for Information Extraction

Objective

• Learn how to apply the machine learning concept to the application

• Learn how to improve the performance of the existed application by applying the machine learning algorithms

Page 3: Machine Learning for Information Extraction

Introduction

• Information Extraction (IE) is concerned with extracting the relevant data from a collection of document.

• Key component: extraction patterns.

• Machine Learning algorithms.

Page 4: Machine Learning for Information Extraction

IE for Free Text

• Syntactic and semantic constraints

• AutoSlog

• LIEP

• PALKA

• CRYSTAL

• CRYSTAL + Webfoot

• HASTEN

Page 5: Machine Learning for Information Extraction

IE from online Document• WHISK (Soderland 1998)

– Domain: Rental Ads– Precision: ~95%; Recall: 73%-90%

• RAPIER (Califf & Mooney 1997)– Domain: software jobs– Precision: 84%; Recall: 53%

• SRV (Freitag 1998)– Domain: Seminar announcement – Precision: Speaker, 75%; Location,75%; start time 99%, end time

96%.

Page 6: Machine Learning for Information Extraction

WHISK

Page 7: Machine Learning for Information Extraction

RAPIER

Page 8: Machine Learning for Information Extraction

SRV

Page 9: Machine Learning for Information Extraction

Problems• Bottom-up search

– RAPIER– WHISK

• Single-slot extraction rules – SRV– RAPIER

• Heavily depend on the layout pattern

Page 10: Machine Learning for Information Extraction

Obituary Ontology

Page 11: Machine Learning for Information Extraction

Improvement

Page 12: Machine Learning for Information Extraction

Lexical Object

• Relational Learning– FOIL– Feature design

• Regular expression

• Rote Learning

Page 13: Machine Learning for Information Extraction

Multi-slot Hierarchy

Page 14: Machine Learning for Information Extraction

Multi-slot Boundary

• Relational Learning

• Feature Design– Individual heuristics – Combining heuristics

Page 15: Machine Learning for Information Extraction

Conclusion

• How to applying the machine learning algorithm to IE?

• What is the problem for each system?

• How to improve an existed IE approach through machine learning? And how to avoid the problems appeared in other machine learning based IE systems?