machine learning for information extraction
Post on 31-Dec-2015
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
Machine Learning for Information Extraction
Li Xu
Objective
• Learn how to apply the machine learning concept to the application
• Learn how to improve the performance of the existed application by applying the machine learning algorithms
Introduction
• Information Extraction (IE) is concerned with extracting the relevant data from a collection of document.
• Key component: extraction patterns.
• Machine Learning algorithms.
IE for Free Text
• Syntactic and semantic constraints
• AutoSlog
• LIEP
• PALKA
• CRYSTAL
• CRYSTAL + Webfoot
• HASTEN
IE from online Document• WHISK (Soderland 1998)
– Domain: Rental Ads– Precision: ~95%; Recall: 73%-90%
• RAPIER (Califf & Mooney 1997)– Domain: software jobs– Precision: 84%; Recall: 53%
• SRV (Freitag 1998)– Domain: Seminar announcement – Precision: Speaker, 75%; Location,75%; start time 99%, end time
96%.
WHISK
RAPIER
SRV
Problems• Bottom-up search
– RAPIER– WHISK
• Single-slot extraction rules – SRV– RAPIER
• Heavily depend on the layout pattern
Obituary Ontology
Improvement
Lexical Object
• Relational Learning– FOIL– Feature design
• Regular expression
• Rote Learning
Multi-slot Hierarchy
Multi-slot Boundary
• Relational Learning
• Feature Design– Individual heuristics – Combining heuristics
Conclusion
• How to applying the machine learning algorithm to IE?
• What is the problem for each system?
• How to improve an existed IE approach through machine learning? And how to avoid the problems appeared in other machine learning based IE systems?
top related