school of computer science 1 information extraction with hmm structures learned by stochastic...

26
1 School of Computer Scie School of Computer Scie School of Computer Scie School of Computer Scie Information Extraction Information Extraction with HMM Structures with HMM Structures Learned by Stochastic Learned by Stochastic Optimization Optimization Dayne Freitag and Andrew McCallum Dayne Freitag and Andrew McCallum Presented by Tal Blum Presented by Tal Blum for the course: Machine Learning for the course: Machine Learning Approaches to Information Extraction Approaches to Information Extraction and Information Integration and Information Integration

Upload: marybeth-mckinney

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

11

School of Computer ScienceSchool of Computer ScienceSchool of Computer ScienceSchool of Computer Science

Information Extraction Information Extraction with HMM Structures with HMM Structures Learned by Stochastic Learned by Stochastic OptimizationOptimization

Dayne Freitag and Andrew McCallumDayne Freitag and Andrew McCallumPresented by Tal BlumPresented by Tal Blum

for the course: Machine Learning Approaches for the course: Machine Learning Approaches to Information Extraction and Information to Information Extraction and Information

Integration Integration

Page 2: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

22

School of Computer Science

OutlineOutline

Background on HMM transition Background on HMM transition structure selectionstructure selection

The algorithm for the sparse IE taskThe algorithm for the sparse IE task Comparison between their algorithm Comparison between their algorithm

and Borkar et al. algorithmand Borkar et al. algorithm DiscussionDiscussion ResultsResults

Page 3: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

33

School of Computer Science

HMMs for IEHMMs for IE

Has been successfully used in Has been successfully used in many tasks:many tasks:– Speech RecognitionSpeech Recognition– Information Extraction (Biker et Information Extraction (Biker et

al.,Borkar et al.)al.,Borkar et al.)– IE in Bioinformatics (Leek)IE in Bioinformatics (Leek)– POS Tagging (Ratnaparkhi)POS Tagging (Ratnaparkhi)

Page 4: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

44

School of Computer Science

Sparse Extraction taskSparse Extraction task

Fields are extracted from a long Fields are extracted from a long documentdocument

Most of the document is Most of the document is irrelevantirrelevant

Examples:Examples:– NENE– Conference Time & LocationConference Time & Location

Page 5: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

55

School of Computer Science

Learning HMM Learning HMM Structure?Structure?

BN

X Y

Z

W

S

Obs

HMM as a BN

)|( 1 tt SSPA

S1

Obs1

S2

Obs2

S3

Obs3

HMM as a dynamic BN

t

)|( tt SObsPB

Page 6: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

66

School of Computer Science

Constrained TransitionConstrained Transition

X3 X4

X1 X2

2.0008.0

5.05.000

2.05.03.00

008.002

)|( 1 tt SSPA

X1 X2 X3 X4

X1

X2

X3

X4

Page 7: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

77

School of Computer Science

HMM Structure HMM Structure LearningLearning

Unlike BN structure learningUnlike BN structure learning Learn the structure of the transition Matrix ALearn the structure of the transition Matrix A Learn structures with different number of Learn structures with different number of

statesstates

Zip code country

street St. #

Zip code country

street St. #

Zip C2

Zip C1

country

street St. #

Page 8: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

88

School of Computer Science

HMM Structure HMM Structure ExampleExample

Page 9: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

99

School of Computer Science

Example Example HierarchicaHierarchical l HMMHMM

Page 10: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1010

School of Computer Science

Why learn HMM Why learn HMM structure?structure?

HMMs are not specifically suited for IE tasksHMMs are not specifically suited for IE tasks Including structural bias can reduce the Including structural bias can reduce the

amount of parameters needed to learn and amount of parameters needed to learn and therefore require less datatherefore require less data

The parameters will be more accurateThe parameters will be more accurate Constrain the number of times a class can Constrain the number of times a class can

appear in a documentappear in a document Can represent class length more accuratelyCan represent class length more accurately The emission probability might be multi The emission probability might be multi

modalmodal To model class left and right context of a To model class left and right context of a

class for the sparse IE taskclass for the sparse IE task

Page 11: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1111

School of Computer Science

Fully Observed vs. Fully Observed vs. Partially ObservedPartially Observed The structure learning is only required The structure learning is only required

when the data is partially observedwhen the data is partially observed Partially ObservedPartially Observed – a field is – a field is

represented by several states, where represented by several states, where the label is the fieldthe label is the field

With fully observed data we can let the With fully observed data we can let the probabilities “learn” the structureprobabilities “learn” the structure

Edges that are not observed will get Edges that are not observed will get zero probabilityzero probability

Learning the transition structure Learning the transition structure involves incorporating new statesinvolves incorporating new states

Naively allowing arbitrary transition will Naively allowing arbitrary transition will not generalize wellnot generalize well

Page 12: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1212

School of Computer Science

The ProblemThe Problem

How to select the additional How to select the additional states and the state transition states and the state transition structurestructure

Manual Selection doesn’t scale Manual Selection doesn’t scale wellwell

Human intuition do not always Human intuition do not always corresponds to the best corresponds to the best structuresstructures

Page 13: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1313

School of Computer Science

The SolutionThe Solution

A system that automatically selects a HMM A system that automatically selects a HMM transition structuretransition structure

The system starts from an initial simple The system starts from an initial simple model and extends it sequentially by a set of model and extends it sequentially by a set of operations to search for a better modeloperations to search for a better model

The model quality is measured by its The model quality is measured by its discrimination on validation datasetdiscrimination on validation dataset

The best model is returnedThe best model is returned The system is comparable with human The system is comparable with human

constructed HMM structures and on average constructed HMM structures and on average outperforms themoutperforms them

Page 14: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1414

School of Computer Science

IE with HMMsIE with HMMs

Each extracted field has its own HMMEach extracted field has its own HMM Each HMM contains two kinds of states:Each HMM contains two kinds of states:

– Target statesTarget states– Non-Target statesNon-Target states

• All of the fields HMM are All of the fields HMM are concatenated to a whole concatenated to a whole consistent HMMconsistent HMM• The entire document is The entire document is used to train the models used to train the models with no need of pre-with no need of pre-processingprocessing

Page 15: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1515

School of Computer Science

Parameter EstimationParameter Estimation

Transition Probabilities Estimation is Transition Probabilities Estimation is done with Maximum Likelihooddone with Maximum Likelihood– Unique path – ratio of countsUnique path – ratio of counts– Non Unique path – use EMNon Unique path – use EM

Emission Probabilities require Emission Probabilities require smoothing with priorssmoothing with priors– shrinkage with EMshrinkage with EM

KsawPswP /1))(|()|( 321

Page 16: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1616

School of Computer Science

Learning State-Learning State-Transition StructureTransition Structure

States:

• Target

• Prefix

• Suffix

• Background

Page 17: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1717

School of Computer Science

Model Expansion Model Expansion ChoicesChoices

States:– Target– Prefix– Suffix– Background

Model Expansion Model Expansion Choices:Choices: – Lengthen a prefixLengthen a prefix– Split a prefixSplit a prefix– Lengthen a suffixLengthen a suffix– Split a suffixSplit a suffix– Lengthen a target stringLengthen a target string– Split a target stringSplit a target string– Add a background stateAdd a background state

Page 18: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1818

School of Computer Science

The AlgorithmThe Algorithm

Page 19: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

1919

School of Computer Science

DiscussionDiscussion

Structure Learning is similar to Structure Learning is similar to rule learning for word or boundary rule learning for word or boundary classificationclassification

The search for the best structure The search for the best structure is not comprehensiveis not comprehensive

There is no attempt to generalize There is no attempt to generalize better by using the same better by using the same emission probabilities for different emission probabilities for different statesstates

Page 20: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2020

School of Computer Science

Comparison with Bokar Comparison with Bokar et. al. algorithmet. al. algorithm

Differences

• Segmentation vs.

Sparse Extraction

• Background and boundaries

modeling

• Unique Path - don’t use EM

• Backward Search vs. Forward Search

• Both assume boundaries and that the position is the more relevant feature that distinguish different states

Page 21: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2121

School of Computer Science

Experimental ResultsExperimental Results

Tested on 8 extraction tasks over 4 Tested on 8 extraction tasks over 4 datasetsdatasets– Seminar Announcements (485)Seminar Announcements (485)– Reuter Corporate Acquisition articles (600)Reuter Corporate Acquisition articles (600)– Job Announcements (298)Job Announcements (298)– Call For Paper (363)Call For Paper (363)

Training and Testing were equal sizeTraining and Testing were equal size Average performance over 10 splitsAverage performance over 10 splits

Page 22: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2222

School of Computer Science

Learned StructureLearned Structure

Page 23: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2323

School of Computer Science

Experimental ResultsExperimental Results

Compared to 4 other approachesCompared to 4 other approaches– Grown HMM – the structure learnedGrown HMM – the structure learned– SRV – rule learning (Freitag 1998)SRV – rule learning (Freitag 1998)– Rapier – rule learning (Califf 1998)Rapier – rule learning (Califf 1998)– Simple HMMSimple HMM– Complex HMMComplex HMM

Page 24: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2424

School of Computer Science

Experimental ResultsExperimental Results

Page 25: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2525

School of Computer Science

ConclusionsConclusions

HMMs has been proved to be state of HMMs has been proved to be state of the art method for IEthe art method for IE

Constraining the transition structure Constraining the transition structure has a crucial effect on performancehas a crucial effect on performance

Automatic Transition Structure Automatic Transition Structure learning compares and even learning compares and even outperforms manually crafted HMMs outperforms manually crafted HMMs which require hard labor for manual which require hard labor for manual constructionconstruction

Page 26: School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented

2626

School of Computer Science

TheTheEnd!End!

Questions?Questions?