constrained conditional models tutorial

64
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Upload: ahava

Post on 18-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Constrained Conditional Models Tutorial. Jingyu Chen, Xiao Cheng. Introduction. Main ideas:. Idea 1: Modeling Separate modeling and problem formulation from algorithms Similar to the philosophy of probabilistic modeling Idea 2: Inference - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Constrained Conditional Models Tutorial

CONSTRAINED CONDITIONAL MODELS TUTORIALJingyu Chen, Xiao Cheng

Page 2: Constrained Conditional Models Tutorial

INTRODUCTION

Page 3: Constrained Conditional Models Tutorial

Main ideas:• Idea 1: Modeling Separate modeling and problem formulation from algorithms

• Similar to the philosophy of probabilistic modeling

• Idea 2: Inference Keep model simple, make expressive decisions (via constraints)

• Unlike probabilistic modeling, where models become more expressive • Inject background knowledge

• Idea 3: Learning Expressive structured decisions can be supported by simply learned models

• Global Inference can be used to amplify the simple models (and even minimal supervision).

Page 4: Constrained Conditional Models Tutorial

Task of interest: Structured Prediction• Common formulation

• e.g. HMM, CRF, Structured Perceptron etc.

• Covers a lot of NLP problems:• Parsing; Semantic Parsing; Summarization; Transliteration; Co-

reference resolution, Textual Entailment…• IE problems:

• Entities, relations, attributes…• How to improve without incurring performance issues?

Page 5: Constrained Conditional Models Tutorial

Pipeline?• Very crude approximation to the real problem, propagates

error.• Ignores dependency :

• e.g. In relation extraction, the label of the entity depends on the relation it is involved and the relation label depends on the label of its arguments.

Page 6: Constrained Conditional Models Tutorial

Model Formulation• Typical models

• With CCM we choose

Penalty Violation measure

Regularization

Local dependencye.g. HMM, CRF

Page 7: Constrained Conditional Models Tutorial

Constraint expressivity

Multiclass Problem:

One v. All approximation:

Ideal classification, can be expressed through constraints

Page 8: Constrained Conditional Models Tutorial

Implementations

Modeling Objective function

Constrained Optimization Solver

Integer Linear Programming

Inference Exact ILP, Heurisitic Search, Relaxation, Dynamic Programming

Learning Learn and , can be learnt jointly or separately, semi-supervised learning etc.

arg max𝑦𝑤𝑇 𝑓 (𝑥 , 𝑦 ) −𝜌𝑇 𝑑 (𝑥 , 𝑦 )

Page 9: Constrained Conditional Models Tutorial

How do we use CCM to learn?

Page 10: Constrained Conditional Models Tutorial

EXAMPLE 1: JOINT INFERENCE-BASED LEARNINGConstrained HMM in Information Extraction

Page 11: Constrained Conditional Models Tutorial

Typical work flow• Define basic classifiers• Define constraints as linear inequalities• Combine the two into an objective function

Page 12: Constrained Conditional Models Tutorial

HMMCCM Example• Information extraction without prior knowledge• Use HMM

Page 13: Constrained Conditional Models Tutorial

HMMCCM Example

AUTHOR Lars Ole Andersen . Program analysis and

TITLE specialization for the

EDITOR C

BOOKTITLE Programming language

TECH-REPORT . PhD thesis .

INSTITUTION DIKU , University of Copenhagen , May

DATE 1994 .

Violates a lot of natural constraints

Page 14: Constrained Conditional Models Tutorial

HMMCCM Example• Each field must be a consecutive list of words and can

appear at most once in a citation.

• State transitions must occur on punctuation marks.

• The citation can only start with AUTHOR or EDITOR.

• The words pp., pages correspond to PAGE.• Four digits starting with 20xx and 19xx are DATE.• Quotations can appear only in TITLE

Page 15: Constrained Conditional Models Tutorial

HMMCCM Example• How do we use constraints with HMM?• Standard HMM:

• Learn the probability of the sequence of labels and input :

• Inference, taking the most likely label sequence:

Page 16: Constrained Conditional Models Tutorial

HMMCCM Example• New objective function involving constraints• Penalize the probability of sequence if it violates

constraint

Penalty for each time the constraint is violated

Page 17: Constrained Conditional Models Tutorial

HMMCCM Example• Transform to linear model

Page 18: Constrained Conditional Models Tutorial

HMMCCM Example• We need to learn the new parameters maximizes the

scoring function

• Despite the fact that the scoring function is no longer a log likelihood of the dataset, it is still a smooth concave function with a unique global maximum with zero gradient.

Page 19: Constrained Conditional Models Tutorial

HMMCCM Example

Simply counting the probabilityof the constraints being violated

Page 20: Constrained Conditional Models Tutorial

HMMCCM Example

Page 21: Constrained Conditional Models Tutorial

Are there other ways to learn?

Can this paradigm be generalized?

Page 22: Constrained Conditional Models Tutorial

TRAINING PARADIGMS

Page 23: Constrained Conditional Models Tutorial

Training paradigms

DecomposeLearn Inference

Page 24: Constrained Conditional Models Tutorial

Prior knowledge: Features vs. ConstraintsFeature Constraint

Data dependent Yes No (if not learnt)

Learnable Yes Yes

Size Large Small

Improvement Approach Higher order model Post-processing for I+L

Domain

Penalty type Soft Hard & Soft

Common usage Local Global

Formulation Propositional/ FOL/

Page 25: Constrained Conditional Models Tutorial

Comparison with MLN• MLN models constraints are formulated as an explicit

probability jointly with the overall distributions:• e.g.

• Constraints in CCM are formulated as linear inequalities• e.g.

• Theoretically the same, very different in practice

Page 26: Constrained Conditional Models Tutorial

Training paradigms• Learning + Inference: Train with some constraints, apply

all constraints only in inference• No need to retrain an existing system• Fast and modular

• Inference-Based Training: Train jointly with constraints and dependencies (e.g. Graphical Models)• Better for strong interactions between

• Other training paradigm:• Pipe-line like sequential model [Roth, Small, Titov: AI&Stat’09]• Constraints Driven Learning (CODL) [Chang et. al’07,12]

Page 27: Constrained Conditional Models Tutorial

Which paradigm is better?

Page 28: Constrained Conditional Models Tutorial

For each iterationFor each in the training data

If

endifendfor

endfor

Algorithmic view of the differences

IBT−𝜌𝑇𝑑 (𝑥 , 𝑦)

𝒀 𝑷𝑹𝑬𝑫=arg max𝑦𝑤𝑇 𝑓 (𝑥 , 𝑦 ) −𝜌𝑇𝑑 (𝑥 , 𝑦 )   I+L

Page 29: Constrained Conditional Models Tutorial

L+I vs. IBT tradeoffs

# of Features

In some cases problems are hard due to lack of training data.

Semi-supervised learning

Page 30: Constrained Conditional Models Tutorial

Choice of paradigm• IBT:

• Better when the interaction between output label is strong

• L+I:• Faster computationally• Modular, no need to retrain existing classifier and works with

simple models such as

Page 31: Constrained Conditional Models Tutorial

PARADIGM 2:LEARNING + INFERENCEAn example with Entity-Relation Extraction

Page 32: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

Dole ’s wife, Elizabeth , is a native of N.C. E1 E2 E3

R12 R2

3

1: 32Decision time inference

Page 33: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

• Formulation 1: Joint Global Model

Intractable to learn Need to decomposition

Page 34: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

• Formulation 2: Local learning + global inference

Page 35: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

Cost function:

c{E1 = per}· x{E1 = per} + c{E1 = loc}· x{E1 = loc} + … + c{R12 = spouse_of}· x{R12 = spouse_of} + … + c{R12 = }· x{R12 = } + …

R12 R21 R23 R32 R13 R31

E1

DoleE2

ElizabethE3

N.C.

Page 36: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

Exactly one label for each relation and entity

Relation and entity type constraints

Integral constraints, in effect boolean

Page 37: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

• Each entity is either a person, organization or location:x{E1 = per}+ x{E1 = loc}+ x{E1 = org} + x{E1 = }=1

• (R12 = spouse_of) (E1 = person) (E2 = person)x{R12 = spouse_of} x{E1 = per}

x{R12 = spouse_of} x{E2 = per}

Page 38: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

• Entity classification results

Page 39: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

• Relation identification results

Page 40: Constrained Conditional Models Tutorial

Entity-Relation Extraction [RothYi07]

• Relation identification results

Page 41: Constrained Conditional Models Tutorial

INNER WORKINGS OF INFERENCE

Page 42: Constrained Conditional Models Tutorial

Constraints Encoding• Atoms

• Existential quantification

• Negation

• Conjunction• Disjunction

Page 43: Constrained Conditional Models Tutorial

Integer Linear Programming (ILP)• Powerful tool, very general

• NP-hard even in binary case, but efficient for most NLP problems

• If ILP can not solve the problem efficiently, we can fall back to approximate solutions using heuristic search

Page 44: Constrained Conditional Models Tutorial

Integer Linear Programming (ILP)

Page 45: Constrained Conditional Models Tutorial

Integer Linear Programming (ILP)

Page 46: Constrained Conditional Models Tutorial

SENTENCE COMPRESSION

Page 47: Constrained Conditional Models Tutorial

Sentence Compression Example Modelling Compression with Discourse Constraints, James Clarke and Mirella Lapata, COLING/SCL 2006

• 1. What is sentence compression? • Sentence compression is commonly expressed as a word deletion

problem: given an input sentence of words W = w1,w2, . . . ,wn, the aim is to produce a compression by removing any subset of these words (Knight and Marcu 2002).

Page 48: Constrained Conditional Models Tutorial

A trigram language model: maximize a scoring function by ILP:

p i: word i starts the compressionq i,j : sequence wi,wj ends the compressionX i,j,k : trigram wi , wj ,wk in the compressionY i : word i in the compressionEach p ,q,x,y is either 0 or 1,

Page 49: Constrained Conditional Models Tutorial

Sentential Constrains:• 1. disallows the inclusion of modifiers without their head

words:

• 2. presence of modifiers when the head is retained in the compression:

• 3. constrains that if a verb is present in the compression then so are its arguments:

Page 50: Constrained Conditional Models Tutorial

Modifier Constraint Example

Page 51: Constrained Conditional Models Tutorial

Modifier Constraint Example

Page 52: Constrained Conditional Models Tutorial

Sentential Constrains:• 4. preserve personal pronouns in the compressed output:

Page 53: Constrained Conditional Models Tutorial

Discourse Constrains:• 1. Center of a sentence is retained in the compression,

and the entity realised as the center in the following sentence is also retained.

• Center of the sentences is the entity with the highest rank.• Entity may ranked by many features.• EX:• grammatical role• (subjects > objects > others).

Page 54: Constrained Conditional Models Tutorial

Discourse Constrains:• 2. Lexical Chain Constrains:•

• Lexical chain is a sequences of semantically related words.

• Often the longest lexical chain is the most important chain.

Page 55: Constrained Conditional Models Tutorial

SEMANTIC ROLE LABELING

Page 56: Constrained Conditional Models Tutorial

Semantic Role labeling Example:

• What is SRL?• SRL identifies all

constituents that fill a semantic role, and determines their roles.

Page 57: Constrained Conditional Models Tutorial

General information:• Both models(argument identifier and argument

classifiers) are trained by SNoW.

• Idea: maximization the scoring function

Page 58: Constrained Conditional Models Tutorial

SRL: Argument Identification• use a learning scheme that utilizes two classifiers, one to• predict the beginnings of possible arguments, and the

other the ends. The predictions are combined to form argument candidates.

• Why:• When only shallow parsing is available, the system does

not have constituents to begin with. Therefore, conceptually, the system has to consider all possible subsequences.

Page 59: Constrained Conditional Models Tutorial

SRL: List of features• POS tags• Length• Verb class• Head word and POS tag of the head word• Position• Path• Chunk pattern• Clause relative position• Clause coverage• NEG• MOD

Page 60: Constrained Conditional Models Tutorial

SRL: Constraints• 1. Arguments cannot overlap with the predicate.

• 2. Arguments cannot exclusively overlap with the clauses.

• 3. If a predicate is outside a clause, its arguments cannot be embedded in that clause.

• 4. No overlapping or embedding arguments.

• 5. No duplicate argument classes for core arguments.• Note: conjunction is an exception.• [A0 I] [V left ] [A1 my pearls] [A2 to my daughter] and [A1 my

gold] [A2 to my son].

Page 61: Constrained Conditional Models Tutorial

SRL: Constraints• 6. if an argument is a reference to some other argument

arg, then this referenced argument must exist in the sentence.

• 7. If there is a C-arg argument, then there has to be an arg argument; in addition,the C-arg argument must occur after arg.

• the label C-arg is then used to specify the continuity of the arguments.

• 8. Given a specific verb, some argument types should• never occur.

Page 62: Constrained Conditional Models Tutorial
Page 63: Constrained Conditional Models Tutorial

SRL Results:

Page 64: Constrained Conditional Models Tutorial

QA• Questions?