markov logic in natural language processing hoifung poon dept. of computer science & eng....
TRANSCRIPT
![Page 1: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/1.jpg)
1
Markov Logic in Natural Language Processing
Hoifung PoonDept. of Computer Science & Eng.
University of Washington
![Page 2: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/2.jpg)
2
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
![Page 3: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/3.jpg)
3
Languages Are Structural
governments
lm$pxtm(according to their families)
![Page 4: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/4.jpg)
4
Languages Are Structural
govern-ment-s
l-m$px-t-m(according to their families)
S
V NP
NP VP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......
involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.
![Page 5: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/5.jpg)
5
Languages Are Structural
govern-ment-s
l-m$px-t-m(according to their families)
S
V NP
NP VP
IL-4 induces CD11B
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41......
involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
George Walker Bush was the 43rd President of the United States.…… Bush was the eldest son of President G. H. W. Bush and Babara Bush. …….In November 1977, he met Laura Welch at a barbecue.
![Page 6: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/6.jpg)
6
Languages Are Structural
Objects are not just feature vectors They have parts and subparts Which have relations with each other They can be trees, graphs, etc.
Objects are seldom i.i.d.(independent and identically distributed) They exhibit local and global dependencies They form class hierarchies (with multiple inheritance) Objects’ properties depend on those of related objects
Deeply interwoven with knowledge
![Page 7: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/7.jpg)
7
First-Order Logic
Main theoretical foundation of computer scienceGeneral language for describing
complex structures and knowledge Trees, graphs, dependencies, hierarchies, etc.
easily expressed Inference algorithms (satisfiability testing,
theorem proving, etc.)
![Page 8: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/8.jpg)
8
G. W. Bush ………… Laura Bush ……Mrs. Bush ……
Languages Are Statistical
I saw the man with the telescope
I saw the man with the telescope
NP
NP ADVP
I saw the man with the telescope
Here in London, Frances Deek is a retired teacher …In the Israeli town …, Karen London says …Now London says …
London PERSON or LOCATION?
Microsoft buys Powerset
Microsoft acquires Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsoft’s purchase of Powerset, …
……
Which one?
![Page 9: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/9.jpg)
9
Languages Are Statistical
Languages are ambiguousOur information is always incompleteWe need to model correlationsOur predictions are uncertain Statistics provides the tools to handle this
![Page 10: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/10.jpg)
10
Probabilistic Graphical Models
Mixture modelsHidden Markov models Bayesian networksMarkov random fieldsMaximum entropy modelsConditional random fields Etc.
![Page 11: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/11.jpg)
11
The Problem
Logic is deterministic, requires manual coding Statistical models assume i.i.d. data,
objects = feature vectors Historically, statistical and logical NLP
have been pursued separately We need to unify the two! Burgeoning field in machine learning:
Statistical relational learning
![Page 12: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/12.jpg)
12
Costs and Benefits ofStatistical Relational Learning
Benefits Better predictive accuracy Better understanding of domains Enable learning with less or no labeled data
Costs Learning is much harder Inference becomes a crucial issue Greater complexity for user
![Page 13: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/13.jpg)
13
Progress to Date
Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction
[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Etc. This talk: Markov logic [Domingos & Lowd, 2009]
![Page 14: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/14.jpg)
14
Markov Logic: A Unifying Framework
Probabilistic graphical models andfirst-order logic are special cases
Unified inference and learning algorithms Easy-to-use software: Alchemy Broad applicabilityGoal of this tutorial:
Quickly learn how to use Markov logic and Alchemy for a broad spectrum of NLP applications
![Page 15: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/15.jpg)
15
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
![Page 16: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/16.jpg)
16
Markov Networks Undirected graphical models
Cancer
CoughAsthma
Smoking
Potential functions defined over cliques
Smoking Cancer Ф(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
c
cc xZ
xP )(1
)(
x c
cc xZ )(
![Page 17: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/17.jpg)
17
Markov Networks Undirected graphical models
Log-linear model:
Weight of Feature i Feature i
otherwise0
CancerSmokingif1)CancerSmoking,(1f
5.11 w
Cancer
CoughAsthma
Smoking
iii xfw
ZxP )(exp
1)(
![Page 18: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/18.jpg)
18
Markov Nets vs. Bayes Nets
Property Markov Nets Bayes Nets
Form Prod. potentials Prod. potentials
Potentials Arbitrary Cond. probabilities
Cycles Allowed Forbidden
Partition func. Z = ? Z = 1
Indep. check Graph separation D-separation
Indep. props. Some Some
Inference MCMC, BP, etc. Convert to Markov
![Page 19: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/19.jpg)
19
Inference in Markov NetworksGoal: compute marginals & conditionals of
Exact inference is #P-completeConditioning on Markov blanket is easy:
Gibbs sampling exploits this
exp ( )( | ( ))
exp ( 0) exp ( 1)
i ii
i i i ii i
w f xP x MB x
w f x w f x
1( ) exp ( )i i
i
P X w f XZ
exp ( )i i
X i
Z w f X
![Page 20: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/20.jpg)
20
MCMC: Gibbs Sampling
state ← random truth assignmentfor i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of xP(F) ← fraction of states in which F is true
![Page 21: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/21.jpg)
21
Other Inference Methods
Belief propagation (sum-product)Mean field / Variational approximations
![Page 22: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/22.jpg)
22
MAP/MPE Inference
Goal: Find most likely state of world given evidence
)|(max xyPy
Query Evidence
![Page 23: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/23.jpg)
23
MAP Inference Algorithms
Iterated conditional modes Simulated annealingGraph cuts Belief propagation (max-product) LP relaxation
![Page 24: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/24.jpg)
24
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
![Page 25: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/25.jpg)
25
Generative Weight Learning
Maximize likelihoodUse gradient ascent or L-BFGSNo local maxima
Requires inference at each step (slow!)
No. of times feature i is true in data
Expected no. times feature i is true according to model
)()()(log xnExnxPw iwiw
i
![Page 26: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/26.jpg)
26
Pseudo-Likelihood
Likelihood of each variable given its neighbors in the data
Does not require inference at each stepWidely used in vision, spatial statistics, etc. But PL parameters may not work well for
long inference chains
i
ii xneighborsxPxPL ))(|()(
![Page 27: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/27.jpg)
27
Discriminative Weight Learning
Maximize conditional likelihood of query (y) given evidence (x)
Approximate expected counts by counts in MAP state of y given x
No. of true groundings of clause i in data
Expected no. true groundings according to model
),(),()|(log yxnEyxnxyPw iwiw
i
![Page 28: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/28.jpg)
28
wi ← 0for t ← 1 to T do yMAP ← Viterbi(x) wi ← wi + η [counti(yData) – counti(yMAP)]return wi / T
Voted Perceptron
Originally proposed for training HMMs discriminatively
Assumes network is linear chain Can be generalized to arbitrary networks
![Page 29: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/29.jpg)
29
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
![Page 30: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/30.jpg)
30
First-Order Logic
Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y)
Literal: Predicate or its negationClause: Disjunction of literalsGrounding: Replace all variables by constants
E.g.: Friends (Anna, Bob)World (model, interpretation):
Assignment of truth values to all ground predicates
![Page 31: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/31.jpg)
31
Inference in First-Order Logic Traditionally done by theorem proving
(e.g.: Prolog) Propositionalization followed by model
checking turns out to be faster (often by a lot) Propositionalization:
Create all ground atoms and clausesModel checking: Satisfiability testing Two main approaches: Backtracking (e.g.: DPLL) Stochastic local search (e.g.: WalkSAT)
![Page 32: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/32.jpg)
32
Satisfiability
Input: Set of clauses(Convert KB to conjunctive normal form (CNF))
Output: Truth assignment that satisfies all clauses, or failure
The paradigmatic NP-complete problem Solution: Search Key point:
Most SAT problems are actually easy Hard region: Narrow range of
#Clauses / #Variables
![Page 33: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/33.jpg)
33
Stochastic Local Search
Uses complete assignments instead of partial Start with random state Flip variables in unsatisfied clausesHill-climbing: Minimize # unsatisfied clauses Avoid local minima: Random flipsMultiple restarts
![Page 34: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/34.jpg)
34
The WalkSAT Algorithm
for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if all clauses satisfied then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes # satisfied clausesreturn failure
![Page 35: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/35.jpg)
35
Overview Motivation Foundational areas
Probabilistic inference Statistical learning Logical inference Inductive logic programming
Markov logic NLP applications
Basics Supervised learning Unsupervised learning
![Page 36: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/36.jpg)
36
Rule Induction
Given: Set of positive and negative examples of some concept Example: (x1, x2, … , xn, y) y: concept (Boolean) x1, x2, … , xn: attributes (assume Boolean)
Goal: Induce a set of rules that cover all positive examples and no negative ones Rule: xa ^ xb ^ … y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body Head Rule r covers example x iff x satisfies body of r
Eval(r): Accuracy, info gain, coverage, support, etc.
![Page 37: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/37.jpg)
37
Learning a Single Rule
head ← ybody ← Ørepeat for each literal x rx ← r with x added to body Eval(rx) body ← body ^ best xuntil no x improves Eval(r)return r
![Page 38: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/38.jpg)
38
Learning a Set of Rules
R ← ØS ← examplesrepeat learn a single rule r R ← R U { r } S ← S − positive examples covered by runtil S = Øreturn R
![Page 39: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/39.jpg)
39
First-Order Rule Induction
y and xi are now predicates with argumentsE.g.: y is Ancestor(x,y), xi is Parent(x,y)
Literals to add are predicates or their negations Literal to add must include at least one variable
already appearing in rule Adding a literal changes # groundings of rule
E.g.: Ancestor(x,z) ^ Parent(z,y) Ancestor(x,y) Eval(r) must take this into account
E.g.: Multiply by # positive groundings of rule still covered after adding literal
![Page 40: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/40.jpg)
40
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
![Page 41: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/41.jpg)
41
Markov Logic
Syntax: Weighted first-order formulasSemantics: Feature templates for Markov
networks Intuition: Soften logical constraintsGive each formula a weight
(Higher weight Stronger constraint)
satisfiesit formulas of weightsexpP(world)
![Page 42: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/42.jpg)
42
Example: Coreference Resolution
Mentions of Obama are often headed by "Obama"Mentions of Obama are often headed by "President"Appositions usually refer to the same entity
Barack Obama, the 44th President of the United States, is the first African American to hold the office. ……
![Page 43: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/43.jpg)
43
MentionOf ( ,Obama) Head( ,"Obama")
MentionOf ( ,Obama) Head( ,"President")
, , Apposition( , ) MentionOf ( , ) MentionOf ( , )
x x x
x x x
x y c x y x c y c
Example: Coreference Resolution
![Page 44: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/44.jpg)
44
Example: Coreference Resolution
1.5
0.8
100
MentionOf ( ,Obama) Head( ,"Obama")
MentionOf ( ,Obama) Head( ,"President")
, , Apposition( , ) MentionOf ( , ) MentionOf ( , )
x x x
x x x
x y c x y x c y c
![Page 45: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/45.jpg)
45
Example: Coreference Resolution
1.5
0.8
100
Head(A,“Obama”)
MentionOf(A,Obama)
Head(A,“President”)
MentionOf(B,Obama)
Apposition(A,B)
Head(B,“Obama”)
Head(B,“President”)
Two mention constants: A and B
Apposition(B,A)
MentionOf ( ,Obama) Head( ,"Obama")
MentionOf ( ,Obama) Head( ,"President")
, , Apposition( , ) MentionOf ( , ) MentionOf ( , )
x x x
x x x
x y c x y x c y c
![Page 46: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/46.jpg)
46
Markov Logic Networks MLN is template for ground Markov nets Probability of a world x:
Typed variables and constants greatly reduce size of ground Markov net
Functions, existential quantifiers, etc. Can handle infinite domains [Singla & Domingos, 2007]
and continuous domains [Wang & Domingos, 2008]
Weight of formula i No. of true groundings of formula i in x
iii xnw
ZxP )(exp
1)(
![Page 47: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/47.jpg)
47
Relation to Statistical Models
Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields
Obtained by making all predicates zero-arity
Markov logic allows objects to be interdependent (non-i.i.d.)
![Page 48: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/48.jpg)
48
Relation to First-Order Logic
Infinite weights First-order logic Satisfiable KB, positive weights
Satisfying assignments = Modes of distributionMarkov logic allows contradictions between
formulas
![Page 49: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/49.jpg)
49
MLN Algorithms:The First Three Generations
Problem First generation
Second generation
Third generation
MAP inference
Weighted satisfiability
Lazy inference
Cutting planes
Marginal inference
Gibbs sampling
MC-SAT Lifted inference
Weight learning
Pseudo-likelihood
Voted perceptron
Scaled conj. gradient
Structure learning
Inductive logic progr.
ILP + PL (etc.)
Clustering + pathfinding
![Page 50: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/50.jpg)
50
MAP/MPE Inference
Problem: Find most likely state of world given evidence
)|(max xyPy
Query Evidence
![Page 51: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/51.jpg)
51
MAP/MPE Inference
Problem: Find most likely state of world given evidence
i
iix
yyxnw
Z),(exp
1max
![Page 52: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/52.jpg)
52
MAP/MPE Inference
Problem: Find most likely state of world given evidence
i
iiy
yxnw ),(max
![Page 53: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/53.jpg)
53
MAP/MPE Inference
Problem: Find most likely state of world given evidence
This is just the weighted MaxSAT problemUse weighted SAT solver
(e.g., MaxWalkSAT [Kautz et al., 1997] )
i
iiy
yxnw ),(max
![Page 54: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/54.jpg)
54
The MaxWalkSAT Algorithm
for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes weights(sat. clauses) return failure, best solution found
![Page 55: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/55.jpg)
55
Computing Probabilities
P(Formula|MLN,C) = ?MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary
to answer query (generalization of KBMC) Then apply MCMC
![Page 56: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/56.jpg)
56
But … Insufficient for Logic
Problem:Deterministic dependencies break MCMCNear-deterministic ones make it very slow
Solution:Combine MCMC and WalkSAT
→ MC-SAT algorithm [Poon & Domingos, 2006]
![Page 57: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/57.jpg)
57
Auxiliary-Variable Methods
Main ideas: Use auxiliary variables to capture dependencies Turn difficult sampling into uniform sampling
Given distribution P(x)
Sample from f (x, u), then discard u
1, if 0 ( )( , ) ( , ) ( )
0, otherwise
u P xf x u f x u du P x
![Page 58: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/58.jpg)
58
Slice Sampling [Damien et al. 1999]
Xx(k)
u(k)
x(k+1)
Slice
U P(x)
![Page 59: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/59.jpg)
59
Slice Sampling
Identifying the slice may be difficult
Introduce an auxiliary variable ui for each Фi
1( ) ( )i
i
P x xZ
1
1 if 0 ( )( , , , )
0 otherwisei i
n
u xf x u u
![Page 60: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/60.jpg)
60
The MC-SAT Algorithm Select random subset M of satisfied clauses With probability 1 – exp ( – wi )
Larger wi Ci more likely to be selected
Hard clause (wi ): Always selected
Slice States that satisfy clauses in M Uses SAT solver to sample x | u. Orders of magnitude faster than Gibbs sampling,
etc.
![Page 61: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/61.jpg)
61
But … It Is Not Scalable
1000 researchers Coauthor(x,y): 1 million ground atoms Coauthor(x,y) Coauthor(y,z) Coauthor(x,z): 1
billion ground clauses Exponential in arity
![Page 62: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/62.jpg)
62
Sparsity to the Rescue 1000 researchers Coauthor(x,y): 1 million ground atoms
But … most atoms are false Coauthor(x,y) Coauthor(y,z) Coauthor(x,z):
1 billion ground clauses
Most trivially satisfied if most atoms are falseNo need to explicitly compute most of them
![Page 63: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/63.jpg)
63
Lazy Inference
LazySAT [Singla & Domingos, 2006a] Lazy version of WalkSAT [Selman et al., 1996]
Grounds atoms/clauses as needed Greatly reduces memory usage
The idea is much more general [Poon & Domingos, 2008a]
![Page 64: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/64.jpg)
64
General Method for Lazy Inference
If most variables assume the default value, wasteful to instantiate all variables / functions
Main idea: Allocate memory for a small subset of
“active” variables / functions Activate more if necessary as inference proceeds
Applicable to a diverse set of algorithms: Satisfiability solvers (systematic, local-search), Markov chain Monte Carlo, MPE / MAP algorithms, Maximum expected utility algorithms, Belief propagation, MC-SAT, Etc.
Reduce memory and time by orders of magnitude
![Page 65: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/65.jpg)
65
Lifted Inference
Consider belief propagation (BP)Often in large problems, many nodes are
interchangeable:They send and receive the same messages throughout BP
Basic idea: Group them into supernodes, forming lifted network
Smaller network → Faster inference Akin to resolution in first-order logic
![Page 66: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/66.jpg)
66
Belief Propagation
Nodes (x)
Features (f)
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
![Page 67: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/67.jpg)
67
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
Nodes (x)
Features (f)
![Page 68: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/68.jpg)
68
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
, :Functions of edge counts
Nodes (x)
Features (f)
![Page 69: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/69.jpg)
69
Learning
Data is a relational databaseClosed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)
![Page 70: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/70.jpg)
70
Parameter tying: Groundings of same clause
Generative learning: Pseudo-likelihoodDiscriminative learning: Conditional likelihood,
use MC-SAT or MaxWalkSAT for inference
Parameter Learning
No. of times clause i is true in data
Expected no. times clause i is true according to MLN
log ( ) ( ) ( )i x ii
P x n x E n xw
![Page 71: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/71.jpg)
71
Parameter Learning
Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results
Voted perceptron:Gradient descent + MAP inference
Scaled conjugate gradient
![Page 72: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/72.jpg)
72
HMMs are special case of MLNsReplace Viterbi by MaxWalkSATNetwork can now be arbitrary graph
wi ← 0for t ← 1 to T do yMAP ← MaxWalkSAT(x) wi ← wi + η [counti(yData) – counti(yMAP)]return wi / T
Voted Perceptron for MLNs
![Page 73: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/73.jpg)
73
Problem: Multiple Modes
Not alleviated by contrastive divergence Alleviated by MC-SATWarm start: Start each MC-SAT run at
previous end state
![Page 74: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/74.jpg)
74
Solvable by quasi-Newton, conjugate gradient, etc. But line searches require exact inference Solution: Scaled conjugate gradient
[Lowd & Domingos, 2008]
Use Hessian to choose step size Compute quadratic form inside MC-SAT Use inverse diagonal Hessian as preconditioner
Problem: Extreme Ill-Conditioning
![Page 75: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/75.jpg)
75
Structure Learning
Standard inductive logic programming optimizesthe wrong thing
But can be used to overgenerate for L1 pruning Our approach:
ILP + Pseudo-likelihood + Structure priors For each candidate structure change:
Start from current weights & relax convergence Use subsampling to compute sufficient statistics
![Page 76: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/76.jpg)
76
Structure Learning
Initial state: Unit clauses or prototype KBOperators: Add/remove literal, flip sign Evaluation function:
Pseudo-likelihood + Structure prior Search: Beam search, shortest-first search
![Page 77: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/77.jpg)
77
Alchemy
Open-source software including: Full first-order logic syntaxGenerative & discriminative weight learning Structure learningWeighted satisfiability, MCMC, lifted BP Programming language features
alchemy.cs.washington.edu
![Page 78: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/78.jpg)
78
Alchemy Prolog BUGS
Represent-ation
F.O. Logic + Markov nets
Horn clauses
Bayes nets
Inference Model check- ing, MCMC, lifted BP
Theorem proving
MCMC
Learning Parameters& structure
No Params.
Uncertainty Yes No Yes
Relational Yes Yes No
![Page 79: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/79.jpg)
79
Constrained Conditional Model
Representation: Integer linear programs
Local classifiers + Global constraints Inference: LP solver Parameter learning: None for constraints Weights of soft constraints set heuristically Local weights typically learned independently
Structure learning: None to date But see latest development in NAACL-10
![Page 80: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/80.jpg)
80
Running Alchemy
Programs Infer Learnwts Learnstruct
Options
MLN file Types (optional) Predicates Formulas
Database files
![Page 81: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/81.jpg)
81
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
![Page 82: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/82.jpg)
82
Uniform Distribn.: Empty MLN
Example: Unbiased coin flips
Type: flip = { 1, … , 20 }
Predicate: Heads(flip)
1 0
1 10 0
1(Heads( ))
2Z
Z Z
eP f
e e
![Page 83: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/83.jpg)
83
Binomial Distribn.: Unit Clause
Example: Biased coin flips
Type: flip = { 1, … , 20 }
Predicate: Heads(flip)
Formula: Heads(f)
Weight: Log odds of heads:
By default, MLN includes unit clauses for all predicates
(captures marginal distributions, etc.)
1
1 1 0
1(Heads( ))
1
wZ
w wZ Z
eP f p
e e e
p
pw
1log
![Page 84: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/84.jpg)
84
Multinomial Distribution
Example: Throwing die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face)
Formulas: Outcome(t,f) ^ f != f’ => !Outcome(t,f’).
Exist f Outcome(t,f).
Too cumbersome!
![Page 85: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/85.jpg)
85
Multinomial Distrib.: ! Notation
Example: Throwing die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas:
Semantics: Arguments without “!” determine arguments with “!”.
Also makes inference more efficient (triggers blocking).
![Page 86: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/86.jpg)
86
Multinomial Distrib.: + Notation
Example: Throwing biased die
Types: throw = { 1, … , 20 }
face = { 1, … , 6 }
Predicate: Outcome(throw,face!)
Formulas: Outcome(t,+f)
Semantics: Learn weight for each grounding of args with “+”.
![Page 87: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/87.jpg)
87
Logistic regression:
Type: obj = { 1, ... , n }Query predicate: C(obj)Evidence predicates: Fi(obj)Formulas: a C(x) bi Fi(x) ^ C(x)
Resulting distribution:
Therefore:
Alternative form: Fi(x) => C(x)
Logistic Regression (MaxEnt)
iiii fba
fba
CP
CP
)0exp(
explog
)|0(
)|1(log
fF
fF
iii cfbac
ZcCP exp
1),( fF
ii fbaCP
CP
)|0(
)|1(log
fF
fF
![Page 88: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/88.jpg)
88
Hidden Markov Models
obs = { Red, Green, Yellow }state = { Stop, Drive, Slow }time = { 0, ..., 100 }
State(state!,time)Obs(obs!,time)
State(+s,0)State(+s,t) ^ State(+s',t+1)Obs(+o,t) ^ State(+s,t)
Sparse HMM: State(s,t) => State(s1,t+1) v State(s2, t+1) v ... .
![Page 89: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/89.jpg)
89
Bayesian Networks
Use all binary predicates with same first argument (the object x).
One predicate for each variable A: A(x,v!) One clause for each line in the CPT and
value of the variable Context-specific independence:
One clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses
![Page 90: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/90.jpg)
90
Relational Models
Knowledge-based model construction Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external
Stochastic logic programs Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds Exactly one body holds
Probabilistic relational models Allow only binary relations Same as Bayes nets, except first argument can vary
![Page 91: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/91.jpg)
91
Relational Models Relational Markov networks
SQL → Datalog → First-order logic One clause for each state of a clique + syntax in Alchemy facilitates this
Bayesian logic Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it
Unknown relations: Second-order Markov logicS. Kok & P. Domingos, “Statistical Predicate Invention”, inProc. ICML-2007.
![Page 92: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/92.jpg)
92
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
![Page 93: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/93.jpg)
93
Text Classification
The 56th quadrennial United States presidential election was held on November 4, 2008. Outgoing Republican President George W. Bush's policies and actions and the American public's desire for change were key issues throughout the campaign. ……
The Chicago Bulls are an American professional basketball team based in Chicago, Illinois, playing in the Central Division of the Eastern Conference in the National Basketball Association (NBA). ……
……
Topic = politics
Topic = sports
![Page 94: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/94.jpg)
94
Text Classificationpage = {1, ..., max}word = { ... }topic = { ... }
Topic(page,topic)HasWord(page,word)
Topic(p,t)HasWord(p,+w) => Topic(p,+t)
If topics mutually exclusive: Topic(page,topic!)
![Page 95: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/95.jpg)
95
Text Classificationpage = {1, ..., max}word = { ... }topic = { ... }
Topic(page,topic)HasWord(page,word)Links(page,page)
Topic(p,t)HasWord(p,+w) => Topic(p,+t)Topic(p,t) ^ Links(p,p') => Topic(p',t)
Cf. S. Chakrabarti, B. Dom & P. Indyk, “Hypertext ClassificationUsing Hyperlinks,” in Proc. SIGMOD-1998.
![Page 96: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/96.jpg)
96
Entity ResolutionAUTHOR: H. POON & P. DOMINGOSTITLE: UNSUPERVISED SEMANTIC PARSINGVENUE: EMNLP-09
AUTHOR: Hoifung Poon and Pedro DomingsTITLE: Unsupervised semantic parsingVENUE: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
AUTHOR: Poon, Hoifung and Domings, PedroTITLE: Unsupervised ontology induction from textVENUE: Proceedings of the Forty-Eighth Annual Meeting of the Association for Computational Linguistics
AUTHOR: H. Poon, P. DomingsTITLE: Unsupervised ontology inductionVENUE: ACL-10
SAME?
SAME?
![Page 97: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/97.jpg)
97
Problem: Given database, find duplicate records
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)
Entity Resolution
![Page 98: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/98.jpg)
98
Problem: Given database, find duplicate records
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)
Cf. A. McCallum & B. Wellner, “Conditional Models of Identity Uncertaintywith Application to Noun Coreference,” in Adv. NIPS 17, 2005.
Entity Resolution
![Page 99: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/99.jpg)
99
Can also resolve fields:
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) <=> SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”)
More: P. Singla & P. Domingos, “Entity Resolution with Markov Logic”, in Proc. ICDM-2006.
Entity Resolution
![Page 100: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/100.jpg)
100
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.
Information Extraction
![Page 101: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/101.jpg)
101
UNSUPERVISED SEMANTIC PARSING. H. POON & P. DOMINGOS. EMNLP-2009.
Unsupervised Semantic Parsing, Hoifung Poon and Pedro Domingos. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore: ACL.
Information Extraction
Author Title Venue
SAME?
![Page 102: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/102.jpg)
102
Information Extraction
Problem: Extract database from text orsemi-structured sources
Example: Extract database of publications from citation list(s) (the “CiteSeer problem”)
Two steps: Segmentation:
Use HMM to assign tokens to fields Entity resolution:
Use logistic regression and transitivity
![Page 103: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/103.jpg)
103
Token(token, position, citation)InField(position, field!, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Information Extraction
![Page 104: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/104.jpg)
104
Token(token, position, citation)InField(position, field!, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) ^ InField(i+1,+f,c)
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
More: H. Poon & P. Domingos, “Joint Inference in InformationExtraction”, in Proc. AAAI-2007.
Information Extraction
![Page 105: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/105.jpg)
105
Biomedical Text Mining
Traditionally, name entity recognition or information extractionE.g., protein recognition, protein-protein identification
BioNLP-09 shared task: Nested bio-events Much harder than traditional IE Top F1 around 50% Naturally calls for joint inference
![Page 106: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/106.jpg)
106
Bio-Event Extraction
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...
involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
![Page 107: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/107.jpg)
107
Token(position, token)DepEdge(position, position, dependency)IsProtein(position)EvtType(position, evtType)InArgPath(position, position, argType!)
Token(i,+w) => EvtType(i,+t)Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)DepEdge(i,j,+d) => InArgPath(i,j,+a)Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)…
Bio-Event Extraction
Logistic regression
![Page 108: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/108.jpg)
108
Token(position, token)DepEdge(position, position, dependency)IsProtein(position)EvtType(position, evtType)InArgPath(position, position, argType!)
Token(i,+w) => EvtType(i,+t)Token(j,w) ^ DepEdge(i,j,+d) => EvtType(i,+t)DepEdge(i,j,+d) => InArgPath(i,j,+a)Token(i,+w) ^ DepEdge(i,j,+d) => InArgPath(i,j,+a)…
InArgPath(i,j,Theme) => IsProtein(j) v (Exist k k!=i ^ InArgPath(j, k, Theme)).…
More: H. Poon and L. Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, 10:40 am, June 4, Gold Room.
Bio-Event Extraction
Adding a few joint inference rules doubles the F1
![Page 109: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/109.jpg)
109
Temporal Information Extraction
Identify event times and temporal relations (BEFORE, AFTER, OVERLAP)
E.g., who is the President of U.S.A.? Obama: 1/20/2009 present G. W. Bush: 1/20/2001 1/19/2009 Etc.
![Page 110: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/110.jpg)
110
DepEdge(position, position, dependency)Event(position, event)After(event, event)
DepEdge(i,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
Temporal Information Extraction
![Page 111: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/111.jpg)
111
DepEdge(position, position, dependency)Event(position, event)After(event, event)Role(position, position, role)
DepEdge(I,j,+d) ^ Event(i,p) ^ Event(j,q) => After(p,q)Role(i,j,ROLE-AFTER) ^ Event(i,p) ^ Event(j,q) => After(p,q)
After(p,q) ^ After(q,r) => After(p,r)
More:
K. Yoshikawa, S. Riedel, M. Asahara and Y. Matsumoto, “Jointly Identifying Temporal Relations with Markov Logic”, in Proc. ACL-2009.
X. Ling & D. Weld, “Temporal Information Extraction”, in Proc. AAAI-2010.
Temporal Information Extraction
![Page 112: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/112.jpg)
112
Semantic Role Labeling
Problem: Identify arguments for a predicate Two steps: Argument identification:
Determine whether a phrase is an argument Role classification:
Determine the type of an argument (agent, theme, temporal, adjunct, etc.)
![Page 113: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/113.jpg)
113
Token(position, token)DepPath(position, position, path)IsPredicate(position)Role(position, position, role!)HasRole(position, position)
Token(i,+t) => IsPredicate(i)DepPath(i,j,+p) => Role(i,j,+r)
HasRole(i,j) => IsPredicate(i)IsPredicate(i) => Exist j HasRole(i,j)HasRole(i,j) => Exist r Role(i,j,r)Role(i,j,r) => HasRole(i,j)
Cf. K. Toutanova, A. Haghighi, C. Manning, “A global joint model for semantic role labeling”, in Computational Linguistics 2008.
Semantic Role Labeling
![Page 114: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/114.jpg)
114
Token(position, token)DepPath(position, position, path)IsPredicate(position)Role(position, position, role!)HasRole(position, position)Sense(position, sense!)
Token(i,+t) => IsPredicate(i)DepPath(i,j,+p) => Role(i,j,+r)Sense(I,s) => IsPredicate(i)
HasRole(i,j) => IsPredicate(i)IsPredicate(i) => Exist j HasRole(i,j)HasRole(i,j) => Exist r Role(i,j,r)Role(i,j,r) => HasRole(i,j)Token(i,+t) ^ Role(i,j,+r) => Sense(i,+s)
More: I. Meza-Ruiz & S. Riedel, “Jointly Identifying Predicates, Arguments and Senses using Markov Logic”, in Proc. NAACL-2009.
Joint Semantic Role Labeling and Word Sense Disambiguation
![Page 115: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/115.jpg)
115
Practical Tips: Modeling
Add all unit clauses (the default)How to handle uncertain data:R(x,y) ^ R’(x,y) (the “HMM trick”)
Implications vs. conjunctionsFor soft correlation, conjunctions often better Implication: A => B is equivalent to !(A ^ !B) Share cases with others like A => C Make learning unnecessarily harder
![Page 116: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/116.jpg)
116
Practical Tips: Efficiency
Open/closed world assumptionsLow clause aritiesLow numbers of constantsShort inference chains
![Page 117: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/117.jpg)
117
Practical Tips: Development
Start with easy componentsGradually expand to full taskUse the simplest MLN that worksCycle: Add/delete formulas, learn and test
![Page 118: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/118.jpg)
118
Overview
Motivation Foundational areasMarkov logicNLP applications Basics Supervised learning Unsupervised learning
![Page 119: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/119.jpg)
119
Unsupervised Learning: Why?
Virtually unlimited supply of unlabeled text Labeling is expensive (Cf. Penn-Treebank)Often difficult to label with consistency and
high quality (e.g., semantic parses) Emerging field: Machine reading
Extract knowledge from unstructured text with high precision/recall and minimal human effort
Check out LBR-Workshop (WS9) on Sunday
![Page 120: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/120.jpg)
120
Unsupervised Learning: How?
I.i.d. learning: Sophisticated model requires more labeled data
Statistical relational learning: Sophisticated model may require less labeled data Relational dependencies constrain problem space One formula is worth a thousand labels
Small amount of domain knowledge large-scale joint inference
![Page 121: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/121.jpg)
121
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How?
Are they coreferent?
![Page 122: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/122.jpg)
122
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
Should be coreferent
![Page 123: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/123.jpg)
123
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
So must be singular male!
![Page 124: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/124.jpg)
124
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
Must be singular female!
![Page 125: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/125.jpg)
125
Ambiguities vary among objects Joint inference Propagate information from
unambiguous objects to ambiguous ones E.g.:
G. W. Bush …
He …
…
Mrs. Bush …
Unsupervised Learning: How
Verdict: Not coreferent!
![Page 126: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/126.jpg)
126
| ,log ( ) ( , ) ( , )z x i x z ii
P x E n x z E n x zw
Marginalize out hidden variables
Use MC-SAT to approximate both expectationsMay also combine with contrastive estimation
[Poon & Cherry & Toutanova, NAACL-2009]
Parameter Learning
Sum over z, conditioned on observed x
Summed over both x and z
![Page 127: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/127.jpg)
127
Unsupervised Coreference Resolution
Head(mention, string)Type(mention, type)MentionOf(mention, entity)
MentionOf(+m,+e)Type(+m,+t)Head(+m,+h) ^ MentionOf(+m,+e)
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
… (similarly for Number, Gender etc.)
Mixture model
Joint inference formulas: Enforce agreement
![Page 128: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/128.jpg)
128
Unsupervised Coreference Resolution
Head(mention, string)Type(mention, type)MentionOf(mention, entity)Apposition(mention, mention)
MentionOf(+m,+e)Type(+m,+t)Head(+m,+h) ^ MentionOf(+m,+e)
MentionOf(a,e) ^ MentionOf(b,e) => (Type(a,t) <=> Type(b,t))
… (similarly for Number, Gender etc.)
Apposition(a,b) => (MentionOf(a,e) <=> MentionOf(b,e))
More: H. Poon and P. Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008.
Joint inference formulas: Leverage apposition
![Page 129: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/129.jpg)
129
Relational Clustering: Discover Unknown PredicatesCluster relations along with objectsUse second-order Markov logic
[Kok & Domingos, 2007, 2008]
Key idea: Cluster combination determines likelihood of relationsInClust(r,+c) ^ InClust(x,+a) ^ InClust(y,+b) => r(x,y)
Input: Relational tuples extracted by TextRunner [Banko et al., 2007]
Output: Semantic network
![Page 130: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/130.jpg)
130
Recursive Relational Clustering
Unsupervised semantic parsing [Poon & Domingos, EMNLP-2009]
Text Knowledge Start directly from text Identify meaning units + Resolve variations Use high-order Markov logic (variables over
arbitrary lambda forms and their clusters) End-to-end machine reading:
Read text, then answer questions
![Page 131: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/131.jpg)
131
Semantic Parsing
induces
protein CD11b
nsubj dobj
IL-4
nn
induces
protein CD11b
nsubj dobj
IL-4
nn
INDUCE
INDUCER INDUCED
IL-4
CD11B
INDUCE(e1)
INDUCER(e1,e2) INDUCED(e1,e3)
IL-4(e2) CD11B(e3)
IL-4 protein induces CD11b
Structured prediction: Partition + Assignment
![Page 132: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/132.jpg)
132
Challenge: Same Meaning, Many Variations
IL-4 up-regulates CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is induced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
……
![Page 133: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/133.jpg)
133
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
![Page 134: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/134.jpg)
134
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster same forms at the atom level
![Page 135: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/135.jpg)
135
Cluster forms in composition with same forms
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
![Page 136: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/136.jpg)
136
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
![Page 137: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/137.jpg)
137
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
![Page 138: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/138.jpg)
138
Unsupervised Semantic Parsing
USP Recursively cluster arbitrary expressions composed with / by similar expressions
IL-4 induces CD11b
Protein IL-4 enhances the expression of CD11b
CD11b expression is enhanced by IL-4 protein
The cytokin interleukin-4 induces CD11b expression
IL-4’s up-regulation of CD11b, …
Cluster forms in composition with same forms
![Page 139: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/139.jpg)
139
Unsupervised Semantic Parsing
Exponential prior on number of parameters Event/object/property cluster mixtures:InClust(e,+c) ^ HasValue(e,+v)
Object/Event Cluster: INDUCE
induces 0.1
enhances 0.4
…
…
Property Cluster: INDUCER
0.5
0.4…
IL-4 0.2
IL-8 0.1
…
None 0.1
One 0.8
…
nsubj
agent
![Page 140: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/140.jpg)
140
But … State Space Too Large
Coreference: #-clusters #-mentionsUSP: #-clusters exp(#-tokens) Also, meaning units often small and
many singleton clusters
Use combinatorial search
![Page 141: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/141.jpg)
141
Inference: Hill-Climb Probability
Initialize
Search Operator
Lambda reduction
induces
protein CD11B
nsubj dobj
IL-4
nn
?
? ?
?
?
?
?
protein
IL-4
nn
protein
IL-4
nn
?
?
?
?
![Page 142: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/142.jpg)
142
Learning: Hill-Climb Likelihood
enhances 1induces 1 protein 1IL-4 1
MERGE COMPOSE
IL-4 protein 1induces 0.2enhances 0.8
…Initialize
Search Operator
enhances 1induces 1 protein 1IL-4 1
![Page 143: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/143.jpg)
143
Unsupervised Ontology Induction
Limitations of USP: No ISA hierarchy among clusters Little smoothing Limited capability to generalize
OntoUSP [Poon & Domingos, ACL-2010]
Extends USP to also induce ISA hierarchy Joint approach for ontology induction, population,
and knowledge extraction To appear in ACL (see you in Uppsala :-)
![Page 144: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/144.jpg)
144
OntoUSP Modify the cluster mixture formula
InClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v) Hierarchical smoothing + clustering New operator in learning:
induces 0.30.1
…
enhances
ISA ISA
inhibits 0.2suppresses 0.1
induces 0.6
up-regulates 0.2
…
INDUCE
INHIBITinhibits 0.4
0.2
…
suppresses
ABSTRACTION
INHIBIT
inhibits 0.4
0.2
…
suppressesinduces 0.6
up-regulates 0.2
…
INDUCE
MERGE with
REGULATE?
![Page 145: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/145.jpg)
145
End of The Beginning …
Not merely a user guide of MLN and Alchemy Statistical relational learning:
Growth area for machine learning and NLP
![Page 146: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/146.jpg)
146
Future Work: Inference
Scale up inference Cutting-planes methods (e.g., [Riedel, 2008]) Unify lifted inference with sampling Coarse-to-fine inference
Alternative technologyE.g., linear programming, lagrangian relaxation
![Page 147: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/147.jpg)
147
Future Work: Supervised Learning
Alternative optimization objectivesE.g., max-margin learning [Huynh & Mooney, 2009]
Learning for efficient inferenceE.g., learning arithmetic circuits [Lowd & Domingos, 2008]
Structure learning: Improve accuracy and scalabilityE.g., [Kok & Domingos, 2009]
![Page 148: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/148.jpg)
148
Future Work: Unsupervised Learning
Model: Learning objective, formalism, etc. Learning: Local optima, intractability, etc.Hyperparameter tuning Leverage available resources Semi-supervised learning Multi-task learning Transfer learning (e.g., domain adaptation)
Human in the loopE.g., interative ML, active learning, crowdsourcing
![Page 149: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/149.jpg)
149
Future Work: NLP Applications
Existing application areas: More joint inference opportunities Additional domain knowledge Combine multiple pipeline stages
A “killer app”: Machine readingMany, many more awaiting YOU to discover
![Page 150: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/150.jpg)
150
SummaryWe need to unify logical and statistical NLPMarkov logic provides a language for this Syntax: Weighted first-order formulas Semantics: Feature templates of Markov nets Inference: Satisfiability, MCMC, lifted BP, etc. Learning: Pseudo-likelihood, VP, PSCG, ILP, etc.
Growing set of NLP applicationsOpen-source software: Alchemy
Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009.
alchemy.cs.washington.edu
![Page 151: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/151.jpg)
151
References[Banko et al., 2007] Michele Banko, Michael J. Cafarella, Stephen
Soderland, Matt Broadhead, Oren Etzioni, "Open Information Extraction From the Web", In Proc. IJCAI-2007.
[Chakrabarti et al., 1998] Soumen Chakrabarti, Byron Dom, Piotr Indyk, "Hypertext Classification Using Hyperlinks", in Proc. SIGMOD-1998.
[Damien et al., 1999] Paul Damien, Jon Wakefield, Stephen Walker, "Gibbs sampling for Bayesian non-conjugate and hierarchical models by auxiliary variables", Journal of the Royal Statistical Society B, 61:2.
[Domingos & Lowd, 2009] Pedro Domingos and Daniel Lowd, Markov Logic, Morgan & Claypool.
[Friedman et al., 1999] Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer, "Learning probabilistic relational models", in Proc. IJCAI-1999.
![Page 152: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/152.jpg)
152
References[Halpern, 1990] Joe Halpern, "An analysis of first-order logics of
probability", Artificial Intelligence 46.
[Huynh & Mooney, 2009] Tuyen Huynh and Raymond Mooney, "Max-Margin Weight Learning for Markov Logic Networks", In Proc. ECML-2009.
[Kautz et al., 1997] Henry Kautz, Bart Selman, Yuejun Jiang, "A general stochastic approach to solving problems with hard and soft constraints", In The Satisfiability Problem: Theory and Applications. AMS.
[Kok & Domingos, 2007] Stanley Kok and Pedro Domingos, "Statistical Predicate Invention", In Proc. ICML-2007.
[Kok & Domingos, 2008] Stanley Kok and Pedro Domingos, "Extracting Semantic Networks from Text via Relational Clustering", In Proc. ECML-2008.
![Page 153: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/153.jpg)
153
References[Kok & Domingos, 2009] Stanley Kok and Pedro Domingos, "Learning
Markov Logic Network Structure via Hypergraph Lifting", In Proc. ICML-2009.
[Ling & Weld, 2010] Xiao Ling and Daniel S. Weld, "Temporal Information Extraction", In Proc. AAAI-2010.
[Lowd & Domingos, 2007] Daniel Lowd and Pedro Domingos, "Efficient Weight Learning for Markov Logic Networks", In Proc. PKDD-2007.
[Lowd & Domingos, 2008] Daniel Lowd and Pedro Domingos, "Learning Arithmetic Circuits", In Proc. UAI-2008.
[Meza-Ruiz & Riedel, 2009] Ivan Meza-Ruiz and Sebastian Riedel, "Jointly Identifying Predicates, Arguments and Senses using Markov Logic", In Proc. NAACL-2009.
![Page 154: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/154.jpg)
154
References[Muggleton, 1996] Stephen Muggleton, "Stochastic logic programs", in
Proc. ILP-1996.
[Nilsson, 1986] Nil Nilsson, "Probabilistic logic", Artificial Intelligence 28.
[Page et al., 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Tech. Rept., Stanford University, 1998.
[Poon & Domingos, 2006] Hoifung Poon and Pedro Domingos, "Sound and Efficient Inference with Probabilistic and Deterministic Dependencies", In Proc. AAAI-06.
[Poon & Domingos, 2007] Hoifung Poon and Pedro Domingo, "Joint Inference in Information Extraction", In Proc. AAAI-07.
![Page 155: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/155.jpg)
155
References[Poon & Domingos, 2008a] Hoifung Poon, Pedro Domingos, Marc
Sumner, "A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC", In Proc. AAAI-08.
[Poon & Domingos, 2008b] Hoifung Poon and Pedro Domingos, "Joint Unsupervised Coreference Resolution with Markov Logic", In Proc. EMNLP-08.
[Poon & Domingos, 2009] Hoifung and Pedro Domingos, "Unsupervised Semantic Parsing", In Proc. EMNLP-09.
[Poon & Cherry & Toutanova, 2009] Hoifung Poon, Colin Cherry, Kristina Toutanova, "Unsupervised Morphological Segmentation with Log-Linear Models", In Proc. NAACL-2009.
![Page 156: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/156.jpg)
156
References[Poon & Vanderwende, 2010] Hoifung Poon and Lucy Vanderwende,
"Joint Inference for Knowledge Extraction from Biomedical Literature", In Proc. NAACL-10.
[Poon & Domingos, 2010] Hoifung and Pedro Domingos, "Unsupervised Ontology Induction From Text", In Proc. ACL-10.
[Riedel 2008] Sebatian Riedel, "Improving the Accuracy and Efficiency of MAP Inference for Markov Logic", In Proc. UAI-2008.
[Riedel et al., 2009] Sebastian Riedel, Hong-Woo Chun, Toshihisa Takagi and Jun'ichi Tsujii, "A Markov Logic Approach to Bio-Molecular Event Extraction", In Proc. BioNLP 2009 Shared Task.
[Selman et al., 1996] Bart Selman, Henry Kautz, Bram Cohen, "Local search strategies for satisfiability testing", In Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge. AMS.
![Page 157: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/157.jpg)
157
References[Singla & Domingos, 2006a] Parag Singla and Pedro Domingos,
"Memory-Efficient Inference in Relational Domains", In Proc. AAAI-2006.
[Singla & Domingos, 2006b] Parag Singla and Pedro Domingos, "Entity Resolution with Markov Logic", In Proc. ICDM-2006.
[Singla & Domingos, 2007] Parag Singla and Pedro Domingos, "Markov Logic in Infinite Domains", In Proc. UAI-2007.
[Singla & Domingos, 2008] Parag Singla and Pedro Domingos, "Lifted First-Order Belief Propagation", In Proc. AAAI-2008.
[Taskar et al., 2002] Ben Taskar, Pieter Abbeel, Daphne Koller, "Discriminative probabilistic models for relational data", in Proc. UAI-2002.
![Page 158: Markov Logic in Natural Language Processing Hoifung Poon Dept. of Computer Science & Eng. University of Washington](https://reader034.vdocument.in/reader034/viewer/2022051819/55147649550346b0158b52ff/html5/thumbnails/158.jpg)
158
References[Toutanova & Haghighi & Manning, 2008] Kristina Toutanova, Aria
Haghighi, Chris Manning, "A global joint model for semantic role labeling", Computational Linguistics.
[Wang & Domingos, 2008] Jue Wang and Pedro Domingos, "Hybrid Markov Logic Networks", In Proc. AAAI-2008.
[Wellman et al., 1992] Michael Wellman, John S. Breese, Robert P. Goldman, "From knowledge bases to decision models", Knowledge Engineering Review 7.
[Yoshikawa et al., 2009] Katsumasa Yoshikawa, Sebastian Riedel, Masayuki Asahara and Yuji Matsumoto, "Jointly Identifying Temporal Relations with Markov Logic", In Proc. ACL-2009.