markov logic networks: a step towards a unified theory of learning and cognition
DESCRIPTION
Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition. Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner. One Algorithm. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/1.jpg)
Markov Logic Networks:A Step Towards a Unified Theory of
Learning and Cognition
Pedro DomingosDept. of Computer Science & Eng.
University of Washington
Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner
![Page 2: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/2.jpg)
One Algorithm
Observation:The cortex has the samearchitecture throughout
Hypothesis: A singlelearning/inferencealgorithm underliesall cognition
Let’s discover it!
![Page 3: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/3.jpg)
The Neuroscience Approach Map the brain and figure out how it works Problem: We don’t know nearly enough
![Page 4: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/4.jpg)
The Engineering Approach Pick a task (e.g., object recognition) Figure out how to solve it Generalize to other tasks Problem: Any one task is too impoverished
![Page 5: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/5.jpg)
The Foundational Approach
Consider all tasks the brain does Figure out what they have in common Formalize, test and repeat Advantage: Plenty of clues Where to start?
The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms.
Albert Einstein
![Page 6: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/6.jpg)
Recurring Themes
Noise, uncertainty, incomplete information
→ Probability / Graphical models
Complexity, many objects & relations
→ First-order logic
![Page 7: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/7.jpg)
Examples
Field Logical approach
Statistical approach
Knowledge representation
First-order logic Graphical models
Automated reasoning
Satisfiability testing
Markov chain Monte Carlo
Machine learning Inductive logic programming
Neural networks
Planning Classical planning
Markov decision processes
Natural language
processing
Definite clause grammars
Prob. context-free grammars
![Page 8: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/8.jpg)
Research Plan
Unify graphical models and first-order logic Develop learning and inference algorithms Apply to wide variety of problems This talk: Markov logic networks
![Page 9: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/9.jpg)
Markov Logic Networks
MLN = Set of 1st-order formulas with weights Formula = Feature template (Vars→Objects) E.g., Ising model:
Most graphical models are special cases First-order logic is infinite-weight limit
Weight of formula i No. of true instances of formula i in x
iii xnw
ZxP )(exp
1)(
Up(x) ^ Neighbor(x,y) => Up(y)
![Page 10: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/10.jpg)
MLN Algorithms:The First Three Generations
Problem First generation
Second generation
Third generation
MAP inference
Weighted satisfiability
Lazy inference
Cutting planes
Marginal inference
Gibbs sampling
MC-SAT Lifted inference
Weight learning
Pseudo-likelihood
Voted perceptron
Scaled conj. gradient
Structure learning
Inductive logic progr.
ILP + PL (etc.)
Clustering + pathfinding
![Page 11: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/11.jpg)
Weighted Satisfiability
SAT: Find truth assignment that makes allformulas (clauses) true Huge amount of research on this problem State of the art: Millions of vars/clauses in minutes
MaxSAT: Make as many clauses true as possible Weighted MaxSAT: Clauses have weights;
maximize satisfied weight MAP inference in MLNs is just weighted MaxSAT Best current solver: MaxWalkSAT
![Page 12: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/12.jpg)
MC-SAT
Deterministic dependences break MCMC In practice, even strong probabilistic ones do Swendsen-Wang:
Introduce aux. vars. u to represent constraints among x Alternately sample u | x and x | u.
But Swendsen-Wang only works for Ising models MC-SAT: Generalize S-W to arbitrary clauses Uses SAT solver to sample x | u. Orders of magnitude faster than Gibbs sampling,
etc.
![Page 13: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/13.jpg)
Lifted Inference
Consider belief propagation (BP) Often in large problems, many nodes are
interchangeable:They send and receive the same messages throughout BP
Basic idea: Group them into supernodes, forming lifted network
Smaller network → Faster inference Akin to resolution in first-order logic
![Page 14: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/14.jpg)
Belief Propagation
Nodes (x)
Features (f)
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
![Page 15: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/15.jpg)
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
Nodes (x)
Features (f)
![Page 16: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/16.jpg)
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
, :Functions of edge counts
Nodes (x)
Features (f)
![Page 17: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/17.jpg)
Weight Learning
Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results
Voted perceptron:Gradient descent + MAP inference
Problem: Multiple modes Not alleviated by contrastive divergence Alleviated by MC-SAT Start each MC-SAT run at previous end state
![Page 18: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/18.jpg)
Weight Learning (contd.)
Problem: Extreme ill-conditioning
Solvable by quasi-Newton, conjugate gradient, etc. But line searches require exact inference Stochastic gradient not applicable because
data not i.i.d. Solution: Scaled conjugate gradient Use Hessian to choose step size Compute quadratic form inside MC-SAT Use inverse diagonal Hessian as preconditioner
![Page 19: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/19.jpg)
Structure Learning
Standard inductive logic programming optimizesthe wrong thing
But can be used to overgenerate for L1 pruning Our approach:
ILP + Pseudo-likelihood + Structure priors For each candidate structure change:
Start from current weights & relax convergence Use subsampling to compute sufficient statistics Search methods: Beam, shortest-first, etc.
![Page 20: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/20.jpg)
Applications to Date
Natural language processing
Information extraction Entity resolution Link prediction Collective classification Social network analysis
Robot mapping Activity recognition Scene analysis Computational biology Probabilistic Cyc Personal assistants Etc.
![Page 21: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/21.jpg)
Unsupervised Semantic ParsingMicrosoft buys Powerset. BUYS(MICROSOFT,POWERSET)Goal
USP
Evaluation
Microsoft buys PowersetMicrosoft acquires semantic search engine PowersetPowerset is acquired by Microsoft CorporationThe Redmond software giant buys PowersetMicrosoft’s purchase of Powerset, …
Challenge
Recursively cluster expressions composed of similar subexpressionsExtract knowledge from biomedical abstracts and answer questionsSubstantially outperforms state of the artThree times as many correct answers; accuracy 88%
![Page 22: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/22.jpg)
Research Directions
Compact representations Deep architectures Boolean decision diagrams Arithmetic circuits
Unified inference procedure Learning MLNs with many latent variables Tighter integration of learning and inference End-to-end NLP system Complete agent
![Page 23: Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition](https://reader036.vdocument.in/reader036/viewer/2022062305/56814a79550346895db7906a/html5/thumbnails/23.jpg)
Resources
Open-source software/Web site: Alchemy Learning and inference algorithms Tutorials, manuals, etc. MLNs, datasets, etc. Publications
Book: Domingos & Lowd, Markov Logic,Morgan & Claypool, 2009.
alchemy.cs.washington.edu