probabilistic models for structured...
TRANSCRIPT
![Page 1: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/1.jpg)
PROBABILISTIC MODELS FOR STRUCTURED DATA
Instructor: Yizhou [email protected]
January 6, 2020
1: Introduction
![Page 2: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/2.jpg)
Instructor
• Yizhou Sun
• http://web.cs.ucla.edu/~yzsun/
•Research areas
• graph mining, social/information network mining, text mining, web mining
• Data mining, machine learning
2
![Page 3: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/3.jpg)
Logistics of the Course
•Grading
• Participation: 5%
• Homework: 30%
• Paper presentation: 25%• Group-based
• Course project: 40%• Group-based
3
![Page 4: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/4.jpg)
Lectures
•Part I: Lectures by the instructor (5 weeks)
• Cover the basic materials
•Part II: paper presentation by students (4 weeks)
• Extended materials, which require in-depth reading of papers
•Part III: course project presentation (Week 10)
4
![Page 5: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/5.jpg)
Homework
•Weekly quick homework in Part I
•A quiz style homework for each paper, due every lecture in Part II
• The paper presenters are in charge of homework question, solution, and discussion, which is expected to finish in class
5
![Page 6: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/6.jpg)
Paper Presentation• What to present
• Each student sign-up for one group of research papers• Every group can be signed by at 3-4 students
• How long for each presentation?• 1 lecture, including Q&A, homework time, and homework
discussion
• When to present• From Week 6 to Week 9
• How to present• Make slides, when necessary, using blackboard
• What else?• Design a in-class homework with 1-2 well designed questions• Send the slides and homework (with correct answer) to me
the day before the lecture• Provide the discussion to the solution in class
6
![Page 7: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/7.jpg)
Course Project
• Research project• Goal: design a probabilistic graphical model to solve the candidate problems or problems of your own choice, and write a report that is potentially submitted to some venue for publication
• Teamwork• 3-4 people per group
• Timeline• Team formation due date: Week 2• Proposal due date: Week 5• Presentation due date: 3/12/2019 (10-12pm)• Final report due date: 3/13/2019• What to submit: project report and code
7
![Page 8: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/8.jpg)
Content
•What are probabilistic models
•What are structured data
•Applications
•Key tasks and challenges
8
![Page 9: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/9.jpg)
A Typical Machine Learning Problem
•Given a feature vector x, predict its label y (discrete or continuous)
𝑦 = 𝑓 𝒙
• Example: Text classification• Given a news article, which category does it belong to?
9
Argentina played to a frustrating 1-1 ties against Iceland on Saturday. A stubborn Icelandic defense was increasingly tough to penetrate, and a Lionel MESSI missed penalty was a huge turning point in the match, because it likely would’ve given Argentina three points.
SportsPoliticsEducation…
?
![Page 10: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/10.jpg)
Probabilistic Models
• Data: 𝐷 = 𝒙𝑖 , 𝑦𝑖 𝑖=1𝑛
• n: number of data points
• Model: 𝑝 𝐷 𝜃 𝑜𝑟 𝑝𝜃(𝐷)• Use probability distribution to address uncertainty
• 𝜃: parameters in the model
• Inference: ask questions about the model• Marginal inference: marginal probability of a variable
• Maximum a posteriori (MAP) inference: most likely assignment of variables
• Learning: learn the best parameters 𝜃
10
![Page 11: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/11.jpg)
The I.I.D. Assumption
• Assume data points are independent and identically distributed (i.i.d.)• 𝑝 𝐷|𝜃 = ς𝑖 𝑝(𝒙𝑖 , 𝑦𝑖|𝜃) (if modeling joint distribution)
• 𝑝 𝐷|𝜃 = ς𝑖 𝑝(𝑦𝑖|𝒙𝑖 , 𝜃) (if modeling conditional distribution, conditional i.i.d.)
• Example: linear regression
• 𝑦𝑖|𝒙𝑖 , 𝜷~𝑁(𝒙𝑖𝑇𝜷, 𝜎2)
• 𝑦𝑖 = 𝒙𝑖𝑇𝜷 + ε𝑖 , where ε𝑖~𝑁 0, 𝜎2
𝑝 𝐷 𝜷 =ෑ
𝑖
𝑝 𝑦𝑖 𝒙𝑖 , 𝜷) =ෑ
𝑖
1
2𝜋𝜎2exp{−
𝑦𝑖 − 𝒙𝑖𝑇𝜷
2
2𝜎2}
11
𝐿 𝜷 : 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
![Page 12: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/12.jpg)
Content
•What are probabilistic models
•What are structured data
•Applications
•Key tasks and challenges
12
![Page 13: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/13.jpg)
Structured Data
•Dependency between data points• Dependency are described by links
• Example: paper citation network• Citation between papers introduces dependency
13
![Page 14: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/14.jpg)
Examples of Structured Data
• Text
• sequence
• Image
• Grid / regular graph
• Social/Information Network
• General graph
14
The cat thesat on mat
![Page 15: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/15.jpg)
Roles of Data Dependency
• I.I.D. or conditional I.I.D. assumption no longer holds
• 𝑝 𝐷|𝜃 ≠ ς𝑖 𝑝 𝒙𝑖 , 𝑦𝑖 𝜃 , or
• 𝑝 𝐷|𝜃 ≠ ς𝑖 𝑝 𝑦𝑖 𝒙𝑖 , 𝜃
• Example
• In paper citation network, a paper is more likely to share the same label (research area) of its references
15
Paper i’s label Paper j’s label Probability
0 0 0.4
0 1 0.1
1 0 0.1
1 1 0.4
Suppose i cites j or j cites i
![Page 16: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/16.jpg)
Scope of This Course• A subset of probabilistic graphical model• Consider data dependency• Markov Random Fields, Conditional Random Fields, Factor Graph, and their applications in text, image, knowledge graph, and social/information networks
• Recent development of integrating deep learning and graphical models
• A full cover of probabilistic graphical models can be found:• Stanford course
• Stefano Ermon, CS 228: Probabilistic Graphical Models• Daphne Koller, Probabilistic Graphical Models, YouTube
• CMU course• Eric Xing, 10-708: Probabilistic Graphical Models
16
![Page 17: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/17.jpg)
Content
•What are probabilistic models
•What are structured data
•Applications
•Key tasks and challenges
17
![Page 18: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/18.jpg)
Text NER
•Named-Entity Recognition
• Given a predefined label set, determine each word’s label• E.g., B-PER, I-PER, O
• Possible solution: Conditional random field
• https://nlp.stanford.edu/software/CRF-NER.html
18
![Page 19: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/19.jpg)
Image Semantic Labeling
•Determine the label of each pixel
• Given a predefined label set, determine each pixel’s label
• Possible solution: Conditional random field
19
![Page 20: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/20.jpg)
Social Network Node Classification
•Attribute prediction of Facebook users
• E.g., gender
• Zheleva et al., Higher-order Graphical Models for Classification in Social and Affiliation Networks, NIPS’2010
20
![Page 21: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/21.jpg)
Content
•What are probabilistic models
•What are structured data
•Applications
•Key tasks and challenges
21
![Page 22: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/22.jpg)
Key Tasks
• Model• From data model to graphical model
• Define joint probability of all the data according to graphical model • 𝑝 𝐷 𝜃 𝑜𝑟 𝑝𝜃(𝐷)
• Inference• Marginal inference: marginal probability of a variable
• Maximum a posteriori (MAP) inference: most likely assignment of variables
• Learning• Learn the best parameters 𝜃
22
![Page 23: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/23.jpg)
Key Challenges
•Design challenges in modeling
• How to use heuristics to design meaningful graphical model?
•Computational challenges in inference and learning
• Usually are NP-hard problems
• Need approximate algorithms
23
![Page 24: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/24.jpg)
Course Overview• Preliminary
• Introduction
• Basic probabilistic models• Naïve Bayes• Logistic Regression
• Warm up: Hidden Markov Models• Forward Algorithm, Viterbi Algorithm, The Forward-Backward
Algorithm
• Markov Random Fields• General MRF, Pairwise MRF• Variable elimination, sum-product message passing, max-product
message passing, exponential family, pseudo-likelihood
• Conditional Random Fields• General CRF, Linear Chain CRF
• Factor Graph
24
![Page 25: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/25.jpg)
Probability Review
• Follow Stanford CS229 Probability Notes
• http://cs229.stanford.edu/section/cs229-prob.pdf
25
![Page 26: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/26.jpg)
Major Concepts• Elements of Probability
• Sample space, event space, probability measure• Conditional probability• Independence, conditional independence
• Random variables• Cumulative distribution function, Probability mass function (for discrete
random variable), Probability density function (for continuous random variable)
• Expectation, variance• Some frequently used distributions
• Discrete: Bernoulli, binomial, geometric, Poisson• Continuous: uniform, exponential, normal
• More random variables• Joint distribution, marginal distribution, joint and marginal probability mass
function, joint and marginal density function• Chain rule• Bayes’ rule• Independence• Expectation, conditional expectation, and covariance
26
![Page 27: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/27.jpg)
Summary
•What are probabilistic models
• Model uncertainty
•What are structured data
• Use links to capture dependency between data
•Applications
• Text, image, social/information network
•Key tasks and challenges
• Modeling, inference, learning
27
![Page 28: PROBABILISTIC MODELS FOR STRUCTURED DATAweb.cs.ucla.edu/~yzsun/classes/2020Winter_CS249/Slides/01Intro.pdf · •1 lecture, including Q&A, homework time, and homework discussion •When](https://reader036.vdocument.in/reader036/viewer/2022063021/5fe4c2c766fa4714b019c0a2/html5/thumbnails/28.jpg)
References
• Daphne Koller and Nir Friedman (2009). Probabilistic Graphical Models. The MIT Press.
• Kevin P. Murphy (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.
• Charles Sutton and Andrew McCallum (2014). An Introduction to Conditional Random Fields. Now Publishers.
• Zheleva et al., Higher-order Graphical Models for Classification in Social and Affiliation Networks, NIPS’2010
• https://cs.stanford.edu/~ermon/cs228/index.html
• https://nlp.stanford.edu/software/CRF-NER.html
• http://cs229.stanford.edu/section/cs229-prob.pdf
28