6.899 relational data learning
DESCRIPTION
6.899 Relational Data Learning. Yuan Qi MIT Media Lab [email protected] May 7, 2002. Outline. Structure Learning Using Stochastic Logic Programming (SLP) Text Classification Using Probabilistic Relational Models (PRM). Part 1:Structure Learning Using SLP. - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/2.jpg)
Outline
Structure Learning Using Stochastic Logic Programming (SLP)
Text Classification Using Probabilistic
Relational Models (PRM)
![Page 3: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/3.jpg)
Part 1:Structure Learning Using SLP
SLP defines prior over BN structures
MCMC sampling BN structures
New Sampling Method
![Page 4: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/4.jpg)
An SLP Defining prior BN structuresbn([],[],[]). bn([RV|RVs],BN,AncBN):- bn(RVs, BN2, AncBN2),connect_no_cycles(RV,BN2,AncBN2,BN,AncBN).% An edge: RV parent of H 1/3:: which_edge([H|T],RV,[H-RV|Rest]):-choose_edges(T,RV,Rest).% An edge: H parent of RV1/3:: which_edge([H|T],RV,[RV-H|Rest]) :-choose_edges(T,RV,Rest).% No edge1/3:: which_edge([_H|T],RV,Rest) :-choose_edges(T,RV,Rest).
![Page 5: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/5.jpg)
Metropolis-Hasting Sampling
p(T) specifies a tree prior for BN structures. Sampling T* from the transition distribution
q(Ti,T*). Set Ti = T* with the acceptance ratio
else set Ti+1 = Ti.
1,)(),|(),(
)(),|(),(min),(
*
****
iii
ii
TpTXYpTTq
TpTXYpTTqTT
![Page 6: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/6.jpg)
The Transition Kernel (1) The transition kernel can be implemented by
generating a new derivation(yielding a new model M*) from the derivation which yields the current model Mi. To be specific, we have
Backtrack one step to the most recent choice point in the SLD-tree (i.e., the probability tree)
If at the top of the tree, stop. Otherwise, backtrack one more step to the next choice point with a predefined backtrack probability pb.
![Page 7: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/7.jpg)
The Transition Kernel (2)
Once stopped backtracking, choose a new leaf M* from the choice point by selecting branches according to their probabilities attached to them (loglinear sampling). However, we may not choose the branch that leads back to Mi.
![Page 8: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/8.jpg)
Sampling Problems
Inefficiency of the previous Metropolis-Hasting sampling. pb =0.8, Acceptance ratio: 4%.
– lf pb is small, slow movement of the samples, higher acceptance ratio
– lf pb is large, large movement of the samples, lower acceptance ratio
Fixed pb: the balance between local jumps to neighboring models and big jumps to distant ones.
An Improvement: Cyclic transition kernel pb = 1-2-n for n = 1,….28.
![Page 9: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/9.jpg)
Adaptive Sampling Strategy: Re-Try the Proposals
Suppose a proposal T1 from the proposal distribution q1(T, T1) is tried and rejected. The rejection suggests that this proposal distribution may not be good and a different proposal could be tried. Suppose a new sample T2 is drawn from a new proposal q2(T, T1 , T2).
But how to get a valid Markov sampling chain?
![Page 10: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/10.jpg)
Adaptive Sampling Strategy: New Acceptance Ratio
If we use the following acceptance ratio:
1,),,()),(1)(,()(),|(
),,()),(1)(,()(),|(min
),,(
2102
10101
00
0122
12121
22
210
TTTqTTTTqTpTXYp
TTTqTTTTqTpTXYp
TTT
then we have a valid MCMC sampler for the target distribution, that is, the posterior of BN structures.
![Page 11: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/11.jpg)
Part 1: Conclusion
To adaptively sample BN structures, we can start with large backtrack probability pb, and if get rejected samples, we reduce pb and draw a sampled structure using the new backtrack probability. This process can be repeated.
Adaptive proposal distribution allows the SLP sampler to locally tune its parameter to achieve a good balance between local jumps to neighboring models and big jumps to distant ones. Therefore, we expect a much more efficient sampling result.
![Page 12: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/12.jpg)
Part 2 Text Classification Using Probabilistic Relational Models (PRM)
Why using PRMs? SLP: Discrete R.V.s PRM: Discrete and Continuous R.V.s
Why relational modeling of text? Author relation Citation relation
![Page 13: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/13.jpg)
Modeling Relational Text Data
Figure 1.PRM modeling of text. By Taskar, Segal, and Koller
Unrolled Bayesian Network
![Page 14: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/14.jpg)
Transduction: Train and Testing together
The test data are also included in the model
Transduction: EM AlgorithmE step: Belief propagation
M step: Maximum Likelihood Re-estimation
![Page 15: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/15.jpg)
Several Problems of Modeling in Figure 1
Naïve Bayes (Independence) assumption on generating words
Wrong edge direction between words and topic nodes
Wrong edge direction between a paper and its citations.
![Page 16: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/16.jpg)
Drawback of EM training and Transductions
High dimensional data, relatively limited training points
Transduction: helps training, but is very expensive for testing, since we need retraining the whole model for a new data point .
![Page 17: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/17.jpg)
New Modeling and Bayesian Training
The new node, h, models a classifier which takes input from words, aggregated citation and aggregated author.
![Page 18: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/18.jpg)
Training the new PRM
Unrolling this new PRM, we get a Bayesian network modeling the text data.
Training: Extension of belief propagation, expectation propagation.
We can also easily incorporate the kernel trick like in SVM or Gaussian processes into the classifier h. Note that h models the conditional relation between the text class and words, citations, and authors.
![Page 19: 6.899 Relational Data Learning](https://reader033.vdocument.in/reader033/viewer/2022052603/5681481f550346895db54a1d/html5/thumbnails/19.jpg)
Part 2: Conclusion
Benefit of the new approach: No overfitting like ML approaches Choice of using transduction or not. Much more powerful classifier,
Bayesian Point Machine with Kernel Expansion, compared to Naïve Bayes method
Better Relation modeling