oregon state university – cs539 prms learning probabilistic models of link structure getoor,...

22
Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Learning Probabilistic Models of

Link Structure

Getoor, Friedman, Koller, Taskar

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Example Application: WebKB

Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class

of those pages, recursively) Words in the “anchor text” from the other

page <a href=“url”>anchor text</a>. Web pages obtained from Cornell,

Texas, Washington, and Wisconsin

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Example Application: CORA

Classify documents according to topic (7 levels) using… words in the document papers cited by the document papers citing the document

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Standard PRM

parents(Doc.class) = {MODE(Doc.citers.class),MODE(Doc.cited.class)}

Document

class

words Document

class

words

Document

class

words

Document

class

words

Document

class

words

Document

class

words

Document

class

words

Document

class

words

citers

cited

MODE

MODE

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Problem: The Citation Structure is Fixed

The existence (or non-existence) of a link cannot serve as evidence

Individually-linked papers only influence the class through the MODE.

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Possible Solution: Link Uncertainty

Model the existence of links as random variables

Create a Link instance for each pair of possibly-linked objects

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Unrolled Network

Document

class

words

Document

class

words

Document

class

wordsCites

Exists

Cites

Exists

Cites

Exists

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Getoor’s Diagram

Entity classes (Paper) Relation classes (Cites) Technically, every instance has an Exists

variable which is true for all Entity instances.

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Semantics

P is the basic CPT P* will be the equivalent unrolled CPT Require that an object does not exist if

any of the objects it points to do not exist

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms WebKB Network

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Experimental Results

Cora and WebKB

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms WebKB with various features

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

A Second Approach:Reference Uncertainty

Treat reference attributes as random variables Each reference attribute takes as value an

object of the indicated class

Citation Citing: reference attribute, value is a Paper Cited: reference attribute, value is a Paper

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Problems

How many citation objects exist? Consequently, how many reference random variables exist?

How do we represent P(Citation.cites | …)? Citation.cites could take on thousands of possible values. Huge conditional probability table Costly inference at run time

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

SolutionsProblem 1: How many citations?

Fix the number of Citation objects This gives the “object skeleton”

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Problem 2: Too many potential values for a reference attribute

Attach to each reference attribute a set of partition attributes The reference attribute chooses a partition A Paper is then chosen uniformly at random from

the partition

Citation

CitingCited

PaperPaper

Paper

Theory

PaperPaper

Paper

GraphicsPaper

PaperPaper

Learning

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Representing Constraints Between Citing and Cited Papers

Parents(Cites.Cited) = {Cites.Citing.Topic}

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Details

Each reference attribute has a selector attribute S that chooses the partition.

Citation

PaperPaper

Paper

Learning

PaperPaper

Theory

Paper

Paper

Graphics

PaperPaper

Sciting

Citing

Scited

Cited

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Class-level Dependency Graph

Five types of edges Type I: edges within a single object Type II: edges between objects Type III: edges from every reference attribute along

any reference paths Type IV: edges from every partition attribute to the

selector attributes that use those partition attributes to choose a partition

Type V: edge from selector attributes to their corresponding reference attributes

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Movie Theater Example

Type I: Genre Popularity Type II: Shows.Movie.Genre Shows.Profit

Shows.Theater.Type SMovie

Type III: Move Profit; Theater Smovie

Type IV: Genre SMovie

Type V: STheater Theater; SMovie Movie

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Unrolled Graph?

The Unrolled Graph can have a huge number of edges

Is learning and inference really feasible?

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Homework Exercise

Construct the dependency graph for the citation example

Construct an unrolled network for a reference uncertainty example