experiments in graph-based semi-supervised learning for ...partha pratim talukdar * (search labs,...

Partha Pratim Talukdar* (Search Labs, MSR)

Fernando Pereira (Google)

Experiments in Graph-based Semi-Supervised Learning

for Class-Instance Acquisition

ACL, 2010 (Uppsala, Sweden) *Work done while at the Univ. of Pennsylvania

Class-Instance Acquisition

• Given an entity, assign human readable descriptors to it– e.g., Toyota is a car manufacturer, japanese

company, multinational company, ...

• Large scale, open domain

• Applications– Web search, Advertising, etc.

Graph-based Class-Instance Acquisition

Bob Dylan (0.95)Johnny Cash (0.87)

Billy Joel (0.82)

Set 2Set 1Billy Joel (0.75)

Johnny Cash (0.73)

PatternTable Mining

Billy Joel (0.82)

Johnny Cash (0.73)

PatternTable Mining

Musician Singer

Billy Joel (0.82)

Johnny Cash (0.73)

PatternTable Mining

Cluster ID

Musician Singer

Billy Joel (0.82)

Johnny Cash (0.73)

PatternTable Mining

Cluster ID

ExtractionConfidence

Musician Singer

Billy Joel (0.82)

Johnny Cash (0.73)

Bob Dylan

Johnny Cash

Billy Joel

PatternTable Mining

Cluster ID

Musician Singer

Billy Joel (0.82)

Johnny Cash (0.73)

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

SeedClasses

PatternTable Mining

Cluster ID

Musician Singer

Billy Joel (0.82)

Johnny Cash (0.73)

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

SeedClasses

PatternTable Mining

Cluster ID

Musician Singer

Can we infer that Bob Dylan is also a Musician, as that is missing in current

extractions?

Graph-based Class-Instance Acquisition [Talukdar et al., EMNLP 2008]

Bob Dylan

Johnny Cash

Billy Joel

InitializationSet 2

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

SeedLabels

PredictedLabels

Iteration 1Set 2

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Musician 0.8Musician 1.0

Musician 1.0

SeedLabels

PredictedLabels

Iteration 2Set 2

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Musician 0.8Musician 1.0

Musician 1.0

SeedLabels

Musician 0.6

PredictedLabels

• Graph-based Class-Instance Acquisition– Overview & previous work

• Review & comparison of three SSL algorithms– LP-ZGL (Zhu+, 2003)

– Adsorption (Baluja+, 2008)– Modified Adsorption (MAD) (Talukdar and Crammer, 2009)

• Additional Semantic Constraints

• Summary

Outline

• Summary

Outline

Comments on the Constructed Graph

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Smoothness:Nodes connected by an edge should be assigned similar classes, as enforced by edge weight.

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Initial Clusters

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Initial Clusters

Coupling Node:Force (softly) all instance nodes connected to it to have similar class labels, exploiting the Smoothness requirement.

Bob Dylan

Johnny Cash

Billy Joel

Musician 1.0

Initial Clusters

Seed classes can be different from the cluster IDs

Coupling Node:Force (softly) all instance nodes connected to it to have similar class labels, exploiting the Smoothness requirement.

LP-ZGL[Zhu et al., ICML 2003]

• m labels

• W : edge weight matrix

• Yul: weight of label l on node u

• Yul: seed weight of label l on node u

• S: diagonal matrix, non-zero for seed nodes

• m labels

arg minY

Wuv(Yul ! Yvl)2; s.t. SYl = SYl

• m labels

arg minY

Smooth

• m labels

arg minY

Match Seeds (hard)Smooth

Modified Adsorption (MAD)[Talukdar and Crammer, ECML 2009]

argminY

m+1�

��SY l − SY l�2 + µ1

Muv(Y ul − Y vl)2 + µ2�Y l −Rl�2

• m labels, +1 dummy label

• M = W� +W is the symmetrized weight matrix

• Y vl: weight of label l on node v

• Y vl: seed weight for label l on node v

• S: diagonal matrix, nonzero for seed nodes

• Rvl: regularization target for label l on node v

argminY

m+1�

��SY l − SY l�2 + µ1

Match Seeds (soft)

argminY

m+1�

��SY l − SY l�2 + µ1

Match Seeds (soft) Smooth

argminY

m+1�

��SY l − SY l�2 + µ1

Match Seeds (soft) SmoothMatch Priors(Regularizer)

argminY

m+1�

��SY l − SY l�2 + µ1

MAD has extra regularization compared

to LP-ZGL

Match Seeds (soft) SmoothMatch Priors(Regularizer)

MAD (contd.)

Inputs Y ,R : |V |× (|L|+ 1), W : |V |× |V |, S : |V |× |V | diagonalY ← YM = W +W�

Zv ← Svv + µ1�

u �=v Mvu + µ2 ∀v ∈ Vrepeat

for all v ∈ V doY v ← 1

�(SY )v + µ1Mv·Y + µ2Rv

end foruntil convergence

MAD (contd.)

Zv ← Svv + µ1�

• Easily Parallelizable: Scalable

MAD (contd.)

Zv ← Svv + µ1�

• Easily Parallelizable: Scalable• Importance of a node can be discounted

MAD (contd.)

Zv ← Svv + µ1�

• Easily Parallelizable: Scalable• Importance of a node can be discounted• Convergence guarantee under mild conditions

Experiments with Public Datasets

• Previous research used proprietary datasets– difficult to reproduce and extend

• Public datasets are used in the paper– Freebase: relational tables from multiple sources

– TextRunner (Ritter+, 2009): Open-domain IE System

– YAGO (Suchanek+, 2007): KB curated from Wikipedia and WordNet

– Gold standard hypernyms: [Pantel+, 2009], WordNet

• Public datasets are used in the paper– Freebase: relational tables from multiple sources

– TextRunner (Ritter+, 2009): Open-domain IE System

– YAGO (Suchanek+, 2007): KB curated from Wikipedia and WordNet

– Gold standard hypernyms: [Pantel+, 2009], WordNet

• Available at: http://www.talukdar.net/datasets/class_inst/10

Graph Stats

Statistics of Graphs used in Experiments

Evaluation Metric: Mean Reciprocal Rank (MRR)

MRR =1

|test-set|�

v∈test-set

rankv(class(v))

Evaluation Metric: Mean Reciprocal Rank (MRR)

MRR =1

|test-set|�

v∈test-set

rankv(class(v))

Billy JoelLinguist 0.4Musician 0.3

Gold Label: MusicianMRR: 0.5 (1/2)

Graph-based SSL Comparisons (1)

170 x 2 170 x 10

TextRunner Graph, 170 WordNet Classes

Amount of Supervision

LP-ZGL Adsorption MAD

Graph with175k nodes,529k edges.

Graph-based SSL Comparisons (2)

192 x 2 192 x 10

Freebase-2 Graph, 192 WordNet Classes

Amount of Supervision

LP-ZGL Adsorption MAD

Graph with303k nodes,2.3m edges.

When is MAD most effective?

0 7.5 15 22.5 30

Average Degree

When is MAD most effective?

0 7.5 15 22.5 30

Average Degree

MAD is most effective in dense graphs, where there is greater need for

regularization.

• Summary

Outline

• Summary

Outline

Can Additional Semantic Constraints Help ?

Isaac Newton

Johnny Cash

Bob Dylan

Instances with shared

attributes are likely to be

from the same class.

Isaac Newton

Johnny Cash

Bob Dylan

has_attribute-albums

Isaac Newton

Johnny Cash

Bob Dylan

has_attribute-albums

Isaac Newton

Johnny Cash

Bob Dylan

Graph-based representation makes it easy to incorporate such constraints!

Better Classes with YAGO Attributes

18 0.3

170 WordNet Classes, 10 Seeds per Class, using Adsorption

TextRunner GraphYAGO GraphTextRunner + YAGO Graph

18 0.3

Graph constructed from TextRunner (UWash) output, 175k nodes, 529k edges

18 0.3

Graph constructed from output of YAGO Knowledge Base, 142k nodes, 777k edges

18 0.3

Combined graph, with 237k nodes, 1.3m edges

18 0.3

Additional semantic constraints help improve performance significantly.

Classes for Attributes

• Classes assigned to attribute nodes

has_isbn

wordnet_bookwordnet_maganize

has_isbn

Evidence that attributes propagate right classes.

Summary

• Compared three graph-based SSL algorithms for class-instance acquisition– MAD is consistently most effective, particularly in

dense graphs

Summary

dense graphs

• Better class-instance acquisition with additional semantic constraints– easy to add such constraints in graph-based

representation

Summary

dense graphs

• Better class-instance acquisition with additional semantic constraints– easy to add such constraints in graph-based

representation

• Datasets available: www.talukdar.net/datasets/class_inst/

– for reproducibility and extension

Thank You!http://www.talukdar.net/datasets/class_inst/

experiments in graph-based semi-supervised learning for ...partha pratim talukdar * (search labs,...

Documents

semi-supervised clustering

augmenting feature-driven fmri analyses: semi-supervised...

federated semi-supervised learning with inter-client ...2.1....

learning semi-supervised representation … semi-supervised...

semi-supervised semantic segmentation needs strong, varied...

a semi-supervised learning algorithm via adaptive laplacian...

semi-supervised learning tutorial

semi-supervised information extraction

semi-supervised text categorization.pdf

semi-supervised factored logistic regression for...

phenotype prediction with semi-supervised...

multilingual metaphor processing: experiments with semi...

semi-supervised learningzhuxj/tmp/book.pdf1.1.2...

large graph construction for scalable semi …...motivation...

semi-supervised support vector...

semi-supervised learning - ubc computer...

supervised nonlinear factorizations excel in semi-supervised...

inductive semi-supervised learning

s4l: self-supervised semi-supervised learning...and existing...

semi-supervised deep learning with...