deepwalk: online learning of socialwlh/comp766/files/carlos_oliver.pdf · deepwalk: online learning...

32
DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20, 2020 1 Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. ”Deepwalk: Online learning of social representations.” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.

Upload: others

Post on 05-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

DeepWalk: Online Learning of SocialRepresentations 1

Presented by Carlos Oliver for COMP 766

January 20, 2020

1Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. ”Deepwalk: Online learning of social representations.”

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2014.

Page 2: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Motivation

I Can we predict the label of a node given the graph?

I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)

Page 3: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Motivation

I Can we predict the label of a node given the graph?

I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)

Page 4: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Motivation

I Can we predict the label of a node given the graph?

I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)

Page 5: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

IdeaI Separate labels from underlying structure.

I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering

I Limitations: often require full graph to compute ordomain-specific knowledge.

Page 6: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

IdeaI Separate labels from underlying structure.

I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering

I Limitations: often require full graph to compute ordomain-specific knowledge.

Page 7: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

IdeaI Separate labels from underlying structure.

I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering

I Limitations: often require full graph to compute ordomain-specific knowledge.

Page 8: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Relationship between social graphs and natural language

I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.

I Seeing words co-occur gives us information about thestructure of the language (or graph).

Page 9: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Relationship between social graphs and natural language

I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.

I Seeing words co-occur gives us information about thestructure of the language (or graph).

Page 10: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Relationship between social graphs and natural language

I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.

I Seeing words co-occur gives us information about thestructure of the language (or graph).

Page 11: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Random walks ∼ sentences

I Since co-occurence tells us about the structural context.Sample random walks from each node.

I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)

Page 12: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Random walks ∼ sentences

I Since co-occurence tells us about the structural context.Sample random walks from each node.

I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)

Page 13: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Learning objective

I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.

arg minΦ− logP(v1, .., vi−1|Φ(vi ))

I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.

I Nodes with similar Φ will have similar local graph ‘structure’.

Page 14: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Learning objective

I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.

arg minΦ− logP(v1, .., vi−1|Φ(vi ))

I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.

I Nodes with similar Φ will have similar local graph ‘structure’.

Page 15: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Learning objective

I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.

arg minΦ− logP(v1, .., vi−1|Φ(vi ))

I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.

I Nodes with similar Φ will have similar local graph ‘structure’.

Page 16: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.

I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes

I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))

Page 17: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.

I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes

I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))

Page 18: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.

I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes

I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))

Page 19: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.

I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes

I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))

Page 20: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: Hierarchical Softmax

I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

I

P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))

Page 21: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: Hierarchical Softmax

I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

I

P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))

Page 22: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: Hierarchical SoftmaxI Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

I

P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))

Page 23: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Training loop

Page 24: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Evaluation

I Task: Multi-label node classification on large social graphs.I Evaluation: micro,macro F1 score

I F1 score is the harmonic mean of precision and recallI Macro is the arithmetic mean of F1 over all classesI Micro is total proportion of correct labels over all samples.

Page 25: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Parameter Sensitivity

Page 26: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Summary

I Modelling random walks as sentences in a language gives us:

I Continuous & Low Dimensional representations →robustness

I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical

Softmax

Page 27: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Summary

I Modelling random walks as sentences in a language gives us:

I Continuous & Low Dimensional representations →robustness

I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical

Softmax

Page 28: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Summary

I Modelling random walks as sentences in a language gives us:

I Continuous & Low Dimensional representations →robustness

I Online training → walks naturally partition the graph

I Scalable implementation → SkipGram & HierarchicalSoftmax

Page 29: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Summary

I Modelling random walks as sentences in a language gives us:

I Continuous & Low Dimensional representations →robustness

I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical

Softmax

Page 30: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Limitations

I Representations themselves left unexplored.

I Number of parameters ∝ number of nodes in the graph.

I Similarity is only defined on a single graph.

Page 31: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Limitations

I Representations themselves left unexplored.

I Number of parameters ∝ number of nodes in the graph.

I Similarity is only defined on a single graph.

Page 32: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Limitations

I Representations themselves left unexplored.

I Number of parameters ∝ number of nodes in the graph.

I Similarity is only defined on a single graph.