deepwalk: online learning of socialwlh/comp766/files/carlos_oliver.pdf · deepwalk: online learning...

DeepWalk: Online Learning of SocialRepresentations 1

Presented by Carlos Oliver for COMP 766

January 20, 2020

1Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. ”Deepwalk: Online learning of social representations.”

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2014.

Motivation

I Can we predict the label of a node given the graph?

I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)

Motivation

IdeaI Separate labels from underlying structure.

I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering

I Limitations: often require full graph to compute ordomain-specific knowledge.

Relationship between social graphs and natural language

I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.

I Seeing words co-occur gives us information about thestructure of the language (or graph).

Random walks ∼ sentences

I Since co-occurence tells us about the structural context.Sample random walks from each node.

I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)

Random walks ∼ sentences

I Since co-occurence tells us about the structural context.Sample random walks from each node.

I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)

Learning objective

I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.

arg minΦ− logP(v1, .., vi−1|Φ(vi ))

I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.

I Nodes with similar Φ will have similar local graph ‘structure’.

Learning objective

Speedups: SkipGram

I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.

I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes

I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))

Speedups: SkipGram

Speedups: Hierarchical Softmax

I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))

Speedups: Hierarchical Softmax

I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

Speedups: Hierarchical SoftmaxI Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

Training loop

Evaluation

I Task: Multi-label node classification on large social graphs.I Evaluation: micro,macro F1 score

I F1 score is the harmonic mean of precision and recallI Macro is the arithmetic mean of F1 over all classesI Micro is total proportion of correct labels over all samples.

Parameter Sensitivity

Summary

I Modelling random walks as sentences in a language gives us:

I Continuous & Low Dimensional representations →robustness

I Online training → walks naturally partition the graphI Scalable implementation → SkipGram & Hierarchical

Softmax

Summary

Softmax

Summary

I Online training → walks naturally partition the graph

I Scalable implementation → SkipGram & HierarchicalSoftmax

Summary

Softmax

Limitations

I Representations themselves left unexplored.

I Number of parameters ∝ number of nodes in the graph.

I Similarity is only defined on a single graph.

Limitations

deepwalk: online learning of socialwlh/comp766/files/carlos_oliver.pdf · deepwalk: online learning...

Documents