graph embeddingsembeddings.pdf · 2019. 10. 16. · 1. a graph embedding is a fixed length vector...

40
Graph Embeddings Alicia Frame, PhD October 10, 2019

Upload: others

Post on 31-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Graph EmbeddingsAlicia Frame, PhDOctober 10, 2019

Page 2: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

What’s an embedding?How do these work?- Motivating Example - Word2Vec- Motivating Example - DeepWalkGraph embeddings overviewGraph embedding techniquesGraph embeddings with Neo4j

2

Overview

Page 3: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

What does the internet say?- Google: “An embedding is a relatively

low-dimensional space into which you can translate high-dimensional vectors”

- Wikipedia: “In mathematics, an embedding is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.”

3

TL;DR - what’s an embedding?

A way of mapping something (a document, an image, a graph) into a fixed length vector (or matrix) that captures key features while reducing the dimensionality

Page 4: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Graph embeddings are a specific type of embedding that translate graphs, or parts of graphs, to fixed length vectors (or tensors)

4

So what’s a graph embedding?

Page 5: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

An embedding translates something complex into something a machine can work with- Represents the important features of the input object in a

compact, low dimensional format- Embedded representation can be used as a feature for ML, for

direct comparisons, or as an input representation for a DL model

Embeddings - typically - learn what’s important in an unsupervised, generalizable way.

5

But why bother?

Page 6: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

6

Motivating Examples

Page 7: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

How can I represent words in a way that I can use them mathematically?- How similar are two words?- Can I use the representation of a word in a model?

Naive approach - how similar are the strings?- Hand engineered rules?- How many of each letter?

CAT = [10100000000000000001000000]

7

Motivating example: Word Embeddings

Page 8: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Frequency matrix:

8

Motivating example: Word Embeddings

Weighted term frequency (TF-IDF)Can we use documents to encode words?

Page 9: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Word order probably matters too: Words that occur together have similar contexts.

9

Motivating example: Word Embeddings

- “Tylenol is a pain reliever,” “Paracetamol is a pain reliever” same context

- Co-occurence: how often do two words appear in the same context window?

- Context window: specific number and direction

He is not lazyHe is intelligent

He is smart

He is not lazyHe is intelligent

He is smart

Page 10: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Word order probably matters too: Words that occur together have similar contexts.

10

Motivating example: Word Embeddings

- “Tylenol is a pain reliever,” “Paracetamol is a pain reliever” same context

- Co-occurence: how often do two words appear in the same context window?

- Context window: specific number and direction

3 3

Page 11: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Why not stop here?- You need more documents to really understand context … but

the more documents you have the bigger your matrix is- Giant sparse matrices or vectors are cumbersome and

uninformative

We need to reduce the dimensionality of our matrix

11

Motivating example: Word Embeddings

Page 12: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Count Based Methods: Linear algebra to the rescue?

Pros: Preserves semantic relationships, accurate, known methodsCons: Huge memory requirements, not trained for a specific task

12

Motivating Example: Word Embeddings

Page 13: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

13

Motivating Example: Word Embeddings

Predictive Methods: learn an embedding for a specific task

Page 14: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

14

Motivating Example: Word Embeddings

Predictive Methods: learn an embedding for a specific task

Page 15: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words

15

Motivating Example: Word Embeddings

input word - one hot encoded vector

output prediction - probability, for each word in the corpus, that it’s the next word

Page 16: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words

16

Motivating Example: Word Embeddings

input word - one hot encoded vector

output prediction - probability, for each word in the corpus, that it’s the next word

Page 17: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

The SkipGram model learns a vector representation for each word that maximizes the probability of that word given the previous words

17

Motivating Example: Word Embeddings

The hidden layer is a weight matrix with one row per word, and one column per neuron -- this is the embedding!

Page 18: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Maximize the probability that the next word is w_t given h:

Train model by maximizing the log-likelihood over the training set:

Skipgram model calculates:

18

(if we really want to get into the math)

Page 19: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

19

Motivating Example: Word Embeddings

Word embeddings condense representations of the words while preserving context:

Page 20: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

20

Cool, but what’s this got to do with graphs?

Page 21: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Motivating example: DeepWalk

21

How do we represent a node in a graph mathematically? Can we adapt word2vec?- Each node is like a word- Neighborhood around the node is the context window

Page 22: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Extract the context for each node by sampling random walks from the graph:

For every node in the graph, take n fixed length random walks (equivalent to sentences)

22

Motivating example: DeepWalk

Page 23: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Once we have our sentences, we can extract the context windows and learn weights using the same skip-gram model

(Objective is to predict neighboring nodes given the target node)23

Motivating example: DeepWalk

Page 24: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Embeddings are the hidden layer weights from the skipgram model

Note: there are also equivalent methodologies to the matrix factorization approaches or hand engineered approaches we talked about for words as well

24

Motivating example: DeepWalk

Page 25: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

25

Graph Embeddings Overview

Page 26: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

There are lots of graph embeddings...

26

What type of graph are you trying to create an embedding for?- Monopartite graphs (DeepWalk is designed for these)- Multipartite graphs (eg. Knowledge Graphs)

What aspect of the graph are you trying to represent?- Vertex embeddings: describe connectivity of each node- Path embeddings: traversals across the graph- Graph embeddings: encode an entire graph into a single vector

What tp

Page 27: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Most techniques consist of:

- A similarity function that measures the similarity between nodes- An encoder function: generates the node embedding- A decoder function to reconstruct pairwise similarity- A loss function that measures how good your reconstruction is

27

Node embedding overview

Page 28: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Shallow - Encoder function is an embedding lookup

Matrix Factorization:- These techniques all rely on an adjacency matrix input- Matrix factorization is applied either directly to the input or

some transformation of the input

Random Walk:- Obtain node co-occurrence via random walks- Learn weight to optimize similarity measure

28

Shallow Graph Embedding Techniques

Page 29: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Shallow - Encoder function is an embedding lookup

Matrix Factorization:- These techniques all rely on an adjacency matrix input- Matrix factorization is applied either directly to the input or

some transformation of the input

Random Walk:- Obtain node co-occurrence via random walks- Learn weight to optimize similarity measure

29

Shallow Graph Embedding Techniques

Massive memory footprint Computationally intense

Page 30: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Shallow - Encoder function is an embedding lookup

Matrix Factorization:- These techniques all rely on an adjacency matrix input- Matrix factorization is applied either directly to the input or

some transformation of the input

Random Walk:- Obtain node co-occurrence via random walks- Learn weight to optimize similarity measure

30

Shallow Graph Embedding Techniques

Massive memory footprint Computationally intense

Local-only perspectiveAssumes similar nodes are close together

Page 31: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Matrix Factorization:- These techniques all rely on an adjacency matrix input- Matrix factorization is applied either directly to the input or

some transformation of the input

Random Walk:- Obtain node co-occurrence via random walks- Learn weight to optimize similarity measure

31

Shallow Graph Embedding Techniques

Massive memory footprint Computationally intense

Local-only perspectiveAssumes similar nodes are close together

Page 32: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Why not stick with these?- Shallow embeddings are inefficient - no parameters shared

between nodes- Can’t leverage node attributes- Only generate embeddings for nodes present when the

embedding was trained - problematic for large, evolving graphsNewer methodologies - compress information- Neighborhood autoencoder methods- Neighborhood aggregation- Convolutional autoencoders

32

Shallow Embeddings

Page 33: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

33

Autoencoder methods

Page 34: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Using Graph Embeddings

34

Page 35: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Why are we going to all this trouble?

35

Visualization & pattern discovery:- Leveraging lots of existing - t-SNE plots- PCA

Clustering and community detection:- Apply generic tabular data approaches (eg. k-means) but allows

capture of both functional and structural roles- KNN graphs based on embedding similarity

Page 36: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Node classification/semi-supervised learningPredict missing node attributesLink prediction- predict edges not present in the graph- Either using similarity measures/heuristics or ML pipelines

36

Why are we going to all this trouble?

Embeddings can make the graph algorithm library even more powerful!

Page 37: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Graph Embeddings in Neo4j

37

Page 38: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

Two prototype implementations from Labs: DeepWalk & DeepGL- DeepGL is more similar to a “hand crafted” embedding- Uses graph algorithms to generate features- Diffusion of values across edges, dimensionality reduction

Neither is ready for production use - but lessons learned!- Lots of demand- Memory intensive and not turned for performance- Deep Learning is not easy in Java

Python is easy to get started with for experimentation, but doesn’t perform at scale38

Neo4j Labs Implementations

Page 39: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

We’re actively exploring the best ways to implement graph embeddings at scale so please stay tuned

39

...So what’s next?

Page 40: Graph EmbeddingsEmbeddings.pdf · 2019. 10. 16. · 1. A graph embedding is a fixed length vector of a. Numbers b. Letters c. Nodes 2. An embedding is a _____ representation of your

1. A graph embedding is a fixed length vector ofa. Numbersb. Lettersc. Nodes

2. An embedding is a ______________ representation of your dataa. Human readableb. Lower dimensionalc. Binary

3. What’s the name of the graph embedding we walked through in this presentation?40

Hunger Games!