graph-based approaches to nlp al akhwayn summer school 2015 horacio rodríguez talp research center...

37
Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya [email protected] http://www.lsi.upc.edu/~horacio

Upload: charles-oconnor

Post on 05-Jan-2016

219 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graph-based Approaches to NLP

Al Akhwayn Summer School 2015

Graph-based Approaches to NLP

Al Akhwayn Summer School 2015

Horacio Rodríguez

TALP Research Center Dept. Computer Science

Universitat Politècnica de [email protected]

http://www.lsi.upc.edu/~horacio

Horacio Rodríguez

TALP Research Center Dept. Computer Science

Universitat Politècnica de [email protected]

http://www.lsi.upc.edu/~horacio

Page 2: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

OutlineOutline

• Introduction• Graph basics• Main techniques• (Probabilistic) Graphical Models• Aplications

• Introduction• Graph basics• Main techniques• (Probabilistic) Graphical Models• Aplications

Page 3: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Introduction 1Introduction 1

• Using Graphs– Consolidated theory– Efficient Algorithms

• Aplications– Automatic Summarization– IR– WSD– Entities clustering– Relevance detection– sentiment classification

• Using Graphs– Consolidated theory– Efficient Algorithms

• Aplications– Automatic Summarization– IR– WSD– Entities clustering– Relevance detection– sentiment classification

Page 4: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Introduction 2Introduction 2

• What to read– Rada Mihalcea

• http://lit.csci.unt.edu/index.php/Graph-based_NLP

– TextGraph workshop HLT-NAACL 2006– TextGraph-2 workshop HLT-NAACL 2007– Thesis

• Toutanova 2005, Murphy 2002, Zhu 2005

– Tutorials• Rada Mihalcea, Dragomir Radev, Smaranda Muresan

• What to read– Rada Mihalcea

• http://lit.csci.unt.edu/index.php/Graph-based_NLP

– TextGraph workshop HLT-NAACL 2006– TextGraph-2 workshop HLT-NAACL 2007– Thesis

• Toutanova 2005, Murphy 2002, Zhu 2005

– Tutorials• Rada Mihalcea, Dragomir Radev, Smaranda Muresan

Page 5: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 1Graphs Basics 1

• Graph– G=<V,E>

• V set of vertices (nodes)• E set of arcs (edges)

– An edge is a pair of vertices

– digraphs• Edges are ordered pairs of vertices

– Source, target

– Undirected graphs– (u,v) E, u, v V, u and v are adjacent– (u,v) E is incident in u and v– Degree of a vertex

• Number of incident edges• For Digraphs

– in-degree, out-degree

• Graph– G=<V,E>

• V set of vertices (nodes)• E set of arcs (edges)

– An edge is a pair of vertices

– digraphs• Edges are ordered pairs of vertices

– Source, target

– Undirected graphs– (u,v) E, u, v V, u and v are adjacent– (u,v) E is incident in u and v– Degree of a vertex

• Number of incident edges• For Digraphs

– in-degree, out-degree

Page 6: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 2Graphs Basics 2

• Paths in graphs– Simple path– cycle

• Connected (sub) graphs• Reduced graphs• Articulation nodes

• Paths in graphs– Simple path– cycle

• Connected (sub) graphs• Reduced graphs• Articulation nodes

Page 7: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 3Graphs Basics 3

• Subgraph• Tree• DAG• Complete Graph

– All nodes are adjacent

• Clique• Bipartite Graph• Hipergraph

• Subgraph• Tree• DAG• Complete Graph

– All nodes are adjacent

• Clique• Bipartite Graph• Hipergraph

Page 8: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 4Graphs Basics 4

• Labeled Graphs– In nodes– In edges

• Weighted Graphs• Excentricity of a node of a graph• Center of a graph

• Labeled Graphs– In nodes– In edges

• Weighted Graphs• Excentricity of a node of a graph• Center of a graph

Page 9: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 5Graphs Basics 5

• Spanning Trees• Minimum Cost Spanning Trees• Representation of graph

– Adjacency matrix– Adjacency lists– pros and cons– Sparse graphs

• Spanning Trees• Minimum Cost Spanning Trees• Representation of graph

– Adjacency matrix– Adjacency lists– pros and cons– Sparse graphs

Page 10: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 6Graphs Basics 6

• Algorithms over graphs– Dijkstra

• For a weighted digraph mínimum cost path from a node to the others

• Path length• Path cost• Dynamic programming• O(n2)

– All-pairs shortest path problem• iterate Dijkstra for all the nodes,• Floyd• O(n3)

– Transitive Clausure • Warshall• O(n3)

• Algorithms over graphs– Dijkstra

• For a weighted digraph mínimum cost path from a node to the others

• Path length• Path cost• Dynamic programming• O(n2)

– All-pairs shortest path problem• iterate Dijkstra for all the nodes,• Floyd• O(n3)

– Transitive Clausure • Warshall• O(n3)

Page 11: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 7Graphs Basics 7

• Algorithms over graphs– Center

• using Floyd

– Graph traversing• Depth First Search (DFS)• Breadth First Search (BFS)

– Path finding• Find a path between two nodes• using DFS or BFS

– Minimum cost Spanning Trees• Prim• Kruskal• O(n2)

• Algorithms over graphs– Center

• using Floyd

– Graph traversing• Depth First Search (DFS)• Breadth First Search (BFS)

– Path finding• Find a path between two nodes• using DFS or BFS

– Minimum cost Spanning Trees• Prim• Kruskal• O(n2)

Page 12: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 8Graphs Basics 8

• Algorithms over graphs– Graph matching

• For a graph G = (V,E), a matching is un subset of E not containing two arcs incident to the same node

• maximal matching problem• complete matching problem• Interesting for bipartit1 graphs

• Algorithms over graphs– Graph matching

• For a graph G = (V,E), a matching is un subset of E not containing two arcs incident to the same node

• maximal matching problem• complete matching problem• Interesting for bipartit1 graphs

Page 13: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 9Graphs Basics 9

– Spectral theorem• Conditions for a matrix to be diagonalizable• Spectral Decomposition of A

– Diagonalization • if A is normal (and thus hermitic and thus if real simetric)

then a decomposition exists:– A = U U*

is diagonal, being its entries the eigenvalues of A– U is unitary being its columns the eigenvectors of A

– Spectral theorem• Conditions for a matrix to be diagonalizable• Spectral Decomposition of A

– Diagonalization • if A is normal (and thus hermitic and thus if real simetric)

then a decomposition exists:– A = U U*

is diagonal, being its entries the eigenvalues of A– U is unitary being its columns the eigenvectors of A

Page 14: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Graphs Basics 10Graphs Basics 10

– Matric decompositions• SVD

– Aplication to dimensionality reduction• Principal Components Analysis

– Matric decompositions• SVD

– Aplication to dimensionality reduction• Principal Components Analysis

Page 15: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Specific techniques 1Specific techniques 1

• Graph-based semi-supervised learning– Zhu, 2005– Goldberg, Zhu, 2006

– L instances labeled x1, ..., xl

– U unlabeled xl+1, ..., xl+u

– C classes = {c1, ..., cC}

– f(x) rating of x, f: x R– classification: mapping f(x) to the nearest discrete rating

in C

• Graph-based semi-supervised learning– Zhu, 2005– Goldberg, Zhu, 2006

– L instances labeled x1, ..., xl

– U unlabeled xl+1, ..., xl+u

– C classes = {c1, ..., cC}

– f(x) rating of x, f: x R– classification: mapping f(x) to the nearest discrete rating

in C

+ - + -

Page 16: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Specific techniques 2Specific techniques 2

• Graph-based semi-supervised learning– f(x) should be smooth with respect to the graph

– (un)smootheness of an edge (xi,xj) = w((f(xi) - f(xj))2)

– summing for all edges L(f) (un)smootheness of the graph– L(f) energy or loss– El problema de optimización es la f que minimice L(f)

• Graph-based semi-supervised learning– f(x) should be smooth with respect to the graph

– (un)smootheness of an edge (xi,xj) = w((f(xi) - f(xj))2)

– summing for all edges L(f) (un)smootheness of the graph– L(f) energy or loss– El problema de optimización es la f que minimice L(f)

Page 17: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Specific techniques 3

• Random Walk Algorithms– Decide the importance of a vertex within a graph– A link between two vertices = a vote

• Iterative voting => Ranking over all vertices

– Model a random walk on the graph• A walker takes random steps• Converges to a stationary distribution of probabilities• Guaranteed to converge if the graph is

– Aperiodic – holds for non-bipartite graphs – Irreducible – holds for strongly connected graphs

– Examples• PageRank (Brin, Page, 1998)• HITS (Kleinberg, 1999)

Page 18: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Specific techniques 4

• HITS (Hyperlinked Induced Topic Search)– Keinberg, 1999– Defines 2 types of importance:

•Authority– Can be thought of as being an expert on the subject– “You are important if important hubs point to you”

•Hub– Can be thought of as being knowledgeable about the subject– “You are important if important authorities point to you”

– Hubs and Authorities mutually reinforce each other– random walk where:

•Odd steps you walk from hubs to authorities (follow outlink)•Even steps you walk from authorities to hubs (follow inlink)

Page 19: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Specific techniques 5

• PageRank– Brin, Page, 1998– Original Google algorithm – Random walks over the Web following hyperlinks– some probability of jumping randomly out of the page– The fraction of time spent in each page is the PageRank– PageRank forms a probability distribution over the Web– PageRank is the principal eigenvector of the normalized

link matrix of the Web

Page 20: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Specific techniques 6

• Max Flow networks– G=(V,E,s,t,u)– s V, source– t V, sink– u(e) capacity of e

s

2

3

4

5

6

7

t

15

5

30

15

10

8

15

9

6 10

10

10 15 4

4

Page 21: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

(Probabilistic) Graphical Models 1(Probabilistic) Graphical Models 1

• Graphs whose nodes represent random variables and the absence of an edge conditional independence.

• GM– undirected (Markov Networks, MN, Marlov Random Fields,

MRF)– directed (Bayesian Networks, BN).

• Survey en la tesis de K.P. Murphy, 2002• Daphne Koller book• Daphne Koller Coursera

• Graphs whose nodes represent random variables and the absence of an edge conditional independence.

• GM– undirected (Markov Networks, MN, Marlov Random Fields,

MRF)– directed (Bayesian Networks, BN).

• Survey en la tesis de K.P. Murphy, 2002• Daphne Koller book• Daphne Koller Coursera

Page 22: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

(Probabilistic) Graphical Models 2(Probabilistic) Graphical Models 2

• Undirected GM– global Markov property

• XA XB | Xc

• A, B, C sets of nodes• A is conditionally independent of B given C• all paths between nodes in A and nodes in B are blocked by

some node in C

– local Markov property• Xi is independent of all other nodes in the graph given its

neighbours

• Undirected GM– global Markov property

• XA XB | Xc

• A, B, C sets of nodes• A is conditionally independent of B given C• all paths between nodes in A and nodes in B are blocked by

some node in C

– local Markov property• Xi is independent of all other nodes in the graph given its

neighbours

Page 23: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

(Probabilistic) Graphical Models 3(Probabilistic) Graphical Models 3

• Joint distribution of a MRF– C is the set of maximal cliques C(XC) potential function

– Z normalization factor

• Joint distribution of a MRF– C is the set of maximal cliques C(XC) potential function

– Z normalization factor

Page 24: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

(Probabilistic) Graphical Models 4(Probabilistic) Graphical Models 4

• Example• Example

Page 25: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

(Probabilistic) Graphical Models 5(Probabilistic) Graphical Models 5

• Directed GM (BN)– Not cycles (DAG)– edge (A,B) means that A causes B– X is conditionally independent of its

non descendents given its paprents– X is conditionally independent of all

other nodes given its neighbours

• Directed GM (BN)– Not cycles (DAG)– edge (A,B) means that A causes B– X is conditionally independent of its

non descendents given its paprents– X is conditionally independent of all

other nodes given its neighbours

Page 26: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

(Probabilistic) Graphical Models 6(Probabilistic) Graphical Models 6

• Factor Graphs– Unify directed and undirected GM– Bipartite

• Factors• variables

– A MRF or a BN can be converted to a FG and the reverse

• Factor Graphs– Unify directed and undirected GM– Bipartite

• Factors• variables

– A MRF or a BN can be converted to a FG and the reverse

Page 27: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 1Applications 1

• Applications– Summarization– Name disambiguation– IE– Novelty detection– Word dependency distributions– Textual Entailment– QA– Sentiment categorization– WSD– Substructure learning

• Applications– Summarization– Name disambiguation– IE– Novelty detection– Word dependency distributions– Textual Entailment– QA– Sentiment categorization– WSD– Substructure learning

Page 28: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 2Applications 2

• Summarization Approaches– Salton et al, 1999– LexRank (Radev, Erkan, 2004)– TextRank (Mihalcea, Tarau, 2004)– Zha, 2002

• Summarization Approaches– Salton et al, 1999– LexRank (Radev, Erkan, 2004)– TextRank (Mihalcea, Tarau, 2004)– Zha, 2002

Page 29: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 3Applications 3

• Name disambiguation• Aplication to Email

– Minkov, Cohen, Ng, 2006– Representing a mail with graphs– Using lazy graph walk as similarity measure (similar to

PageRank)

• Name disambiguation• Aplication to Email

– Minkov, Cohen, Ng, 2006– Representing a mail with graphs– Using lazy graph walk as similarity measure (similar to

PageRank)

Page 30: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 4Applications 4

• Semi-Supervised Approach for IE– Hassan et al, 2006– Representing in the graph bot labeled and unlabeled

examples.– Pattern induction– Reinforcement of weights– Based on HITS

• Semi-Supervised Approach for IE– Hassan et al, 2006– Representing in the graph bot labeled and unlabeled

examples.– Pattern induction– Reinforcement of weights– Based on HITS

Page 31: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 5Applications 5

• IE– “American vice President Al Gore visited ...”.– Template

• COUNTRY NOUN_PHRASE PERSON VERB_PHRASE– Pattern

• COUNTRY(E2) NOUN_PHRASE(R) PERSON(E1) VERB_PHRASE– Tuple

• application of a pattern to a text– entity 1: Al Gore– entity 2: United States– relation: vice President

• IE– “American vice President Al Gore visited ...”.– Template

• COUNTRY NOUN_PHRASE PERSON VERB_PHRASE– Pattern

• COUNTRY(E2) NOUN_PHRASE(R) PERSON(E1) VERB_PHRASE– Tuple

• application of a pattern to a text– entity 1: Al Gore– entity 2: United States– relation: vice President

Page 32: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 6Applications 6

• Novelty Detection• Gamon, 2006• TREC 2004 novelty track

– Given the relevant sentences in the complete document set (for a given topic), identify all novel sentences

• supervised classification task• Baseline

– BOW with KL divergence• d document• R document set

• Novelty Detection• Gamon, 2006• TREC 2004 novelty track

– Given the relevant sentences in the complete document set (for a given topic), identify all novel sentences

• supervised classification task• Baseline

– BOW with KL divergence• d document• R document set

Page 33: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 7Applications 7

• Distributions of word/word dependencies– ej. POBJ(n|v)

• Toutanova, 2005 (tesis)• PP attachment• Learning a Markov Chain, MC, (random walk)

model– Nodes = words

• Distributions of word/word dependencies– ej. POBJ(n|v)

• Toutanova, 2005 (tesis)• PP attachment• Learning a Markov Chain, MC, (random walk)

model– Nodes = words

Page 34: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 8Applications 8

• RTE• Zanzotto, Moschitti, 2006

– Textual entailment pairs as pairs of syntactic trees with co-indexed nodes

– consider both the structural similarity between syntactic tree pairs and the similarity between relations among sentences within a pair

– similarities• cross-pair

– K((T',H'), (T'',H''))– structural and lexical similarity between T', T'' and H',

H''– intra-pair word movement compatibility between

(T',H') and (T'',H'')• intra-pair

– novel kernel function

• RTE• Zanzotto, Moschitti, 2006

– Textual entailment pairs as pairs of syntactic trees with co-indexed nodes

– consider both the structural similarity between syntactic tree pairs and the similarity between relations among sentences within a pair

– similarities• cross-pair

– K((T',H'), (T'',H''))– structural and lexical similarity between T', T'' and H',

H''– intra-pair word movement compatibility between

(T',H') and (T'',H'')• intra-pair

– novel kernel function

Page 35: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 9Applications 9

• QA• Diego Mollá, 2006

– graph-based representation of question and text sentences

– QA rule• pattern matching Q• pattern matching A sentence• pointer to A in A sentence• rules define an extension graph

– pattern• graph with (some) nodes assigned to variables

• QA• Diego Mollá, 2006

– graph-based representation of question and text sentences

– QA rule• pattern matching Q• pattern matching A sentence• pointer to A in A sentence• rules define an extension graph

– pattern• graph with (some) nodes assigned to variables

Page 36: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 10

• Substructure Learning– Leskovec at al, 2005 – Semantic Graphs for representing documents– SVM for learning sub-structures– Approach

• Linguistic Analysis– NLPWin– Portrait– Logical Form Triples (Subj, Pred, Obj)

• Semantic Graphs building– 3 sets of linguistic features

» (ANV) adjectives, nouns, verbs» (NPV) head nouns and verbs» (LF) heads of LFT

• Two different graph representations for each of these three sets.

– NamedLink– Nodes

Page 37: Graph-based Approaches to NLP Al Akhwayn Summer School 2015 Horacio Rodríguez TALP Research Center Dept. Computer Science Universitat Politècnica de Catalunya

Applications 11