natural language processing with graph databases and neo4j

71
Natural Language Processing With Graph Databases DataDay Texas January 2016 William Lyon @lyonwj

Upload: william-lyon

Post on 08-Jan-2017

2.166 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Natural Language Processing with Graph Databases and Neo4j

Natural Language Processing With Graph DatabasesDataDay TexasJanuary 2016

William Lyon@lyonwj

Page 2: Natural Language Processing with Graph Databases and Neo4j

About

Software Developer @[email protected]

@lyonwjlyonwj.com

William Lyon

Page 3: Natural Language Processing with Graph Databases and Neo4j

Agenda

• Brief intro to graph databases / Neo4j• Representing text as a graph• NLP tasks• Mining word associations• Graph based summarization and keyword

extraction• Content recommendation

Page 4: Natural Language Processing with Graph Databases and Neo4j

Agenda

• Brief intro to graph databases / Neo4j• Representing text as a graph• NLP tasks• Mining word associations• Graph based summarization and keyword

extraction• Content recommendation Survey of NLP

methods with graphs

Page 5: Natural Language Processing with Graph Databases and Neo4j

Intro to Graph Databases / Neo4j

Page 6: Natural Language Processing with Graph Databases and Neo4j

Charts

Page 7: Natural Language Processing with Graph Databases and Neo4j

Charts Graphs

Page 8: Natural Language Processing with Graph Databases and Neo4j

Neo4j

Graph Database

• Property graph data model• Nodes and relationships

• Native graph processing• Cypher query language

Page 9: Natural Language Processing with Graph Databases and Neo4j

The Whiteboard Model Is the Physical Model

Page 10: Natural Language Processing with Graph Databases and Neo4j

Relational Versus Graph Models

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person FriendPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Page 11: Natural Language Processing with Graph Databases and Neo4j

Property Graph Model Components

Nodes • The objects in the graph • Can have name-value properties • Can be labeled

Relationships • Relate nodes by type and

direction • Can have name-value properties

CAR

DRIVES

name: “Dan” born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo” model: “V70”

LOVES

LOVES

LIVES WITH

OWNS

PERSON PERSON

Page 12: Natural Language Processing with Graph Databases and Neo4j

Cypher: Graph Query Language

CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )

LOVES

Dan Ann

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Page 13: Natural Language Processing with Graph Databases and Neo4j

“So what does this have to do with NLP?”

“Am I in the wrong talk?”

“I thought this was going to be about text processing….”

Page 14: Natural Language Processing with Graph Databases and Neo4j

Natural Language Processing With Graphs

Page 15: Natural Language Processing with Graph Databases and Neo4j

Natural Language Processing With Graphs

Uncovering meaning from text using a graph data model.

Page 16: Natural Language Processing with Graph Databases and Neo4j

Representing Text As A Graph

“Nearly all text processing starts by transforming text into vectors.”

- Matt Biddulph www.hackdiary.com

Page 17: Natural Language Processing with Graph Databases and Neo4j

Representing text as a graph

Text Adjacency Graph

Page 18: Natural Language Processing with Graph Databases and Neo4j

Representing text as a graph

Text Adjacency Graph

Page 19: Natural Language Processing with Graph Databases and Neo4j

My cat eats fish on Saturday.

Page 20: Natural Language Processing with Graph Databases and Neo4j
Page 21: Natural Language Processing with Graph Databases and Neo4j
Page 22: Natural Language Processing with Graph Databases and Neo4j

Convert to array of words

Page 23: Natural Language Processing with Graph Databases and Neo4j

Iterate with counter variable i,from 0 to number of words - 2

Page 24: Natural Language Processing with Graph Databases and Neo4j

Get or create node forwords at index i and i+1

Page 25: Natural Language Processing with Graph Databases and Neo4j

Create :NEXT relationship

Page 26: Natural Language Processing with Graph Databases and Neo4j

Representing A Text Corpus As A Graph

Page 27: Natural Language Processing with Graph Databases and Neo4j
Page 28: Natural Language Processing with Graph Databases and Neo4j

Add followship frequency

Page 29: Natural Language Processing with Graph Databases and Neo4j

Add word counts

Page 30: Natural Language Processing with Graph Databases and Neo4j

Query Word frequency

Page 31: Natural Language Processing with Graph Databases and Neo4j

Query Word pair frequencies (colocation)

Page 32: Natural Language Processing with Graph Databases and Neo4j

NLP Tasks

Page 33: Natural Language Processing with Graph Databases and Neo4j

Mining Word Associations

Page 34: Natural Language Processing with Graph Databases and Neo4j

Word Associations

• Paradigmatic• words that can be substituted• “Monday” <—> “Thursday”• “cat” <—> “dog”

• Syntagmatic• words that can be combined with each other• “cold”, “weather”• colocations

Page 35: Natural Language Processing with Graph Databases and Neo4j

Computing Paradigmatic Similarity

1. Represent each word by its context2. Compute context similarity3. Words with high context similarity likely have

paradigmatic relation

Page 36: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity1. Represent each word by its context

Page 37: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity1. Represent each word by its context

Page 38: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity1. Represent each word by its context

Left1 Right1

Page 39: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity2. Compute context similarity

Page 40: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity2. Compute context similarity

Page 41: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity2. Compute context similarity

www.lyonwj.com/2015/06/16/nlp-with-neo4j/

Page 42: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity3. Find words with high context similarity

http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus

Page 43: Natural Language Processing with Graph Databases and Neo4j

Paradigmatic Similarity

Example

http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/

https://github.com/johnymontana/nlp-graph-notebooks

https://class.coursera.org/textanalytics-001

Page 44: Natural Language Processing with Graph Databases and Neo4j

Graph Based Summarization and Keyword Extraction

Page 45: Natural Language Processing with Graph Databases and Neo4j
Page 46: Natural Language Processing with Graph Databases and Neo4j

image credit: https://en.wikipedia.org/wiki/PageRank

https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf

https://github.com/summanlp/textrank

Keyword Extraction

Page 47: Natural Language Processing with Graph Databases and Neo4j

SummarizationOpinion mining

Page 48: Natural Language Processing with Graph Databases and Neo4j

• Opinion mining• Summarize major opinions• Concise and readable• Major complaints /

compliments

Page 49: Natural Language Processing with Graph Databases and Neo4j

http://kavita-ganesan.com/opinosis

1.Graph based representation of review corpus

2.Find and score candidate summaries

3.Select top scoring candidates as summary

Page 50: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Example

• Best Buy API• Product reviews by SKU

Page 51: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Example

Page 52: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Example

Page 53: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Example

1.Graph based representation of review corpus

2.Find and score candidate summaries

3.Select top scoring candidates as summary

Page 54: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Example

Find highest ranked paths of 2-5 words

Page 55: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Demo

“Easy to read in sunlight”

“Comfortable great sound quality”

“I love this washer”

Page 56: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Demo

“Bought this smart TV for the price”

“Easy to use this vacuum”

Page 57: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining - Demo

• iPython notebook

https://github.com/johnymontana/nlp-graph-notebooks

Page 58: Natural Language Processing with Graph Databases and Neo4j

Content Recommendation

Page 59: Natural Language Processing with Graph Databases and Neo4j

Content recommendation

“Networks give structure to the conversation while content mining gives meaning.”

http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/

- Preriit Souda

Page 60: Natural Language Processing with Graph Databases and Neo4j

Using Data Relationships for Recommendations

Content-based filtering Recommend items based on what users have liked in the past

Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others

Movie

Person

Person

RATED

SIMILARITY

rating: 7

value: .92

Page 61: Natural Language Processing with Graph Databases and Neo4j

Using Data Relationships for Recommendations

Content-based filtering Recommend items based on what users have liked in the past

Movie

Person

Person

RATED

SIMILARITY

rating: 7

value: .92

Page 62: Natural Language Processing with Graph Databases and Neo4j

The article graph - data model

Page 63: Natural Language Processing with Graph Databases and Neo4j

Building the article graph• Articles users have shared• Extract keywords using newspaper3k

python library• Insert in the graph• Scrape additional articles

https://github.com/johnymontana/nlp-graph-notebooks

Page 64: Natural Language Processing with Graph Databases and Neo4j

The article graph - example

Page 65: Natural Language Processing with Graph Databases and Neo4j

What are the keywords of the articles I liked?

Page 66: Natural Language Processing with Graph Databases and Neo4j
Page 67: Natural Language Processing with Graph Databases and Neo4j

Summary

• Property graph model• Represent text as a graph• Word associations• Opinion mining• Content recommendation

Page 68: Natural Language Processing with Graph Databases and Neo4j

Resources

Page 69: Natural Language Processing with Graph Databases and Neo4j

graphdatabases.com

Page 70: Natural Language Processing with Graph Databases and Neo4j

Resources

• http://kavita-ganesan.com/opinosis • http://jexp.de/blog/2015/01/natural-language-

analytics-made-simple-and-visual-with-neo4j/ • https://github.com/johnymontana/nlp-graph-notebooks

Page 71: Natural Language Processing with Graph Databases and Neo4j

Opinion Mining

• “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions”

• - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University of Illinois at Urbana-Champaign

• Multi-sentence compression: Finding shortest paths in word graphs

• - Proceedings of the 23rd International Conference on Computational Linguistics. COLING 10. Beijing, Cina Aug23-27, 2010. Katy Fillipova