natural language processing with graph databases and neo4j

Post on 08-Jan-2017

2.166 Views

Category:

Data & Analytics

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Natural Language Processing With Graph DatabasesDataDay TexasJanuary 2016

William Lyon@lyonwj

About

Software Developer @Neo4jwilliam.lyon@neo4j.com

@lyonwjlyonwj.com

William Lyon

Agenda

• Brief intro to graph databases / Neo4j• Representing text as a graph• NLP tasks• Mining word associations• Graph based summarization and keyword

extraction• Content recommendation

Agenda

• Brief intro to graph databases / Neo4j• Representing text as a graph• NLP tasks• Mining word associations• Graph based summarization and keyword

extraction• Content recommendation Survey of NLP

methods with graphs

Intro to Graph Databases / Neo4j

Charts

Charts Graphs

Neo4j

Graph Database

• Property graph data model• Nodes and relationships

• Native graph processing• Cypher query language

The Whiteboard Model Is the Physical Model

Relational Versus Graph Models

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person FriendPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Property Graph Model Components

Nodes • The objects in the graph • Can have name-value properties • Can be labeled

Relationships • Relate nodes by type and

direction • Can have name-value properties

CAR

DRIVES

name: “Dan” born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo” model: “V70”

LOVES

LOVES

LIVES WITH

OWNS

PERSON PERSON

Cypher: Graph Query Language

CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )

LOVES

Dan Ann

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

“So what does this have to do with NLP?”

“Am I in the wrong talk?”

“I thought this was going to be about text processing….”

Natural Language Processing With Graphs

Natural Language Processing With Graphs

Uncovering meaning from text using a graph data model.

Representing Text As A Graph

“Nearly all text processing starts by transforming text into vectors.”

- Matt Biddulph www.hackdiary.com

Representing text as a graph

Text Adjacency Graph

Representing text as a graph

Text Adjacency Graph

My cat eats fish on Saturday.

Convert to array of words

Iterate with counter variable i,from 0 to number of words - 2

Get or create node forwords at index i and i+1

Create :NEXT relationship

Representing A Text Corpus As A Graph

Add followship frequency

Add word counts

Query Word frequency

Query Word pair frequencies (colocation)

NLP Tasks

Mining Word Associations

Word Associations

• Paradigmatic• words that can be substituted• “Monday” <—> “Thursday”• “cat” <—> “dog”

• Syntagmatic• words that can be combined with each other• “cold”, “weather”• colocations

Computing Paradigmatic Similarity

1. Represent each word by its context2. Compute context similarity3. Words with high context similarity likely have

paradigmatic relation

Paradigmatic Similarity1. Represent each word by its context

Paradigmatic Similarity1. Represent each word by its context

Paradigmatic Similarity1. Represent each word by its context

Left1 Right1

Paradigmatic Similarity2. Compute context similarity

Paradigmatic Similarity2. Compute context similarity

Paradigmatic Similarity2. Compute context similarity

www.lyonwj.com/2015/06/16/nlp-with-neo4j/

Paradigmatic Similarity3. Find words with high context similarity

http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus

Paradigmatic Similarity

Example

http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/

https://github.com/johnymontana/nlp-graph-notebooks

https://class.coursera.org/textanalytics-001

Graph Based Summarization and Keyword Extraction

image credit: https://en.wikipedia.org/wiki/PageRank

https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf

https://github.com/summanlp/textrank

Keyword Extraction

SummarizationOpinion mining

• Opinion mining• Summarize major opinions• Concise and readable• Major complaints /

compliments

http://kavita-ganesan.com/opinosis

1.Graph based representation of review corpus

2.Find and score candidate summaries

3.Select top scoring candidates as summary

Opinion Mining - Example

• Best Buy API• Product reviews by SKU

Opinion Mining - Example

Opinion Mining - Example

Opinion Mining - Example

1.Graph based representation of review corpus

2.Find and score candidate summaries

3.Select top scoring candidates as summary

Opinion Mining - Example

Find highest ranked paths of 2-5 words

Opinion Mining - Demo

“Easy to read in sunlight”

“Comfortable great sound quality”

“I love this washer”

Opinion Mining - Demo

“Bought this smart TV for the price”

“Easy to use this vacuum”

Opinion Mining - Demo

• iPython notebook

https://github.com/johnymontana/nlp-graph-notebooks

Content Recommendation

Content recommendation

“Networks give structure to the conversation while content mining gives meaning.”

http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/

- Preriit Souda

Using Data Relationships for Recommendations

Content-based filtering Recommend items based on what users have liked in the past

Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others

Movie

Person

Person

RATED

SIMILARITY

rating: 7

value: .92

Using Data Relationships for Recommendations

Content-based filtering Recommend items based on what users have liked in the past

Movie

Person

Person

RATED

SIMILARITY

rating: 7

value: .92

The article graph - data model

Building the article graph• Articles users have shared• Extract keywords using newspaper3k

python library• Insert in the graph• Scrape additional articles

https://github.com/johnymontana/nlp-graph-notebooks

The article graph - example

What are the keywords of the articles I liked?

Summary

• Property graph model• Represent text as a graph• Word associations• Opinion mining• Content recommendation

Resources

graphdatabases.com

Resources

• http://kavita-ganesan.com/opinosis • http://jexp.de/blog/2015/01/natural-language-

analytics-made-simple-and-visual-with-neo4j/ • https://github.com/johnymontana/nlp-graph-notebooks

Opinion Mining

• “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions”

• - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University of Illinois at Urbana-Champaign

• Multi-sentence compression: Finding shortest paths in word graphs

• - Proceedings of the 23rd International Conference on Computational Linguistics. COLING 10. Beijing, Cina Aug23-27, 2010. Katy Fillipova

top related