natural language processing with neo4j

36
Natural Language Processing with Neo4j Kenny Bastani @kennybastani

Upload: kenny-bastani

Post on 26-Jan-2015

123 views

Category:

Technology


1 download

DESCRIPTION

Recent natural language processing advancements have propelled search engine and information retrieval innovations into the public spotlight. People want to be able to interact with their devices in a natural way. In this talk I will be introducing you to natural language search using a Neo4j graph database. I will show you how to interact with an abstract graph data structure using natural language and how this approach is key to future innovations in the way we interact with our devices.

TRANSCRIPT

Page 1: Natural Language Processing with Neo4j

Natural Language Processing with Neo4j

Kenny Bastani@kennybastani

Page 2: Natural Language Processing with Neo4j

This is a hobby of mine

I’m passionate about it

It’s always a work in progress

I do it for fun

Page 3: Natural Language Processing with Neo4j

Machine Learning Focuses

• Text mining

• Natural Language Processing

• Automatic summarization

• Graph databases

• Commitment to unsupervised learning.

Page 4: Natural Language Processing with Neo4j

Why NLP and Graphs?

Page 5: Natural Language Processing with Neo4j

I wanted a better way to learn with less effort

I wanted something a little more zippy.

I’m mostly self-taught, so I wanted something that made self-learning easier for others.

Page 6: Natural Language Processing with Neo4j

The Idea

Contain

Fou

nd

in

Found in

Sentences

PhrasesArticles

Page 7: Natural Language Processing with Neo4j

Importance of NLP

• I’m inspired by the idea of machines learning from experience.

• NLP is important for finding valuable information in noisy unstructured text.

• I’m a Developer Evangelist for Neo4j, so I’m kind of a fan of graph databases.

Page 8: Natural Language Processing with Neo4j

Algorithms can learnAs long as it can store information and retrieve it in enough

time for it to be of any use.

Page 9: Natural Language Processing with Neo4j

Learning requires storage

To learn, storage is required.

For NLP, storage is sometimes a second class citizen.

Much focus is on the algorithm first, then storage second.

But really, it’s storage and retrieval of big data that is the problem.

Page 10: Natural Language Processing with Neo4j

Machine learning

Machine learning isn’t magic or hard to understand. It’s real stuff.

We know how to do it.

It’s easily articulated.

ML algorithms solve big computational problems today.

It’s based on the idea of machines learning from prior experiences as data.

Page 11: Natural Language Processing with Neo4j

Formulate a Hypothesis

When you analyze data, the outcome is usually a hypothesis.

An hypothesis is a conclusion based on limited data.

There are always more pieces needed to solve the puzzle.

Page 12: Natural Language Processing with Neo4j

Build on Past Experience

By experience, I mean DATA.

Machine Learning techniques are entirely based on collection and analysis of recorded data.

So storage is really important if you want to do machine learning successfully.

You cannot play baseball without your brain. Don’t try it.

Page 13: Natural Language Processing with Neo4j

The Problem with AI

The problem with AI is that it seems like magic.

Some people say strong AI is possible.

There are some people that deny that it is possible.

It is a central theme in many fictional fantasy films and book genres.

It’s in Greek mythology.

Page 14: Natural Language Processing with Neo4j

Is AI Misunderstood?

Researchers admit to not fully understanding how intelligence works in the human brain.

We generally understand how it works, but no consensus on how to recreate it in machines.

AI is really just the act of perceiving an environment and maximizing chances of success.

Page 15: Natural Language Processing with Neo4j

You get the point.

• Now why is a Graph Database useful for unsupervised machine learning?

• Let’s consider the problem I stated earlier.

• I wanted to build a better way to summarize and learn from Wikipedia’s combined knowledge.

Page 16: Natural Language Processing with Neo4j

Unsupervised Learning on Wikipedia

Contain

Fou

nd

in

Found in

Sentences

PhrasesArticles

Page 17: Natural Language Processing with Neo4j

How do you learn about learning?

I started by observing myself learning from reading Wikipedia articles.

I searched for an interesting term on Google.

I read through the article’s text word by word.

Page 18: Natural Language Processing with Neo4j

The Learning Algorithm

As I read the article’s text, I would sometimes come across a phrase or term I had not seen before.

Before continuing reading I would open up a new tab and search for the unrecognized phrase.

It was a well defined recursive algorithm.

I would drill down n-times on unrecognized article terms until returning to the original article text.

Page 19: Natural Language Processing with Neo4j

A Self-Learning Algorithm

In the computer’s world, this process would result in an ontology of labeled data.

Which looks a lot like a graph.

But how would I store the results?

If only there were a database for that..

Page 20: Natural Language Processing with Neo4j

Neo4j is a graph database…and graphs are everywhere!

Page 21: Natural Language Processing with Neo4j

Simple Clustering Model

Contains

Fou

nd

in

Found in

Sentence

PhraseArticle

Page 22: Natural Language Processing with Neo4j
Page 23: Natural Language Processing with Neo4j
Page 24: Natural Language Processing with Neo4j

Summarizing Article Text

Page 25: Natural Language Processing with Neo4j

What about the NLP stuff?This is how I did it.

Page 26: Natural Language Processing with Neo4j

The seed article

You start with a seed article which is the first article text to start the learning algorithm with.

Page 27: Natural Language Processing with Neo4j

Fetch text from Wikipedia

Get the unstructured text and meta data from Wikipedia.

Page 28: Natural Language Processing with Neo4j

Sliding text window

I formulated dynamic RegEx templates and treated them as a hypothesis.

The RegEx template would slide word by word through the text, searching for unrecognized phrases

(n known word matches + 1 wildcard word match)

Page 29: Natural Language Processing with Neo4j

Looking for redundant phrases

As each unrecognized phrase is encountered, the dynamic RegEx is then matched against the entire article’s text.

The algorithm looks for more than 2 identical phrases within the article’s text.

It appends a 3rd wildcard word match to the template and then rescans the text for redundant phrases until none are found.

Page 30: Natural Language Processing with Neo4j

Identify Redundancy of Text

This recursive matching process within the local article’s text resulted in finding the duplicate phrases of a variable length.

“The King of Sweden” has 2 appearances in an article, so that must be important to the topic of Sweden.

Better go search for an article stub on “The King of Sweden”

Page 31: Natural Language Processing with Neo4j

Graph Storage and Retrieval

Every time a phrase that doesn’t exist as a node in Neo4j is encountered, it becomes a target of investigation, kind of like a hypothesis.

Each sentence that contains the extracted phrase is also added to Neo4j as a content node.

Relationships are added between nodes, showing semantic relationship.

Page 32: Natural Language Processing with Neo4j

Phrase inheritance

Phrases can be found within other phrases, denoting a grammatical inheritance hierarchy mapped to a variety of content nodes and articles.

Page 33: Natural Language Processing with Neo4j

Phrase Inheritance Graph Data Model

Phrase“X Y Z”

Phrase“X Y”

Found in

Found in

Found in Found

in

Fou

nd

in

Phrase“X ”

Fou

nd

in

Contains Article

Sentence“X Y Z.”

Sentence“X MEN.”

Article

Contains

Page 34: Natural Language Processing with Neo4j
Page 35: Natural Language Processing with Neo4j

Questions?

Graphs are everywhere.

Page 36: Natural Language Processing with Neo4j

Thanks for coming to my talk!

Please look me up on Twitter and LinkedIn!

Twitter: http://www.twitter.com/kennybastani

LinkedIn: http://www.linkedin.com/in/kennybastani