connect and search your data

58
Connect and Search your Data Brendon Page

Upload: brendonpage

Post on 11-Aug-2015

24 views

Category:

Software


0 download

TRANSCRIPT

Connect and Search your Data

Brendon Page

Me

twitter @brendonpaginate

blog http://geekswithblogs.com/brendonpage

Software Developer @ Chillisoft

Questions?

Put your hand upanytime

Story

Current project hasuser requirements

“Google-like search”“Mine our data”

2

Expectations Search Connectedness

Story

Expectations

What expectations you ask?

2004

billion Internet UsersFlickr, Gmail

0,9

2005

billion Internet UsersReddit, Pandora, YouTube, Google

Earth

1,02

2006

Twitter, Khan Academy,

Facebook

1,1 billion Internet Users

2007

Street View, Kindle, Tumblr,

SoundCloud

1,3 billion Internet Users

2008

GitHub, Spotify, Dropbox

1,5 billion Internet Users

2009

Google Docs, Wolfram Alpha,

Kickstarter, WhatsApp

1,7 billion Internet Users

2010

Instagram, Pintrest

2 billion Internet Users

2011

Google+

2,2 billion Internet Users

2012

Coursera

2,5 billion Internet Users

2013

Mega2,7 billion Internet Users

2014

????

2,9 billion Internet Users

Last Decade

billion usersto

billion users

0,9

2,9

So What?

Situation

billion users don’t know what IRC is

billion users know what IRC is

2

0.9

High

Expectations

NormalExpectations

Situation

Flickr, Gmail, Reddit, Pandora, Youtube, Google Earth, Twitter, Khan Academy, Facebook, Street View, Kindle, Tumblr, SoundCloud, GitHub, Spotify, Dropbox, Google Docs, Wolfram Alpha, Kickstarter, WhatsApp, Bitcoin, Instagram, Pintrest, Google+, Coursera, Mega

Most internet users know an internet with

Our Problem

We have to write software for those

2 billion users

Googleisbig

We aresmall

Expectations

Good look

and feel Search

Prediction

Speed

Search

How do we do “Google-like search”?

SearchWhen you think search

GoogleYahoo

AltaVista

Bing

SearchBut consider Facebook

SearchAnd Twitter

SearchAnd IMDB

SearchAnd Wikipedia

SearchAnd Amazon

TheGuardian

SearchAnd

SoundCloud

Foursquare

Stack

Exchange YouTube

… and many more

Kalahari Github

StumbleUpon

Search

Is one of the main points of interactivity

Search

Good search is everywhere… and it’s setting your user’s expectations

Search

But what makes search good?

Fuzziness &

Synonyms

“batmn” -> “batman”

“i-Pod” -> “iPod”

Did you mean?

“example did you

meen” ->

“example did you

mean”

Ranking

Speed

“About 1 780 000 000 results (0.37 seconds)” 

Auto

Complete

Advanced Query“crack -ass”

Search

But all of that is hardsupported by Elasticsearch

Elasticsearch?

Your very own, distributed, highly available,

search engine

F.O.S.S

Elasticsearch

Indexing demo

Indexing{ “Title”: “Batman Returns”, “Year”: “2005”}

Elasticsearch

Tokenise

Analyse

Store Documentwith id (1)

Update Index

returns: [1]return: [1]batman: [1]2005: [1]

REST API

“Batman”“Returns”

“2005”

“returns”“batman”

“2005”“return”

Indexing{ “Title”: “Batman”, “Year”: “2007”}

Elasticsearch

Tokenise

Analyse

Store Documentwith id (1)

Update Index

returns: [1]return: [1]batman: [1,2]2005: [1] 2007: [2]

REST API

“Batman”“2007”

“batman”“2007”

Elasticsearch

Let’s check our demo

Elasticsearch

Search demo

Search

Term frequency – inverse document frequency

term frequency: score for each matching word in the documentinverse document frequency: word weight is higher if uncommon across documents

“batman returns”

returns: [1]return: [1]batman: [1, 2]2005: [1] 2007: [1]

(id 1){ “Title”: “Batman Returns”, “Year”: “2005”}

1

(id 2){ “Title”: “Batman”, “Year”: “2007”}

2

??

Search Summary

Fuzziness &

Synonyms

“batmn” -> “batman”

“i-Pod” -> “iPod”

RankingSpeed

“About 1 780 000 000 results (0.37 seconds)” 

Auto

CompleteAdvanced Query

“crack -ass”

With very little effort we’ve gotten

using Elasticsearch

Connectedness

How do we “mine our data”?

Connectedness

When you think connected / linked / networked data

Facebook LinkedinGoogle+

Friend Recommendation

FriendMeFriend

of Friend

I cando that in SQL!

Thisguy

Thisguy

Friend Recommendation

FriendMeFriend

of Friend

Friend

FriendFriend

of Friend

I don’t want todo that in SQL!

Thisguy

Thisguy

Connectedness

Consider recommendations made by…

AmazonYoutube

Pandora

They read

your mind!!!

Book Recommendation

BookMe Genre Book

Other User

I don’t want todo that in SQL!

ToomanyThisguy

Thisguy

Connectedness

But all of that is hardsupported by Neo4j

Neo4j?

Graph database

Neo4j

Friend recommendation demo

Other Uses?

LogisticsPermissions

Fraud Detection

Almost anything you drawas a graph on the white board

Improving Search

“man of steel”

Session

Doc 1“steel

workers are

cool”

“super man”

Doc 2“super man”

Improving Search

“man of steel”

Session

Doc 1“steel

workers are

cool”

“super man”

Doc 2“super man”

Improving Search

“man of steel”

Session

Doc 1“steel

workers are

cool”

“super man”

Doc 2“super man”

{ “Title”: “Super Man”, “Year”: “2013”}

{ “Title”: “Super Man”, “Year”: “2013” “OtherMatches”: [“man of steel”]}

Expectations Search Connectedness

Summary

Thank youQuestions?