bootstrapping recommendations with neo4j

61
Bootstrapping Recommendations with Neo4j Big Data TechCon

Upload: max-de-marzi

Post on 08-Jan-2017

3.993 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Bootstrapping Recommendations with Neo4j

Bootstrapping  Recommendations with  Neo4j

Big  Data  TechCon

Page 2: Bootstrapping Recommendations with Neo4j

About  Me

• Max  De  Marzi  -­‐  Neo4j  Field  Engineer    

• My  Blog:  http://maxdemarzi.com  • Find  me  on  Twitter:  @maxdemarzi  • Email  me:  [email protected]  • GitHub:  http://github.com/maxdemarzi

Page 3: Bootstrapping Recommendations with Neo4j

Big  Data  -­‐  What  is  it  good  for?

• Absolutely  Nothing!

• Benchmarks Is  this  performing  better  then  that?  Yes,  why?  Uh.  • Recommendations You  should  buy  this  right  now.  • Predictions You  will  probably  buy  this.

Page 4: Bootstrapping Recommendations with Neo4j

Top  10  Recommendations

• PopularityThe  naive  approach One  size  fits  most

Page 5: Bootstrapping Recommendations with Neo4j

Naive  Approach

I’m  getting  little  Timmy  some  “Cards  Against  Humanity”  

Page 6: Bootstrapping Recommendations with Neo4j

Content  Based  Recommendations

• Step  1:  Collect  Item  Characteristics  • Step  2:  Find  similar  Items  • Step  3:  Recommend  Similar  Items  

• Example:  Similar  Movie  Genres

Page 7: Bootstrapping Recommendations with Neo4j

There  is  more  to  life  than  Romantic  Zombie-­‐coms

Page 8: Bootstrapping Recommendations with Neo4j

Collaborative  Filtering  Recommendations

• Step  1:  Collect  User  Behavior  • Step  2:  Find  similar  Users  • Step  3:  Recommend  Behavior  taken  by  similar  users  

• Example:  People  with  similar  musical  tastes

Page 9: Bootstrapping Recommendations with Neo4j

You  are  so  original!

Page 10: Bootstrapping Recommendations with Neo4j

Using  Relationships  for  Recommendations

Content-­‐based  filtering  Recommend  items  based  on  what  users  have  liked  in  the  past  

Collaborative  filtering    Predict  what  users  like  based  on  the  similarity  of  their  behaviors,  activities  and  preferences  to  others  

Movie

Person

Person

RATED

SIMILARITY

rating:  7

value:  .92

Page 11: Bootstrapping Recommendations with Neo4j

Hybrid  Recommendations

• Combine  the  two  for  better  results  

• Like  Peanut  Butter  and  Jelly

Page 12: Bootstrapping Recommendations with Neo4j

Benefits  of  Real-­‐Time  Recommendations

Online  Retail  • Suggest  related  products  and  services  • Increase  revenue  and  engagement  

Media  and  Broadcasting  • Create  an  engaging  experience  • Produce  personalized  content  and  offers  

Logistics  • Recommend  optimal  routes  • Increase  network  efficiency

Page 13: Bootstrapping Recommendations with Neo4j

Challenges  for  Real-­‐Time  Recommendations

Make  effective  real-­‐time  recommendations  • Timing  is  everything  in  point-­‐of-­‐touch  applications  • Base  recommendations  on  current  data,  not  last  night’s  batch  load  

Process  large  amounts  of  data  and  relationships  for  context  • Relevance  is  king:  Make  the  right  connections  • Drive  traffic:  Get  users  to  do  more  with  your  application  

Accommodate  new  data  and  relationships  continuously  • Systems  get  richer  with  new  data  and  relationships  • Recommendations  become  more  relevant

Page 14: Bootstrapping Recommendations with Neo4j

Relational  vs.  Graph  Models

Relational  Model Graph  Model

RATED

RATED

RATED

MAX

Person MovieRatings

MAXTerminator

Toy  Story

Titanic

Page 15: Bootstrapping Recommendations with Neo4j

Cypher  Query  Language

MATCH  (:Person  {  name:“Dan”}  )  -­‐[:KNOWS]-­‐>  (:Person  {  name:“Ann”}  )  

KNOWS

Dan Ann

Label Property Label Property

Node Node

Page 16: Bootstrapping Recommendations with Neo4j

MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),              (sub)-­‐[:MANAGES*1..3]-­‐>(report)  WHERE  boss.name  =  “John  Doe”  RETURN  sub.name  AS  Subordinate,      count(report)  AS  Total

Express  Complex  Queries  Easily  with  Cypher

Find  all  direct  reports  and  how  many  people  they  manage,  

up  to  3  levels  down

Cypher  QuerySQL  Query

Page 17: Bootstrapping Recommendations with Neo4j

Hello  World  Recommendation

Page 18: Bootstrapping Recommendations with Neo4j

Movie  Data  Model

Page 19: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Movie  Recommendation

MATCH  (watched:Movie  {title:"Toy  Story”})  <-­‐[r1:RATED]-­‐  ()  -­‐[r2:RATED]-­‐>  (unseen:Movie)  WHERE  r1.rating  >  7  AND  r2.rating  >  7  AND  watched.genres  =  unseen.genres  AND  NOT(  (:Person  {username:”maxdemarzi"})  -­‐[:RATED|WATCHED]-­‐>  (unseen)  )  RETURN  unseen.title,  COUNT(*)  ORDER  BY  COUNT(*)  DESC  LIMIT  25

What  are  the  Top  25  Movies  • that  I  haven't  seen  • with  the  same  genres  as  Toy  Story    • given  high  ratings  • by  people  who  liked  Toy  Story

Page 20: Bootstrapping Recommendations with Neo4j

Let’s  try  k-­‐nearest  neighbors  (k-­‐NN)

Cosine  Similarity

Page 21: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Ratings  of  Two  Users

MATCH    (p1:Person  {name:'Michael  Sherman’})  -­‐[r1:RATED]-­‐>  (m:Movie),                                (p2:Person  {name:'Michael  Hunger’})  -­‐[r2:RATED]-­‐>  (m:Movie)  RETURN  m.name  AS  Movie,                                r1.rating  AS  `M.  Sherman's  Rating`,                                  r2.rating  AS  `M.  Hunger's  Rating`

What  are  the  Movies  these  2  users  have  both  rated

Page 22: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Ratings  of  Two  UsersCalculating  Cosine  Similarity

Page 23: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Cosine  Similarity  

MATCH  (p1:Person)  -­‐[x:RATED]-­‐>  (m:Movie)  <-­‐[y:RATED]-­‐  (p2:Person)  WITH    SUM(x.rating  *  y.rating)  AS  xyDotProduct,              SQRT(REDUCE(xDot  =  0.0,  a  IN  COLLECT(x.rating)  |  xDot  +  a^2))  AS  xLength,              SQRT(REDUCE(yDot  =  0.0,  b  IN  COLLECT(y.rating)  |  yDot  +  b^2))  AS  yLength,              p1,  p2  MERGE  (p1)-­‐[s:SIMILARITY]-­‐(p2)  SET      s.similarity  =  xyDotProduct  /  (xLength  *  yLength)

Calculate  it  for  all  Person  nodes  with  at  least  one  Movie  between  them

Page 24: Bootstrapping Recommendations with Neo4j

Movie  Data  Model

Page 25: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Your  nearest  neighbors

MATCH  (p1:Person  {name:'Grace  Andrews’})  -­‐[s:SIMILARITY]-­‐  (p2:Person)  WITH    p2,  s.score  AS  sim  ORDER  BY  sim  DESC  LIMIT  5  RETURN    p2.name  AS  Neighbor,  sim  AS  Similarity

Who  are  the  • top  5  Persons  and  their  similarity  score  • ordered  by  similarity  in  descending  order  • for  Grace  Andrews

Page 26: Bootstrapping Recommendations with Neo4j

Your  nearest  neighbors

Page 27: Bootstrapping Recommendations with Neo4j

Cypher  Query:  k-­‐NN  Recommendation

MATCH  (m:Movie)  <-­‐[r:RATED]-­‐  (b:Person)  -­‐[s:SIMILARITY]-­‐  (p:Person  {name:'Zoltan  Varju'})  WHERE  NOT(  (p)  -­‐[:RATED]-­‐>  (m)  )  WITH  m,  s.similarity  AS  similarity,  r.rating  AS  rating  ORDER  BY  m.name,  similarity  DESC  WITH  m.name  AS  movie,  COLLECT(rating)[0..3]  AS  ratings  WITH  movie,  REDUCE(s  =  0,  i  IN  ratings  |  s  +  i)*1.0  /  LENGTH(ratings)  AS  recommendation  ORDER  BY  recommendation  DESC  RETURN  movie,  recommendation LIMIT  25

What  are  the  Top  25  Movies  • that  Zoltan  Varju  has  not  seen  • using  the  average  rating  • by  my  top  3  neighbors  

Page 28: Bootstrapping Recommendations with Neo4j

Recommendations  over  Searching/Browsing

Page 29: Bootstrapping Recommendations with Neo4j

Recommend  Jobs  to  Job  SeekersWhat  connects  them?  • location  • skills  • education  • experience

Page 30: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Job  RecommendationWhat  are  the  Top  10  Jobs  for  me  • that  are  in  the  same  location  I’m  in  • for  which  I  have  the  necessary  qualifications

Page 31: Bootstrapping Recommendations with Neo4j

Job  Recommendation  ResultsPerfect  Candidate  for  100%  matches    • missing  qualifications  can  be  added  quickly  • might  encourage  exaggerated  resumes    

Page 32: Bootstrapping Recommendations with Neo4j

Just  one  tiny  itsy  bitsy  problem

Job  Boards  get  paid  by  • Number  of  Applicants  to  a  Job  • Wholesale  Resume  sales  • Selling  your  data  

Page 33: Bootstrapping Recommendations with Neo4j

Recommend  LoveFind  your  soulmate  in  the  graph    • Are  they  energetic?  • Do  they  like  dogs?  • Have  a  good  sense  of  humor?  • Neat  and  tidy,  but  not  crazy  about  it?

What  are  the  Top  10  Potential  Mates  for  me  • that  are  in  the  same  location  • are  sexually  compatible  • have  traits  I  want    • want  traits  I  have

Page 34: Bootstrapping Recommendations with Neo4j

Cypher  Query:  Love  Recommendation

Page 35: Bootstrapping Recommendations with Neo4j

Love  Recommendation  Results

Page 36: Bootstrapping Recommendations with Neo4j

Linked  Data

Connect  to  the    Semantic  Web

Page 37: Bootstrapping Recommendations with Neo4j

Getting  some  Data

Page 38: Bootstrapping Recommendations with Neo4j

graphipedia

https://github.com/mirkonasato/graphipedia

Page 39: Bootstrapping Recommendations with Neo4j

neo4j-­‐dbpedia-­‐importer

https://github.com/kbastani/neo4j-­‐dbpedia-­‐importer

Page 40: Bootstrapping Recommendations with Neo4j

Named  Entity  RecognitionAutomatically  find  • names  of  people  • place  and  locations  • products  • and  organizations

Page 41: Bootstrapping Recommendations with Neo4j

Hacker  News  for  Example

• What  are  the  kids  in  silicon  valley  talking  about?

Page 42: Bootstrapping Recommendations with Neo4j

Let’s  find  out

• They  have  an  API!  • Get  some  data:StoriesUsersAuthors Commenters

Page 43: Bootstrapping Recommendations with Neo4j

Data  Model

Page 44: Bootstrapping Recommendations with Neo4j

Hacker  News  Recommendations

• Which  stories  should  I  read?  • Which  users  should  I  follow?  • What  else  should  I  be  interested  in?  • Who  seems  to  know  a  lot  about  X?  • Etc.

Page 45: Bootstrapping Recommendations with Neo4j

GraphAware  Recommendation  Framework

• Ability  to  trade  off  recommendation  quality  for  speed  • Ability  to  pre-­‐compute  recommendations  • Built-­‐in  algorithms  and  functions  • Ability  to  measure  recommendation  quality  • Ability  to  easily  run  in  A/B  test  environments

Page 46: Bootstrapping Recommendations with Neo4j

Real-­‐Time  Recommendations  with  Neo4j

SocialRecommendations

Products   and  Services Content Routing

Page 47: Bootstrapping Recommendations with Neo4j

Walmart        BUSINESS  CASE

World’s  largest  companyby  revenue  

World’s  largest  retailer  and  private  employer  

SF-­‐based  global  e-­‐commerce  division  

manages  several  websites  

Found  in  1969Bentonville,  Arkansas  

• Needed  online  customer  recommendations  to  keep  pace  with  competition  

• Data  connections  provided  predictive  context,  but  were  not  in  a  usable  format  

• Solution  had  to  serve  many  millions  of  customers  and  products  while  maintaining  superior  scalability  and  performance

Page 48: Bootstrapping Recommendations with Neo4j

Walmart        SOLUTION

• Brings  customers,  preferences,  purchases,  products  and  locations  into  a  graph  model  

• Uses  connections  to  make  product  recommendations  

• Solution  deployed  across  WalMart  divisions  and  websites

Page 49: Bootstrapping Recommendations with Neo4j

Global  Courier        BUSINESS  CASE

World’s  largest  courier  

480,000  employees€55  billion  in  revenue    

Needed  new   B2C  and  B2B  parcel  routing  

system  for  its  logistics  practice  

Legacy  system  neither  supported  the  full  network  

nor  the  shift  to  online  demands

Needed  to  replace  aging  B2B  and  B2C  parcel  routing  system  whose  requirements  include:  • 24x7  availability  • Peak  loads  of  5M  parcels  per  day,  3K  per  second  • Support  for  complex  and  diverse  software  stack  • Predictable  performance  with  linear  scalability  • Daily  changes  to  logistics  networks  • Route  from  any  point  to  any  point  • Single  point  of  truth  for  entire  network

Page 50: Bootstrapping Recommendations with Neo4j

Global  Courier        SOLUTION

Neo4j  provides  the  ideal  domain  fit  since  a  logistics  network  is  a  graph  • High  availability  and  performance  via  Neo4j  clustering  

• Greatly  simplified  Cypher  queries  for  routing  versus  relational  SQL  queries  

• Flexible  data  model  that  reflects  the  real  logistics  world  far  better  than  relational  

• Easy-­‐to-­‐grasp  whiteboard-­‐friendly  model

Page 51: Bootstrapping Recommendations with Neo4j

eBay        BUSINESS  CASE

C2C  and  B2C retail  network  

Full  e-­‐commerce  functionality  for  individuals  

and  businesses  

Integrated  with  logistics  vendors  for  product  

deliveries

• Needed  an  offering  to  compete  with  Amazon  Prime  

• Enable  customer-­‐selected  delivery  inside  90  minutes  

• Calculate  best  route  option  in  real-­‐time  • Scale  to  enable  a  variety  of  services  • Offer  more  predictable  delivery  times

Page 52: Bootstrapping Recommendations with Neo4j

eBay  Now          SOLUTION

• Acquired  UK-­‐based  Shutl.  a  leader  in  same-­‐day  delivery  

• Used  Neo4j  to  create  eBay  Now  • 1000  times  faster  than  the  prior   MySQL-­‐based  solution  

• Faster  time-­‐to-­‐market  • Improved  code  quality  with  10  to  100  times  less  query  code

Page 53: Bootstrapping Recommendations with Neo4j

Classmates        BUSINESS  CASE

Online  yearbook  connecting  friends  from  school,  work  and  military  

in  US  and  Canada  

Founded  as   Memory  Lane  in  Seattle  

Develop  new  social  networking  capabilities  to  monetize  yearbook-­‐related  offerings  • Show  all  the  people  I  know  in  a  yearbook  • Show  yearbooks  my  friends  appear  in  most  often  • Show  sections  of  a  yearbook  that  my  friends  appear  most  in  

• Show  me  other  schools  my  friends  attended

Page 54: Bootstrapping Recommendations with Neo4j

Classmates        SOLUTION

Neo4j  provides  a  robust  and  scalable  graph  database  solution  • 3-­‐instance  cluster  with  cache  sharding  and  disaster-­‐recovery  

• 18ms  response  time  for  top  4  queries  • 100M  nodes  and  600M  relationships  in  initial  graph—including  people,  images,  schools,  yearbooks  and  pages  

• Projected  to  grow  to  1B  nodes  and  6B  relationships

Page 55: Bootstrapping Recommendations with Neo4j

National  Geographic        BUSINESS  CASE

Non-­‐profit  scientific  and  educational  institution  

founded  in  1888  

Covers  geography,  archaeology,  natural  science,  environment  and  historical  

conservation  

Journals,  online  media,   radio,  TV,  documentaries,   live  events  and  consumer  

content  and  goods

• Improve  poor  performance  of  PostgreSQL  app  • Increase  user  engagement  by  linking  to  100+  years  of  multimedia  content    

• Improve  targeting  by  understand  subscribers’  interests  better  

• Recommend  content  and  services  to  users  based  on  their  interests

Page 56: Bootstrapping Recommendations with Neo4j

National  Geographic        SOLUTION

• Enabled  complex  real-­‐time  analytics  across  eight  million  users  and  a  century  of  content  

• Delivered  robust  performance  by  eliminating  triple-­‐nested  SQL  joins    

• Cross-­‐refers  users  among  content,  live  events,  travel,  goods  and  causes  

• Neo4j  solution  much  less  cumbersome  and  easier  to  maintain  than  previous  SQL  system

Page 57: Bootstrapping Recommendations with Neo4j

Curaspan        BUSINESS  CASE

Leader  in  patient  management  for  discharges  

and  referrals  Manages  patient  referrals  4600+  health  care  facilities  Connects  providers,  payers  via  web-­‐based  patient  management  platform  Founded  in  1999  in  

Newton,  Massachusetts

• Improve  poor  performance  of  Oracle  solution  

• Support  more  complexity  including  granular,  role-­‐based  access  control  

• Satisfy  complex  Graph  Search  queries  by  discharge  nurses  and  intake  coordinators  Find  a  skilled  nursing  facility  within  n  miles  of  a  given  location,  belonging  to  health  care  group  XYZ,  offering  speech  therapy  and  cardiac  care,  and  optionally  Italian  language  services

Page 58: Bootstrapping Recommendations with Neo4j

Curaspan        SOLUTION• Met  fast,  real-­‐time  performance  demands  

• Supported  queries  span  multiple  hierarchies  including  provider  and  employee-­‐permissions  graphs  

• Improved  data  model  to  handle  adding  more  dimensions  to  the  data  such  as  insurance  networks,  service  areas  and  care  organizations  

• Greatly  simplified  queries,  simplifying  multi-­‐page  SQL  statements  into  one  Neo4j  function

Page 59: Bootstrapping Recommendations with Neo4j

FiftyThree      BUSINESS  CASE

Maker  of  Paper,   one  of  the  top  apps  

in  Apple’s  App  Store,  with  millions  of  users  

Based  in  New  York  City

• Add  social  capabilities  to  digital-­‐paper  app  • Support  social  collaboration  across  millions  of  users  in  new  Mix  app  

• Enable  seamless  interaction  between  social  and  content-­‐asset  networks  

• Ensure  new  apps  are  robust,  scalable  and  fast

Page 60: Bootstrapping Recommendations with Neo4j

FiftyThree        SOLUTION

• Neo4j  data  model  ideal  for  social  network,  content  management  and  access  control  • Users  create,  publish  and  share  designs  simply  • Easy  to  develop  and  evolve  Neo4j-­‐based  app  • Integrates  well  with  FiftyThree  EC2  architecture  

See  the  Neo4j  solution  in  action  Betting  the  Company  (Literally)  on  a  Graph  Databasehttp://aseemk.com/talks/neo4j-­‐lessons-­‐learned#/

App  Store  Editor’s  Choice2012  iPad  App  of  Year Apple  Best  Apps  of  2014

Page 61: Bootstrapping Recommendations with Neo4j

Questions

• How  does  Neo4j  fit  into  my  existing  infrastructure? As  a  Service.  

• Will  Neo4j  scale? Yes.