downtown sf lucene/solr meetup: developing scalable user search for playstation 4

20
Developing Scalable User Search for PlayStation 4 Ai Sasho [email protected] Sr. So/ware Engineer Sony Interac6ve Entertainment

Upload: lucidworks

Post on 16-Apr-2017

487 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

Developing Scalable User Search for PlayStation 4

Ai  Sasho    [email protected]  Sr.  So/ware  Engineer  Sony  Interac6ve  Entertainment  

     

Page 2: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

About My Team

§  Developing  social  features  for  PS4  to  improve  social  gaming  experiences.    §  Worked  on  User  Search  and  Players  You  May  know  recommenda6on  

features.  

§  Server  side:  Isaias,  Marlon,  Pavan,  Chris,  Janhavi,  Xifan,  Venkat  §  Client  side:  Tomas,  Nythya,  Max,  Yukio,  Katsuya,  Eric,  Tong    Sony  Interac8ve  Entertainment  =  Sony  Network  Entertainment  Intn’l  +  Sony  Computer  Entertainment  

=  Greatness  Awaits!  

 

Page 3: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

Page 4: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

Outline

§  User Search Feature Overview

§  SolrCloud Setup

§  Personalized Search: Lucene + SolrCloud §  Challenges

§  Solr4.8 to 5.4 Upgrade

Page 5: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

User Search

Page 6: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

User Search

Page 7: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  Fast •  query should return < 100 ms

§  Reliable / Fault Tolerant §  Scalable

•  SolrCloud cluster need to handle: o Up to 1000 RPS query requests o Up to 250 RPS indexing requests

•  Appr. 300 millions documents

§  Ranking search results by friendship. •  Up to n degrees of separation. •  Friends, 2nd degree fiends (friends of friends), etc.

©2016  Sony  Interac6ve  Entertainment    

User Search: Requirements

Page 8: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

SolrCloud: System Architecture

ZooKeeper  

SolrCloud  cluster  

Leader  

a  Replica  

Leader  

a  Replica  

Leader  

a  Replica  

Leader  

a  Replica  

ELB   Applica6on  Servers  

Database  

Page 9: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  SolrCloud 5.4 §  Documents

•  User data (~ 1.5 kb per user) •  ID, Online ID, Name (First, Middle, Last), Privacy, User Type, etc.. •  ~ 300 million documents

§  Shards •  4 shards + many replicas. •  # shards determined experimentally. •  Most of the docs on each shard fit in the memory.

§  Cache •  Query Result Cache, Document Cache, Filter Cache, etc ..

§  Commit •  SoftAutoComit: 5 secs •  AutoCommit: 15 mins (OpenSearcher=false)

©2016  Sony  Interac6ve  Entertainment    

SolrCloud: Configurations

Page 10: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  Tokenizers •  Whitespace Tokenizer

§  Filters §  Ascii Folding Filter

o Stored and queried with equivalent English alphabets. o Joan Miró -> Joan Miro

§  N-Gram Filter o abc -> a, b, c, ab, bc, abc o Takes up more space, but faster than wildcard (*) when

queried. §  Lower Case Filter

©2016  Sony  Interac6ve  Entertainment    

SolrCloud: Configurations

Page 11: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  People search users they know or they kind of know... §  Search results should be ranked by the friendship

between the searcher and the searched (users).

©2016  Sony  Interac6ve  Entertainment    

Personalized Search: Overview

User  A    <-­‐  Friend  (1st  degree  of  separa6on)  User  B    <-­‐  Friend  (1st  degree  of  separa6on)  User  C    <-­‐  Friend  of  Friend  (2nd  degree  of  separa6on)  ...    User  Y  <-­‐  Not  associated.  User  Z  <-­‐  Not  associated.    

Page 12: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

Personalized Search: Ideas

q=ps4king& bf=friends:(ID1 or ID2 or ID3 or …)^500& bf=friends2nd:(ID4 or ID5 or ID6 or …)^50& bf=friends3rd:(ID7or ID8 or ID9 or …)^5& …

Possible  Solu8on  1  :  Query  SolrCloud  with  the  list  of  friend  IDs.    

Problems  •  The  list  of  friends  can  be  very  long  (poten6ally  thousands).  •  Increases  the  query  latency.  

Giving  a  higher  boost  for  users  who  are  closer  to  the  caller.  

Page 13: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

Possible Solution 2: Index the friendship in SolrCloud. Add “friends“ fields, if the caller is in one of the “friends” fields, boost the document. Problems:

o Too many requests to Solr. o Maintaining friendship in Solr in addition to our database

might be overkill. o Requires a large disk space.

©2016  Sony  Interac6ve  Entertainment    

Personalized Search: Ideas

Page 14: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

Personalized Search: Our Solution

+  Personalized  Index  

 Stores  people  close  to  the  caller  (friends,  friends  of  friends,  up  to  n  degrees  of  separa6on).    §  Also  used  in  friend  recommenda6on  

system.  §  Other  team  already  uses  Lucene  index  for  

user  owned  games.    

Global  Index    

Includes  all  the  users.    

Page 15: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

Personalized Search: Lucene + SolrCloud

Online  ID   First  Name  

….   Degree  of  Separa6on  

ps4Queen   Marge   …   1  

ps4King   Homer   ...   1  

ps4awesome  

Bart   …   2  

…   …   …   …  

Lucene  Index  (simplified)  

Applica6on  Server  

Friendship  Data  

§  Lucene  index  created  on-­‐demand  for  the  caller  

 §  Cached  temporarily  

+  

Page 16: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  Hard to increase the performance using two index systems. (Lucene + SolrCloud) •  Tuned SolrCloud a lot (cache size, query optimization,

soft/auto commit settings, GC settings, etc.)

§  Not a problem anymore, but SolrCloud had been unstable for a while. •  Entire cluster would have gone down a couple of

times a month.

©2016  Sony  Interac6ve  Entertainment    

Challenges

Page 17: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  Increased the number of replicas

•  When leader goes in recovery, need to have enough replicas to handle all the requests.

§  Reconfigured GC settings with CMS (concurrent mark sweep).

§  Decreased the size of the document query cache.

o  Cache warm-up time was longer than the soft auto commit duration -> was always warming the cache.

©2016  Sony  Interac6ve  Entertainment    

Challenges: Instability Solutions

Page 18: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

©2016  Sony  Interac6ve  Entertainment    

SolrCloud Upgrade

§  Motivations •  Originally Solr 4.8 was used, but due to the instability

issues, upgraded to Solr 5.4.

§  Challeges •  Tried to data stream from a Solr 4.8 node to Solr 5.4 by

joining a node, but did not work.

•  Some data types have been deprecated. o  IntegerType, LongType -> TrieInteger, TrieLong o  schema.xml needed to be updated with the new data types. o Decided to full index the 300 million documents in Solr 5.4

cluster.

Page 19: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

§  First, query out 300M docs and then full indexing. §  Deep paging (specifying start index and limit) is too slow

•  Solr needs to cache documents up to the starting index.

§  The logical cursor cusorMark is solution to the deep paging problem.

The cursorMark returns the next cursor as part of the response. §  cursorMark is not perfect. Sometimes the cursor stops before the end

of the documents. Could use filter query to query the certain range of documents by ids.

©2016  Sony  Interac6ve  Entertainment    

SolrCloud Upgrade: Full Indexing

...&rows=10&sort=id+asc&cursorMark=AoEjR0JQ  

Page 20: Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStation 4

Q  &  A  Any  Ques6ons?