downtown sf lucene/solr meetup: developing scalable search for user generated content at playstation
TRANSCRIPT
-
Developing Scalable Search for User Generated Content at PlaySta:on
Alvin Peng Sr. So=ware Engineer
Sony Interac:ve Entertainment
-
User Generated Content (UGC) in PlaySta:on
PlaySta:on users can easily share awesome medias Media types
Broadcasts Screenshots Videos
Medias are posted to third party networks Facebook TwiKer YouTube Dailymo:on Twitch Ustream Niconico
-
However
There was no central place to show or search for all these awesome contents
Only shown up in users Ac:vity Feed and Profile
Only sent to friends Basically not visible to majority of our millions of users
-
Difficulties of UGC System
Searchable Scalability Performance Dynamic content A lot of read A lot write Various searching requirements
-
Solution
SolrCloud based scalable search system for public UGC by users of PlaySta:on
-
Why Solr?
Widely used open source search plaTorm Scalable Stable Feature rich Not just a search plaTorm Great Solr community, both individuals and companies
-
Developers of UGC backend system
Alvin Peng David Herrera Rosales
-
Live From PlayStation and More
-
Live From PlayStation and More
-
Live From PlayStation and More
-
Live From PlayStation and More
-
System Architecture
-
System Architecture
-
System Architecture
-
System Architecture
-
System Architecture
-
System Architecture
-
System Architecture
-
System Architecture
-
UGC SolrCloud System Design Solr 5.2.1 SolrJ CloudSolrClient Single collec:on 3 clusters in produc:on environment
Broadcasts Screenshots Videos
5 zookeeper nodes Single shard 16 Solr nodes per cluster
-
Solr Schema Field types
Class StrField TextField TrieLongField TrieDateField etc.
Analyzer Char filter
MappingCharFilterFactory HTMLStripCharFilterFactory PaKernReplaceCharFilterFactory etc.
Tokenizer StandardTokenizerFactory NGramTokenizerFactory KeywordTokenizerFactory etc.
Filter LowerCaseFilterFactory PorterStemFilterFactory StopFilterFactory etc.
Index Analyzer and Query Analyzer
-
Solr Schema Fields
Number of fields Field type Indexed Stored etc.
copyField
-
UGC Multilingual Support
Supports about 20 languages English Spanish Japanese etc.
Different field types for different languages Different tokenizers and filters
-
UGC Solr Configuration
Hard commit: 15 minutes Hard commits are about durability
So= commit: 1 minute So= commits are about visibility Less expensive, but not free Use the longest so= commit interval thats acceptable for best performance
-
UGC Stats Online since last Sept. Number of documents
Broadcasts: 26K Screenshots: 5M Videos: 20M
Average request RPS Total UGC query requests per day > 1B Average Solr query RPS:
Broadcasts: 1600 Screenshots: 250 Videos: 250
Average Solr update RPS: Broadcasts: 500 Screenshots: 250 Videos: 500
Average query latency Average Solr query latency:
Broadcasts: 4ms (16ms for leader) Screenshots: 14ms (16ms for leader) Videos: 60ms (210ms for leader)
Average Solr update latency: Broadcasts: 8ms (60ms for leader) Screenshots: 1ms (10ms for leader) Videos: 2ms (24ms for leader)
-
Finally
Happy searching with Solr!
-
Q/A