downtown sf lucene/solr meetup: developing scalable search for user generated content at playstation

27
Developing Scalable Search for User Generated Content at PlaySta:on Alvin Peng Sr. So=ware Engineer Sony Interac:ve Entertainment

Upload: lucidworks

Post on 16-Feb-2017

324 views

Category:

Software


0 download

TRANSCRIPT

  • Developing Scalable Search for User Generated Content at PlaySta:on

    Alvin Peng Sr. So=ware Engineer

    Sony Interac:ve Entertainment

  • User Generated Content (UGC) in PlaySta:on

    PlaySta:on users can easily share awesome medias Media types

    Broadcasts Screenshots Videos

    Medias are posted to third party networks Facebook TwiKer YouTube Dailymo:on Twitch Ustream Niconico

  • However

    There was no central place to show or search for all these awesome contents

    Only shown up in users Ac:vity Feed and Profile

    Only sent to friends Basically not visible to majority of our millions of users

  • Difficulties of UGC System

    Searchable Scalability Performance Dynamic content A lot of read A lot write Various searching requirements

  • Solution

    SolrCloud based scalable search system for public UGC by users of PlaySta:on

  • Why Solr?

    Widely used open source search plaTorm Scalable Stable Feature rich Not just a search plaTorm Great Solr community, both individuals and companies

  • Developers of UGC backend system

    Alvin Peng David Herrera Rosales

  • Live From PlayStation and More

  • Live From PlayStation and More

  • Live From PlayStation and More

  • Live From PlayStation and More

  • System Architecture

  • System Architecture

  • System Architecture

  • System Architecture

  • System Architecture

  • System Architecture

  • System Architecture

  • System Architecture

  • UGC SolrCloud System Design Solr 5.2.1 SolrJ CloudSolrClient Single collec:on 3 clusters in produc:on environment

    Broadcasts Screenshots Videos

    5 zookeeper nodes Single shard 16 Solr nodes per cluster

  • Solr Schema Field types

    Class StrField TextField TrieLongField TrieDateField etc.

    Analyzer Char filter

    MappingCharFilterFactory HTMLStripCharFilterFactory PaKernReplaceCharFilterFactory etc.

    Tokenizer StandardTokenizerFactory NGramTokenizerFactory KeywordTokenizerFactory etc.

    Filter LowerCaseFilterFactory PorterStemFilterFactory StopFilterFactory etc.

    Index Analyzer and Query Analyzer

  • Solr Schema Fields

    Number of fields Field type Indexed Stored etc.

    copyField

  • UGC Multilingual Support

    Supports about 20 languages English Spanish Japanese etc.

    Different field types for different languages Different tokenizers and filters

  • UGC Solr Configuration

    Hard commit: 15 minutes Hard commits are about durability

    So= commit: 1 minute So= commits are about visibility Less expensive, but not free Use the longest so= commit interval thats acceptable for best performance

  • UGC Stats Online since last Sept. Number of documents

    Broadcasts: 26K Screenshots: 5M Videos: 20M

    Average request RPS Total UGC query requests per day > 1B Average Solr query RPS:

    Broadcasts: 1600 Screenshots: 250 Videos: 250

    Average Solr update RPS: Broadcasts: 500 Screenshots: 250 Videos: 500

    Average query latency Average Solr query latency:

    Broadcasts: 4ms (16ms for leader) Screenshots: 14ms (16ms for leader) Videos: 60ms (210ms for leader)

    Average Solr update latency: Broadcasts: 8ms (60ms for leader) Screenshots: 1ms (10ms for leader) Videos: 2ms (24ms for leader)

  • Finally

    Happy searching with Solr!

  • Q/A