pinterest zen - facebook @scale conference

Upload: pinteresr

Post on 02-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    1/35

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    2/35

    Raghavendra Prabhu (RVP)

    Zen:

    Pinterests Graph Storage Ser

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    3/35

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    4/35

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    5/35

    What does it take to do this consistently

    Many di!erent things

    #Hire the best

    #Culture

    #Focus

    Infrastructure that doesnt get in the way

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    6/35

    Persistent Storage

    Even with a distributed database, app needs to deal w

    #Schema design

    #Fault tolerance

    #Capacity management

    #Performance tuning

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    7/35

    Solution 1: UserMetaStore

    Storage-as-a-Service: Key-value thrift API on top of H

    Features:

    #Key partitioning to balance load

    #Master-slave clusters, semi automatic failover

    #Speculative execution

    #Multi-tenancy with tra$c isolation

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    8/35

    Storage-as-a-service is a great step fobut can we do better?

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    9/35

    Example: Messages Data Model

    Conversation

    Mes

    Mes

    Mes

    User

    User

    Participates Contains

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    10/35

    Realization

    #These object models closely resemble a graph

    #Objects are nodes, edges represent relationships

    #Typical needs:

    # retrieve data for a node or edge

    # get all outgoingedges from a node

    # get all incomingedges from a node # count incomingor outgoingedges for a node

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    11/35

    Enter Zen!

    #Provides a graphdata model instead of key-value

    #Automatically creates necessary indexes

    #Materializes counts for e$cient querying

    #Implemented on top of HBase, but can plug in other

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    12/35

    Why the name Zen?

    #Data model inspired by Facebooks TAO

    #But internally a very di!erent system

    #Zen:

    # evolution of Buddhism under Taoist conditions

    # simplified version of Taoism

    # basically Pinterests take on the TAO idea :)

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    13/35

    What Zen is NOT

    #NOT a full fledged graph database

    #NO advanced graph operations

    #Basically an object-relationship data model on top of

    databases to simplify app development

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    14/35

    Zen API

    Nodes:

    # addNode, removeNode, getNode# Node id: globally unique 64-bit integer

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    15/35

    Zen API

    Edges:

    # addEdge, removeEdge, getEdge# Edge Ref: (edgeType, fromId, toId)

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    16/35

    Zen API

    Edge Queries:

    # getEdges, countEdges, removeEdges

    struct EdgeQuery {

    1: required NodeId nodeId;

    2: required EdgeDirection direction;

    3: optional TypeId edgeType;

    }

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    17/35

    Zen API

    Property Indexes

    #Unique index#Ensures a property value is unique across all nodes of a type

    #Non-unique index

    #Allows retrieval by property value

    #Works for both nodes and edges

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    18/35

    Illustration: Messages on Zen

    Id:1234 Id:2345 Id

    Type: Participates Type: Contains

    Type: Conversation

    Started: 12 Aug 2014 08:00

    Header: Great pin!

    Pin Id: 10001 [non-unique]

    Type: User

    Name: Ben Smith [unique]

    Status: Active

    Type: Me

    Sent: 12

    Text: Gr

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    19/35

    Zen: Current Usage

    Products:

    # smart feed, messages, network news, interest graph and upcoming features

    Numbers:

    # ~10 clusters

    # 100,000+ requests per second at peak

    #Over 5 million HBase operations per second

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    20/35

    Xun Liu

    Internals and Production Learn

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    21/35

    Internals - Property

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    22/35

    Internals - Property Index

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    23/35

    Internals - Edge Score Index

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    24/35

    Internals - Edge Count

    N F

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    25/35

    Status - Soft Delete

    New Features

    N F t

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    26/35

    Built-in Cache

    New Features

    Zen

    Cache

    HBaseClient ZenClient

    Cache

    Before After

    N F t

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    27/35

    Namespace

    New Features

    Node

    Names ace 1

    Edge Index Node

    Names ace 2

    Edge Index

    N F t

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    28/35

    New Features

    #Online type schema change

    #Optional reverse edge #Optional edge count

    #Retrieval of subset of properties

    #Descending edge score

    P f W k

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    29/35

    Performance Work

    Demanding work load needs special tuning

    # Inserting 1 million edges per second# Excessive HLog (WAL) flushes

    P f W k

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    30/35

    Performance Work

    Batching

    #Client Side Batching bulk edge insertion# Zen Server Side Batching bu!er edits across clients &

    together

    #Reduce HLog (WAL) flushes by orders of magnitude

    Performance Work

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    31/35

    Performance Work

    Tuning storage engine

    #Bloomfilter, Blocksize, Encoding, Compression etc.

    Zen production setup

    #Dedicated Zen cluster

    #Namespace

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    32/35

    Distributed transaction or not?

    Data Consistency

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    33/35

    Data Consistency

    Stay on top of data inconsistencies

    #Manual rollback in Zen server #O&ine jobs (Dr Zen) to scan and fix inconsistencies

    # Tools to debug and fix one-o!inconsistency

    Future Work

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    34/35

    Future Work

    #Dr Zen (make it more e$cient)

    #Other backends: MySQL, Redis, etc#Distributed transactions

    #Open source!

  • 8/11/2019 Pinterest Zen - Facebook @Scale conference

    35/35