stig: social graphs & discovery at scale

36
Social graphs and discovery. At scale. Jason Lucas, Scalability Architect, Tagged

Upload: dataversity

Post on 20-Aug-2015

555 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Stig: Social Graphs & Discovery at Scale

Social graphs and discovery. At scale.Jason Lucas, Scalability Architect, Tagged

Page 2: Stig: Social Graphs & Discovery at Scale

Stig is...• A very large-scale non-SQL database...

• But it speaks and can emulate SQL

• A graph-oriented data store...

• But can look like a key value store, relational tables, file system

• A foundation for building general web applications...

• But it particularly excels at social apps

• A distributed system with a shared-nothing architecture...

• But it gives developers an easy-to-manage path to data

• A solution to complex problem of CAP-limited systems...

Page 3: Stig: Social Graphs & Discovery at Scale

Part 1: Stig Project GoalsPart 2: Stig ConceptsPart 3: Lunch Workshop

Page 4: Stig: Social Graphs & Discovery at Scale

Part 1: Stig Project Goals• Facilitate the developer.

• Be like a good waiter

• Easy should be easy, complex should be possible

• Untangle some existing messes

• Scale like crazy. (Without driving ops crazy.)

• Go big

• Go fast

• Go smooth

• Exceed expectations.

• Enable previously unthinkable features

Page 5: Stig: Social Graphs & Discovery at Scale

Facilitate the developer

• Decrease the burden.

• Provide a single path to data

• Create a uniform representation available to multiple application languages

• Reduce the need for “defensive programming”

• Enforce consistency.

• Re-introduce atomic transactions

• Control assumptions with assertions

• Promote correctness.

• Provide a more robust data representation

• Support unit testing

Page 6: Stig: Social Graphs & Discovery at Scale

Facilitate the developer

• Offer power in simplicity.

• Offer a robust expression language

• Describe effects rather than details of distribution

• Above all else:

• “I want to feel like I'm doing a good job.”

Page 7: Stig: Social Graphs & Discovery at Scale

Scale like crazy...

• Use a distributed architecture.

• Shard data over multiple machines

• Use commodity hardware

• Scale as linearly as possible

• Use replicas to speed average access

• Move queries to data.

• Decompose queries by separating areas of concern

• Farm sub-queries to shards which hold the relevant data

• Use comprehensions instead of realizations when possible

Page 8: Stig: Social Graphs & Discovery at Scale

Scale like crazy...

• Build for the web.

• Provide durable sessions

• Allow clients to disconnect and reconnect at will

• Continue running in the background

• Increase concurrency.

• Break large objects down into smaller ones

• Escrow deltas around fields which are partitioned or contentious

• Use assertions instead of locks to permit interleaving of operations

Page 9: Stig: Social Graphs & Discovery at Scale

Without driving Ops crazy

• Be highly available.

• Replicate storage across multiple machines

• Shift responsibilities between machines transparently

• Bring machines back into service transparently

• Tolerate partitioning.

• Fall back transparently to lower levels of service

• Reconcile database automatically when partitions rejoin

Page 10: Stig: Social Graphs & Discovery at Scale

Without driving Ops crazy

• Simplify maintenance.

• Tolerate unreliable hardware

• Make software upgrades easy to manage

• Be flexible with regard to physical topology

• Make system status, performance, and capacity easy to measure and comprehend

• Degrade gracefully under load

• To the greatest degree possible, make the system maintain itself

Page 11: Stig: Social Graphs & Discovery at Scale

Exceed expectations• Enable previously unthinkable features.

• Don’t include histories in your schemas; the database keeps histories

• Design apps with real-time, multi-user communications; database sessions are “chatty”

• Feel free to compute Erdős Numbers or routes to Kevin Bacon

• Test for the existence of interesting data states in constant time, not log time

• Execute queries in time proportionate to the size of the answer, not the size of the database

Page 12: Stig: Social Graphs & Discovery at Scale

Exceed expectations

• Decrease development cycle time.

• Build working apps on your desktop; the database can be simulated

• Evolve your schema at will; the database doesn’t make a distinction between data and metadata

• Use any language you like; the database looks the same from all clients

Page 13: Stig: Social Graphs & Discovery at Scale

Part 2: Stig Concepts• Representing Graphs

• Deconstructing Commits

• Making Time Flow

• Finding Meaning

• Querying

Page 14: Stig: Social Graphs & Discovery at Scale

Representing Graphs...Without Stig

• Graphs in Tables.

• Walks spread outward in waves

• Self-joins proliferate

• Graphs Key-Value Stores.

• Generally node-centric

• Edges are denormalized conjugate sets

• Non-transactional multi-set is deadly

• Graphs in XML Stores.

• Floating chunk syndrome

• Worst of both worlds

• Graphs in Doc & Graph Stores.

• Typeless, interned at nodes

Page 15: Stig: Social Graphs & Discovery at Scale

Representing Graphs...With Stig

Locations, Nodes & Edges

/user/[email protected] /user/[email protected]

person

mafiaplayer

personpets

player

mafiaplayer

ownspetsplayer

Page 16: Stig: Social Graphs & Discovery at Scale

Deconstructing Commits...Without Stig

• Two States.

• Uncommitted: only me

• Committed: everybody else

• One sandbox per connection

• Variable Isolation.

• High isolation limits concurrency

• Low isolation hard to cope with

• Two Guarantees.

• Written to disk

• Ephemeral

• Some NoSQL Options.

• No transactional integrity

• Post-hoc reconciliation

Page 17: Stig: Social Graphs & Discovery at Scale

Deconstructing Commits...With Stig

• Private.

• Only me, but I get as many as I want; maybe ephemeral

• Shared.

• Restricted scope, rapid communication; maybe ephemeral

• Global.

• A singleton, same as commit

• Guarantees

• Self-consistent

• Replicated in data center

• Written to disks

• Replicated to other data centers

Page 18: Stig: Social Graphs & Discovery at Scale

Deconstructing Commits...With StigPoints of View in Diplomacy

(Global)

Diplomacy Game(Shared)

Alice/Bob Alliance(Shared)

Alice(Private)

Carol(Private)

Bob(Private)

Page 19: Stig: Social Graphs & Discovery at Scale

Making Time Flow...Without Stig

• Time Flows Naturally.

• System clock is OK

• Execution Time ≈ Query Time.

• A query made after an update will see the results of the update because time flow is linear

• The order of events is definite

• Locks Enforce Consistency.

• Updates block each other

• MVCC in Lieu of Locks.

• Reads are writes

• Collisions are rollbacks

Page 20: Stig: Social Graphs & Discovery at Scale

Making Time Flow...With Stig

• Time is Uncertain.

• Distributed machines cannot rely on their system clocks

• Declared Dependencies.

• Each query declares its predecessors, so causality is a graph

• The order of events is unknowable, but any topological sort of the graph is OK

• Assertions Enforce Consistency.

• MVCC with Paxos facilitates time travel

• Query: seek a time in the past at which assertions are true

• Update: seek a time in the future at which assertions are still true

Page 21: Stig: Social Graphs & Discovery at Scale

Confirm Order

Making Time Flow...With Stig

Checkout Time

Enter Credit Card

DisplayShopping

CartRequest Gift Wrap

Update Qty. of Item

Specify Shipping

Page 22: Stig: Social Graphs & Discovery at Scale

Finding Meaning...Without Stig

• Tables & Views.

• Tables store the base data

• Views collect data from tables and other views

• Views often present performance bottlenecks

• Analysis Belongs to Data Definition.

• Adding or changing a view or index is a schema change

• Programmers must work with DBAs, limiting individual initiative

• Changes have the potential to degrade the data service as a whole

Page 23: Stig: Social Graphs & Discovery at Scale

Finding Meaning...With Stig

• Asserted & Inferred Edges.

• Asserted edges store the base data

• Inferred edges collect data from asserted and inferred edges

• Inference is distributed, on-going, and subject to time-travel

• Analysis Belongs to Program Definition.

• Inference rules aren’t “special”

• Programmers can invent as they like

• Scope of risk is limited

Page 24: Stig: Social Graphs & Discovery at Scale

AliceAlice

Finding Meaning...With Stig

Bob

has friendship

x

has friendship

Inferring Friends & Stalkers

Bob

is friend of

<a, ‘is friend of’ b>if <a, ‘has friendship’, x>and <b, ‘has friendship’, x>and a is not b;

<a, ‘is stalking’ b>if <a, ‘is friend of’, b>and a.age >= 18and b.age < 18;

Page 25: Stig: Social Graphs & Discovery at Scale

Querying...Without Stig

• SQL

• Easy-to-use, commonly known, and mostly harmless

• Suffers from poor composability and is useless as a general-purpose programming language

• Map-Reduce, Erlang, etc.

• Not so easy-to-use, not so commonly known, and capable of shooting you in the foot

• Often requires knowledge of underlying distributed architecture and are still not front-runners as general-purpose programming languages

Page 26: Stig: Social Graphs & Discovery at Scale

Querying...With Stig

• Robust and General-Purpose Language.

• Purely functional, lazily evaluated, and strictly, robustly typed

• Pattern-oriented notation for describing walks across graph

• Composability Rules.

• Comprehensions of sequences form the foundation

• Transformations of sequences (map, reduce, filter, zip, etc.) are the building blocks

• Distributed Evaluation Rocks.

• Queries are broken down and sent to the servers where they need to be

• Evaluation occurs in parallel

Page 27: Stig: Social Graphs & Discovery at Scale

Querying...With Stig

• Compiled & Stored.

• Queries compile down to machine code and get stored in the graph itself

• Stored programs are subject to on-going analysis

• Programs can call each other

• Library-Driven.

• Language fundamentals support construction of libraries

• We can emulate other languages, such as LINQ and Python

• Clients.

• Currently Java, Perl, PHP, Python, and C/C++

• We can also serve HTTP directly

Page 28: Stig: Social Graphs & Discovery at Scale

Querying...With Stig

o /* function definition */mutual_friends x y = solve f: [ <x, ‘is friend of’, f>; <y, ‘is friend of’, f> ];

o /* function application */mutual_friends person@/users/alice person@/users/bob;

o /* results */[ { f = person@/users/carol }, { f = person@/users/dave } ];

Mutual Friends

Page 29: Stig: Social Graphs & Discovery at Scale

Wrapup

Page 30: Stig: Social Graphs & Discovery at Scale

Is your project...?• Graph-shaped?

• Representing graphs as graphs (instead of as tables or key pairs) simplifies your life

• Stig graphs are fat, meaning they're really any number of simultaneous, intersecting graphs, so go nuts

• Transactional?

• Reliably atomic state transitions also simplify your life

• Asynchronous transaction management makes it more tolerable

• Real-time?

• Control the influence of updates with shared points-of-view

• Never be blocked waiting for the database to respond

Page 31: Stig: Social Graphs & Discovery at Scale

Is your project...?• Really huge?

• The store scales very close to linearly, so more data just means more machines

• The size of the cluster doesn't generally doesn't affect the performance of individual operations

• Deeply analytic?

• Use inferences to describe relations and conditions you're interested in

• Build up arbitrarily complex libraries of inference to extract meaning from data

Page 32: Stig: Social Graphs & Discovery at Scale

Open-sourcing this year!• About our Code.

• Written in C++0x and Haskell, with Python for tools

• Entirely unit-test driven and designed for easy adoption

• Why Open Source?

• We want to give back

• We benefit first and most

• Competitive advantage would be temporary anyway

• Knowing it’s open keeps us on our toes

• There’s more to do than we can do ourselves

• We attract the kind of people we want to work with

Page 33: Stig: Social Graphs & Discovery at Scale

Our doors are open• About Tagged

• #3 in social networking and growing (100+ Million members)

• Located in downtown SF, 10 Ten Places to work by San Francisco Business Journal

• Profitable since 2008. We answer only to ourselves and our users

Page 34: Stig: Social Graphs & Discovery at Scale

Our doors are open

• About the Stig Team.

• Five full-time engineers with backgrounds in compilers, databases, distributed systems, and AI

• Interns year-round with opportunities to publish

• And yes, we're hiring!

Page 35: Stig: Social Graphs & Discovery at Scale

Got ideas?• Contact us!

• Sign up for Stig news at: www.stigdb.org

• Follow the Tagged Dev Blog at: blog.tagged.com

• Jason LucasArchitect of Scalable [email protected]

Page 36: Stig: Social Graphs & Discovery at Scale

Part 3: Lunch Workshop• But wait, there’s more!

• Join us as we get our hands messy with food and take a deep dive into the Stig query language and the Stig API!

• Lunch 'N Learn 01:15 PM - 02:15 PM