stig: social graphs & discovery at scale

Post on 20-Aug-2015

555 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Social graphs and discovery. At scale.Jason Lucas, Scalability Architect, Tagged

Stig is...• A very large-scale non-SQL database...

• But it speaks and can emulate SQL

• A graph-oriented data store...

• But can look like a key value store, relational tables, file system

• A foundation for building general web applications...

• But it particularly excels at social apps

• A distributed system with a shared-nothing architecture...

• But it gives developers an easy-to-manage path to data

• A solution to complex problem of CAP-limited systems...

Part 1: Stig Project GoalsPart 2: Stig ConceptsPart 3: Lunch Workshop

Part 1: Stig Project Goals• Facilitate the developer.

• Be like a good waiter

• Easy should be easy, complex should be possible

• Untangle some existing messes

• Scale like crazy. (Without driving ops crazy.)

• Go big

• Go fast

• Go smooth

• Exceed expectations.

• Enable previously unthinkable features

Facilitate the developer

• Decrease the burden.

• Provide a single path to data

• Create a uniform representation available to multiple application languages

• Reduce the need for “defensive programming”

• Enforce consistency.

• Re-introduce atomic transactions

• Control assumptions with assertions

• Promote correctness.

• Provide a more robust data representation

• Support unit testing

Facilitate the developer

• Offer power in simplicity.

• Offer a robust expression language

• Describe effects rather than details of distribution

• Above all else:

• “I want to feel like I'm doing a good job.”

Scale like crazy...

• Use a distributed architecture.

• Shard data over multiple machines

• Use commodity hardware

• Scale as linearly as possible

• Use replicas to speed average access

• Move queries to data.

• Decompose queries by separating areas of concern

• Farm sub-queries to shards which hold the relevant data

• Use comprehensions instead of realizations when possible

Scale like crazy...

• Build for the web.

• Provide durable sessions

• Allow clients to disconnect and reconnect at will

• Continue running in the background

• Increase concurrency.

• Break large objects down into smaller ones

• Escrow deltas around fields which are partitioned or contentious

• Use assertions instead of locks to permit interleaving of operations

Without driving Ops crazy

• Be highly available.

• Replicate storage across multiple machines

• Shift responsibilities between machines transparently

• Bring machines back into service transparently

• Tolerate partitioning.

• Fall back transparently to lower levels of service

• Reconcile database automatically when partitions rejoin

Without driving Ops crazy

• Simplify maintenance.

• Tolerate unreliable hardware

• Make software upgrades easy to manage

• Be flexible with regard to physical topology

• Make system status, performance, and capacity easy to measure and comprehend

• Degrade gracefully under load

• To the greatest degree possible, make the system maintain itself

Exceed expectations• Enable previously unthinkable features.

• Don’t include histories in your schemas; the database keeps histories

• Design apps with real-time, multi-user communications; database sessions are “chatty”

• Feel free to compute Erdős Numbers or routes to Kevin Bacon

• Test for the existence of interesting data states in constant time, not log time

• Execute queries in time proportionate to the size of the answer, not the size of the database

Exceed expectations

• Decrease development cycle time.

• Build working apps on your desktop; the database can be simulated

• Evolve your schema at will; the database doesn’t make a distinction between data and metadata

• Use any language you like; the database looks the same from all clients

Part 2: Stig Concepts• Representing Graphs

• Deconstructing Commits

• Making Time Flow

• Finding Meaning

• Querying

Representing Graphs...Without Stig

• Graphs in Tables.

• Walks spread outward in waves

• Self-joins proliferate

• Graphs Key-Value Stores.

• Generally node-centric

• Edges are denormalized conjugate sets

• Non-transactional multi-set is deadly

• Graphs in XML Stores.

• Floating chunk syndrome

• Worst of both worlds

• Graphs in Doc & Graph Stores.

• Typeless, interned at nodes

Representing Graphs...With Stig

Locations, Nodes & Edges

/user/alice@foo.bar /user/bob@baz.gak

person

mafiaplayer

personpets

player

mafiaplayer

ownspetsplayer

Deconstructing Commits...Without Stig

• Two States.

• Uncommitted: only me

• Committed: everybody else

• One sandbox per connection

• Variable Isolation.

• High isolation limits concurrency

• Low isolation hard to cope with

• Two Guarantees.

• Written to disk

• Ephemeral

• Some NoSQL Options.

• No transactional integrity

• Post-hoc reconciliation

Deconstructing Commits...With Stig

• Private.

• Only me, but I get as many as I want; maybe ephemeral

• Shared.

• Restricted scope, rapid communication; maybe ephemeral

• Global.

• A singleton, same as commit

• Guarantees

• Self-consistent

• Replicated in data center

• Written to disks

• Replicated to other data centers

Deconstructing Commits...With StigPoints of View in Diplomacy

(Global)

Diplomacy Game(Shared)

Alice/Bob Alliance(Shared)

Alice(Private)

Carol(Private)

Bob(Private)

Making Time Flow...Without Stig

• Time Flows Naturally.

• System clock is OK

• Execution Time ≈ Query Time.

• A query made after an update will see the results of the update because time flow is linear

• The order of events is definite

• Locks Enforce Consistency.

• Updates block each other

• MVCC in Lieu of Locks.

• Reads are writes

• Collisions are rollbacks

Making Time Flow...With Stig

• Time is Uncertain.

• Distributed machines cannot rely on their system clocks

• Declared Dependencies.

• Each query declares its predecessors, so causality is a graph

• The order of events is unknowable, but any topological sort of the graph is OK

• Assertions Enforce Consistency.

• MVCC with Paxos facilitates time travel

• Query: seek a time in the past at which assertions are true

• Update: seek a time in the future at which assertions are still true

Confirm Order

Making Time Flow...With Stig

Checkout Time

Enter Credit Card

DisplayShopping

CartRequest Gift Wrap

Update Qty. of Item

Specify Shipping

Finding Meaning...Without Stig

• Tables & Views.

• Tables store the base data

• Views collect data from tables and other views

• Views often present performance bottlenecks

• Analysis Belongs to Data Definition.

• Adding or changing a view or index is a schema change

• Programmers must work with DBAs, limiting individual initiative

• Changes have the potential to degrade the data service as a whole

Finding Meaning...With Stig

• Asserted & Inferred Edges.

• Asserted edges store the base data

• Inferred edges collect data from asserted and inferred edges

• Inference is distributed, on-going, and subject to time-travel

• Analysis Belongs to Program Definition.

• Inference rules aren’t “special”

• Programmers can invent as they like

• Scope of risk is limited

AliceAlice

Finding Meaning...With Stig

Bob

has friendship

x

has friendship

Inferring Friends & Stalkers

Bob

is friend of

<a, ‘is friend of’ b>if <a, ‘has friendship’, x>and <b, ‘has friendship’, x>and a is not b;

<a, ‘is stalking’ b>if <a, ‘is friend of’, b>and a.age >= 18and b.age < 18;

Querying...Without Stig

• SQL

• Easy-to-use, commonly known, and mostly harmless

• Suffers from poor composability and is useless as a general-purpose programming language

• Map-Reduce, Erlang, etc.

• Not so easy-to-use, not so commonly known, and capable of shooting you in the foot

• Often requires knowledge of underlying distributed architecture and are still not front-runners as general-purpose programming languages

Querying...With Stig

• Robust and General-Purpose Language.

• Purely functional, lazily evaluated, and strictly, robustly typed

• Pattern-oriented notation for describing walks across graph

• Composability Rules.

• Comprehensions of sequences form the foundation

• Transformations of sequences (map, reduce, filter, zip, etc.) are the building blocks

• Distributed Evaluation Rocks.

• Queries are broken down and sent to the servers where they need to be

• Evaluation occurs in parallel

Querying...With Stig

• Compiled & Stored.

• Queries compile down to machine code and get stored in the graph itself

• Stored programs are subject to on-going analysis

• Programs can call each other

• Library-Driven.

• Language fundamentals support construction of libraries

• We can emulate other languages, such as LINQ and Python

• Clients.

• Currently Java, Perl, PHP, Python, and C/C++

• We can also serve HTTP directly

Querying...With Stig

o /* function definition */mutual_friends x y = solve f: [ <x, ‘is friend of’, f>; <y, ‘is friend of’, f> ];

o /* function application */mutual_friends person@/users/alice person@/users/bob;

o /* results */[ { f = person@/users/carol }, { f = person@/users/dave } ];

Mutual Friends

Wrapup

Is your project...?• Graph-shaped?

• Representing graphs as graphs (instead of as tables or key pairs) simplifies your life

• Stig graphs are fat, meaning they're really any number of simultaneous, intersecting graphs, so go nuts

• Transactional?

• Reliably atomic state transitions also simplify your life

• Asynchronous transaction management makes it more tolerable

• Real-time?

• Control the influence of updates with shared points-of-view

• Never be blocked waiting for the database to respond

Is your project...?• Really huge?

• The store scales very close to linearly, so more data just means more machines

• The size of the cluster doesn't generally doesn't affect the performance of individual operations

• Deeply analytic?

• Use inferences to describe relations and conditions you're interested in

• Build up arbitrarily complex libraries of inference to extract meaning from data

Open-sourcing this year!• About our Code.

• Written in C++0x and Haskell, with Python for tools

• Entirely unit-test driven and designed for easy adoption

• Why Open Source?

• We want to give back

• We benefit first and most

• Competitive advantage would be temporary anyway

• Knowing it’s open keeps us on our toes

• There’s more to do than we can do ourselves

• We attract the kind of people we want to work with

Our doors are open• About Tagged

• #3 in social networking and growing (100+ Million members)

• Located in downtown SF, 10 Ten Places to work by San Francisco Business Journal

• Profitable since 2008. We answer only to ourselves and our users

Our doors are open

• About the Stig Team.

• Five full-time engineers with backgrounds in compilers, databases, distributed systems, and AI

• Interns year-round with opportunities to publish

• And yes, we're hiring!

Got ideas?• Contact us!

• Sign up for Stig news at: www.stigdb.org

• Follow the Tagged Dev Blog at: blog.tagged.com

• Jason LucasArchitect of Scalable Infrastructurejlucas@tagged.com

Part 3: Lunch Workshop• But wait, there’s more!

• Join us as we get our hands messy with food and take a deep dive into the Stig query language and the Stig API!

• Lunch 'N Learn 01:15 PM - 02:15 PM

top related