john ousterhout stanford university ramcloud overview and update sedcl retreat june, 2014

23
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014

Upload: landon-rench

Post on 16-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

John Ousterhout

Stanford University

RAMCloud Overviewand Update

SEDCL RetreatJune, 2014

General-purpose storage system for large-scale applications:

● All data is stored in DRAM at all times

● As durable and available as disk

● Simple key-value data model

● Large scale: 1000+ servers, 100+ TB

● Low latency: 5-10 µs remote access time

Project goal: enable a new class of data-intensive applications

June 6, 2013 RAMCloud Update and Overview Slide 2

What is RAMCloud?

June 6, 2013 RAMCloud Update and Overview Slide 3

RAMCloud Architecture

Master

Backup

Master

Backup

Master

Backup

Master

Backup…

Appl.

Library

Appl.

Library

Appl.

Library

Appl.

Library…

DatacenterNetwork Coordinator

1000 – 10,000 Storage Servers

1000 – 100,000 Application Servers

CommodityServers

64-256 GBper server

High-speed networking:● 5 µs round-trip● Full bisection bwidth

Data Model: Key-Value Store

● Basic operations: read(tableId, key)

=> blob, version write(tableId, key, blob)

=> version delete(tableId, key)

● Other operations: cwrite(tableId, key, blob, version)

=> version

Enumerate objects in table Efficient multi-read, multi-write Atomic increment

● Not in RAMCloud 1.0: Atomic updates of multiple objects Secondary indexes

May 13, 2014 RAMCloud/VMware RADIO Slide 4

Tables

(Only overwrite ifversion matches)

Key (≤ 64KB)

Version (64b)

Blob (≤ 1MB)

Object

● Close to 1.0 release: Core system becoming stable Coordinator not yet fault-tolerant

● Original students working on dissertations

● New students staring to think about new projects

June 6, 2013 RAMCloud Update and Overview Slide 5

Status at June 2013 Retreat

● RAMCloud 1.0, January 2014: Key-value store Low-latency RPC system (4.9 µs reads, 15.3 µs durable writes) Log-structured storage management 1-2 second recovery from storage server crashes Coordinator crash recovery

● New projects (see below)

● Application experiments/interest: Graph processing: Jonathan Ellithorpe ONOS (operating system for software-defined networks)

Open Networking Laboratory Various projects/experiments at Huawei High-energy physics(CERN): Jakob Blomer visiting for summer Port to NEC Atom cluster: Satoshi Matsushita

June 6, 2013 RAMCloud Update and Overview Slide 6

Progress Since June 2013

● PhD dissertations: Ryan Stutsman: “Durability and Crash Recovery in Distributed

In-memory Storage Systems”Now at Microsoft Research

Steve Rumble: “Memory and Object Management in RAMCloud”Now at Google Zurich

Diego Ongaro: “Consensus: Bridging Theory and Practice”ETA summer 2014

● Papers published: “Log-Structured Memory for DRAM-Based Storage”

Best Paper Award, FAST “In Search of an Understandable Consensus Algorithm”

USENIX ATC

June 6, 2013 RAMCloud Update and Overview Slide 7

Progress, cont’d

● New papers submitted to OSDI: “SLIK: Scalable Low-Latency Indexes for a Key-Value Store”

(Ankita, Arjun, Ashish, Zhihao) “Experience with Rules-Based Programming for Distributed,

Concurrent, Fault-Tolerant Code”(Ryan, Collin)

June 6, 2013 RAMCloud Update and Overview Slide 8

Progress, cont’d

Ryan Stutsman Graduated (PhD)

Steve Rumble Graduated (PhD)

Diego Ongaro Graduating soon (PhD)

Ankita Kejriwal

Arjun Gopalan Graduating soon (MS)

Behnam Montazeri

Collin Lee New

Henry Qin New

Ashish Gupta New (but leaving with MS)

Seo Jin Park New

Zhihao Jia New (rotation only)

Stephen Yang Rejoining Fall 2014June 6, 2013 RAMCloud Update and Overview Slide 9

Changing of the Guard

June 6, 2013 RAMCloud Update and Overview Slide 10

New Projects

• First-generation RPC(based on Infiniband)

• Key-value store

• Log-structured storagemanagement

• Crash recovery

RAMCloud 1.0• Secondary indexes

• Linearizability

• Multi-object transactions

• Graph support?

Higher-Level Data Model

• Analyze RPC latency

• Driver(s) for 10 GigE

• Clean-slate RPC redesign

Networking Infrastructure

Phase I: 2009 – 2013 Phase II: 2014 – ?

● SLIK: Scalable, Low-latency Indexes for a Key-value Store

● Requires new object format:

June 6, 2013 RAMCloud Update and Overview Slide 11

Secondary Indexes

Key

Version

Value

Key 0

Key 1

Value

Version

...

Key N

Old:key-value store

New:multikey-value store

Primary key:same as before

June 6, 2013 RAMCloud Update and Overview Slide 12

RAMCloud OperationsIndex

Log

Object

Hash Table

Log

Primary Key

Value

Backups

Secondary Key Range

Primary Key Hash(es)

Hash Table

Log

Hash Table

Log

Primary Key Hash(es)

Object(s)

Hash Table

Index

LogBackups

Object

Backups

Secondary Key,Primary Key Hash

Clie

nt A

pplic

atio

nC

lient

App

licat

ion

Clie

nt A

pplic

atio

nC

lient

App

licat

ion

Non-Indexed

Log

Indexed

Read

Write

Master

Master

● Status: Preliminary limitations of most mechanism Initial performance measurements

● Students involved: Ankita Kejriwal (talk later today) Arjun Gopalan Ashish Gupta Zhihao Jia

June 6, 2013 RAMCloud Update and Overview Slide 13

SLIK, cont’d

● Holy Grail of consistency for large-scale apps

June 6, 2013 RAMCloud Update and Overview Slide 14

Linearizability

time

Client sendsrequest for operation

Client receivesresponse

System behaves as if operation executes exactly once, instantaneously, sometime between when client sends request and receives response

June 6, 2013 RAMCloud Update and Overview Slide 15

Linearizability

time

x = 10x = 5

read x

OK to returneither 5 or 10

time

x = 10 x = 5

read x

Must return 5

June 6, 2013 RAMCloud Update and Overview Slide 16

Linearizability Failure

Client

Master

if x.version == 22then x.value = 20

key: xversion: 22value: 10

June 6, 2013 RAMCloud Update and Overview Slide 17

Linearizability Failure

Client

Master

if x.version == 22then x.value = 20

key: xversion: 23value: 20

success

Backups

key: xversion: 23value: 20

key: xversion: 23value: 20

June 6, 2013 RAMCloud Update and Overview Slide 18

Linearizability Failure

Client

Master

if x.version == 22then x.value = 20

key: xversion: 23value: 20

success

Backups

key: xversion: 23value: 20

key: xversion: 23value: 20

Recovery Master

key: xversion: 23value: 20

CrashRecovery

if x.version == 22then x.value = 20

retry

error: version mismatch!

CRASH

Must remember old results, avoid re-executing requests

● Create general-purpose infrastructure(use log to track RPC results)

● Use it to implement linearizable RPCs: Conditional write Multi-object transactions

● Students involved: Seo Jin Park (talk later today) Collin Lee Ankita Kejriwal

June 6, 2013 RAMCloud Update and Overview Slide 19

Linearizability Project

● After 4 years, still little understanding of RAMCloud latency! What accounts for current latency? How much can it be improved? What are the fundamental limits? What is the right system structure to minimize latency?

● Henry Qin starting to answer these questions

June 6, 2013 RAMCloud Update and Overview Slide 20

Latency Analysis

● Infiniband reliable queue pairs: Highest performance; our main workhorse Reliable, in-order delivery implemented in hardware Doesn’t support Ethernet-style networks Driver is old, thrown-together, warty (“temporary solution”)

● Kernel TCP: Easy to use Too slow for real applications (50-150µs round-trips)

● FastTransport: Custom transport for RAMCloud Works with any underlying datagram protocol (e.g. kernel UDP) Provides reliable, in-order, flow-controlled delivery Not as fast as infrc, too complex, never fully debugged

June 6, 2013 RAMCloud Update and Overview Slide 21

RAMCloud Transports Today

● Goal: clean-slate replacement for FastTransport: Better latency and scalability Replace infrc as workhorse transport Separable from RAMCloud “RPC for future datacenters”

● First steps (Behnam Montazeri): Build SolarFlare datagram driver for FastTransport

● Kernel bypass for 10 GigE

Understand FastTransport weaknesses

June 6, 2013 RAMCloud Update and Overview Slide 22

Transport Redesign

● Several new projects in early stages

● Talks this retreat: mostly work in progress

● Should have many interesting results over the next year

June 6, 2013 RAMCloud Update and Overview Slide 23

Conclusion