riak: a friendly key/value store for the web

Post on 12-Nov-2014

13.698 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk at DevNation Portland, July 10th, 2010.Keynote available for download at:http://dl.dropbox.com/u/458036/presentations/DevNation%20Portland.zipSources and additional information in the presenter notes.

TRANSCRIPT

A friendly key/value store for the web.

riakriak

A primer by Bruce Williams

DEVNATION

PORTLAND2010

My name isBruce Williams.

and I’m addic

ted

to the b

leeding

edge.

DEVN

ATIO

N

PORTLA

ND

2001 - Present Day

wayyy before it was

a viable job choice.

DEVN

ATIO

N

PORTLA

ND

But I use other languages, too.

especia

lly from

other paradigm

s.

DEVN

ATIO

N

PORTLA

ND

Photo by oddst

eph - ht

tp://flic

.kr/p

/6vWPB

U

Choose the Right Weapon

Let’s a

ssume

Java is one

of

the baseb

all

bats.

DEVN

ATIO

N

PORTLA

ND

Based in the D.C. area.

(but I’m not.)

DEVN

ATIO

N

PORTLA

ND

You may find the followingconspicuously missing in

this talk:

Sorry!

I will not be presenting a paper on

Dynamo, the CAP theorem, vector

clocks, merkle trees, etc.

These are explained

elsewhere by my

algorithmic betters.

DEVN

ATIO

N

PORTLA

ND

I will not be dwelling on performance or

redundancy.

Expect some vague

statements like “very

fast” and “very robust.”

DEVN

ATIO

N

PORTLA

ND

I will not try to convince you that

“NoSQL is the messiah.

It’s an alternative that

makes sense in some

situations.

DEVN

ATIO

N

PORTLA

ND

I will not be conducting a large-scale

comparison of competing technologies.

but I’d love to hear

about what you use, and why

DEVN

ATIO

N

PORTLA

ND

What is Riak?

NoSQL

and of the Dynamo

persuasion.

DEVN

ATIO

N

PORTLA

ND

Open Source& a commercial

“EnterpriseDS”

version with some

proprietary pieces

DEVN

ATIO

N

PORTLA

ND

Key/Value Store

With some metadata.

DEVN

ATIO

N

PORTLA

ND

Schema-less

Great for sparse data,

but requires more discipline.

DEVN

ATIO

N

PORTLA

ND

Datatype AgnosticContent-Type is King.

DEVN

ATIO

N

PORTLA

ND

Language AgnosticREST & PBC

Erlang, Javascript, Java, PHP, Python, Ruby, ...

DEVN

ATIO

N

PORTLA

ND

DistributedIt’s [mostly] Erlang, what

did you expect?

DEVN

ATIO

N

PORTLA

ND

Masterless

All nodes are equal

DEVN

ATIO

N

PORTLA

ND

Scalableor “easy to scale.”

DEVN

ATIO

N

PORTLA

ND

Eventually Consistent

and CAP tunable.

DEVN

ATIO

N

PORTLA

ND

Uses Map/Reduceand “Link.”

DEVN

ATIO

N

PORTLA

ND

GettingUp & Running

http://riak.basho.com

DEVN

ATIO

N

PORTLA

ND

DEVN

ATIO

N

PORTLA

ND

hg & git

$ ./riak1/bin/riak start$ ./riak2/bin/riak start$ ./riak3/bin/riak start

A Quick Local Cluster

$ ./riak2/bin/riak-admin join riak1@127.0.0.1$ ./riak3/bin/riak-admin join riak1@127.0.0.1

Start three “nodes”

Join them into a cluster

DEVN

ATIO

N

PORTLA

ND

Your Data

Object

Content TypeBody+ Links

The thing you’re storing.

DEVN

ATIO

N

PORTLA

ND

Key

pic1

The identifier for the object.

can be user-

defined or

automatically

generated

DEVN

ATIO

N

PORTLA

ND

Bucket

pic1

pic2 pic3

images

The type or category of object.

“pic1” is unique

within “images”DEVN

ATIO

N

PORTLA

ND

Addressability

pic1

images

Refer to objects by bucket and key.

<images/pic1>

DEVN

ATIO

N

PORTLA

ND

Example

require 'riak'

client = Riak::Client.newclient.bucket('images').new('pic1').tap do |pic1| pic1.content_type = 'image/jpeg' pic1.data = File.read('/path/to/jpg') pic1.storeend

$ gem install riak-client

DEVN

ATIO

N

PORTLA

ND

Example

client.bucket('people').new('bruce').tap do |bruce| bruce.data = { name: 'Bruce Williams', email: 'bruce@codefluency.com' } bruce.storeend

puts client['people']['bruce'].data['name']

“application/json” is the

default for riak-client

DEVN

ATIO

N

PORTLA

ND

Links

pic1

images

stored here

bruce

people

Connect objects

can also be “tagged”

DEVN

ATIO

N

PORTLA

ND

Example

client['people']['bruce'].tap do |bruce| bruce.links << client['images']['pic1'].to_link('avatar') bruce.storeend

client['people']['bruce'].walk(:tag => 'avatar')

DEVN

ATIO

N

PORTLA

ND

Hooks

pre-commitreject or transform an object to be committed

post-commitnotify external services, build your own indexe

Where does it go?

The Ring

A 160-bit integer space

DEVN

ATIO

N

PORTLA

ND

The Ring

broken into equal sized partitions.

DEVN

ATIO

N

PORTLA

ND

The Ring

It looks kinda like this

(it’s just more functional)

Photo

by m

arch

doe - h

ttp:/

/flickr.co

m/photo

s/m

arch

doe/45

7741149

DEVN

ATIO

N

PORTLA

ND

The Ring

Each partition is managedby a vnode (virtual node),

DEVN

ATIO

N

PORTLA

ND

The Ring

Each vnode runs ona [physical] node.

DEVN

ATIO

N

PORTLA

ND

The Ring

Each node owns an equal share of vnodes (& partitions)

1 2

3 4

DEVN

ATIO

N

PORTLA

ND

Replication

n_val = 3

Objects are written to multiple partitions.

3 is the default

DEVN

ATIO

N

PORTLA

ND

Availability

Uses Hinted Handoff to deal with node failures.

When node “2” fails,

the others pick up

the slack.

1 2

3 4

DEVN

ATIO

N

PORTLA

ND

Persistence

Supports pluggable backends

fsetsdets

gb_trees innostore

multi

DEVN

ATIO

N

PORTLA

ND

bitcask +

CAP Tuning

GETrhow many replicas need to agree (default: 2)

DEVN

ATIO

N

PORTLA

ND

PUTrhow many replicas need to agree when retrieving an existing object before the write (default: 2)

whow many replicas to write to before returning a successful response (default: 2).

dwhow many replicas to commit to durable storage before returning a successful response (default: 0)

DEVN

ATIO

N

PORTLA

ND

(Map|Link)*Reduce

Map

Map functions take one piece of data as input, and produce zero or more

results as output.

obj [result, ...]

your function

DEVN

ATIO

N

PORTLA

ND

Data-locality is important in Riak. Map phases are run where the data is

stored.

You can have multiple map phases.

The input to a map definition is a series of [bucket, key] names.

unlike CouchDB

Link

A special kind of map phase; links matching a pattern are “walked” to

find objects to be output.

obj [linked_obj, ...]

link walk, using a pattern

DEVN

ATIO

N

PORTLA

ND

Reduce

[obj, ...] [result]

your function

Reduce functions combine the output of many "map" step evaluations, into

one result

DEVN

ATIO

N

PORTLA

ND

The reduce phase occurs on the “coordinating node.”

Reduces may be run multiple times as more input comes in (eg, re-

reduce)

Example

bruce = client['people']['bruce']melissa = client['people']['melissa']

addy = client['addresses'].new('123fake')addy.data = { street: '123 Fake St', city: 'Portland', state: 'OR', zip: '97214'}addy.links << bruce.to_link('resident')addy.links << melissa.to_link('resident')addy.store

lets assume these have ages

DEVN

ATIO

N

PORTLA

ND

Example

Riak::MapReduce.new(client).add(addy). link(tag: 'resident'). map("function (v) { return [Riak.mapValuesJson(v)[0]['age'] || 0] }"). reduce(function: 'Riak.reduceSum', keep: true). run

We should get an array with one value

DEVN

ATIO

N

PORTLA

ND

Hurdles

No range queries.

Things like time series data require

creative approaches.

Sorry, Cassandra fans

like bucket and key naming, etc

DEVN

ATIO

N

PORTLA

ND

Don’t list keys.

Processing an entire bucket is more expensive

than you might think.

ever, if you can avoid it.

because it lists keys

DEVN

ATIO

N

PORTLA

ND

Watch your encoding.

MapReduce Javascript phases need your data to be in valid Unicode.

you’ll get a “bad encoding” error

DEVN

ATIO

N

PORTLA

ND

Questions?Easy

Thanks!

@wbruce

DEVN

ATIO

N

PORTLA

ND

top related