riak: a friendly key/value store for the web
DESCRIPTION
Talk at DevNation Portland, July 10th, 2010.Keynote available for download at:http://dl.dropbox.com/u/458036/presentations/DevNation%20Portland.zipSources and additional information in the presenter notes.TRANSCRIPT
A friendly key/value store for the web.
riakriak
A primer by Bruce Williams
DEVNATION
PORTLAND2010
My name isBruce Williams.
and I’m addic
ted
to the b
leeding
edge.
DEVN
ATIO
N
PORTLA
ND
2001 - Present Day
wayyy before it was
a viable job choice.
DEVN
ATIO
N
PORTLA
ND
But I use other languages, too.
especia
lly from
other paradigm
s.
DEVN
ATIO
N
PORTLA
ND
Photo by oddst
eph - ht
tp://flic
.kr/p
/6vWPB
U
Choose the Right Weapon
Let’s a
ssume
Java is one
of
the baseb
all
bats.
DEVN
ATIO
N
PORTLA
ND
Based in the D.C. area.
(but I’m not.)
DEVN
ATIO
N
PORTLA
ND
You may find the followingconspicuously missing in
this talk:
Sorry!
I will not be presenting a paper on
Dynamo, the CAP theorem, vector
clocks, merkle trees, etc.
These are explained
elsewhere by my
algorithmic betters.
DEVN
ATIO
N
PORTLA
ND
I will not be dwelling on performance or
redundancy.
Expect some vague
statements like “very
fast” and “very robust.”
DEVN
ATIO
N
PORTLA
ND
I will not try to convince you that
“NoSQL is the messiah.
It’s an alternative that
makes sense in some
situations.
”
DEVN
ATIO
N
PORTLA
ND
I will not be conducting a large-scale
comparison of competing technologies.
but I’d love to hear
about what you use, and why
DEVN
ATIO
N
PORTLA
ND
What is Riak?
NoSQL
and of the Dynamo
persuasion.
DEVN
ATIO
N
PORTLA
ND
Open Source& a commercial
“EnterpriseDS”
version with some
proprietary pieces
DEVN
ATIO
N
PORTLA
ND
Key/Value Store
With some metadata.
DEVN
ATIO
N
PORTLA
ND
Schema-less
Great for sparse data,
but requires more discipline.
DEVN
ATIO
N
PORTLA
ND
Datatype AgnosticContent-Type is King.
DEVN
ATIO
N
PORTLA
ND
Language AgnosticREST & PBC
Erlang, Javascript, Java, PHP, Python, Ruby, ...
DEVN
ATIO
N
PORTLA
ND
DistributedIt’s [mostly] Erlang, what
did you expect?
DEVN
ATIO
N
PORTLA
ND
Masterless
All nodes are equal
DEVN
ATIO
N
PORTLA
ND
Scalableor “easy to scale.”
DEVN
ATIO
N
PORTLA
ND
Eventually Consistent
and CAP tunable.
DEVN
ATIO
N
PORTLA
ND
Uses Map/Reduceand “Link.”
DEVN
ATIO
N
PORTLA
ND
GettingUp & Running
DEVN
ATIO
N
PORTLA
ND
hg & git
$ ./riak1/bin/riak start$ ./riak2/bin/riak start$ ./riak3/bin/riak start
A Quick Local Cluster
$ ./riak2/bin/riak-admin join [email protected]$ ./riak3/bin/riak-admin join [email protected]
Start three “nodes”
Join them into a cluster
DEVN
ATIO
N
PORTLA
ND
Your Data
Object
Content TypeBody+ Links
The thing you’re storing.
DEVN
ATIO
N
PORTLA
ND
Key
pic1
The identifier for the object.
can be user-
defined or
automatically
generated
DEVN
ATIO
N
PORTLA
ND
Bucket
pic1
pic2 pic3
images
The type or category of object.
“pic1” is unique
within “images”DEVN
ATIO
N
PORTLA
ND
Addressability
pic1
images
Refer to objects by bucket and key.
<images/pic1>
DEVN
ATIO
N
PORTLA
ND
Example
require 'riak'
client = Riak::Client.newclient.bucket('images').new('pic1').tap do |pic1| pic1.content_type = 'image/jpeg' pic1.data = File.read('/path/to/jpg') pic1.storeend
$ gem install riak-client
DEVN
ATIO
N
PORTLA
ND
Example
client.bucket('people').new('bruce').tap do |bruce| bruce.data = { name: 'Bruce Williams', email: '[email protected]' } bruce.storeend
puts client['people']['bruce'].data['name']
“application/json” is the
default for riak-client
DEVN
ATIO
N
PORTLA
ND
Links
pic1
images
stored here
bruce
people
Connect objects
can also be “tagged”
DEVN
ATIO
N
PORTLA
ND
Example
client['people']['bruce'].tap do |bruce| bruce.links << client['images']['pic1'].to_link('avatar') bruce.storeend
client['people']['bruce'].walk(:tag => 'avatar')
DEVN
ATIO
N
PORTLA
ND
Hooks
pre-commitreject or transform an object to be committed
post-commitnotify external services, build your own indexe
Where does it go?
The Ring
A 160-bit integer space
DEVN
ATIO
N
PORTLA
ND
The Ring
broken into equal sized partitions.
DEVN
ATIO
N
PORTLA
ND
The Ring
It looks kinda like this
(it’s just more functional)
Photo
by m
arch
doe - h
ttp:/
/flickr.co
m/photo
s/m
arch
doe/45
7741149
DEVN
ATIO
N
PORTLA
ND
The Ring
Each partition is managedby a vnode (virtual node),
DEVN
ATIO
N
PORTLA
ND
The Ring
Each vnode runs ona [physical] node.
DEVN
ATIO
N
PORTLA
ND
The Ring
Each node owns an equal share of vnodes (& partitions)
1 2
3 4
DEVN
ATIO
N
PORTLA
ND
Replication
n_val = 3
Objects are written to multiple partitions.
3 is the default
DEVN
ATIO
N
PORTLA
ND
Availability
Uses Hinted Handoff to deal with node failures.
When node “2” fails,
the others pick up
the slack.
1 2
3 4
DEVN
ATIO
N
PORTLA
ND
Persistence
Supports pluggable backends
fsetsdets
gb_trees innostore
multi
DEVN
ATIO
N
PORTLA
ND
bitcask +
CAP Tuning
GETrhow many replicas need to agree (default: 2)
DEVN
ATIO
N
PORTLA
ND
PUTrhow many replicas need to agree when retrieving an existing object before the write (default: 2)
whow many replicas to write to before returning a successful response (default: 2).
dwhow many replicas to commit to durable storage before returning a successful response (default: 0)
DEVN
ATIO
N
PORTLA
ND
(Map|Link)*Reduce
Map
Map functions take one piece of data as input, and produce zero or more
results as output.
obj [result, ...]
your function
DEVN
ATIO
N
PORTLA
ND
Data-locality is important in Riak. Map phases are run where the data is
stored.
You can have multiple map phases.
The input to a map definition is a series of [bucket, key] names.
unlike CouchDB
Link
A special kind of map phase; links matching a pattern are “walked” to
find objects to be output.
obj [linked_obj, ...]
link walk, using a pattern
DEVN
ATIO
N
PORTLA
ND
Reduce
[obj, ...] [result]
your function
Reduce functions combine the output of many "map" step evaluations, into
one result
DEVN
ATIO
N
PORTLA
ND
The reduce phase occurs on the “coordinating node.”
Reduces may be run multiple times as more input comes in (eg, re-
reduce)
Example
bruce = client['people']['bruce']melissa = client['people']['melissa']
addy = client['addresses'].new('123fake')addy.data = { street: '123 Fake St', city: 'Portland', state: 'OR', zip: '97214'}addy.links << bruce.to_link('resident')addy.links << melissa.to_link('resident')addy.store
lets assume these have ages
DEVN
ATIO
N
PORTLA
ND
Example
Riak::MapReduce.new(client).add(addy). link(tag: 'resident'). map("function (v) { return [Riak.mapValuesJson(v)[0]['age'] || 0] }"). reduce(function: 'Riak.reduceSum', keep: true). run
We should get an array with one value
DEVN
ATIO
N
PORTLA
ND
Hurdles
No range queries.
Things like time series data require
creative approaches.
Sorry, Cassandra fans
like bucket and key naming, etc
DEVN
ATIO
N
PORTLA
ND
Don’t list keys.
Processing an entire bucket is more expensive
than you might think.
ever, if you can avoid it.
because it lists keys
DEVN
ATIO
N
PORTLA
ND
Watch your encoding.
MapReduce Javascript phases need your data to be in valid Unicode.
you’ll get a “bad encoding” error
DEVN
ATIO
N
PORTLA
ND
Questions?Easy
Thanks!
@wbruce
DEVN
ATIO
N
PORTLA
ND