couchconf-bangalore-intro-to-document-databases

Post on 14-Jun-2015

383 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to DocumentDatabases

Dustin Sallings @dlsspy

2

HOW DO WE THINK ABOUT DATA?a brief history

1850 1945 1957 1958 1965 1966

Atlantic CableCyrus W. Field

IDSCharles Bachman (GE)

"As We May Think"Vannevar Bush

Sputnik(USSR)

IMS (IBM)Vern Watts

ARPA(USA)

oNLine System (NLS)Doug Engelbart

19681962

MUMPS,Pick (TRW)

1969

IMP(UCLA-Stanford)

1970

“A Relational Model ofData for

Large SharedData Banks”

E.F. Codd (IBM)

1972

ARPANET

1973

IngresMichael Stonebraker

(Berkeley)seminal events in internet historyhypertext/hypermedia/webseminal events in internet historyseminal events in internet history seminal events in internet historyhypertext/hypermedia/webbeginnings of the internet

• 1850 - Atlantic cable -- taking data transmission up a notch• 1945 - As we may think - "He urges that men of science should then turn to the massive task of making more accessible our bewildering store of knowledge."• 1958 - ARPA - "prevent technological surprise like the launch of Sputnik" - "to prevent technological surprise to the US, but also to create technological surprise for its enemies"• 1969 - IMP - interface message processor (packet network)

1850 1945 1957 1958 1965 1966

Atlantic CableCyrus W. Field

IDSCharles Bachman (GE)

"As We May Think"Vannevar Bush

Sputnik(USSR)

IMS (IBM)Vern Watts

ARPA(USA)

oNLine System (NLS)Doug Engelbart

19681962

MUMPS,Pick (TRW)

1969

IMP(UCLA-Stanford)

1970

“A Relational Model ofData for

Large SharedData Banks”

E.F. Codd (IBM)

1972

ARPANET

1973

IngresMichael Stonebraker

(Berkeley)

seminal events in internet historyhierarchical/network databases relational databasesrelational databases

• 1965 - MUMPS - Massachusetts General Hospital Utility Multi-Programming System - It was largely adopted during the 1970s and early 1980s in healthcare and financial information systems/databases, and continues to be used by many of the same clients today. It is currently used in electronic health record systems as well as by multiple banking networks and online trading/investment services.

Pre-1960

1976 1984 1985

GemStone/S(GemStone)

System R(IBM)

1974

Oracle(Larry Ellison)

1990

line-mode browser(Nicola Pellow)

WWW(Tim Berners-Lee)

Versant(Versant)

MUMPSANSI,DBM

1977 1989

Lotus Notes(Lotus)

1982

GT.M,BerkeleyDB

1991 1994

MySQL(Michael Wideniusand David Axmark)

Cello(Tom Bruce)

1997

CacheIntersystems

(MUMPS)

Metakit

1983

DNS(Paul Mockapetris)

TCP/IP(Vint Cerf

andBob Kahn)

Mosaic(Marc Andreeson)

ViolaWWW(Pei Wei)

Hypercard(Bill Atkinson)

NeXT

manyother

ODBMSs seminal events in internet historyhypertext/hypermedia/webhypertext/hypermedia/webbeginnings of the internet

Pre-1960

1976 1984 1985

GemStone/S(GemStone)

System R(IBM)

1974

Oracle(Larry Ellison)

1990

line-mode browser(Nicola Pellow)

WWW(Tim Berners-Lee)

Versant(Versant)

MUMPSANSI,DBM

1977 1989

Lotus Notes(Lotus)

1982

GT.M,BerkeleyDB

1991 1994

MySQL(Michael Wideniusand David Axmark)

Cello(Tom Bruce)

1997

CacheIntersystems

(MUMPS)

Metakit

1983

DNS(Paul Mockapetris)

TCP/IP(Vint Cerf

andBob Kahn)

Mosaic(Marc Andreeson)

ViolaWWW(Pei Wei)

Hypercard(Bill Atkinson)

NeXT

manyother

ODBMSs

seminal events in relational databasesseminal MUMPS open sourceobject databasesobject databases

2007 2008

Android(Andy Rubin)

iOS and iPhoneSteve Jobs

2000

Neo4j

db4o

QDBM

2003

memcached

2005

CouchDB

2004

BigTable

2006

JackRabbit,Tokyo Cabinet

AmazonDynamo(paper)

MongoDB

Project Voldemort,Cassandra

1998

Open Source SummitTim O'Reilly

"NoSQL"Carlo Rozzi

2009 2010 2011

membaseCouchbase Server

Couchbase Mobile

Terrastore,Riak

Dynomite,Hbase,

VertexDB

"NoSQL"

iPad Kindle Fire

Samsung GalaxyCAP TheoremEric Brewer

2002

CAP TheoremFormally Proven

Seth Gilbert,Nancy Lynch

(MIT)

seminal events in distributed computing mobile devicesmobile devices

2007 2008

Android(Andy Rubin)

iOS and iPhoneSteve Jobs

2000

Neo4j

db4o

QDBM

2003

memcached

2005

CouchDB

2004

BigTable

2006

JackRabbit,Tokyo Cabinet

AmazonDynamo(paper)

MongoDB

Project Voldemort,Cassandra

1998

Open Source SummitTim O'Reilly

"NoSQL"Carlo Rozzi

2009 2010 2011

membaseCouchbase Server

Couchbase Mobile

Terrastore,Riak

Dynomite,Hbase,

VertexDB

"NoSQL"

iPad Kindle Fire

Samsung GalaxyCAP TheoremEric Brewer

2002

CAP TheoremFormally Proven

Seth Gilbert,Nancy Lynch

(MIT)

seminal events in internet historyNoSQL (“Not Only SQL”)seminal events in internet historyIssues of Scale

9

2011The Web

Mobile

NoSQL

off-line applications

distributed systems

dynamic user population in the millions

availability or consistency

innovative applications with changing requirements

server synchronization and sharing

Logic Scales!

This application runs all the way to the edge. A billion concurrent users on this application will have the same experience as a single user.

What About Data?

The Relational Database Solution

RDBMS Scales ... at what cost?

15

ACID ACID Atomicity

Consistency

Isolationdatabases go from one consistent state to another

transactions never interfere with each other

database modifications are all or nothing

Durabilityonce a transaction is committed, it stays

16

TRANSACTIONSDAN PRITCHETT, EBAY

PayPal Uses TransactionseBay Doesn’t (for non-critical data)

two-phase commit not pragmaticresponsiveness and site availability would suffer

Other High-Volume independently determine the same strategy

new role for database as a ‘data store’new problem: transactions per watt

17

CAP THEOREM

Consistency

Availability

Partition Toleranceevery operation returns a result

the network allows lost and undeliverable messages

reads and writes happen correctly

PICK TWO!

18

BASE

Basically AvailableSoft-StateEventually Consistent

B ASE

The NoSQL Solution

The NoSQL Solution

21

DATABASE

Features-FirstOracle, SQL Server, DB2, MySQL, PostgreSQL, Amazon RDS

Scale-FirstCouchbase Server, CouchDB, Project Voldemort, Riak, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, Cassandra, HBase and Hypertable

Simple Structured StorageAmazon SimpleDB, Berkeley DB

Purpose-Optimized StoresStreamBase, Vertica, Aster Data, Netezza, Greenplum, VoltDB

22

NOSQL TAXONOMYSTEVEN YEN, COUCHBASE

key-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store

23

WHO WILL WIN?

24

 THE MOST APPROACHABLE API WITH ENOUGH POWER WILL WIN

25

NOSQL TAXONOMYkey-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store

26

WHY DOCUMENT DATABASES?

27

DOCUMENT DATABASE APIS ARE ‘APPROACHABLE’

HTTPGET, POST, PUT, DELETE

memcachedGET, SET, DELETE, ADD, REPLACE, ...

Everyone Understands

APIs in every language to work with your data.

Documents are Flexible

Focus point: Applications tend to only care about the parts they find interesting while preserving the rest.

30

DOCUMENT DATABASES HAVE “ENOUGH POWER”fast reads from cachefast writes to persistent single-document atomic writesdocument conflict resolutionreplication of dataclustering and fault-tolerancefail-over and rebalancing on the flyrolling upgrades and deploymenthigh availabilitypartition tolerancerapid-development tools

Document Reads Are Fast

document caching

Document Reads Are Fast

document persistence

Document Writes Are Fast

document write queue

write request

write to cache

asynchronous document write

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

new reductionsA-R15

I-R8

M-R5

new document

Document Writes Are Fast

Document Writes Are Safe

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

A-R15

I-R8

M-R5

new revisionsnew root

M

Document Databases Scale Out

❦❦

A B

37

Document Databases Replicate

Document Database Queries Are Fast

38A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

M-R5

A-H7

I-R7

A-R14

M

startkey endkey

Document Databases Are Developer Friendly

40

“THE ROADS AND CROSSROADS OF INTERNET HISTORY”HTTP://WWW.NETVALLEY.COM/INTVAL1.HTML

“A BRIEF HISTORY OF NOSQL”HTTP://BLOG.KNUTHAUGEN.NO/2010/03/A-BRIEF-HISTORY-OF-NOSQL.HTML

“HISTORY OF THE ATLANTIC CABLE AND UNDERSEA COMMUNICATIONS”HTTP://ATLANTIC-CABLE.COM/FIELD/INDEX.HTM

“A LITTLE HISTORY OF THE WORLD WIDE WEB”HTTP://WWW.W3.ORG/HISTORY.HTML

“DAN PRITCHETT ON ARCHITECTURE AT EBAY”HTTP://WWW.INFOQ.COM/INTERVIEWS/DAN-PRITCHETT-EBAY-ARCHITECTURE

“NOSQL IS A HORSELESS CARRIAGE” BY STEVEN YENHTTP://DL.DROPBOX.COM/U/2075876/NOSQL-STEVE-YEN.PDF

top related