couchconf-bangalore-intro-to-document-databases

40
Introduction to Document Databases Dustin Sallings @dlsspy

Upload: couchbase

Post on 14-Jun-2015

383 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: CouchConf-Bangalore-Intro-to-document-databases

Introduction to DocumentDatabases

Dustin Sallings @dlsspy

Page 2: CouchConf-Bangalore-Intro-to-document-databases

2

HOW DO WE THINK ABOUT DATA?a brief history

Page 3: CouchConf-Bangalore-Intro-to-document-databases

1850 1945 1957 1958 1965 1966

Atlantic CableCyrus W. Field

IDSCharles Bachman (GE)

"As We May Think"Vannevar Bush

Sputnik(USSR)

IMS (IBM)Vern Watts

ARPA(USA)

oNLine System (NLS)Doug Engelbart

19681962

MUMPS,Pick (TRW)

1969

IMP(UCLA-Stanford)

1970

“A Relational Model ofData for

Large SharedData Banks”

E.F. Codd (IBM)

1972

ARPANET

1973

IngresMichael Stonebraker

(Berkeley)seminal events in internet historyhypertext/hypermedia/webseminal events in internet historyseminal events in internet history seminal events in internet historyhypertext/hypermedia/webbeginnings of the internet

• 1850 - Atlantic cable -- taking data transmission up a notch• 1945 - As we may think - "He urges that men of science should then turn to the massive task of making more accessible our bewildering store of knowledge."• 1958 - ARPA - "prevent technological surprise like the launch of Sputnik" - "to prevent technological surprise to the US, but also to create technological surprise for its enemies"• 1969 - IMP - interface message processor (packet network)

Page 4: CouchConf-Bangalore-Intro-to-document-databases

1850 1945 1957 1958 1965 1966

Atlantic CableCyrus W. Field

IDSCharles Bachman (GE)

"As We May Think"Vannevar Bush

Sputnik(USSR)

IMS (IBM)Vern Watts

ARPA(USA)

oNLine System (NLS)Doug Engelbart

19681962

MUMPS,Pick (TRW)

1969

IMP(UCLA-Stanford)

1970

“A Relational Model ofData for

Large SharedData Banks”

E.F. Codd (IBM)

1972

ARPANET

1973

IngresMichael Stonebraker

(Berkeley)

seminal events in internet historyhierarchical/network databases relational databasesrelational databases

• 1965 - MUMPS - Massachusetts General Hospital Utility Multi-Programming System - It was largely adopted during the 1970s and early 1980s in healthcare and financial information systems/databases, and continues to be used by many of the same clients today. It is currently used in electronic health record systems as well as by multiple banking networks and online trading/investment services.

Page 5: CouchConf-Bangalore-Intro-to-document-databases

Pre-1960

1976 1984 1985

GemStone/S(GemStone)

System R(IBM)

1974

Oracle(Larry Ellison)

1990

line-mode browser(Nicola Pellow)

WWW(Tim Berners-Lee)

Versant(Versant)

MUMPSANSI,DBM

1977 1989

Lotus Notes(Lotus)

1982

GT.M,BerkeleyDB

1991 1994

MySQL(Michael Wideniusand David Axmark)

Cello(Tom Bruce)

1997

CacheIntersystems

(MUMPS)

Metakit

1983

DNS(Paul Mockapetris)

TCP/IP(Vint Cerf

andBob Kahn)

Mosaic(Marc Andreeson)

ViolaWWW(Pei Wei)

Hypercard(Bill Atkinson)

NeXT

manyother

ODBMSs seminal events in internet historyhypertext/hypermedia/webhypertext/hypermedia/webbeginnings of the internet

Page 6: CouchConf-Bangalore-Intro-to-document-databases

Pre-1960

1976 1984 1985

GemStone/S(GemStone)

System R(IBM)

1974

Oracle(Larry Ellison)

1990

line-mode browser(Nicola Pellow)

WWW(Tim Berners-Lee)

Versant(Versant)

MUMPSANSI,DBM

1977 1989

Lotus Notes(Lotus)

1982

GT.M,BerkeleyDB

1991 1994

MySQL(Michael Wideniusand David Axmark)

Cello(Tom Bruce)

1997

CacheIntersystems

(MUMPS)

Metakit

1983

DNS(Paul Mockapetris)

TCP/IP(Vint Cerf

andBob Kahn)

Mosaic(Marc Andreeson)

ViolaWWW(Pei Wei)

Hypercard(Bill Atkinson)

NeXT

manyother

ODBMSs

seminal events in relational databasesseminal MUMPS open sourceobject databasesobject databases

Page 7: CouchConf-Bangalore-Intro-to-document-databases

2007 2008

Android(Andy Rubin)

iOS and iPhoneSteve Jobs

2000

Neo4j

db4o

QDBM

2003

memcached

2005

CouchDB

2004

BigTable

2006

JackRabbit,Tokyo Cabinet

AmazonDynamo(paper)

MongoDB

Project Voldemort,Cassandra

1998

Open Source SummitTim O'Reilly

"NoSQL"Carlo Rozzi

2009 2010 2011

membaseCouchbase Server

Couchbase Mobile

Terrastore,Riak

Dynomite,Hbase,

VertexDB

"NoSQL"

iPad Kindle Fire

Samsung GalaxyCAP TheoremEric Brewer

2002

CAP TheoremFormally Proven

Seth Gilbert,Nancy Lynch

(MIT)

seminal events in distributed computing mobile devicesmobile devices

Page 8: CouchConf-Bangalore-Intro-to-document-databases

2007 2008

Android(Andy Rubin)

iOS and iPhoneSteve Jobs

2000

Neo4j

db4o

QDBM

2003

memcached

2005

CouchDB

2004

BigTable

2006

JackRabbit,Tokyo Cabinet

AmazonDynamo(paper)

MongoDB

Project Voldemort,Cassandra

1998

Open Source SummitTim O'Reilly

"NoSQL"Carlo Rozzi

2009 2010 2011

membaseCouchbase Server

Couchbase Mobile

Terrastore,Riak

Dynomite,Hbase,

VertexDB

"NoSQL"

iPad Kindle Fire

Samsung GalaxyCAP TheoremEric Brewer

2002

CAP TheoremFormally Proven

Seth Gilbert,Nancy Lynch

(MIT)

seminal events in internet historyNoSQL (“Not Only SQL”)seminal events in internet historyIssues of Scale

Page 9: CouchConf-Bangalore-Intro-to-document-databases

9

2011The Web

Mobile

NoSQL

off-line applications

distributed systems

dynamic user population in the millions

availability or consistency

innovative applications with changing requirements

server synchronization and sharing

Page 10: CouchConf-Bangalore-Intro-to-document-databases

Logic Scales!

Page 11: CouchConf-Bangalore-Intro-to-document-databases

This application runs all the way to the edge. A billion concurrent users on this application will have the same experience as a single user.

Page 12: CouchConf-Bangalore-Intro-to-document-databases

What About Data?

Page 13: CouchConf-Bangalore-Intro-to-document-databases

The Relational Database Solution

Page 14: CouchConf-Bangalore-Intro-to-document-databases

RDBMS Scales ... at what cost?

Page 15: CouchConf-Bangalore-Intro-to-document-databases

15

ACID ACID Atomicity

Consistency

Isolationdatabases go from one consistent state to another

transactions never interfere with each other

database modifications are all or nothing

Durabilityonce a transaction is committed, it stays

Page 16: CouchConf-Bangalore-Intro-to-document-databases

16

TRANSACTIONSDAN PRITCHETT, EBAY

PayPal Uses TransactionseBay Doesn’t (for non-critical data)

two-phase commit not pragmaticresponsiveness and site availability would suffer

Other High-Volume independently determine the same strategy

new role for database as a ‘data store’new problem: transactions per watt

Page 17: CouchConf-Bangalore-Intro-to-document-databases

17

CAP THEOREM

Consistency

Availability

Partition Toleranceevery operation returns a result

the network allows lost and undeliverable messages

reads and writes happen correctly

PICK TWO!

Page 18: CouchConf-Bangalore-Intro-to-document-databases

18

BASE

Basically AvailableSoft-StateEventually Consistent

B ASE

Page 19: CouchConf-Bangalore-Intro-to-document-databases

The NoSQL Solution

Page 20: CouchConf-Bangalore-Intro-to-document-databases

The NoSQL Solution

Page 21: CouchConf-Bangalore-Intro-to-document-databases

21

DATABASE

Features-FirstOracle, SQL Server, DB2, MySQL, PostgreSQL, Amazon RDS

Scale-FirstCouchbase Server, CouchDB, Project Voldemort, Riak, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, Cassandra, HBase and Hypertable

Simple Structured StorageAmazon SimpleDB, Berkeley DB

Purpose-Optimized StoresStreamBase, Vertica, Aster Data, Netezza, Greenplum, VoltDB

Page 22: CouchConf-Bangalore-Intro-to-document-databases

22

NOSQL TAXONOMYSTEVEN YEN, COUCHBASE

key-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store

Page 23: CouchConf-Bangalore-Intro-to-document-databases

23

WHO WILL WIN?

Page 24: CouchConf-Bangalore-Intro-to-document-databases

24

 THE MOST APPROACHABLE API WITH ENOUGH POWER WILL WIN

Page 25: CouchConf-Bangalore-Intro-to-document-databases

25

NOSQL TAXONOMYkey-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store

Page 26: CouchConf-Bangalore-Intro-to-document-databases

26

WHY DOCUMENT DATABASES?

Page 27: CouchConf-Bangalore-Intro-to-document-databases

27

DOCUMENT DATABASE APIS ARE ‘APPROACHABLE’

HTTPGET, POST, PUT, DELETE

memcachedGET, SET, DELETE, ADD, REPLACE, ...

Page 28: CouchConf-Bangalore-Intro-to-document-databases

Everyone Understands

APIs in every language to work with your data.

Page 29: CouchConf-Bangalore-Intro-to-document-databases

Documents are Flexible

Focus point: Applications tend to only care about the parts they find interesting while preserving the rest.

Page 30: CouchConf-Bangalore-Intro-to-document-databases

30

DOCUMENT DATABASES HAVE “ENOUGH POWER”fast reads from cachefast writes to persistent single-document atomic writesdocument conflict resolutionreplication of dataclustering and fault-tolerancefail-over and rebalancing on the flyrolling upgrades and deploymenthigh availabilitypartition tolerancerapid-development tools

Page 31: CouchConf-Bangalore-Intro-to-document-databases

Document Reads Are Fast

document caching

Page 32: CouchConf-Bangalore-Intro-to-document-databases

Document Reads Are Fast

document persistence

Page 33: CouchConf-Bangalore-Intro-to-document-databases

Document Writes Are Fast

document write queue

write request

write to cache

asynchronous document write

Page 34: CouchConf-Bangalore-Intro-to-document-databases

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

new reductionsA-R15

I-R8

M-R5

new document

Document Writes Are Fast

Page 35: CouchConf-Bangalore-Intro-to-document-databases

Document Writes Are Safe

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

A-R15

I-R8

M-R5

new revisionsnew root

M

Page 36: CouchConf-Bangalore-Intro-to-document-databases

Document Databases Scale Out

Page 37: CouchConf-Bangalore-Intro-to-document-databases

❦❦

A B

37

Document Databases Replicate

Page 38: CouchConf-Bangalore-Intro-to-document-databases

Document Database Queries Are Fast

38A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

M-R5

A-H7

I-R7

A-R14

M

startkey endkey

Page 39: CouchConf-Bangalore-Intro-to-document-databases

Document Databases Are Developer Friendly

Page 40: CouchConf-Bangalore-Intro-to-document-databases

40

“THE ROADS AND CROSSROADS OF INTERNET HISTORY”HTTP://WWW.NETVALLEY.COM/INTVAL1.HTML

“A BRIEF HISTORY OF NOSQL”HTTP://BLOG.KNUTHAUGEN.NO/2010/03/A-BRIEF-HISTORY-OF-NOSQL.HTML

“HISTORY OF THE ATLANTIC CABLE AND UNDERSEA COMMUNICATIONS”HTTP://ATLANTIC-CABLE.COM/FIELD/INDEX.HTM

“A LITTLE HISTORY OF THE WORLD WIDE WEB”HTTP://WWW.W3.ORG/HISTORY.HTML

“DAN PRITCHETT ON ARCHITECTURE AT EBAY”HTTP://WWW.INFOQ.COM/INTERVIEWS/DAN-PRITCHETT-EBAY-ARCHITECTURE

“NOSQL IS A HORSELESS CARRIAGE” BY STEVEN YENHTTP://DL.DROPBOX.COM/U/2075876/NOSQL-STEVE-YEN.PDF