couchconf-bangalore-intro-to-document-databases
TRANSCRIPT
Introduction to DocumentDatabases
Dustin Sallings @dlsspy
2
HOW DO WE THINK ABOUT DATA?a brief history
1850 1945 1957 1958 1965 1966
Atlantic CableCyrus W. Field
IDSCharles Bachman (GE)
"As We May Think"Vannevar Bush
Sputnik(USSR)
IMS (IBM)Vern Watts
ARPA(USA)
oNLine System (NLS)Doug Engelbart
19681962
MUMPS,Pick (TRW)
1969
IMP(UCLA-Stanford)
1970
“A Relational Model ofData for
Large SharedData Banks”
E.F. Codd (IBM)
1972
ARPANET
1973
IngresMichael Stonebraker
(Berkeley)seminal events in internet historyhypertext/hypermedia/webseminal events in internet historyseminal events in internet history seminal events in internet historyhypertext/hypermedia/webbeginnings of the internet
• 1850 - Atlantic cable -- taking data transmission up a notch• 1945 - As we may think - "He urges that men of science should then turn to the massive task of making more accessible our bewildering store of knowledge."• 1958 - ARPA - "prevent technological surprise like the launch of Sputnik" - "to prevent technological surprise to the US, but also to create technological surprise for its enemies"• 1969 - IMP - interface message processor (packet network)
1850 1945 1957 1958 1965 1966
Atlantic CableCyrus W. Field
IDSCharles Bachman (GE)
"As We May Think"Vannevar Bush
Sputnik(USSR)
IMS (IBM)Vern Watts
ARPA(USA)
oNLine System (NLS)Doug Engelbart
19681962
MUMPS,Pick (TRW)
1969
IMP(UCLA-Stanford)
1970
“A Relational Model ofData for
Large SharedData Banks”
E.F. Codd (IBM)
1972
ARPANET
1973
IngresMichael Stonebraker
(Berkeley)
seminal events in internet historyhierarchical/network databases relational databasesrelational databases
• 1965 - MUMPS - Massachusetts General Hospital Utility Multi-Programming System - It was largely adopted during the 1970s and early 1980s in healthcare and financial information systems/databases, and continues to be used by many of the same clients today. It is currently used in electronic health record systems as well as by multiple banking networks and online trading/investment services.
Pre-1960
1976 1984 1985
GemStone/S(GemStone)
System R(IBM)
1974
Oracle(Larry Ellison)
1990
line-mode browser(Nicola Pellow)
WWW(Tim Berners-Lee)
Versant(Versant)
MUMPSANSI,DBM
1977 1989
Lotus Notes(Lotus)
1982
GT.M,BerkeleyDB
1991 1994
MySQL(Michael Wideniusand David Axmark)
Cello(Tom Bruce)
1997
CacheIntersystems
(MUMPS)
Metakit
1983
DNS(Paul Mockapetris)
TCP/IP(Vint Cerf
andBob Kahn)
Mosaic(Marc Andreeson)
ViolaWWW(Pei Wei)
Hypercard(Bill Atkinson)
NeXT
manyother
ODBMSs seminal events in internet historyhypertext/hypermedia/webhypertext/hypermedia/webbeginnings of the internet
Pre-1960
1976 1984 1985
GemStone/S(GemStone)
System R(IBM)
1974
Oracle(Larry Ellison)
1990
line-mode browser(Nicola Pellow)
WWW(Tim Berners-Lee)
Versant(Versant)
MUMPSANSI,DBM
1977 1989
Lotus Notes(Lotus)
1982
GT.M,BerkeleyDB
1991 1994
MySQL(Michael Wideniusand David Axmark)
Cello(Tom Bruce)
1997
CacheIntersystems
(MUMPS)
Metakit
1983
DNS(Paul Mockapetris)
TCP/IP(Vint Cerf
andBob Kahn)
Mosaic(Marc Andreeson)
ViolaWWW(Pei Wei)
Hypercard(Bill Atkinson)
NeXT
manyother
ODBMSs
seminal events in relational databasesseminal MUMPS open sourceobject databasesobject databases
2007 2008
Android(Andy Rubin)
iOS and iPhoneSteve Jobs
2000
Neo4j
db4o
QDBM
2003
memcached
2005
CouchDB
2004
BigTable
2006
JackRabbit,Tokyo Cabinet
AmazonDynamo(paper)
MongoDB
Project Voldemort,Cassandra
1998
Open Source SummitTim O'Reilly
"NoSQL"Carlo Rozzi
2009 2010 2011
membaseCouchbase Server
Couchbase Mobile
Terrastore,Riak
Dynomite,Hbase,
VertexDB
"NoSQL"
iPad Kindle Fire
Samsung GalaxyCAP TheoremEric Brewer
2002
CAP TheoremFormally Proven
Seth Gilbert,Nancy Lynch
(MIT)
seminal events in distributed computing mobile devicesmobile devices
2007 2008
Android(Andy Rubin)
iOS and iPhoneSteve Jobs
2000
Neo4j
db4o
QDBM
2003
memcached
2005
CouchDB
2004
BigTable
2006
JackRabbit,Tokyo Cabinet
AmazonDynamo(paper)
MongoDB
Project Voldemort,Cassandra
1998
Open Source SummitTim O'Reilly
"NoSQL"Carlo Rozzi
2009 2010 2011
membaseCouchbase Server
Couchbase Mobile
Terrastore,Riak
Dynomite,Hbase,
VertexDB
"NoSQL"
iPad Kindle Fire
Samsung GalaxyCAP TheoremEric Brewer
2002
CAP TheoremFormally Proven
Seth Gilbert,Nancy Lynch
(MIT)
seminal events in internet historyNoSQL (“Not Only SQL”)seminal events in internet historyIssues of Scale
9
2011The Web
Mobile
NoSQL
off-line applications
distributed systems
dynamic user population in the millions
availability or consistency
innovative applications with changing requirements
server synchronization and sharing
Logic Scales!
This application runs all the way to the edge. A billion concurrent users on this application will have the same experience as a single user.
What About Data?
The Relational Database Solution
RDBMS Scales ... at what cost?
15
ACID ACID Atomicity
Consistency
Isolationdatabases go from one consistent state to another
transactions never interfere with each other
database modifications are all or nothing
Durabilityonce a transaction is committed, it stays
16
TRANSACTIONSDAN PRITCHETT, EBAY
PayPal Uses TransactionseBay Doesn’t (for non-critical data)
two-phase commit not pragmaticresponsiveness and site availability would suffer
Other High-Volume independently determine the same strategy
new role for database as a ‘data store’new problem: transactions per watt
17
CAP THEOREM
Consistency
Availability
Partition Toleranceevery operation returns a result
the network allows lost and undeliverable messages
reads and writes happen correctly
PICK TWO!
18
BASE
Basically AvailableSoft-StateEventually Consistent
B ASE
The NoSQL Solution
The NoSQL Solution
21
DATABASE
Features-FirstOracle, SQL Server, DB2, MySQL, PostgreSQL, Amazon RDS
Scale-FirstCouchbase Server, CouchDB, Project Voldemort, Riak, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, Cassandra, HBase and Hypertable
Simple Structured StorageAmazon SimpleDB, Berkeley DB
Purpose-Optimized StoresStreamBase, Vertica, Aster Data, Netezza, Greenplum, VoltDB
22
NOSQL TAXONOMYSTEVEN YEN, COUCHBASE
key-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store
23
WHO WILL WIN?
24
THE MOST APPROACHABLE API WITH ENOUGH POWER WILL WIN
25
NOSQL TAXONOMYkey-value-cachekey-value-storeeventually-consistent key-value-storeordered-key-value-storedata-structures servertuple-storeobject databasedocument databasewide columnar store
26
WHY DOCUMENT DATABASES?
27
DOCUMENT DATABASE APIS ARE ‘APPROACHABLE’
HTTPGET, POST, PUT, DELETE
memcachedGET, SET, DELETE, ADD, REPLACE, ...
Everyone Understands
APIs in every language to work with your data.
Documents are Flexible
Focus point: Applications tend to only care about the parts they find interesting while preserving the rest.
30
DOCUMENT DATABASES HAVE “ENOUGH POWER”fast reads from cachefast writes to persistent single-document atomic writesdocument conflict resolutionreplication of dataclustering and fault-tolerancefail-over and rebalancing on the flyrolling upgrades and deploymenthigh availabilitypartition tolerancerapid-development tools
Document Reads Are Fast
document caching
Document Reads Are Fast
document persistence
Document Writes Are Fast
document write queue
write request
write to cache
asynchronous document write
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
N-R4
A-H7
I-R7
A-R14
M
new reductionsA-R15
I-R8
M-R5
new document
Document Writes Are Fast
Document Writes Are Safe
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
N-R4
A-H7
I-R7
A-R14
M
A-R15
I-R8
M-R5
new revisionsnew root
M
Document Databases Scale Out
❦❦
A B
37
Document Databases Replicate
Document Database Queries Are Fast
38A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
M-R5
A-H7
I-R7
A-R14
M
startkey endkey
Document Databases Are Developer Friendly
40
“THE ROADS AND CROSSROADS OF INTERNET HISTORY”HTTP://WWW.NETVALLEY.COM/INTVAL1.HTML
“A BRIEF HISTORY OF NOSQL”HTTP://BLOG.KNUTHAUGEN.NO/2010/03/A-BRIEF-HISTORY-OF-NOSQL.HTML
“HISTORY OF THE ATLANTIC CABLE AND UNDERSEA COMMUNICATIONS”HTTP://ATLANTIC-CABLE.COM/FIELD/INDEX.HTM
“A LITTLE HISTORY OF THE WORLD WIDE WEB”HTTP://WWW.W3.ORG/HISTORY.HTML
“DAN PRITCHETT ON ARCHITECTURE AT EBAY”HTTP://WWW.INFOQ.COM/INTERVIEWS/DAN-PRITCHETT-EBAY-ARCHITECTURE
“NOSQL IS A HORSELESS CARRIAGE” BY STEVEN YENHTTP://DL.DROPBOX.COM/U/2075876/NOSQL-STEVE-YEN.PDF