introduction to couchdb

33
John Wood ChicagoDB User Group August 16, 2010 Monday, August 16, 2010

Upload: john-wood

Post on 22-Jun-2015

2.943 views

Category:

Technology


0 download

DESCRIPTION

Introduction to Couch DB. Presented at the ChicagoDB User Group on August 16th, 2010.

TRANSCRIPT

Page 1: Introduction to CouchDB

John WoodChicagoDB User Group

August 16, 2010

Monday, August 16, 2010

Page 2: Introduction to CouchDB

Outline• What is CouchDB?

• Documents and Document Storage

• Views

• Replication

• CouchApps

• Add-ons

• Resources

• Demo!

Monday, August 16, 2010

Page 3: Introduction to CouchDB

What is CouchDB?

Monday, August 16, 2010

Page 4: Introduction to CouchDB

Document Database{ “_id” : “2d7f015226a05b6940984bbe39004fde”, “_rev” : “2-477f6ab2dec6df185de1a078d270d8”, “first_name” : “John”, “last_name” : “Wood”, “interests” : [“hacking”, “fishing”, “running”], “offspring” : [ { “name” : “Dylan”, “age” : 5 }, { “name” : “Chloe”, “age” : 2 } ]}

Monday, August 16, 2010

Page 5: Introduction to CouchDB

Strong Focus on Replication

Monday, August 16, 2010- Built from day one to support bi-directional peer to peer replication- This feature sets CouchDB apart from the other NoSQL databases, and makes it stand out in the database community

Page 6: Introduction to CouchDB

RESTful API# CreatePOST http://localhost:5984/employees

# ReadGET http://localhost:5984/employees/1

# UpdatePUT http://localhost:5984/employees/1

# DeleteDELETE http://localhost:5984/employees/1

Monday, August 16, 2010

Page 7: Introduction to CouchDB

Queried and Indexed with MapReducefunction(doc) { if (doc.first_name == “John”) emit(doc._id, 1); }

function(keys, values, rereduce) { return sum(values); }

Monday, August 16, 2010

Page 8: Introduction to CouchDB

Multiversion Concurrency Control

http://en.wikipedia.org/wiki/Multiversion_concurrency_controlImage: http://blogs.wyomingnews.com/blogs/everyonegives/files/2009/02/book-stack.jpg

Monday, August 16, 2010- Documents never updated in place; new revisions are always created- Advantages * Don't have to manage locks for reads. Don't have to worry about a concurrent update corrupting a read that is in progress. * Data is “safer”. Old revisions are kept around (at least for a while). If a botched update accidentally destroys data, you can always restore it from a previous revision. * The database can perform some optimizations when writing to disk. If creating or updating 1000 documents, those documents will all live next to each other on disk, eliminating disk seeks.- Disadvantages * Requires occasional database compaction

Page 9: Introduction to CouchDB

Ultra Durable

Monday, August 16, 2010- When CouchDB documents are updated, all data and associated indexes are flushed to disk and the transactional commit always leaves the database in a completely consistent state.- 2 step commit: * All document data and associated index updates are synchronously flushed to disk. * The database header is written in two consecutive, identical chunks, and flushed to disk.- Crash recovery: * If crash on step 1 of commit, partially flushed data are forgotten upon restart. * If crash on step 2, a surviving copy of the previous headers will remain, and are used.- Crash only shutdown

Page 10: Introduction to CouchDB

Erlang OTP

Monday, August 16, 2010The Erlang programming language and the OTP platform are known for their concurrency support, and OTP is known for its extreme emphasis on reliability and availability.

Page 11: Introduction to CouchDB

Monday, August 16, 2010

Page 12: Introduction to CouchDB

Documents and Document Storage

Monday, August 16, 2010

Page 13: Introduction to CouchDB

Documents

• JSON data format

• Schema-less

• Support for binary data in the form of document “attachments”

• Each document uniquely named in the database

Monday, August 16, 2010

Page 14: Introduction to CouchDB

Document Storage

• CouchDB uses append-only updates, and never overwrites comitted data

• Document updates are serialized

• Update model is lockless and optimistic

• Reads are never blocked or interrupted by a concurrent updates

• Databases require occasional compaction

Monday, August 16, 2010

Page 15: Introduction to CouchDB

Views

Monday, August 16, 2010

Page 16: Introduction to CouchDB

Views• Add structure back to your unstructured data, so

it can be queried

• Allow you to have many different view representations of the same data

• Created by executing map and reduce functions on your documents

• View definitions are stored in Design Documents

• Built incrementally and on demand

Monday, August 16, 2010

Page 17: Introduction to CouchDB

MapReduce• MapReduce functions are written primarily

in Javascript (some other languages are supported)

• The map function selects which documents to operate on, emitting zero to many key/value pairs to the reduce function

• The (optional) reduce function combines the key/value pairs and performs any necessary calculations on that data

Monday, August 16, 2010

Page 18: Introduction to CouchDB

View Indexes• View indexes are stored on disk separate

from the main database, in a data structure specific to the given Design Document

• Views are updated incrementally; only new/changed documents are processed when the view is accessed

• Building views (especially from scratch) can be time consuming and resource intensive for large databases

Monday, August 16, 2010

Page 19: Introduction to CouchDB

Replication

Monday, August 16, 2010

Page 20: Introduction to CouchDB

• Effecient and reliable bi-directional replication

• Only documents created/updated since the last replication are replicated

• For each document, only updated fields are replicated

• Support for one-time, continuous, and filtered replication

• Fault tolerant - will simply pick up where it left off if something bad happens

Replication

Monday, August 16, 2010

Page 21: Introduction to CouchDB

Conflict Management

• Documents with conflicts have a property named “_conflicts”, which contains all conflicting revision ids

• CouchDB chooses a winning document, but keeps losing documents around for manual conflict resolution

• CouchDB does not attempt to merge conflicting documents

• It is the application’s responsibility to make sure data is merged successfully

• Losing documents will be removed upon compaction

Monday, August 16, 2010

Page 22: Introduction to CouchDB

CouchApps

Monday, August 16, 2010

Page 23: Introduction to CouchDB

What are CouchApps?• CouchApps are HTML and Javascript applications that

can be hosted directly from CouchDB

• CouchDB can serve HTML, images, CSS, Javascript, etc

• Applications live in a Design Document, with static files (html, css, etc) as attachments

• Dynamic behavior and database access done via Javascript

• CouchDB can be a complete, local web platform

• Support for virtual hosts and URL re-writing

Monday, August 16, 2010

Page 24: Introduction to CouchDB

Why?• Your application and its associated data can be

distributed, and replicated, together

• If you like to share, somebody can grab your application and data with a single replication command

• Not only Open Source, but Open Data as well

• Applications can be taken off line, used, and updated data can be synchronized at a later point in time

Monday, August 16, 2010

Page 25: Introduction to CouchDB

Monday, August 16, 2010

Page 26: Introduction to CouchDB

http://pollen.nymphormation.org/afgwar/_design/afgwardiary/index.html

Monday, August 16, 2010

Page 27: Introduction to CouchDB

http://jchrisa.net/

Monday, August 16, 2010

Page 28: Introduction to CouchDB

http://jchrisa.net/cal/_design/cal/index.html

Monday, August 16, 2010

Page 29: Introduction to CouchDB

http://github.com/jchris/toast

Monday, August 16, 2010

Page 30: Introduction to CouchDB

Add-Ons

• couchdb-lucene - Enables full text searching of documents using Lucene

• GeoCouch - Adds support for geospacial queries to CouchDB

• Lounge - A proxy-based partitioning/clustering framework for CouchDB

Monday, August 16, 2010

Page 31: Introduction to CouchDB

Resources

• http://apache.couchdb.org

• http://www.couch.io

• http://wiki.apache.org/couchdb

• http://books.couchdb.org/relax/

• http://wiki.couchapp.org

Monday, August 16, 2010

Page 32: Introduction to CouchDB

Less Talk, More Rock!Demo

http://github.com/jwood/couchdb_demo

Monday, August 16, 2010

Page 33: Introduction to CouchDB

Thanks!

[email protected]@johnpwood

Monday, August 16, 2010