entity relationships in a document database at couchconf boston

77
Entity Relationships in a Document Database MapReduce Views for SQL Users

Upload: bradley-holt

Post on 17-Dec-2014

2.051 views

Category:

Technology


1 download

DESCRIPTION

Unlike relational databases, document databases like CouchDB and Couchbase do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents.

TRANSCRIPT

Page 1: Entity Relationships in a Document Database at CouchConf Boston

Entity Relationships in a Document Database

MapReduce Views for SQL Users

Page 2: Entity Relationships in a Document Database at CouchConf Boston

Entity:An object de!ned by its identity and a thread of continuity[1]

1. "Entity" Domain-driven Design Community <http://domaindrivendesign.org/node/109>.

Page 3: Entity Relationships in a Document Database at CouchConf Boston

Entity RelationshipModel

Page 4: Entity Relationships in a Document Database at CouchConf Boston

Join vs. Collation

Page 5: Entity Relationships in a Document Database at CouchConf Boston

SELECT `publisher`.`id`, `publisher`.`name`, `book`.`title`FROM `publisher`FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id`ORDER BY `publisher`.`id`, `book`.`title`;

SQL Query Joining Publishers and Books

Page 6: Entity Relationships in a Document Database at CouchConf Boston

Joined Result Set

publisher.id publisher.name book.title

oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript

oreilly O'Reilly MediaCouchDB: The Definitive

Guide

oreilly O'Reilly MediaDocBook: The Definitive

Guide

oreilly O'Reilly Media RESTful Web Services

Page 7: Entity Relationships in a Document Database at CouchConf Boston

Joined Result Set

publisher.id publisher.name book.title

oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript

oreilly O'Reilly MediaCouchDB: The Definitive

Guide

oreilly O'Reilly MediaDocBook: The Definitive

Guide

oreilly O'Reilly Media RESTful Web Services

Publisher (“left”)

Page 8: Entity Relationships in a Document Database at CouchConf Boston

Joined Result Set

publisher.id publisher.name book.title

oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript

oreilly O'Reilly MediaCouchDB: The Definitive

Guide

oreilly O'Reilly MediaDocBook: The Definitive

Guide

oreilly O'Reilly Media RESTful Web Services

Publisher (“left”) Book “right”

Page 9: Entity Relationships in a Document Database at CouchConf Boston

Collated Result Set

key id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"

["oreilly",1] "oreilly" "DocBook: The Definitive Guide"

["oreilly",1] "oreilly" "RESTful Web Services"

Page 10: Entity Relationships in a Document Database at CouchConf Boston

Collated Result Set

key id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"

["oreilly",1] "oreilly" "DocBook: The Definitive Guide"

["oreilly",1] "oreilly" "RESTful Web Services"

Publisher

Page 11: Entity Relationships in a Document Database at CouchConf Boston

Collated Result Set

key id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"

["oreilly",1] "oreilly" "DocBook: The Definitive Guide"

["oreilly",1] "oreilly" "RESTful Web Services"

Publisher

Books

Page 12: Entity Relationships in a Document Database at CouchConf Boston

View Result SetsMade up of columns and rows

Every row has the same three columns:• key• id• value

Columns can contain a mixture of logical data types

Page 13: Entity Relationships in a Document Database at CouchConf Boston

One to Many Relationships

Page 14: Entity Relationships in a Document Database at CouchConf Boston

Embedded Entities: Nest related entities within a document

Page 15: Entity Relationships in a Document Database at CouchConf Boston

A single document represents the “one” entity

Nested entities (JSON Array) represents the “many” entities

Simplest way to create a one to many relationship

Embedded Entities

Page 16: Entity Relationships in a Document Database at CouchConf Boston

Example: Publisher with Nested Books{ "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS, and JavaScript" } ]}

Page 17: Entity Relationships in a Document Database at CouchConf Boston

function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } }}

Map Function

Page 18: Entity Relationships in a Document Database at CouchConf Boston

Result Setkey id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"

["oreilly",1] "oreilly" "DocBook: The Definitive Guide"

["oreilly",1] "oreilly" "RESTful Web Services"

Page 19: Entity Relationships in a Document Database at CouchConf Boston

Only works if there aren’t a large number of related entities:• Too many nested entities can result in very large documents• Slow to transfer between client and server• Unwieldy to modify• Time-consuming to index

Limitations

Page 20: Entity Relationships in a Document Database at CouchConf Boston

Related Documents: Reference an entity by its identi!er

Page 21: Entity Relationships in a Document Database at CouchConf Boston

A document representing the “one” entity

Separate documents for each “many” entity

Each “many” entity references its related “one” entity by the “one” entity’s document identi!er

Makes for smaller documents

Reduces the probability of document update con"icts

Related Documents

Page 22: Entity Relationships in a Document Database at CouchConf Boston

Example: Publisher

{ "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media"}

Page 23: Entity Relationships in a Document Database at CouchConf Boston

Example: Related Book

{ "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly"}

Page 24: Entity Relationships in a Document Database at CouchConf Boston

Map Function

function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); }}

Page 25: Entity Relationships in a Document Database at CouchConf Boston

Result Set

key id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "9780596155896" "CouchDB: The Definitive Guide"

["oreilly",1] "9780596529260" "RESTful Web Services"

["oreilly",1] "9780596805791" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "9781565925809" "DocBook: The Definitive Guide"

Page 26: Entity Relationships in a Document Database at CouchConf Boston

When retrieving the entity on the “right” side of the relationship, one cannot include any data from the entity on the “left” side of the relationship without the use of an additional query

Only works for one to many relationships

Limitations

Page 27: Entity Relationships in a Document Database at CouchConf Boston

Many to Many Relationships

Page 28: Entity Relationships in a Document Database at CouchConf Boston

List of Keys: Reference entities by their identi!ers

Page 29: Entity Relationships in a Document Database at CouchConf Boston

A document representing each “many” entity on the “left” side of the relationship

Separate documents for each “many” entity on the “right” side of the relationship

Each “many” entity on the “right” side of the relationship maintains a list of document identi!ers for its related “many” entities on the “left” side of the relationship

List of Keys

Page 30: Entity Relationships in a Document Database at CouchConf Boston

Books and Related Authors

Page 31: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}

Page 32: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}

Page 33: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}

Page 34: Entity Relationships in a Document Database at CouchConf Boston

Example: Author

{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ]}

Page 35: Entity Relationships in a Document Database at CouchConf Boston

Example: Author

{ "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ]}

Page 36: Entity Relationships in a Document Database at CouchConf Boston

Map Function

function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } }}

Page 37: Entity Relationships in a Document Database at CouchConf Boston

Result Setkey id value

["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"

["9780596805029",1] "walsh" "Norman Walsh"

["9781565920514",0] "9781565920514" "Making TeX Work"

["9781565920514",1] "walsh" "Norman Walsh"

["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"

["9781565925809",1] "muellner" "Leonard Muellner"

["9781565925809",1] "walsh" "Norman Walsh"

Page 38: Entity Relationships in a Document Database at CouchConf Boston

Authors and Related Books

Page 39: Entity Relationships in a Document Database at CouchConf Boston

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } }}

Map Function

Page 40: Entity Relationships in a Document Database at CouchConf Boston

Result Setkey id value

["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "muellner" {"_id":"9781565925809"}

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "walsh" {"_id":"9780596805029"}

["walsh",1] "walsh" {"_id":"9781565920514"}

["walsh",1] "walsh" {"_id":"9781565925809"}

Page 41: Entity Relationships in a Document Database at CouchConf Boston

Including Docs include_docs=true

key id value doc (truncated)

["muellner",0] "muellner" … {"name":"Leonard Muellner"}

["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}

["walsh",0] "walsh" … {"name":"Norman Walsh"}

["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}

["walsh",1] "walsh" … {"title":"Making TeX Work"}

["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}

Page 42: Entity Relationships in a Document Database at CouchConf Boston

Or, we can reverse the references…

Page 43: Entity Relationships in a Document Database at CouchConf Boston

Example: Author

{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}

Page 44: Entity Relationships in a Document Database at CouchConf Boston

Example: Author

{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}

Page 45: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ]}

Page 46: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ]}

Page 47: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ]}

Page 48: Entity Relationships in a Document Database at CouchConf Boston

Map Function

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } }}

Page 49: Entity Relationships in a Document Database at CouchConf Boston

Result Setkey id value

["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "9781565925809" "DocBook: The Definitive Guide"

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"

["walsh",1] "9781565920514" "Making TeX Work"

["walsh",1] "9781565925809" "DocBook: The Definitive Guide"

Page 50: Entity Relationships in a Document Database at CouchConf Boston

Queries from the “right” side of the relationship cannot include any data from entities on the “left” side of the relationship (without the use of include_docs)

A document representing an entity with lots of relationships could become quite large

Limitations

Page 51: Entity Relationships in a Document Database at CouchConf Boston

Relationship Documents: Create a document to represent each individual relationship

Page 52: Entity Relationships in a Document Database at CouchConf Boston

A document representing each “many” entity on the “left” side of the relationship

Separate documents for each “many” entity on the “right” side of the relationship

Neither the “left” nor “right” side of the relationship contain any direct references to each other

For each distinct relationship, a separate document includes the document identi!ers for both the “left” and “right” sides of the relationship

Relationship Documents

Page 53: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}

Page 54: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}

Page 55: Entity Relationships in a Document Database at CouchConf Boston

Example: Book

{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}

Page 56: Entity Relationships in a Document Database at CouchConf Boston

Example: Author

{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}

Page 57: Entity Relationships in a Document Database at CouchConf Boston

Example: Author

{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}

Page 58: Entity Relationships in a Document Database at CouchConf Boston

Example: Relationship Document{ "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh"}

Page 59: Entity Relationships in a Document Database at CouchConf Boston

Example: Relationship Document{ "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh"}

Page 60: Entity Relationships in a Document Database at CouchConf Boston

Example: Relationship Document{ "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner"}

Page 61: Entity Relationships in a Document Database at CouchConf Boston

Example: Relationship Document{ "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh"}

Page 62: Entity Relationships in a Document Database at CouchConf Boston

Books and Related Authors

Page 63: Entity Relationships in a Document Database at CouchConf Boston

function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); }}

Map Function

Page 64: Entity Relationships in a Document Database at CouchConf Boston

Result Setkey id value

["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"

["9780596805029",1] "44005f2c" {"_id":"walsh"}

["9781565920514",0] "9781565920514" "Making TeX Work"

["9781565920514",1] "44005f72" {"_id":"walsh"}

["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"

["9781565925809",1] "44006720" {"_id":"muellner"}

["9781565925809",1] "44006b0d" {"_id":"walsh"}

Page 65: Entity Relationships in a Document Database at CouchConf Boston

Including Docs include_docs=true

key id value doc (truncated)

["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}

["9780596805029",1] … … {"name":"Norman Walsh"}

["9781565920514",0] … … {"title":"Making TeX Work"}

["9781565920514",1] … … {"author","name":"Norman Walsh"}

["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}

["9781565925809",1] … … {"name":"Leonard Muellner"}

["9781565925809",1] … … {"name":"Norman Walsh"}

Page 66: Entity Relationships in a Document Database at CouchConf Boston

Authors and Related Books

Page 67: Entity Relationships in a Document Database at CouchConf Boston

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); }}

Map Function

Page 68: Entity Relationships in a Document Database at CouchConf Boston

Result Setkey id value

["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "44006720" {"_id":"9781565925809"}

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "44005f2c" {"_id":"9780596805029"}

["walsh",1] "44005f72" {"_id":"9781565920514"}

["walsh",1] "44006b0d" {"_id":"9781565925809"}

Page 69: Entity Relationships in a Document Database at CouchConf Boston

Including Docs include_docs=true

key id value doc (truncated)

["muellner",0] … … {"name":"Leonard Muellner"}

["muellner",1] … … {"title":"DocBook: The Definitive Guide"}

["walsh",0] … … {"name":"Norman Walsh"}

["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}

["walsh",1] … … {"title":"Making TeX Work"}

["walsh",1] … … {"title":"DocBook: The Definitive Guide"}

Page 70: Entity Relationships in a Document Database at CouchConf Boston

Queries can only contain data from the “left” or “right” side of the relationship (without the use of include_docs)

Maintaining relationship documents may require more work

Limitations

Page 71: Entity Relationships in a Document Database at CouchConf Boston

Final Thoughts

Page 72: Entity Relationships in a Document Database at CouchConf Boston

Document databases have no tables (and therefore no columns)

Indexes (views) are queried directly, instead of being used to optimize more generalized queries

Result set columns can contain a mix of logical data types

No built-in concept of relationships between documents

Related entities can be embedded in a document, referenced from a document, or both

Document Databases Compared to Relational Databases

Page 73: Entity Relationships in a Document Database at CouchConf Boston

CaveatsNo referential integrity

No atomic transactions across document boundaries

Some patterns may involve denormalized (i.e. redundant) data

Data inconsistencies are inevitable (i.e. eventual consistency)

Consider the implications of replication—what may seem consistent with one database may not be consistent across nodes (e.g. referencing entities that don’t yet exist on the node)

Page 74: Entity Relationships in a Document Database at CouchConf Boston

Additional TechniquesUse the startkey and endkey parameters to retrieve one entity and its related entities:startkey=["9781565925809"]&endkey=["9781565925809",{}]

De!ne a reduce function and use grouping levels

Use UUIDs rather than natural keys for better performance

Use the bulk document API when writing Relationship Documents

When using the List of Keys or Relationship Documents patterns, denormalize data so that you can have data from the “right” and “left” side of the relationship within your query results

Page 75: Entity Relationships in a Document Database at CouchConf Boston

Cheat Sheet

One to Many

Many to Many

<= N* Relations

> N* Relations

Embedded Entities

Related Documents

List of Keys Relationship Documents

✓ ✓

✓ ✓

✓ ✓

✓ ✓

* where N is a large number for your system