document your-world-couchbase sf-2013

53

Upload: couchbase

Post on 10-May-2015

578 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Document your-world-couchbase sf-2013
Page 2: Document your-world-couchbase sf-2013

Document Your World

Robin JohnsonDeveloper Advocate

Page 3: Document your-world-couchbase sf-2013

• Developer Advocate at Couchbase

• Polyglot Hacker (Primarily Ruby,Python, Go, and C)

• NoSQL & REST API Enthusiast

@RBIN [email protected]

Robin Johnson

Page 4: Document your-world-couchbase sf-2013

What to Expect:

• JSON Basics

• JSON Documents within Couchbase itself

• Mind-set Changes between Relational and

Non-Relational Modeling

• Building an application around JSON

• Document Structuring / Modeling our data effectively

• Views and Indexes within Couchbase

• An introduction to Map / Reduce

Page 5: Document your-world-couchbase sf-2013

JSON Basics – what is JSON?

Java Script Object Notation

• Created by Douglas Crockford

• Text Based Format

• Designed for Human-readable

data interchange

Page 6: Document your-world-couchbase sf-2013

JSON Basics – Why JSON?

JSON has a lot of advantages:

• It's compact

• It's easy for both computers and people to read and write

• It maps very easily onto the data structures used by most programming

languages (numbers, strings, booleans, nulls, arrays and associative arrays)

• Nearly all programming languages contain functions or libraries that can

read and write JSON structures

Page 7: Document your-world-couchbase sf-2013

Supported JSON Types:

String:

Numbers: – (Int. & Floating Point)

"A String"

Boolean:

{“value” : false}

Object:

{ ”name" : “Robin Johnson” “twitter" : “@rbin", ”age" : 22, "title" : ”Developer Advocate"}

22 & 55.2

Page 8: Document your-world-couchbase sf-2013

Supported JSON Types - Lists:

Array:

["one", "two", "three"]foos : [ { ”bar1":"value1", ”bar2":"value2" }, { ”bar3":"value3", ”bar4":"value4" }]

List of Objects:

Complex, Nested Objects:

{ tweet, tweet… }

Page 9: Document your-world-couchbase sf-2013

JSON Documents within Couchbase

• Couchbase is primarily a JSON-oriented Document Data Store.

• Each document is stored with a Unique Identifier (Key) and is

made up of key-value pairs.

• Couchbase uses these JSON values to build indexes, query

data and perform advanced lookups.

Couchbase stores the ‘Meta’ of each Document, and the Body (Content)…

Page 10: Document your-world-couchbase sf-2013

meta{“id”: “[email protected]”,“rev”: “1-0002bce0000000000”,“flags”: 0,“expiration”: 0,“type”: “json”}

document{“uid”: 1234,“firstname”: “Robin”,“lastname”: “Johnson”,“age”: 22,“favorite_colors”: [“green”, “red”],“email”: “[email protected]”}

Meta Information Including Key (ID)

All Keys Unique and Kept in RAM at all times.

Document Value

Most Recent In Ram And Persisted To Disk

JSON Document Structure

Page 11: Document your-world-couchbase sf-2013

User Object

string uid

string firstname

string lastname

int age

array favorite_colors

string email

u::[email protected]{“uid”: 1234,“firstname”: “Robin”,“lastname”: “Johnson”,“age”: 22,“favorite_colors”: [“green”, “red”],“email”: “[email protected]”}

User Objectstring uid

string firstname

string lastname

int age

array favorite_colors

string email

set()

get()

Objects Serialized to JSON and Back

u::[email protected]{“uid”: 1234,“firstname”: “Robin”,“lastname”: “Johnson”,“age”: 22,“favorite_colors”: [“green”, “red”],“email”: “[email protected]”}

Page 12: Document your-world-couchbase sf-2013

The Mind-Set Change

Page 13: Document your-world-couchbase sf-2013

•All of our data is in tables,

•We split complex data across multiple tables,

•We have a very rigid, inflexible schema, and

•All of our data records are forced to look the same.

•We use complex JOINS, WHERE Clauses and ORDER BY Clauses

The Move from Relational Modeling

Our ‘Recipe’ table uses “JOINS” to aggregate info from other Tables.

Page 14: Document your-world-couchbase sf-2013

The Move to NoSQL

•In Couchbase, we’re going to model our Documents in JSON.

•Contrary to Relational DBs, we can hit the database as much as we like as Gets and Sets are so quick, they’re trivial.

•We can make changes to our Data structures at any time, without having to use ALTER_TABLE statements allowing for agile model development.

•There is no implied schema, so each record in our DB could look entirely different to the last.

•Getting our heads around modeling data in JSON can be tricky. Let’s look at how we can get started in JSON Modeling:

Page 15: Document your-world-couchbase sf-2013

Modeling an Application…The JSON way

Page 16: Document your-world-couchbase sf-2013

Social Application in which people can vote on other User’s Vine videos and see a Global Ranking of the Best and Worst Vine Videos!

Rate My Vine…

Top Rated Vines

Cooking w/ Hugh Fearnley-Whittingstall

I love doing Housework

What happened to Amanda Bynes

Random Access Memories

I don’t even know

Twerking gone wrong

Too cold to Dance

How To Scare Your Friends

176

143

120

112

107

98

74

37

Page 17: Document your-world-couchbase sf-2013

•This is an actual Sample App for Couchbase, fully Open Source

•Built on Ruby, Rails & Couchbase

•Using the Couchbase-Model Ruby Gem for Active-Record style (easy) data modeling

•Puma as web server for concurrent connections

Technology Used:

Page 18: Document your-world-couchbase sf-2013

•Users must Auth with Twitter before Submitting Vines•We simply register their Name, Twitter Username & Avatar upon T-auth

User.rb

Page 19: Document your-world-couchbase sf-2013

•Standard JSON structure with simple String fields

•This JSON is editable within the Couchbase Console

How that looks as JSON in Couchbase:

Key created by a hash of Twitter UID

Explicit ‘type’ of Document

Page 20: Document your-world-couchbase sf-2013

•Vine has no public API, so we’ve written a cheeky script to Rip the true URI of the video, from the entered URL by the user

•Vines need a Name, A Video URL, a User and a Score

Vine.rb

Page 21: Document your-world-couchbase sf-2013

•Marketing have informed us that we need to add a new field for Facebook Sharing into our Vine Videos!

•In a relational world, we would have problems!

•In the Couchbase world, IT’S TRIVIAL!

The Joys of a Flexible Schema!

Page 22: Document your-world-couchbase sf-2013

•User_ID included so we know who each Vine belongs to

•Score is inside each Vine document. This brings it’s own challenges, but Couchbase solves them!

Again, the JSON within Couchbase:

Random Hash generated Key

User_ID reference

Page 23: Document your-world-couchbase sf-2013

•We have chosen to have the Score inside each Vine doc.

•We need to be able to deal with concurrent score updates.

Optimistic Concurrency:

{ “score" : 174}

UPDATE UPDATE

Page 24: Document your-world-couchbase sf-2013

•To handle the Concurrent updates, we can utilise Couchbase’ inbuilt CAS value.

•We simply write a new Update method in our application controller to use the CAS value on update.

CAS – Compare and Swap

Page 25: Document your-world-couchbase sf-2013

•Just as in SQL, our JSON Documents also have various types of ‘Relationship’.

•For example, a User can own many Videos as a 1 to many relationship.

Document Relationships

video:1{ type: “vine”, title: “My Epic Video”, owner: “rbin”}

user:rbin{ type: “user”, name: “Robin Johnson”, id: “rbin”}

Video:2{ type: “vine”, title: “I NEED A HORSE!”, owner: “rbin”}

Page 26: Document your-world-couchbase sf-2013

•Marketing have informed us we need to add a Comment mechanism to our Vine Videos.

•We need to decide the best way to approach this in JSON document design.

Single vs. Multiple Documents

{

}

Single Multiple

vs.

Document

Comment

Comment

Comment

Page 27: Document your-world-couchbase sf-2013

•Comments are nested within their respective Vine documents.

•Great when we know we have a finite amount of Results.

Single vs. Multiple - Single

{

"type": "vine",

"user_id": "145237874",

"title": "I NEED A HORSE",

"vine_url": "https://vine.co/v/b2jjzY0Wqg5",

"video_url": "https://mtc.cdn.vine.co……,

"score": 247,

"comments": [

{"format": "markdown", "body": "I LOVE this video!"},

{"format": "markdown", "body": "BEST video I have ever seen!"},

]

}

7b18b847292338bc29

Page 28: Document your-world-couchbase sf-2013

•Comments are split from the parent document.

•Comments use referential ID’s, incremented by 1

Single vs. Multiple - Multiple

{

"type": "vine",

"user_id": "145237874",

"title": "I NEED A HORSE",

"score": 247,

}

7b18b847292338bc29 { "format": "markdown", "body": "I LOVE this video!”}

7b18b847292338bc29::1

{ "format": "markdown", "body": “BEST video ever!”}

7b18b847292338bc29::2

Page 29: Document your-world-couchbase sf-2013

•Couchbase has no inbuilt mechanism for Versioning.

•There are many ways to approach document Versioning.­Copy the versions of the document into new documents,­Copy the versions of the document into a list of nested documents,­Store the list of mutated / modified attributes:

• In nested Element,

• In separate Documents.

•In this case, we’re going to look at the simplest way…

Versioning our Documents:

Page 30: Document your-world-couchbase sf-2013

•Get the current version of the document,

•Increment the version number,

•Create the version with the new key "mykey::v1”,

•Save the document in it’s current version.

Versioning our Documents:

Current Version:

Version 1:

Version 2:

mykey

mykey::v1

mykey::v2

Page 31: Document your-world-couchbase sf-2013

Questions so far?

Page 32: Document your-world-couchbase sf-2013

Views & Indexing in Couchbase

Page 33: Document your-world-couchbase sf-2013

•What’s a View?­A view within Couchbase takes in Unstructured / Semi-Structured data

and uses that data to build an Index…

•So what’s an Index?­An index is just an optimised way of finding data. (In list format or

other)

Terminology:

Page 34: Document your-world-couchbase sf-2013

•Ingesting Tweets from the Twitter API

•Taking in data from the LinkedIn API

•Taking Git Commit data etc.

There is little point in trying to sort the data before we store it.

We can simply store the unstructured data, and structure it at query time.

Unstructured Data…

Page 35: Document your-world-couchbase sf-2013

•Storing Data and Indexing Data are separate processes in all database systems.

•With explicit schema like RDBMS systems, Indexes are general optimized based on the data type(s), every row has an entry, everything is known.

• In flexible schema scenarios Map-Reduce is a technique for gathering common components of data into a collection and in Couchbase, that collection is an Index.

Couchbase Server: Views

Page 36: Document your-world-couchbase sf-2013

Map-Reduce in General

A Map function locates data items within datasets and outputs an optimized data structure that can be searched and traversed rapidly.

A Reduce function takes the output of a Map function and can calculate various aggregates from it, generally focused on numeric data.

Together they make up a technique for working with data that is semi-structured or unstructured.

Page 37: Document your-world-couchbase sf-2013

Couchbase Server 2.0: Map-Reduce

In Couchbase, Map-Reduce is specifically used to create an Index.

Map functions are applied to JSON Documents and they output or “emit” a data structure designed to be rapidly queried and traversed.

Page 38: Document your-world-couchbase sf-2013

function(doc, meta) {emit(doc.username, doc.email)

}indexed key output value(s)create row

json doc doc metadata

Every­Document­passes­through­View­Map()­functions

Map

Map() Function => Index

Page 39: Document your-world-couchbase sf-2013

function(doc, meta) {emit(doc.email, null)

}text key

Map

doc.email meta.id

[email protected] u::1

[email protected] u::2

[email protected] u::3

Single Element Keys (Text Key)

Page 40: Document your-world-couchbase sf-2013

Indexing Architecture

33 2Managed Cache Disk Q

ueue

Disk

Replication Queue

App Server

Couchbase Server Node

Doc­1Doc­1

Doc­1

To other node

View Engine

Doc­1

Doc Updated in RAM Cache First

Indexer Updates Indexes After On Disk, in Batches

All Documents & Updates Pass Through View Engine

Page 41: Document your-world-couchbase sf-2013

Buckets >> Design Documents >> Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

Can Only Access Data in the Bucket NamespaceCan Only Access Data in the Bucket Namespace

Page 42: Document your-world-couchbase sf-2013

Querying Views

Page 43: Document your-world-couchbase sf-2013

Parameters used in View Querying

• key = “”­ used­for­exact­match­of­index-key

• keys = []­ used­for­matching­set­of­index-keys

• startkey/endkey = “”­ used for range queries on index-keys

• startkey_docID/endkey_docID = “”­ used for range queries on meta.id

• stale=[false, update_after, true]­ used­to­decide­indexer­behavior­from­client

• group/group_by­ used­with­reduces­to­aggregate­with­grouping

Page 44: Document your-world-couchbase sf-2013

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?startkey=”b1”­&­endkey=”zz”

Pulls the Index-Keys between UTF-8 Range specified by the startkey and endkey.

?startkey=”bz”­&­endkey=”zn”

Pulls the Index-Keys between UTF-8 Range specified by the startkey and endkey.

?startkey=”[email protected]”­&endkey=”[email protected]

Range of a single item (can also be done with key= parameter).

Most Common Query’s Are Ranges

Page 45: Document your-world-couchbase sf-2013

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?key=”[email protected]”­

Match a Single Index-Key

Index-Key Matching

Page 46: Document your-world-couchbase sf-2013

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?keys=[“[email protected]”,“[email protected]”]

Query Multiple in the Set (Array Notation)

Index-Key Set Matches

Page 47: Document your-world-couchbase sf-2013

Beer Sample Views Demo

Page 48: Document your-world-couchbase sf-2013

Scoring and Leaderboard-ing

Top Rated Vines

I NEED A HORSE!

I love doing Housework

Cooking w/ Hugh Fearnley-Whittingstall

Random Access Memories

I don’t even know

Twerking gone wrong

Too cold to Dance

How To Scare Your Friends

Using Couchbase for the first Time

What does a fox say?

Top 10 Top 100 Top Users Login

220

207

182

164

143

120

103

94

86

81

Page 49: Document your-world-couchbase sf-2013

•Although this is the main feature of our app, the code behind it is very simple.

•We need to create a View in Couchbase, and query the View to populate our Leaderboard…

•We then tell Rails to use our Specific View on the Vine Leaderboard page

The Code Behind the Board

List each Vine, linking the Title to its URL and print its

Score.

Page 50: Document your-world-couchbase sf-2013

The Leaderboard View

The Map Function:

The Query:

Page 51: Document your-world-couchbase sf-2013

Scoring and Leaderboard-ing

Top Rated Vines

I NEED A HORSE!

I love doing Housework

Cooking w/ Hugh Fearnley-Whittingstall

Random Access Memories

I don’t even know

Twerking gone wrong

Too cold to Dance

How To Scare Your Friends

Using Couchbase for the first Time

What does a fox say?

Top 10 Top 100 Top Users Login

220

207

182

164

143

120

103

94

86

81

Page 52: Document your-world-couchbase sf-2013

Questions?

Page 53: Document your-world-couchbase sf-2013