The Artful Businessof Data Mining
Distributed Schema-lessDocument-Based Databases
Wednesday 27 March 13
id name age address
1234567...
daviddivadfoobarjohnjackjill...
134142
331548...
315513198851166...
Wednesday 27 March 13
id name age address
1234567...
daviddivadfoobarjohnjackjill...
134142
331548...
315513198851166...
Wednesday 27 March 13
id name age address
1234567...
daviddivadfoobarjohnjackjill...
134142
331548...
315513198851166...
Wednesday 27 March 13
id name age address
1234567...
daviddivadfoobarjohnjackjill...
13
4142
331548...
315513198851166...
Wednesday 27 March 13
id name age address
1234567...
daviddivadfoobarjohnjackjill...
134142
331548...
315513198851166...
Wednesday 27 March 13
id name age address phone
1234567...
daviddivadfoobarjohnjackjill...
262742311712821...
IEUSIE
CANZDKIE...
3531
3531
131311353...
Wednesday 27 March 13
{ "firstName": "David", "lastName": "Coallier", "age": 26, "address": { "streetAddress": "Mansfield House", "city": "Crosshaven", }, "phoneNumbers": [ { "type": "mobile", "number": "0863299999" } ]}
Wednesday 27 March 13
Replication AttachmentsGenerated “random” idsDictionary Revisions?JSON ObjectsHTTP CRUD
Wednesday 27 March 13
{ "_id": "131dafsd1vasd", "_rev": "12-fva32asdf", "firstName": "David", "lastName": "Coallier", "age": 26, "address": { "streetAddress": "Mansfield House", "city": "Crosshaven", }, "phoneNumbers": [ { "type": "mobile", "number": "0863299999" } ]}
Wednesday 27 March 13
CouchDB Riak
Storage Model append-only bitcask
Access HTTP HTTP, PB
Retrieval Views(M/R) M/R, Indexes, Search
Versioning Eventual Consistency Vector Clocks
Concurrency No Locking Client Resolution
Replication master/master/slave replication, clustering
Scaling In/Out Big Couch Built-in
Management Futon/Fuxton Riak Controlhttp://downloads.basho.com/papers/bitcask-intro.pdfhttp://guide.couchdb.org
Wednesday 27 March 13
{ "_id": "...", "_rev": "...", "age": "26"}
{ "_id": "...", "_rev": "...", "age": "32", "heads": "3",}
{ "_id": "...", "_rev": "...", "age": "42"}
{ "_id": "...", "_rev": "...", "age": "17"}
Wednesday 27 March 13
{ "_id": "...", "_rev": "...", "age": "26"}
{ "_id": "...", "_rev": "...", "age": "42"}
{ "_id": "...", "_rev": "...", "age": "17"}
{ "_id": "...", "_rev": "...", "age": "32", "heads": "3",}
Wednesday 27 March 13
{ "_id": "...", "_rev": "...", "age": "26"}
{ "_id": "...", "_rev": "...", "age": "42"}
{ "_id": "...", "_rev": "...", "age": "17"}
{ "_id": "...", "_rev": "...", "age": "32", "heads": "3",}
Map: find-ages
Wednesday 27 March 13
function find_ages(doc) { if (typeof(doc.age) != undefined) { emit(doc._id, doc.age); }}
Map: find-ages
Wednesday 27 March 13
{ "_id": "...", "_rev": "...", "age": "26"}
{ "_id": "...", "_rev": "...", "age": "42"}
{ "_id": "...", "_rev": "...", "age": "17"}
{ "_id": "...", "_rev": "...", "age": "32", "heads": "3",}
Map: find-ages
Wednesday 27 March 13
{ "_id": "...", "_rev": "...", "age": "26"}
{ "_id": "...", "_rev": "...", "age": "42"}
{ "_id": "...", "_rev": "...", "age": "17"}
{ "_id": "...", "_rev": "...", "age": "32", "heads": "3",}
Map: find-ages
26 32 42 17
Wednesday 27 March 13
{ "_id": "...", "_rev": "...", "age": "26"}
{ "_id": "...", "_rev": "...", "age": "32", "heads": "3",}
{ "_id": "...", "_rev": "...", "age": "42"}
{ "_id": "...", "_rev": "...", "age": "17"}
Wednesday 27 March 13
/** * Our mapper function. */map: function(doc) { emit(null, [doc.age, doc.age * doc.age]);}
/** * Our reducer... */reduce: function(keys, values, rereduce) { var N = 0; var summed = 0; var summedSquare = 0;
for (var i in values) { N += 1; summed += values[i][0]; summedSquare += values[i][1]; }
var mean = summed / N; var standard_deviation = Math.sqrt( (summedSquare / N) - (mean* mean) )
return [mean, standard_deviation]}
Wednesday 27 March 13
/** * Our mapper function. */map: function(doc) { emit(null, [doc.age, doc.age * doc.age]);}
/** * Our reducer... */reduce: function(keys, values, rereduce) { var N = values.length; var summed = sum(values.map(function(v) { return v[0]; })); var summedSquares = sum(values.map(function(v) { return v[1];}));
var mean = summed / N; var standard_deviation = Math.sqrt( (summedSquares / N) - (mean*mean) )
return [mean, standard_deviation]}
Wednesday 27 March 13