developing node-mdb simpledb emulationgradvs1.mgateway.com/download/node-mdb.pdf– evented...
TRANSCRIPT
Developing node-mdb
SimpleDB emulationusing Node.js and GT.M
Rob TweedM/Gateway Developments Ltd
http://www.mgateway.comTwitter: @rtweed
Could you translate that title?
• SimpleDB:– Amazon’s NoSQL cloud database
• Node.js:– evented server-side Javascript (using V8)
• GT.M:– Open source global-storage based NoSQL
database
• node-mdb– Open source emulation of SimpleDB
SimpleDB
• Amazon’s cloud database
– Pay as you go
• Secure HTTP interface
• Schema-free NoSQL database
• Spreadsheet-like database model
– Domains (= tables)
• Items (= rows)
– Attributes (=cells)
» Values (1+ per attribute allowed)
• SQL-like query API
Why emulate SimpleDB?
• Because I could!
• Kind of cool project
Why emulate SimpleDB?
• To provide a free, locally-available database that
behaved identically to SimpleDB
– Lots of off-the-shelf available clients
• Standalone
– Bolso
– Mindscape’s SimpleDB Management Tools
• Language-specific clients
– boto (Python)
– Official AWS clients for Java, .Net
– Node.js
– etc…
Why emulate SimpleDB?
• To perform local tests prior to committing to production on SimpleDB
• To provide a live, local backup database
• A SimpleDB database for private clouds
• To provide an immediately-consistent SimpleDB database
– SimpleDB is “eventually consistent”
Why the GT.M database?
• I’m familiar with it
• Free Open Source NoSQL database
• Schema-free
• “Globals”:– Sparse persistent multi-dimensional arrays
• Hierarchical database• Completely dynamic storage
– No pre-declaration or specification needed
• Result: trivial to model SimpleDB in globals
• node-mdb: Good way to demonstrate the capabilities of the otherwise little-known GT.M
• More info – Google:– “GT.M database”
– “universalnosql”
Why write it using Node.js?
• M/DB originally written in late 2008– Implemented using GT.M’s native scripting language
(M)
– Apache + m_apache gateway to GT.M for HTTP interface
• I’ve been working with Node.js for about a year now– Rewriting M/DB in Javascript would make it more
widely interesting and comprehensible
• Some performance issues reported with M/DB when being pushed hard
Why Node.js?
• Conclusion:
– Re-implementing M/DB using Node.js should
provide better performance and scalability
– Fewer moving parts:
• Apache + m_apache + GT.M / multi-threaded
• Node.js + GT.M as child processes / single-thread
– Cool Node.js project to attempt
– Great example of non-trivial use of Node.js +
database
How does SimpleDB work?
HTTP
Server
Authenticate
Request
(HMacSHA)
Security Key Id
Secret Key
Execute
API
Action
Generate
HTTP
Response
SimpleDB
Database
Copy 1
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy n
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy 2
Incoming
SDB
HTTP
Request
Outgoing
SDB
HTTP
Response
Error Success
and/or
data/results
Node.js can emulate all this
HTTP
Server
Authenticate
Request
(HMacSHA)
Security Key Id
Secret Key
Execute
API
Action
Generate
HTTP
Response
SimpleDB
Database
Copy 1
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy n
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy 2
Incoming
SDB
HTTP
Request
Outgoing
SDB
HTTP
Response
Error Success
and/or
data/results
GT.M can emulate this
HTTP
Server
Authenticate
Request
Security Key Id
Secret Key
Execute
API
Action
Generate
HTTP
Response
SimpleDB
Database
Copy 1
Incoming
SDB
HTTP
Request
Outgoing
SDB
HTTP
Response
Error Success
and/or
data/results
Node.js characteristics
• Single threaded process
• Event loop
• Non-blocking I/O
– Asynchronous calls to functions that handle I/O
– Event-driven call-back functions when function
completes
• Data fetched
• Data saved
Result: deeply nested call-backs
HTTP
Server
Authenticate
Request
Security Key Id
Secret Key
Execute
API
Action
Generate
HTTP
Response
Error Success
and/or
data/results
Flattening the call-back nesting
processSDBRequest()
http server
executeAPI() sendResponse()
http.createServer(function(req,res) {..}
var processSDBRequest = function() {…};
var executeAPI = function() {…};
Node.js HTTP Serverhttp.createServer(function(request, response) {request.content = '';request.on("data", function(chunk) {
request.content += chunk;});request.on("end", function(){var SDB = {startTime: new Date().getTime(), request: request, response: response };var urlObj = url.parse(request.url, true); if (request.method === 'POST') {SDB.nvps = parseContent(request.content);
}else {SDB.nvps = urlObj.query;
}var uri = urlObj.pathname;if ((uri.indexOf(sdbURLPattern) !== -1)||(uri.indexOf(mdbURLPattern) !== -1)) {
processSDBRequest(SDB);}else {
var uriString = 'http://' + request.headers.host + request.url;var error = {code:'InvalidURI', message: 'The URI ' + uriString + ' is not valid',status:400};returnError(SDB ,error);
}});
}).listen(httpPort);
processSDBRequest()var processSDBRequest = function(SDB) {
var accessKeyId = SDB.nvps.AWSAccessKeyId;if (!accessKeyId) {var error = {code:'AuthMissingFailure', message: 'AWS was not able to authenticate the request: access credentials are missing',status:403};
returnError(SDB, error);}else {MDB.getGlobal('MDBUAF', ['keys', accessKeyId], function (error, results) {if (!error) {if (results.value !== '') {accessKey[accessKeyId] = results.value;validateSDBRequest(SDB, results.value);
}else {
var error = {code:'AuthMissingFailure', message: 'AWS was not able to authenticate the request: access credentials are missing',status:403};
returnError(SDB, error);}
}});
}};
validateSDBRequest()
var validateSDBRequest = function(SDB, secretKey) {
var type = ‘HmacSHA256’;var stringToSign = createStringToSign(SDB, true);
var hash = digest(stringToSign, secretKey, type);
if (hash === SDB.nvps.Signature) {processSDBAction(SDB);
}else {
errorResponse('SignatureDoesNotMatch', SDB)
}};
stringToSign()
POST{lf}
192.168.1.134:8081{lf}
/{lf}
AWSAccessKeyId=rob&Action=ListDomains&
MaxNumberOfDomains=100&SignatureMethod=HmacSHA1&
SignatureVersion=2&
Timestamp=2011-06-06T22%3A39%3A30%2 B00%3A00&
Version=2009-04-15
ie: reconstruct the same string that the SDB client usedto sign the request
then use rob’s secret key to sign it:
digest()
var crypto = require("crypto");
var digest = function(string, secretKey, type) {
var hmac = crypto.createHmac(type, secretKey);
hmac.update(string);
return hmac.digest('base64');
};
Ready to execute an API!
HTTP
Server
Authenticate
Request
Security Key Id
Secret Key
Execute
API
Action
Generate
HTTP
Response
SimpleDB
Database
Copy 1
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy n
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy 2
Incoming
SDB
HTTP
Request
Outgoing
SDB
HTTP
Response
Error Success
and/or
data/results
SimpleDB APIs (Actions)
• CreateDomain
• ListDomains
• DeleteDomain
• PutAttributes (BatchPutAttributes)
• GetAttributes
• DeleteAttributes (BatchDeleteAttributes)
• Select
• DomainMetaData
Accessing the GT.M Database
• Accessed via node-mwire
– TCP-based wire protocol
– Extension of Redis protocol
– Adapted redis-node module
• APIs allow you to set/get/delete/edit Globals
GT.M Globals
• Globals = unit of persistent storage
– Schema-free
– Hierarchically structured
– Sparse
– Dynamic
– “persistent associative array”
GT.M Globals
• A Global has:
– A name
– 0, 1 or more subscripts
– String value
globalName[subscript1,subscript2,..subscriptn]=value
SDB Domain in Globals
CreateDomainAWSAccessKeyId = ‘rob’
DomainName = ‘books’
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
‘name’
‘created’ 1304956337423
‘accounts’
‘modified’ 1304956337423
2
‘accounts’ 2 ‘’
Multiple Domains in Globals
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
2
Creating a new domain (1)
increment()
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
‘name’
‘created’ 1304956337423
‘accounts’
‘modified’ 1304956337423
2
‘accounts’ 2 ‘’
Creating a new domain (2)
setGlobal()
Key Node.js async patterns for db I/O
• Dependent pattern:
– Can’t set the global nodes until the value of
the increment() is returned
• Parallel pattern:
– Global nodes can be created in parallel
– No interdependence
– BUT:
• Need to know when they’re all completed
MDB ‘rob’
‘domains’
‘name’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
1
2
Dependent pattern
MDB.increment([accessKeyId, 'domains'], 1, function (error, results) {
var id = results.value;
//….now create the other global nodes inside callback
});
IncrBy
MDB ‘rob’
‘domains’
‘name’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
1
2
Dependent pattern
MDB.increment([accessKeyId, 'domains'], 1, function (error, results) {
var id = results.value;
//….now create the other global nodes inside callback
});
Parallel Pattern (semaphore)
var count = 0;
MDB.setGlobal([accessKeyId, 'domains', id, 'name'], domainName, function (error, results) {
count++;
if (count === 4) sendCreateDomainResponse(count, SDB);
});
MDB.setGlobal([accessKeyId, 'domains', id, 'created'], now, function (error, results) {
count++;
if (count === 4) sendCreateDomainResponse(count, SDB);
});
MDB.setGlobal([accessKeyId, 'domains', id, 'modified'], now, function (error, results) {
count++;
if (count === 4) sendCreateDomainResponse(count, SDB);
});
MDB.setGlobal([accessKeyId, 'domainIndex', nameIndex, id], '', function (error, results) {
count++;
if (count === 4) sendCreateDomainResponse(count, SDB);
});
MDB ‘rob’
‘domains’
‘name’
‘domainIndex’
‘created’ 1304956337618
‘books’
‘modified’ 1304956337618
‘books’
1
1 ‘’
‘name’
‘created’ 1304956337423
‘accounts’
‘modified’ 1304956337423
2
‘accounts’ 2 ‘’
New domain nodes created
Send CreateDomain Response
HTTP
Server
Authenticate
Request
Security Key Id
Secret Key
Execute
API
Action
Generate
HTTP
Response
SimpleDB
Database
Copy 1
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy n
SimpleDB
Database
Copy 2
SimpleDB
Database
Copy 2
Incoming
SDB
HTTP
Request
Outgoing
SDB
HTTP
Response
Error Success
and/or
data/results
CreateDomain Response<?xml version="1.0"?>
<CreateDomainResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/">
<ResponseMetadata>
<RequestID>e4e9fa45-f9dc-4e5b-8f0a-777acce6505e</RequestID>
<BoxUsage>0.0020000000</BoxUsage>
</ResponseMetadata>
</CreateDomainResponse>
var okResponse = function(SDB) {
var nvps = SDB.nvps;
var xml = responseStart({action: nvps.Action, version: nvps.Version});
xml = xml + responseEnd(nvps.Action, SDB.startTime, false);
responseHeader(200, SDB.response);
SDB.response.write(xml);
SDB.response.end();
};
Node.js HTTP Server Response
http.createServer(function(request, response) {
//…numerous call-backs deep:
response.writeHead(status, {
"Server": "Amazon SimpleDB",
"Content-Type": "text/xml",
"Date": dateNow.toUTCString()});
response.write('<?xml version="1.0"?>\n');
response.write(xml);
response.end();
});
Entire request/response SDB round-trip completed
Demo using Bolso
• List Domains
• Create Domain
• Add an item (row) and some attributes (columns + cells)
Node.js Gotchas
• Async programming is not immediately intuitive!
• Loops
– Calling functions that use call-backs inside a
for..in loop will go horribly wrong!
• Understanding closures
– How externally-defined variables can be used
inside call-back functions
Example
• BatchPutAttributes
– Intuitively a for .. in loop around PutAttributes
– Had to be serialised
• Completion of one PutAttributes calls the next
– Copy state of SDB object and use for..in?
• var SDBx = SDB;
• SDBx is a pointer to SDB, not a clone of it!
Conclusions• node-mdb is now nearly complete
• Only BatchDeleteAttributes not implemented• Other APIs emulate SimpleDB 100%
• Free Open Source– https://github.com/robtweed/node-mdb
– Give it a try!– Use mdb.js for examples to build your own Node.js database
applications
• Check out GT.M!
• Follow me on Twitter at @rtweed
• Slides: http://www.mgateway.com/node-mdb-pres.html