protobase: it's about time for backend/database …...d s e b databases software engineering...
TRANSCRIPT
DS EB
Databases
SoftwareEngineering
andProtobase: It's About Time for Backend/Database Co-Design
Marcus Pinnecke, Gabriel Durand Campero, Roman Zoun, David Broneske, Gunter Saake
18. Fachtagung für "Datenbanksysteme für Business, Technologie und Web"
BackendLogic
REST
3rd PartyTool
REST
In-HouseTool
REST
Remote
Service
REST
DatabaseSystem
Interface
Remote
DatabaseSystem
REST
Database
Prototype a Scholarly Search Running Example
What you don’t haveA prototype at hand attracting users and investors
Neither a lot of money, nor a lot of time
A large-scale cloud platform of more than a handful of machines
Perhaps you only have a single high-end machine at hand
Assume you are a small (research) team having awesome ideas for analytic feature for a new scholarly search engine.
Storage Format
Document Data Model
Par
alle
lizat
ion
Fram
ewor
k
Storage Engine
Columnar Binary-Encoded JSON (CARBON) String Compressor Framework
Query Engine
Runtime Engine (incl. HTTP server and Python interpreter)
User-Def. Backend Code
Control Panel
…
CARBON Files
The Protobase System
Protobase Instance (Microservice)
REST API Client
(NG5)
Protobase Architecture Sketch
• Open source analytic NoSQL main-memory document store to prototype micro-webservices• Instantiated with user-defined code• Hides itself behind user-defined REST API• direct access to storage engine and query engine (i.e., no query language) • No inter-process comm. overhead• Less data marshaling • Less systems involved to get things done• Extends engine with custom logic• Optimization possible, though
• Philosophy: instance is micro-webservice • Operates on CARBON files • Parallelization across components
Protobase Architecture
Storage Engine(NG5)
Storage Engine(NG5)
Storage Engine(NG5)
Storage Engine(NG5)
Storage Engine(NG5)
{ "words":[ "hello", "world" ], "langs":[ { "name":"C", "code":"main() { printf(\"Hello, world!\"); }" }, { "name":"C+, "code":"int main() { std::cout << \"Hello, \ } ] }
JSON (Plain-Text)
Key Value (String):
Operational-Optimized
A
B
CD
CD
12
4
6
A B CC
1 2 D4
5 6
String Table
(Memory Resident)
(Cached on Demand)
Record Table
A B C D 1 2
D
Value (Non-String)
3
5
3
46
3 5
String Reference (Fixed-Sized)
A 1 2 B C D 4 6
CARBON (Columnar Storage, binary)
Analytic-Optimized
State-of-the-Art (Key-Value-Pair Storage, binary, e.g., UBJSON)
archive ::= archive-header string-table record-header carbon-object archive-header ::= magic-word version-string record-offset record-header ::= 'r' record-header-flags record-size string-table ::= 'D' num-strings table-flags ( no-compressor | huffman-compressor ) no-compressor ::= ('-' string-length string-id character+)+ huffman-compressor ::= huffman-dictionary huffman-string+ huffman-dictionary ::= 'd' character prefix-length prefix-code+ huffman-string ::= '-' string-id string-length data-length byte+ carbon-object ::= '{' object-id object-flags property-offset+ next-object-offset columnified- props+ '}' columnified-props ::= null-prop | nullable-prop | null-array-prop | nullable-array-prop | object- array-prop null-prop ::= 'n' column-length key-column nullable-prop ::= ( 'b' | number-type | 't' | 'o' ) column-length key-column offset-column? value-column number-type ::= unsigned-number | signed-number | 'f' unsigned-number ::= 'r' | 'h' | 'e' | 'g' signed-number ::= 'c' | 's' | 'i' | 'l' null-array-prop ::= 'N' column-length key-column length-column nullable-array-prop ::= ( 'B' | number-array-type | 'T' ) column-length key-column length-column value-column+ number-array-type ::= unsigned-number-array | signed-number-array | 'F' unsigned-number-array ::= 'R' | 'H' | 'E' | 'G' signed-number-array ::= 'C' | 'S' | 'I' | 'L' object-array-prop ::= 'O' column-length key-column offset-column column-groups+ column-groups ::= 'X' column-count object-count object-id-column offset-column column+ column ::= 'x' column-name ( null-column | nullable-column | object-column ) null-column ::= 'N' column-length offset-column value-column nullable-column ::= ( 'B' | number-array-type | 'T' ) column-length offset-column positioning- column ( column-length value-column )+ object-column ::= 'o' column-length offset-column positioning-column ( column-length carbon- object )+ column-name ::= string-id
CARBON Files
CARBON Grammar
Check it out!Microsoft Academic Graph Analytics
Libcarbon
Protobase
Spac
e (M
iB)
0
2000
500
1000
1500
JSON
(Plain-Te
xt)
2500
1752.8
UBJSON
(Binary)
1614.8
BSON
(Binary)
1798.5
1560.4
Messa
gePac
k
(Binary)
2091.9
CARBON
(Binary)
543.8 MiBRecordTable(Memory Resident)
1376.0 MiB
StringTable(Cached)
(size-optimized)
172.1 MiB
-7.8%
(unoptimized)
+2.6%-11.0%
-69.0%
(+19,3)
(+10,5)
Memory Requirement Comparison(On Microsoft Academic Graph, Excerpt) PAN Queries
(1) Displaying publication information given a paper title
(2) Listing publications within a certain time span given a specific author’s name,
(3) List publications via the is-cited-by relationship.