protobase: it's about time for backend/database …...d s e b databases software engineering...

1
D S E B Databases Software Engineering and Protobase: It's About Time for Backend/Database Co-Design Marcus Pinnecke, Gabriel Durand Campero, Roman Zoun, David Broneske, Gunter Saake 18. Fachtagung für "Datenbanksysteme für Business, Technologie und Web" Backend Logic REST 3rd Party Tool REST In-House Tool REST Remote Service REST Database System Interface Remote Database System REST Database Prototype a Scholarly Search Running Example What you don’t have A prototype at hand attracting users and investors Neither a lot of money, nor a lot of time A large-scale cloud platform of more than a handful of machines Perhaps you only have a single high-end machine at hand Assume you are a small (research) team having awesome ideas for analytic feature for a new scholarly search engine. Storage Format Document Data Model Parallelization Framework Storage Engine Columnar Binary-Encoded JSON (CARBON) String Compressor Framework Query Engine Runtime Engine (incl. HTTP server and Python interpreter) User-Def. Backend Code Control Panel CARBON Files The Protobase System Protobase Instance (Microservice) REST API Client (NG5) Protobase Architecture Sketch Open source analytic NoSQL main-memory document store to prototype micro-webservices Instantiated with user-defined code Hides itself behind user-defined REST API direct access to storage engine and query engine (i.e., no query language) No inter-process comm. overhead Less data marshaling Less systems involved to get things done Extends engine with custom logic Optimization possible, though Philosophy: instance is micro-webservice Operates on CARBON files Parallelization across components Protobase Architecture Storage Engine (NG5) Storage Engine (NG5) Storage Engine (NG5) Storage Engine (NG5) Storage Engine (NG5) { "words":[ "hello", "world" ], "langs":[ { "name":"C", "code":"main() { printf(\"Hello, world!\"); }" }, { "name":"C+, "code":"int main() { std::cout << \"Hello, \ } ] } JSON (Plain-Text) Key Value (String) : Operational-Optimized A B C D C D 1 2 4 6 A B C C 1 2 D 4 5 6 String Table (Memory Resident) (Cached on Demand) Record Table A B C D 1 2 D Value (Non-String) 3 5 3 4 6 3 5 String Reference (Fixed-Sized) A 1 2 B C D 4 6 CARBON (Columnar Storage, binary) Analytic-Optimized State-of-the-Art (Key-Value-Pair Storage, binary, e.g., UBJSON) archive ::= archive-header string-table record-header carbon-object archive-header ::= magic-word version-string record-offset record-header ::= 'r' record-header-flags record-size string-table ::= 'D' num-strings table-flags ( no-compressor | huffman-compressor ) no-compressor ::= ('-' string-length string-id character+)+ huffman-compressor ::= huffman-dictionary huffman-string+ huffman-dictionary ::= 'd' character prefix-length prefix-code+ huffman-string ::= '-' string-id string-length data-length byte+ carbon-object ::= '{' object-id object-flags property-offset+ next-object-offset columnified- props+ '}' columnified-props ::= null-prop | nullable-prop | null-array-prop | nullable-array-prop | object- array-prop null-prop ::= 'n' column-length key-column nullable-prop ::= ( 'b' | number-type | 't' | 'o' ) column-length key-column offset-column? value-column number-type ::= unsigned-number | signed-number | 'f' unsigned-number ::= 'r' | 'h' | 'e' | 'g' signed-number ::= 'c' | 's' | 'i' | 'l' null-array-prop ::= 'N' column-length key-column length-column nullable-array-prop ::= ( 'B' | number-array-type | 'T' ) column-length key-column length-column value-column+ number-array-type ::= unsigned-number-array | signed-number-array | 'F' unsigned-number-array ::= 'R' | 'H' | 'E' | 'G' signed-number-array ::= 'C' | 'S' | 'I' | 'L' object-array-prop ::= 'O' column-length key-column offset-column column-groups+ column-groups ::= 'X' column-count object-count object-id-column offset-column column+ column ::= 'x' column-name ( null-column | nullable-column | object-column ) null-column ::= 'N' column-length offset-column value-column nullable-column ::= ( 'B' | number-array-type | 'T' ) column-length offset-column positioning- column ( column-length value-column )+ object-column ::= 'o' column-length offset-column positioning-column ( column-length carbon- object )+ column-name ::= string-id CARBON Files CARBON Grammar Check it out! Microsoft Academic Graph Analytics Libcarbon Protobase Space (MiB) 0 2000 500 1000 1500 JSON (Plain-Text) 2500 1752.8 UBJSON (Binary) 1614.8 BSON (Binary) 1798.5 1560.4 MessagePack (Binary) 2091.9 CARBON (Binary) 543.8 MiB RecordTable (Memory Resident) 1376.0 MiB StringTable (Cached) (size-optimized) 172.1 MiB -7.8% (unoptimized) +2.6% -11.0% -69.0% (+19,3) (+10,5) Memory Requirement Comparison (On Microsoft Academic Graph, Excerpt) PAN Queries (1) Displaying publication information given a paper title (2) Listing publications within a certain time span given a specific author’s name, (3) List publications via the is- cited-by relationship.

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protobase: It's About Time for Backend/Database …...D S E B Databases Software Engineering Protobase: It's About Time and for Backend/Database Co-Design Marcus Pinnecke, Gabriel

DS EB

Databases

SoftwareEngineering

andProtobase: It's About Time for Backend/Database Co-Design

Marcus Pinnecke, Gabriel Durand Campero, Roman Zoun, David Broneske, Gunter Saake

18. Fachtagung für "Datenbanksysteme für Business, Technologie und Web"

BackendLogic

REST

3rd PartyTool

REST

In-HouseTool

REST

Remote

Service

REST

DatabaseSystem

Interface

Remote

DatabaseSystem

REST

Database

Prototype a Scholarly Search Running Example

What you don’t haveA prototype at hand attracting users and investors

Neither a lot of money, nor a lot of time

A large-scale cloud platform of more than a handful of machines

Perhaps you only have a single high-end machine at hand

Assume you are a small (research) team having awesome ideas for analytic feature for a new scholarly search engine.

Storage Format

Document Data Model

Par

alle

lizat

ion

Fram

ewor

k

Storage Engine

Columnar Binary-Encoded JSON (CARBON) String Compressor Framework

Query Engine

Runtime Engine (incl. HTTP server and Python interpreter)

User-Def. Backend Code

Control Panel

CARBON Files

The Protobase System

Protobase Instance (Microservice)

REST API Client

(NG5)

Protobase Architecture Sketch

• Open source analytic NoSQL main-memory document store to prototype micro-webservices• Instantiated with user-defined code• Hides itself behind user-defined REST API• direct access to storage engine and query engine (i.e., no query language) • No inter-process comm. overhead• Less data marshaling • Less systems involved to get things done• Extends engine with custom logic• Optimization possible, though

• Philosophy: instance is micro-webservice • Operates on CARBON files • Parallelization across components

Protobase Architecture

Storage Engine(NG5)

Storage Engine(NG5)

Storage Engine(NG5)

Storage Engine(NG5)

Storage Engine(NG5)

{ "words":[ "hello", "world" ], "langs":[ { "name":"C", "code":"main() { printf(\"Hello, world!\"); }" }, { "name":"C+, "code":"int main() { std::cout << \"Hello, \ } ] }

JSON (Plain-Text)

Key Value (String):

Operational-Optimized

A

B

CD

CD

12

4

6

A B CC

1 2 D4

5 6

String Table

(Memory Resident)

(Cached on Demand)

Record Table

A B C D 1 2

D

Value (Non-String)

3

5

3

46

3 5

String Reference (Fixed-Sized)

A 1 2 B C D 4 6

CARBON (Columnar Storage, binary)

Analytic-Optimized

State-of-the-Art (Key-Value-Pair Storage, binary, e.g., UBJSON)

archive ::= archive-header string-table record-header carbon-object archive-header ::= magic-word version-string record-offset record-header ::= 'r' record-header-flags record-size string-table ::= 'D' num-strings table-flags ( no-compressor | huffman-compressor ) no-compressor ::= ('-' string-length string-id character+)+ huffman-compressor ::= huffman-dictionary huffman-string+ huffman-dictionary ::= 'd' character prefix-length prefix-code+ huffman-string ::= '-' string-id string-length data-length byte+ carbon-object ::= '{' object-id object-flags property-offset+ next-object-offset columnified- props+ '}' columnified-props ::= null-prop | nullable-prop | null-array-prop | nullable-array-prop | object- array-prop null-prop ::= 'n' column-length key-column nullable-prop ::= ( 'b' | number-type | 't' | 'o' ) column-length key-column offset-column? value-column number-type ::= unsigned-number | signed-number | 'f' unsigned-number ::= 'r' | 'h' | 'e' | 'g' signed-number ::= 'c' | 's' | 'i' | 'l' null-array-prop ::= 'N' column-length key-column length-column nullable-array-prop ::= ( 'B' | number-array-type | 'T' ) column-length key-column length-column value-column+ number-array-type ::= unsigned-number-array | signed-number-array | 'F' unsigned-number-array ::= 'R' | 'H' | 'E' | 'G' signed-number-array ::= 'C' | 'S' | 'I' | 'L' object-array-prop ::= 'O' column-length key-column offset-column column-groups+ column-groups ::= 'X' column-count object-count object-id-column offset-column column+ column ::= 'x' column-name ( null-column | nullable-column | object-column ) null-column ::= 'N' column-length offset-column value-column nullable-column ::= ( 'B' | number-array-type | 'T' ) column-length offset-column positioning- column ( column-length value-column )+ object-column ::= 'o' column-length offset-column positioning-column ( column-length carbon- object )+ column-name ::= string-id

CARBON Files

CARBON Grammar

Check it out!Microsoft Academic Graph Analytics

Libcarbon

Protobase

Spac

e (M

iB)

0

2000

500

1000

1500

JSON

(Plain-Te

xt)

2500

1752.8

UBJSON

(Binary)

1614.8

BSON

(Binary)

1798.5

1560.4

Messa

gePac

k

(Binary)

2091.9

CARBON

(Binary)

543.8 MiBRecordTable(Memory Resident)

1376.0 MiB

StringTable(Cached)

(size-optimized)

172.1 MiB

-7.8%

(unoptimized)

+2.6%-11.0%

-69.0%

(+19,3)

(+10,5)

Memory Requirement Comparison(On Microsoft Academic Graph, Excerpt) PAN Queries

(1) Displaying publication information given a paper title

(2) Listing publications within a certain time span given a specific author’s name,

(3) List publications via the is-cited-by relationship.