highly scalable database for iot - amazon s3 › customerattachment › griddb...limitation 1...

23
© 2016 Toshiba Corporation Industrial ICT Solutions, Toshiba Corporation Highly Scalable Database for IoT Confidential

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation

Industrial ICT Solutions, Toshiba Corporation

Highly Scalable Database for IoT

Confidential

Page 2: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 2

Various types of data - sensor data, log data and historical data is rapidly increasing

Purpose Built Database for Big Data is in high demand

Various types of data - sensor data, log data and historical data is rapidly increasing

Purpose Built Database for Big Data is in high demand

The Era of Big Data is here

Big Data Management requires flexibility and expandability

Risk Reduction

High Efficiency

New Value Creation

Increasing Data Analysis & Value CreationBig Data Management

Sensors

Logs

HistoricalData

Market Data

VolumeVelocity

Variety

Page 3: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 3

IoT systems requirements are unique – Generates enormous volumes of data every minute, second or sub-

second– Require to maintain consistency across individual sensor data– Need to process sharp fluctuations in volume of data

IoT systems requirements are unique – Generates enormous volumes of data every minute, second or sub-

second– Require to maintain consistency across individual sensor data– Need to process sharp fluctuations in volume of data

The Internet of Things

IoT DataTraditional IT Data

Time

Processing Volume

Time

Processing Volume

Page 4: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 4

Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained

Limitation 2– Trade off required for cluster management as shown below

Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained

Limitation 2– Trade off required for cluster management as shown below

Existing NoSQL has Limitations in IoT usage

Peer to Peer Master Slave

Pros Easy to redistribute data Pros Easy to maintain consistency

Cons Network overhead problem to maintain consistency among nodes

Cons - Difficult to redistribute data efficiently - SPOF problem of the master node

Node A

Node BNode C

Node D

Node A

Node B

Node C

Node D

Master Master'HA

Slave NodesNo Master Node

No Slave Node

Page 5: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 5

Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained

Limitation 2– Trade off required for cluster management as shown below

Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained

Limitation 2– Trade off required for cluster management as shown below

Existing NoSQL has Limitations in IoT usage

Peer to Peer Master Slave

Pros Easy to redistribute data Pros Easy to maintain consistency

Cons Network overhead problem to maintain consistency among nodes

Cons - Difficult to redistribute data efficiently - SPOF problem of the master node

Node A

Node BNode C

Node D

Node A

Node B

Node C

Node D

Master Master'HA

Slave NodesNo Master Node

No Slave Node

Page 6: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 6

GridDB - Highly Scalable Database for IoT GridDB is designed to meet the needs of IoT systems to process enormous volumes of data at high speed with reliability.

Page 7: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 7

GridDB Key Features

• GridDB's data model is Key-Container which can maintain data consistency within a container

• GridDB has some functions to manage time series data efficiently

IoT Oriented Data Model

• GridDB is a distributed In-Memory Database.• Using memory, GridDB accesses enormous volume of

data at high velocity.

High Performance

• GridDB has scale-out architecture. • You can add capacity seamlessly using commodity

hardware for near-linear performance improvement.Scalability

• GridDB replicates data automatically in a clustered system, eliminating the risk of a single point of failure.

High Availability

Page 8: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

1.NoSQL 2.Time Series DB 3 .  NewSQL (SQL with JOIN)

GridDB Cassandra MongoDB Riak KV Riak TS DynamoDB GemFire Apache HBase Redis KairosDB VoltDB Couchbase CouchDB InfluxDB

GridDB vs. other general purpose databases

GridDB is the only purpose built IoT Database that scales. No other database comes close in performance, functionality or strength.

Page 9: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 9

Data Models of NoSQL

Data Model

Key-Value Column-oriented Document-oriented Key-Container

Examples Riak Cassandra MongoDB GridDB

Key

Value

Key

Column

Value

Column

Value

Key

C0 C1 C2 C3Val Val Val ValVal Val Val ValVal Val Val Val

SchemaContainer

Key

JSON

Page 10: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 10

Container is a group of data set with schema GridDB has 2 types of Containers

– Collection Container : For Record Management– Time Series Container : For Time Series Record Management

Key-Container Data Model provides;– Data consistency within a container– Faster search because it has schema– TQL, SQL like query language, that improves application productivity

Container is a group of data set with schema GridDB has 2 types of Containers

– Collection Container : For Record Management– Time Series Container : For Time Series Record Management

Key-Container Data Model provides;– Data consistency within a container– Faster search because it has schema– TQL, SQL like query language, that improves application productivity

Key-Container Data Model

Container C

RRR

RowC

RRR

C

RRR

Partition Timestamp Voltage Current Temp.

2016/3/2 10:00 100 0.64 20.5

In this case, Each sensor is mapped to each container for scalability

C

RRR

Smart Meter w/Key-Container Model:

Time Series Data

Page 11: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 11

For efficient collection and analysis of IoT sensor data, GridDB supports Time Series Data– Time Series Container has "Expiry Release Function" which delete

expired time series data efficiently– TQL, SQL like query language, has Aggregation and Sampling

commands so that IoT application can analyze time series data easily and efficiently

For efficient collection and analysis of IoT sensor data, GridDB supports Time Series Data– Time Series Container has "Expiry Release Function" which delete

expired time series data efficiently– TQL, SQL like query language, has Aggregation and Sampling

commands so that IoT application can analyze time series data easily and efficiently

Time Series Data

SENSOR DATACOLLECTION

VISUALISATION& ANALYTICS

SENSOR-A SENSOR-B SENSOR-C SENSOR-D

QUICK NATIVE INTERFACE TQL INTERFACE

CONTAINER

ROW DATA

Page 12: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 12

In-Memory Architecture– Optimizes read/write I/O using GB class memory– GridDB's Event Driven Engine can process huge amount of

operations with small resources– It eliminates exclusive control and synchronization control overhead

of memory and disk accesses

In-Memory Architecture– Optimizes read/write I/O using GB class memory– GridDB's Event Driven Engine can process huge amount of

operations with small resources– It eliminates exclusive control and synchronization control overhead

of memory and disk accesses

High Performance

RDB

Transaction ProcessingQuery Processing

Buffer Processing

Demand Processing

I/O Processing

5 – 10 %

Event Driven Engine

GridDB Node

Page 13: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 13

Performance Comparison with RDB

0

10

20

30

40

50

60

Second

Running time

# of processing / secon

d (Relative value)

10,000 Facilities Management w/ 1 GridDB server

10,000 Facilities Management / 1 RDB server

This measurement was done as PoC of BEMS as shown in slide 18

Page 14: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 14

GridDB Performance translates to User Benefits!

10,000 Facilities Management app inone 50 core running RDB

can be replaced by 5 nodes with 2 cores running GridDB

Or

10,000 Facilities Management taking 3 operators managing the application

using a traditional Relational DB can be replaced by 1 operator

using GridDB

…etc.

Real Benefits

Example of performance translating to Real Benefits.The numbers are examples only. Actual depends on use case

Example of performance translating to Real Benefits.The numbers are examples only. Actual depends on use case

3Xover Generic NoSQL DB

10Xover Generic RDB

Page 15: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 15

Comparison with CassandraTh

roug

hput

(row

s/s)

Yahoo! Cloud Serving Benchmark (YCSB) Workload-A

Nodes

56,832 89,534

181,708

342,883

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

1 3

Load

Nodes

better

55,913

100,790

145,766

275,663

0

50,000

100,000

150,000

200,000

250,000

300,000

1 3

Run

CassandraGridDB

betterbetter

Confidential

Data : 100,000,000 rows11 Fields (96 Bytes) per row

Run : Read 50%, Update 50%

14,700

10,200

02,0004,0006,0008,000

10,00012,00014,00016,000

Cassandra GridDB

DB size(MB)

CassandraGridDB

Page 16: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 16

GridDB has scale-out architecture. You can add capacity seamlessly using commodity

hardware for near-linear performance improvement.

GridDB has scale-out architecture. You can add capacity seamlessly using commodity

hardware for near-linear performance improvement.

Scalability

Yahoo! Cloud Serving Benchmark(YCSB)

0

200

400

600

800

1,000

0 20 40

Thro

ughp

ut R

atio

Nodes

Write

0

1,000

2,000

3,000

4,000

0 20 40

Thro

ughp

ut R

atio

Nodes

Read

Row Size (Record Size) 724 Byte

# of Values 750,000,000

# of Replica 3

Page 17: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 17

Hybrid Cluster Management– Hybrid of Peer to Peer Cluster Management and Master-Slave Cluster

Management– Master node is determined dynamically and autonomously when master

node is down. No SPOF– Data replication and Failover are done automatically without loosing data

Hybrid Cluster Management– Hybrid of Peer to Peer Cluster Management and Master-Slave Cluster

Management– Master node is determined dynamically and autonomously when master

node is down. No SPOF– Data replication and Failover are done automatically without loosing data

High Availability

Data Distribution Table (Cached)

Master

Client

OriginalOriginal ReplicaReplicaOriginalOriginal ReplicaReplica

OriginalOriginal ReplicaReplicaOriginalOriginal ReplicaReplica

OriginalOriginalReplicaReplica

Client Client

Data Distribution Table

Hybrid Cluster Management Failover

Node 1 Node 2 Node 3 Node 4 Node 5

Data Replication

Page 18: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 18

Use Case : BEMS

F/WF/W

GridDB

Cloud PlatformF/W

F/W

In the Building Energy Management System (BEMS), meter data is sent to the cloud platform and is stored in GridDB. The platform manages many buildings meter data.

•Each building has about 50 sensors.•The Cloud Platform manages 1000 buildings or more (The data size is 2TB or more.)•Each sensor data is collected every one minute, each sensor data size is 70Bytes.•For a building, the amount of data size is 5MB per day and 2GB per year.•More than 1000 records Read & Write is required per second.

BEMS Application

Page 19: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 19

Use Case : Electric Power CompanyGridDB achieved 2,250 times the throughput of the original RDB system

Input Data216M records

(43.2GB)

GridDB

Processing Time = 40 min.

Output Data3072MB(XML)

Server (12 cores) x 5

Input Data144K records

(28.8MB)

RDB

Processing Time = 60 min.

Output Data2MB

(XML)

Server (32 cores) x 1

1,500 Times of Volume 2/3 of Processing Time

Throughput 8KB/sec

Throughput 18MB/sec

2,250 Times of Throughput

Page 20: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 20

Toshiba has developed an IoT centric NoSQL with benefits• GridDB’s Key-Container data model is designed for IoT including

specific functions targeted for Time series data management.• The distributed IMDB architecture provides high performance• The Scale-out architecture provide immense scalability• The hybrid cluster mgmt. provides high availability and avoids SPOF

Toshiba has developed an IoT centric NoSQL with benefits• GridDB’s Key-Container data model is designed for IoT including

specific functions targeted for Time series data management.• The distributed IMDB architecture provides high performance• The Scale-out architecture provide immense scalability• The hybrid cluster mgmt. provides high availability and avoids SPOF

In Summary

Scaleable IoT Centric Robust

GridDB

GridDB vs. other leading Databases

Page 21: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation 21

GridDB CE / SE / AE

Page 22: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

© 2016 Toshiba Corporation

Page 23: Highly Scalable Database for IoT - Amazon S3 › CustomerAttachment › GridDB...Limitation 1 –Cannot maintain consistency across individual sensor data –No function to search

Attributes

Popular databases are classified based on below 3 attributes

NoSQL - NoSQL DBs are non-relational or not only-SQL databases that focus on high scalability and availability and provide weak ACID and SQL support. These DBs mainly come in 4 flavors - KV Store, Document Store, Columnar Store, and Graphs

NewSQL (SQL with JOIN) – Databases that not only scale out but also ensures ACID guarantees and support SQL with Joins

Time Series DB - Time Series DBs are optimized to perform data operations on time series data (each data entry is associated with a timestamp)