highly scalable database for iot - amazon s3 › customerattachment › griddb...limitation 1...
TRANSCRIPT
© 2016 Toshiba Corporation
Industrial ICT Solutions, Toshiba Corporation
Highly Scalable Database for IoT
Confidential
© 2016 Toshiba Corporation 2
Various types of data - sensor data, log data and historical data is rapidly increasing
Purpose Built Database for Big Data is in high demand
Various types of data - sensor data, log data and historical data is rapidly increasing
Purpose Built Database for Big Data is in high demand
The Era of Big Data is here
Big Data Management requires flexibility and expandability
Risk Reduction
High Efficiency
New Value Creation
Increasing Data Analysis & Value CreationBig Data Management
Sensors
Logs
HistoricalData
Market Data
VolumeVelocity
Variety
© 2016 Toshiba Corporation 3
IoT systems requirements are unique – Generates enormous volumes of data every minute, second or sub-
second– Require to maintain consistency across individual sensor data– Need to process sharp fluctuations in volume of data
IoT systems requirements are unique – Generates enormous volumes of data every minute, second or sub-
second– Require to maintain consistency across individual sensor data– Need to process sharp fluctuations in volume of data
The Internet of Things
IoT DataTraditional IT Data
Time
Processing Volume
Time
Processing Volume
© 2016 Toshiba Corporation 4
Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained
Limitation 2– Trade off required for cluster management as shown below
Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained
Limitation 2– Trade off required for cluster management as shown below
Existing NoSQL has Limitations in IoT usage
Peer to Peer Master Slave
Pros Easy to redistribute data Pros Easy to maintain consistency
Cons Network overhead problem to maintain consistency among nodes
Cons - Difficult to redistribute data efficiently - SPOF problem of the master node
Node A
Node BNode C
Node D
Node A
Node B
Node C
Node D
Master Master'HA
Slave NodesNo Master Node
No Slave Node
© 2016 Toshiba Corporation 5
Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained
Limitation 2– Trade off required for cluster management as shown below
Limitation 1– Cannot maintain consistency across individual sensor data– No function to search data during designated time of period– Performance degrades when system memory is constrained
Limitation 2– Trade off required for cluster management as shown below
Existing NoSQL has Limitations in IoT usage
Peer to Peer Master Slave
Pros Easy to redistribute data Pros Easy to maintain consistency
Cons Network overhead problem to maintain consistency among nodes
Cons - Difficult to redistribute data efficiently - SPOF problem of the master node
Node A
Node BNode C
Node D
Node A
Node B
Node C
Node D
Master Master'HA
Slave NodesNo Master Node
No Slave Node
© 2016 Toshiba Corporation 6
GridDB - Highly Scalable Database for IoT GridDB is designed to meet the needs of IoT systems to process enormous volumes of data at high speed with reliability.
© 2016 Toshiba Corporation 7
GridDB Key Features
• GridDB's data model is Key-Container which can maintain data consistency within a container
• GridDB has some functions to manage time series data efficiently
IoT Oriented Data Model
• GridDB is a distributed In-Memory Database.• Using memory, GridDB accesses enormous volume of
data at high velocity.
High Performance
• GridDB has scale-out architecture. • You can add capacity seamlessly using commodity
hardware for near-linear performance improvement.Scalability
• GridDB replicates data automatically in a clustered system, eliminating the risk of a single point of failure.
High Availability
1.NoSQL 2.Time Series DB 3 . NewSQL (SQL with JOIN)
GridDB Cassandra MongoDB Riak KV Riak TS DynamoDB GemFire Apache HBase Redis KairosDB VoltDB Couchbase CouchDB InfluxDB
GridDB vs. other general purpose databases
GridDB is the only purpose built IoT Database that scales. No other database comes close in performance, functionality or strength.
© 2016 Toshiba Corporation 9
Data Models of NoSQL
Data Model
Key-Value Column-oriented Document-oriented Key-Container
Examples Riak Cassandra MongoDB GridDB
Key
Value
Key
Column
Value
Column
Value
Key
C0 C1 C2 C3Val Val Val ValVal Val Val ValVal Val Val Val
SchemaContainer
Key
JSON
© 2016 Toshiba Corporation 10
Container is a group of data set with schema GridDB has 2 types of Containers
– Collection Container : For Record Management– Time Series Container : For Time Series Record Management
Key-Container Data Model provides;– Data consistency within a container– Faster search because it has schema– TQL, SQL like query language, that improves application productivity
Container is a group of data set with schema GridDB has 2 types of Containers
– Collection Container : For Record Management– Time Series Container : For Time Series Record Management
Key-Container Data Model provides;– Data consistency within a container– Faster search because it has schema– TQL, SQL like query language, that improves application productivity
Key-Container Data Model
Container C
RRR
RowC
RRR
C
RRR
Partition Timestamp Voltage Current Temp.
2016/3/2 10:00 100 0.64 20.5
In this case, Each sensor is mapped to each container for scalability
C
RRR
Smart Meter w/Key-Container Model:
Time Series Data
© 2016 Toshiba Corporation 11
For efficient collection and analysis of IoT sensor data, GridDB supports Time Series Data– Time Series Container has "Expiry Release Function" which delete
expired time series data efficiently– TQL, SQL like query language, has Aggregation and Sampling
commands so that IoT application can analyze time series data easily and efficiently
For efficient collection and analysis of IoT sensor data, GridDB supports Time Series Data– Time Series Container has "Expiry Release Function" which delete
expired time series data efficiently– TQL, SQL like query language, has Aggregation and Sampling
commands so that IoT application can analyze time series data easily and efficiently
Time Series Data
SENSOR DATACOLLECTION
VISUALISATION& ANALYTICS
SENSOR-A SENSOR-B SENSOR-C SENSOR-D
QUICK NATIVE INTERFACE TQL INTERFACE
CONTAINER
ROW DATA
© 2016 Toshiba Corporation 12
In-Memory Architecture– Optimizes read/write I/O using GB class memory– GridDB's Event Driven Engine can process huge amount of
operations with small resources– It eliminates exclusive control and synchronization control overhead
of memory and disk accesses
In-Memory Architecture– Optimizes read/write I/O using GB class memory– GridDB's Event Driven Engine can process huge amount of
operations with small resources– It eliminates exclusive control and synchronization control overhead
of memory and disk accesses
High Performance
RDB
Transaction ProcessingQuery Processing
Buffer Processing
Demand Processing
I/O Processing
5 – 10 %
Event Driven Engine
GridDB Node
© 2016 Toshiba Corporation 13
Performance Comparison with RDB
0
10
20
30
40
50
60
Second
Running time
# of processing / secon
d (Relative value)
10,000 Facilities Management w/ 1 GridDB server
10,000 Facilities Management / 1 RDB server
This measurement was done as PoC of BEMS as shown in slide 18
© 2016 Toshiba Corporation 14
GridDB Performance translates to User Benefits!
10,000 Facilities Management app inone 50 core running RDB
can be replaced by 5 nodes with 2 cores running GridDB
Or
10,000 Facilities Management taking 3 operators managing the application
using a traditional Relational DB can be replaced by 1 operator
using GridDB
…etc.
Real Benefits
Example of performance translating to Real Benefits.The numbers are examples only. Actual depends on use case
Example of performance translating to Real Benefits.The numbers are examples only. Actual depends on use case
3Xover Generic NoSQL DB
10Xover Generic RDB
© 2016 Toshiba Corporation 15
Comparison with CassandraTh
roug
hput
(row
s/s)
Yahoo! Cloud Serving Benchmark (YCSB) Workload-A
Nodes
56,832 89,534
181,708
342,883
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
1 3
Load
Nodes
better
55,913
100,790
145,766
275,663
0
50,000
100,000
150,000
200,000
250,000
300,000
1 3
Run
CassandraGridDB
betterbetter
Confidential
Data : 100,000,000 rows11 Fields (96 Bytes) per row
Run : Read 50%, Update 50%
14,700
10,200
02,0004,0006,0008,000
10,00012,00014,00016,000
Cassandra GridDB
DB size(MB)
CassandraGridDB
© 2016 Toshiba Corporation 16
GridDB has scale-out architecture. You can add capacity seamlessly using commodity
hardware for near-linear performance improvement.
GridDB has scale-out architecture. You can add capacity seamlessly using commodity
hardware for near-linear performance improvement.
Scalability
Yahoo! Cloud Serving Benchmark(YCSB)
0
200
400
600
800
1,000
0 20 40
Thro
ughp
ut R
atio
Nodes
Write
0
1,000
2,000
3,000
4,000
0 20 40
Thro
ughp
ut R
atio
Nodes
Read
Row Size (Record Size) 724 Byte
# of Values 750,000,000
# of Replica 3
© 2016 Toshiba Corporation 17
Hybrid Cluster Management– Hybrid of Peer to Peer Cluster Management and Master-Slave Cluster
Management– Master node is determined dynamically and autonomously when master
node is down. No SPOF– Data replication and Failover are done automatically without loosing data
Hybrid Cluster Management– Hybrid of Peer to Peer Cluster Management and Master-Slave Cluster
Management– Master node is determined dynamically and autonomously when master
node is down. No SPOF– Data replication and Failover are done automatically without loosing data
High Availability
Data Distribution Table (Cached)
Master
Client
OriginalOriginal ReplicaReplicaOriginalOriginal ReplicaReplica
OriginalOriginal ReplicaReplicaOriginalOriginal ReplicaReplica
OriginalOriginalReplicaReplica
Client Client
Data Distribution Table
Hybrid Cluster Management Failover
Node 1 Node 2 Node 3 Node 4 Node 5
Data Replication
© 2016 Toshiba Corporation 18
Use Case : BEMS
F/WF/W
GridDB
Cloud PlatformF/W
F/W
In the Building Energy Management System (BEMS), meter data is sent to the cloud platform and is stored in GridDB. The platform manages many buildings meter data.
•Each building has about 50 sensors.•The Cloud Platform manages 1000 buildings or more (The data size is 2TB or more.)•Each sensor data is collected every one minute, each sensor data size is 70Bytes.•For a building, the amount of data size is 5MB per day and 2GB per year.•More than 1000 records Read & Write is required per second.
BEMS Application
© 2016 Toshiba Corporation 19
Use Case : Electric Power CompanyGridDB achieved 2,250 times the throughput of the original RDB system
Input Data216M records
(43.2GB)
GridDB
Processing Time = 40 min.
Output Data3072MB(XML)
Server (12 cores) x 5
Input Data144K records
(28.8MB)
RDB
Processing Time = 60 min.
Output Data2MB
(XML)
Server (32 cores) x 1
1,500 Times of Volume 2/3 of Processing Time
Throughput 8KB/sec
Throughput 18MB/sec
2,250 Times of Throughput
© 2016 Toshiba Corporation 20
Toshiba has developed an IoT centric NoSQL with benefits• GridDB’s Key-Container data model is designed for IoT including
specific functions targeted for Time series data management.• The distributed IMDB architecture provides high performance• The Scale-out architecture provide immense scalability• The hybrid cluster mgmt. provides high availability and avoids SPOF
Toshiba has developed an IoT centric NoSQL with benefits• GridDB’s Key-Container data model is designed for IoT including
specific functions targeted for Time series data management.• The distributed IMDB architecture provides high performance• The Scale-out architecture provide immense scalability• The hybrid cluster mgmt. provides high availability and avoids SPOF
In Summary
Scaleable IoT Centric Robust
GridDB
GridDB vs. other leading Databases
© 2016 Toshiba Corporation 21
GridDB CE / SE / AE
© 2016 Toshiba Corporation
Attributes
Popular databases are classified based on below 3 attributes
NoSQL - NoSQL DBs are non-relational or not only-SQL databases that focus on high scalability and availability and provide weak ACID and SQL support. These DBs mainly come in 4 flavors - KV Store, Document Store, Columnar Store, and Graphs
NewSQL (SQL with JOIN) – Databases that not only scale out but also ensures ACID guarantees and support SQL with Joins
Time Series DB - Time Series DBs are optimized to perform data operations on time series data (each data entry is associated with a timestamp)