achieving horizontal scalability - intersystems€¦ · horizontal scalability expand capacity of a...
TRANSCRIPT
Achieving Horizontal ScalabilityAlain Houf – Sales Engineer
2 | © InterSystems Corporation. All rights reserved. |
Scale Matters
InterSystems IRIS Database Platform lets you:
• Scale up and scale out
• Scale users and scale data
• Mix and match a variety of approaches to scalability, to suit your application and business needs
3 | © InterSystems Corporation. All rights reserved. |
Scaling Up: Vertical Scalability
4 | © InterSystems Corporation. All rights reserved. |
Expand capacity of an individual server by adding CPU, memory, I/O & networking
components to address workload requirements
Advantages Challenges
• Architectural simplicity
• Fine-grained balancing possible
• Software complexity
• Hardware limitations
• Non-linear price / performance
• Requires careful upfront sizing
Vertical Scalability
5 | © InterSystems Corporation. All rights reserved. |
InterSystems SQL Parallel Query Execution
Leverage multiple CPU cores to serve up SQL query results
• Spawns 1 process per core = vertical scalability
• Most beneficial for aggregation queries on large datasets
Currently considered by optimizer based on the %PARALLEL hint, fully transparent
automation under development
6 | © InterSystems Corporation. All rights reserved. |
Scaling Out: Horizontal Scalability
7 | © InterSystems Corporation. All rights reserved. |
Horizontal Scalability
Expand capacity of a cluster by adding servers to address workload requirements
Advantages Challenges
• Near-linear price /
performance
• Leverage commodity, virtual
& cloud-based systems
• Allows elastic scaling
• Software complexity
• Emphasis on networking
9 | © InterSystems Corporation. All rights reserved. |
Horizontally Scaling Users
10 | © InterSystems Corporation. All rights reserved. |
InterSystems ECP Application Servers
The InterSystems Enterprise Cache Protocol is a powerful mechanism to distribute data
and application logic across database instances. It decouples the execution of application
code from persisting the data it handles:
• ECP Application Server services user requests off a local database cache
• ECP Data Server persists updates to disk
Horizontally Scaling Cache: Allows the caches of multiple instances to each have an
independent working set in memory, kept in sync with persisted data
• Fully transparent to application code
11 | © InterSystems Corporation. All rights reserved. |
Horizontally Scaling Data
12 | © InterSystems Corporation. All rights reserved. |
Horizontally Scaling Data
13 | © InterSystems Corporation. All rights reserved. |
InterSystems SQL Sharding
SQL Sharding allows table data to be partitioned over multiple instances
• Takes parallel SQL processing one step further by distributing the work over multiple servers
rather than multiple processes on the same server
• Distributed data layout can further be exploited through parallel loading and 3rd party
frameworks like Apache Spark
Horizontally Scaling Cache: Allows cache of multiple instances to be added up to keep a
larger overall working set in memory
• Fully transparent to application code
14 | © InterSystems Corporation. All rights reserved. |
Independently Scaling Users and Data
InterSystems SQL Sharding
16 | © InterSystems Corporation. All rights reserved. |
Two main instance roles participate in a sharded cluster:
One Shard Master (DM)
• Entry point to the sharded namespace
• Stores table definitions, code, data for nonsharded tables
Any number of Shard Servers (DS)
• Provide scalable storage, cache capacity for sharded tables
• Sharded tables are partitioned across shard servers
• Nonsharded tables are mapped to shard servers via ECP
• Routine database is shared between all shard servers
• Transparent to user code, not accessed directly by users
shard master
shard
servershard
server
Sharded Architecture – Basics
17 | © InterSystems Corporation. All rights reserved. |
Sharded Architecture – Query Processing
1. Application issues query to shard master
2. Shard master analyzes query for partitioning
opportunities and sends shard-local queries to
shard servers
3. Shard-local queries are resolved by shard servers
and results sent back to master via ECP
4. Shard master aggregates shard-local query results
and sends main query results back to application
application
shard master
shard master
shard
servershard
server
18 | © InterSystems Corporation. All rights reserved. |
Sharded Architecture – Shard Master App Servers
Shard Master Application Servers (AM) scale user application workload while Shard
Servers scale query processing
• Use ECP to read nonsharded tables from the Shard Master Data Server (DM)
• Connect directly to the Shard Servers for sharded table data
DM
DS DS DS DS
AM AM AM
19 | © InterSystems Corporation. All rights reserved. |
Sharded Architecture – Query Shards
For demanding use cases, application servers can also be added to the shard level to
spread the shard-local query workload:
• Data Shards (DS) persist a partition’s data
• Query Shards (QS) query the data of the corresponding
Data Shard via ECP
For example, large ingestion workloads can
be sent straight to the data shards while query
shards reserve their cache for a concurrent
analytical query workloadDS
analytic ingest
DS DS
QS QS QS
DM
20 | © InterSystems Corporation. All rights reserved. |
Joining Sharded Tables
Cosharded joins
• Equijoins on the user-defined shard keys of two or more tables can be executed locally on each
shard
• Extremely efficient, scales well with number of tables and number of shards
Any set of sharded tables can be joined
• Each shard server can access data from other shards via ECP
• Efficient “shard tuple” algorithm assigns shard sets to each shard server
Sharded tables can be joined with nonsharded tables
• Shard servers access data from nonsharded tables via ECP
21 | © InterSystems Corporation. All rights reserved. |
Leveraging Other Features of InterSystems IRIS Data Platform
Mirroring
• Sharding leverages mirroring to provide High Availability
• Fully supported for all data-storing components of sharded clusters (DM & DS)
• Automatic completion of sharded queries upon node failover
InterSystems Connector for Apache Spark
• Leverages sharded topologies - Spark workers connect directly to shards to execute local
queries, do aggregating work in Spark itself
JDBC
• Transparently makes direct parallel connections to shards for high speed data ingestion
Use Cases
23 | © InterSystems Corporation. All rights reserved. |
Use Cases
Multi-Asset Global Trading System
• One of the top global investment banks who processes 13% of global equities trading volume,
runs its global trading system on top of InterSystems data platform.
• More than 2 billions of transactions/day, more than 6TB data are generated every day
• Has evaluated InterSystems IRIS for real-time data access, short term and long term storage,
replacing ECP app servers, replacing Sybase ASE, Sybase IQ and Rainstor. InterSystems IRIS
improves query performance by 300% and reduces cost by 70%.
Benchmark Service
• Another top global investment bank is evaluating InterSystems IRIS for replacing its existing
Sybase IQ for its benchmark service
• Has found that InterSystems IRIS is up to 2x faster than another in memory data base, and up to
3x~10x faster than Sybase IQ
Use Case 1Multi-Asset Global Trading System
Real Time Access and Data Storage on Private Cloud
25 | © InterSystems Corporation. All rights reserved. |
Use Case 1: Current InterSystems SQL Environment
The trading system persists intraday transactions to
hundreds of InterSystems SQL instances:
• They are divided into Data Servers (DS) and App Servers (AS)
• Interconnected by InterSystems ECP
• All of them are running on physical servers
• AS needs 3x of RAM than DS (128GB vs 40GB)
To avoid additional load on the AS’s by non trading
related queries, the customer has also set up Sybase ASE
instances and is replicating data from trading system/InterSystems SQL environment to
these ASE instances to serve those queries.
TSS/Hermes
TIS/Persistor
Data
Server
AS AS AS
Data
Server
Data
Server
26 | © InterSystems Corporation. All rights reserved. |
Use Case 1: Current Storage Infrastructure
In near real time, the customer replicates data from trading system/Caché to Sybase ASE
instances, typically one ASE instance will hold data from more than one trading system/Caché
instance for 7 days. At EOD, the customer dumps data from its trading system/Caché instances to
Sybase IQ for up to 6 months, and to Rainstor to keep them there forever.
Rainstor
forever
Sybase IQ
6 months
Sybase ASE
7 days
Caché
intraday
27 | © InterSystems Corporation. All rights reserved. |
Use Case 1: InterSystems IRIS for Real Time Access
Proposed InterSystems IRIS Architecture
• The trading system components TIS and persistors will
continue to store data into existing DS’s
• There will be no more expensive AS’s
• Cloud based InterSystems IRIS query cluster
• One or more InterSystems IRIS shard master(s)
• For each DS, there will be one or more IRIS query shard(s)
• Each node only requires 40GB RAM, no expensive
storage either.
This cloud based InterSystems IRIS configuration will provide a
real time, horizontally scalable query facility, that can replace current AS’s and Sybase ASE for
intraday queries. It will improve query performance by 300%, cut hardware cost by 70%.
TSS/Hermes & client apps
TIS/Persistor
DS DS DS
QS QS QS
DM
28 | © InterSystems Corporation. All rights reserved. |
Use Case 1: InterSystems IRIS for Data Storage
InterSystems IRIS native data replication will move data from InterSystems Caché data servers to
InterSystems IRIS data shards in near real time. The cloud based InterSystems IRIS data storage
facility can hold 7days, 30 days or 6months of trading data.
TSS/Hermes
TIS/Persistor
DS DS DS
QS QS QS
DM
ASE/IQ/Rainstor clients
DS DS DS
QS QS QS
DM
Use Case 2Benchmark Service
Succeed where Hadoop and Traditional Data Warehouse Fail to Deliver
30 | © InterSystems Corporation. All rights reserved. |
Use Case 2: Background
The investment bank has 18,000 benchmarks (14,000 benchmarks are from
external sources, 4,000 benchmarks are created internally). 8TB total data
volume.
Its asset managers need to use the benchmark service to compare the portfolio
they are managing for their clients against one or more benchmarks. Typically
end of the day.
Its real time strategy trading platform also uses the benchmark service to make
trading decisions during trading hours.
31 | © InterSystems Corporation. All rights reserved. |
Use Case 2: Challenges
The bank has a peta-bytes data lake on Hadoop.
Complex SQL Joins
• The bank has created many curated SQL stores to serve enterprise applications/customers.
• Currently Sybase ASE cannot keep up with applications/customers demand.
Low Latency Requirements
• In-memory SQL solutions are expensive and/or unstable
32 | © InterSystems Corporation. All rights reserved. |
Use Case 2: InterSystems IRIS Succeeds Where Others Fail
The bank deployed InterSystems IRIS on VMs provisioned from its private cloud
• Each VM has 4 cores, 32GB RAM, 200GB internal disk.
• Different sharding strategies by different sharding keys (indexID, businessDate)
InterSystems IRIS is up to 2x faster than another distributed in memory
database, up to 3x~10x faster than Sybase IQ in many test cases, and
InterSystems IRIS is always fast across the board.
Q&A
Thank you.