in-memory data management trends & techniques
DESCRIPTION
www.hazelcast.comTRANSCRIPT
![Page 1: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/1.jpg)
Lightning TalkIn-Memory Data Management Trends & Techniques"
GREG LUCK!CTO HAZELCAST
![Page 2: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/2.jpg)
2
In-Memory Hardware Trends
How to Use It
![Page 3: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/3.jpg)
3
Von Neumann Architecture
3
![Page 4: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/4.jpg)
Hardware Trends
4
![Page 5: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/5.jpg)
5
Commodity Multi-core Servers
5
0
4
8
12
16
20
Cores/CPU
![Page 6: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/6.jpg)
UMA -> NUMA
![Page 7: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/7.jpg)
7
Commodity 64 bit servers
7
4GB 32 18EB 64
![Page 8: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/8.jpg)
8
50 Years of RAM Prices Historical and Projected
8
![Page 9: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/9.jpg)
9
50 Years of Disk Prices
9
![Page 10: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/10.jpg)
10
SSD Prices
10
Average Price $1/GB
![Page 11: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/11.jpg)
11
Cost Comparison: USD/GB 2012
11
Disk: $0.04
SSD: $1 25x
DRAM: $21 525x
$4k
$100k
$2.1m
100TB
![Page 12: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/12.jpg)
12
Max RAM Per Commodity Server
12
0 1 2 3 4 5 6 7 8 9
2010 2011 2012 2013
TB
![Page 13: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/13.jpg)
13
Latency across the network
13
0 10 20 30 40 50 60 70
µs
![Page 14: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/14.jpg)
14
Access Times & Sizes
14
Level RR Latency Typical Size Technology Managed By Registers <1 ns 1 KB Custom CMOS Compiler L1 Cache 1 ns 8 – 128 KB SRAM Hardware L2 Cache 3 ns .5 – 8 MB SRAM Hardware L3 Cache (oc) 10-15 ns 4 – 30 MB SRAM Hardware Main Memory 60 ns 16GB – TB DRAM OS/App SSD 50 -100us 400GB – 6TB Flash Memory OS/App Main Memory over Network
2-100us Unbounded DRAM/Ethernet/Infinband
OS/App
Disk 4 - 7ms Multiple TBs Magnetic Rotational Disk
OS/App
Disk over Network
6 - 10ms Unbounded Disk/Ethernet/Infiniband
OS/App
![Page 15: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/15.jpg)
15
Access Times & Sizes
15
Level RR Latency Typical Size Technology Managed By Registers <1 ns 1 KB Custom CMOS Compiler L1 Cache 1 ns 8 – 128 KB SRAM Hardware L2 Cache 3 ns .5 – 8 MB SRAM Hardware L3 Cache (oc) 10-15 ns 4 – 30 MB SRAM Hardware Main Memory 60 ns 16GB – TB DRAM OS/App SSD 50 -100us 400GB – 6TB Flash Memory OS/App Main Memory over Network
2-100us Unbounded DRAM/Ethernet/Infinband
OS/App
Disk 4 - 7ms Multiple TBs Magnetic Rotational Disk
OS/App
Disk over Network
6 - 10ms Unbounded Disk/Ethernet/Infiniband
OS/App
Cache up to 30 times faster than memory. Memory 106 times faster than disk.
Network Memory 103 times faster than disk. SSD 102 faster than disk
![Page 16: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/16.jpg)
Techniques
16
![Page 17: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/17.jpg)
Exploit Data Locality
Data is more likely to be read if: • It was recently read (temporal locality) • If it is adjacent to other data (e.g. arrays, fields in an object) • If it is part of a pattern (e.g. looping, relations) • Some data is naturally accessed more frequently e.g. Pareto
Distribution
![Page 18: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/18.jpg)
Working with the CPU’s Cache Hierarchy
• Memory up to 30x slower than cache • Alleviated somewhat by NUMA, wide ���
channel, multi-channel/large cache • Vector instructions • Work with Cache Lines • Work with Memory Pages (TLBs) • Work with Prefetching • Exploit NUMA with cpu affinity������numactl --physcpubind=0 –localalloc java … ���
• Exploit natural data locality
![Page 19: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/19.jpg)
Data Locality Effects – intra machine
0
20
40
60
80
100
120
140
160
Linear Random - Page
Random - Heap
Intel U4100 i7-860 i7-2760QM
![Page 20: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/20.jpg)
20
Tiered Storage
20
20
Local Disk SSD and Rotational
(Restartable)
Local Storage
Heap Store
Off-Heap Store
5,000,000+
1,000,000
10
1000+
2,000+
Speed (TPS) Size (GB)
100,000
10,000s -
Network Storage
Network Accessible Memory - 100,000 +
![Page 21: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/21.jpg)
21
Data Locality Effects – inter machine
21 21
Compared with hybrid in-‐process and distributed cache: Latency = L1 speed * propor:on + L2 speed * propor:on
L1 = 0ms (< 5us) for on-‐heap and 50-‐100 us off-‐heap L2 = 1 ms 80% L1 Pareto Model:
= 0 * .8 + 1 * .2 = .2 ms
90% L1 Pareto Model: latency = 0 * .9 + 1 * .1
= .1 ms
![Page 22: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/22.jpg)
Columnar Storage
• Manipulate data locality • Sorted Dictionary compression
for finite values • Allows values to be held in
cache for SSE instructions • Better cache line effectiveness • Fewer CPU cache misses for
aggregate calculations • Cross-over point is around a
few dozen columns
![Page 23: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/23.jpg)
Parallelism
• Multi-threading • Avoid synchronized: CAS • Query using a scatter gather pattern • Map/Reduce e.g. Hazelcast Map/Reduce
![Page 24: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/24.jpg)
Java: Will it make the cut?
Garbage Collection limits heap usage. G1 and Balanced aim for <100ms at 10GB. ���
Unused Memory
64GB
4GB 4s
Heap
Java Apps Memory Bound
GC Pause Time
Available Memory
GC
Off-Heap Storage
No low-level CPU access ���
Java is challenged as an infrastructure language despite its newly popular ���
usage for this
![Page 25: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/25.jpg)
CEP/Stream Processing
• Don’t let data pool up and then process with “pull queries”. • Invert that and process it as it streams in. “push queries” • Queries execute against “tables” that breaks the stream up into���
a current time window • Hold the window and intermediate results in memory������������ Results are in real-time
![Page 26: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/26.jpg)
In-Situ Processing
Rather than moving the data to be processed you process it in-situ. ��� Examples: ��� - HANA Calculation Engine���- Google Big Query���- Exadata Storage Servers���- Hazelcast EntryProcessor and Distributed Executor Service
![Page 27: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/27.jpg)
27
Souped-Up Von Neumann Architecture
27
Memory Over The Network
Memory Over The Network
SSD (Flash and
RAM)
Multi-processor Multi-core/
Compression
64 bit DRAM
More Cache, NUMA, Wide/Multi channel, Locality
PCI Flash
PCI Flash
Vector/AES etc
![Page 28: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/28.jpg)
The Data Management Landscape
28
![Page 29: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/29.jpg)
29 29
The new data management world
Data Grid
Terracotta Coherence Gemfire …
![Page 30: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/30.jpg)
SAP HANA Relational | Analytical
• “Appliance” • Aggressive IA64 optimisations • ACID, SQL and MDX • In-memory SSD and Disk • Row and Column based Storage • Fast aggregation on column store • Single Instance 1TB limit • Uses compression (est. 5x size) • Parallel DB - round-robin, hash, or range partitioning of a table
with shared storage • Updates as delta inserts • Data is fed from source systems near real-time, real-time or
batch
![Page 31: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/31.jpg)
Volt DB Relational | New SQL | Operational | Analytical
• An all in-memory design • Full SQL and full ACID • Partitioned per core so that one thread own its partition –
avoids locking and latching • Redundancy provided by ���
multiples instances with ���writes being replicated
• Claims to be 45x faster
![Page 32: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/32.jpg)
Oracle Exadata Relational | Operational | Analytical | Appliance
• Combines Oracle RAC with “Storage Servers” • Connected with the box with Infiniband QDR • SS use PCI Flash (not SSD) for a 22 TB hardware cache • In-situ computation on the Storage Servers with “Smart Scan” • Uses “Hybrid Columnar Compression” a compromise of row
and column storage.
PCI Flash Card
![Page 33: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/33.jpg)
Terracotta BigMemory Key-Value | Operational | Data Grid
• In-memory • Key-value with the Ehcache and soon javax.cache APIs • In-process (L1) and server storage (L2) • Persistence via log-forward Fast Restart Store: SSD or Disk • Tiered Storage: local on-heap, local off-heap, server on-heap,
server off-heap • Partitions with consistent hashing • Search with parallel in-situ execution • Off-heap allows 2TB uncompressed in each app server Java
process and on each server partition • Compression • Speed ranging from < 1µs to a few ms.
![Page 34: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/34.jpg)
Hazelcast Key-Value | Operational | Data Grid
• In-memory • Key-value Map API and javax.cache API • Near cache and server data storage • Tiered Storage: local on-heap, local off-heap, server on-heap,
server off-heap • Partitions with consistent hashing • Search with parallel in-situ execution • In-situ processing with Entry Processors and Distributed
Executors • Speed ranging from < 1µs to a few ms.
![Page 35: In-memory Data Management Trends & Techniques](https://reader031.vdocument.in/reader031/viewer/2022020122/540875088d7f7205088b47e8/html5/thumbnails/35.jpg)
Disk is the new tape
35
SSD is the new disk
Memory is the new operational store