accelerating complex data transfer for cluster computing · 2019. 12. 18. · for cluster computing...
TRANSCRIPT
![Page 1: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/1.jpg)
Accelerating Complex Data Transfer for Cluster Computing
Alexey Khrabrov, Eyal de Lara University of Toronto
HotCloud 2016
![Page 2: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/2.jpg)
Motivation
• Data processing is now CPU-bound
• Software layers can’t leverage fast datacenter networks
– network responsible for as low as 2% of overall performance [Ousterhout, K. et al., “Making sense of performance in data analytics frameworks”, NSDI’15]
• Data [de]serialization is one of the bottlenecks
– up to 26% of total CPU time [Trivedi, A. et al., “On the [ir]relevance of network performance for data processing”, HotCloud’16]
– prevents from fully leveraging RDMA
1
![Page 3: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/3.jpg)
Serialized data transfer …
…
object2 …
…
object3 …
…
header
field1
field2
pointer1
pointer2
ob
ject
1
Serialization
…
object2 data …
object3 data …
…
auxiliary info
ob
ject
1
dat
a field1
field2
…
…
object2 …
…
object3 …
…
header
field1
field2
pointer1
pointer2
ob
ject
1
Deserialization
…
object2 data …
object3 data …
…
auxiliary info
ob
ject
1
dat
a field1
field2 Transfer
Source Node Destination Node
2
![Page 4: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/4.jpg)
Transfer time breakdown: complex data
TreeMap; size: 64 MB raw, 24 MB serialized; 10 Gbit/s
3
80% overhead (for 100 Gbit/s – 97%)
![Page 5: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/5.jpg)
Transfer time breakdown: simple data
double[]; size: 80 MB; 10 Gbit/s
4
65% overhead
![Page 6: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/6.jpg)
Eliminating data [de]serialization
• Reason: pointer-based data structures become invalid when copied directly to another address space
– other reasons (e.g. different endianness) are irrelevant: assume that all nodes have the same architecture
• General idea: shared cluster-wide virtual address space
• Compact allocation of objects to be copied together
– continuous regions copied in a single operation – RDMA-friendly
5
![Page 7: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/7.jpg)
Compact object format and Direct transfer
…
object2 …
object3 …
…
header
field1
field2
pointer1
pointer2
ob
ject
1
Glo
bal
Hea
p O
bje
ct
…
object2 …
object3 …
…
header
field1
field2
pointer1
pointer2
ob
ject
1
Transfer
Source Node Destination Node
6
![Page 8: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/8.jpg)
Cluster-wide shared address space
• Virtual address space is huge -> can be shared – 128 TB (247), potentially 263 bytes
• Limited version of DSM (distributed shared memory)
• DSM original goal: trade off performance for transparency / ease of programming
• We use DSM to improve performance (but increase programming complexity)
7
![Page 9: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/9.jpg)
Assumptions
• Immutable shared objects
– modifications of the original are not propagated
– not very restrictive: e.g. immutable RDDs in Spark
• No need to be completely transparent to programmer
– explicit management of global objects
– possible to hide most of the details inside the framework
8
![Page 10: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/10.jpg)
Global heap
Node 1
Local heap
obj orig obj orig
exclusive region
Node 2
Local heap
… obj
copy
exclusive region
Coordinator
GObject obj = new GObject(...); obj.data = new MyFancyClass(...); //... obj.commit("key"); //... obj.release();
GObject obj = GHeap.get("key"); MyFancyClass data = obj.data; //... obj.release();
direct copy
obj orig
9
Directory
(rare) phys mem
phys mem
Architecture
![Page 11: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/11.jpg)
Global heap architecture
• Huge virtual address space region; the same on all nodes
• Partitioning: nodes allocate objects in own exclusive regions – minimal amount of coordination required
• Mapping to physical memory on demand
• Objects identified by keys mapped to <node, vaddr>
• 3-stage object creation: (1) reserve space; (2) populate with data; (3) commit (make available to other nodes)
• Explicit release of objects
10
![Page 12: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/12.jpg)
JVM-based implementation
• Prototype based on JamVM – HotSpot (“standard” JVM) – in progress
• Most of functionality implemented in native methods
• Still need some JVM modifications – memory allocator / garbage collector
– object header format
– bytecode interpreter / JIT compiler
• Details: in the paper
11
![Page 13: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/13.jpg)
Evaluation
• Microbenchmark (performance of the mechanism alone)
• Transfer objects between 2 identical nodes
• Direct copy vs. serialized – both standard Java serialization and Kryo
• HotSpot for serialized measurements, JamVM for direct copy
• TCP transport, 10 Gbit/s; expect better results with RDMA
• Overhead of JVM modifications: within 1%
12
![Page 14: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/14.jpg)
Evaluation: complex data (TreeMap)
13
10x
5.5x
![Page 15: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/15.jpg)
Evaluation: simple data (double[])
14
3x 3.5x
![Page 16: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/16.jpg)
Evaluation: small simple objects
15
![Page 17: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/17.jpg)
Proposed applications
• Data processing frameworks: Spark, Hadoop, etc.
– optimize shuffle stages (data exchange between all nodes)
– possible scheduling improvements; data migration is now cheaper
• Distributed in-memory storage
– store complex data efficiently
– reduce latency of set/get operations
• Fast IPC and RPC
– zero-copy within one machine (using shared memory)
16
![Page 18: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/18.jpg)
Current and future work directions
• Applications and macrobenchmarks
• RDMA
• Reliability / fault tolerance
• Storage considerations (spills to disk)
• Multiple address spaces for extremely large datasets
• Global heap space management, other implementation details…
17
![Page 19: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/19.jpg)
Conclusion
• Data [de]serialization is a bottleneck; doesn’t let us fully leverage fast network
• Designed a data transfer mechanism to avoid serialization
– main idea: shared cluster-wide virtual address space
• Use DSM to improve performance, trading off increased programming complexity
• Evaluation shows significant (up to 10x) speedup of data transfer
• Will explore applications that can benefit from this mechanism
18
![Page 20: Accelerating Complex Data Transfer for Cluster Computing · 2019. 12. 18. · for Cluster Computing Alexey Khrabrov, Eyal de Lara University of Toronto HotCloud 2016 . Motivation](https://reader033.vdocument.in/reader033/viewer/2022060903/609f61252ae1d579842a77c7/html5/thumbnails/20.jpg)
Questions?
19