global trees: a framework for linked data structures on distributed memory parallel systems

26
Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems D. Brian Larkins, James Dinan, Sriram Krishnamoorthy, Srinivasan Parthasarthy, Atanas Rountev, P. Sadayappan

Upload: lanai

Post on 07-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems. D. Brian Larkins, James Dinan, Sriram Krishnamoorthy, Srinivasan Parthasarthy, Atanas Rountev, P. Sadayappan. Background. Trees and graphs can concisely represent relationships between data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Global Trees: A Framework for Linked Data Structures on

Distributed Memory Parallel Systems

Global Trees: A Framework for Linked Data Structures on

Distributed Memory Parallel Systems

D. Brian Larkins, James Dinan, Sriram Krishnamoorthy, Srinivasan Parthasarthy, Atanas Rountev, P. SadayappanD. Brian Larkins, James Dinan, Sriram Krishnamoorthy,

Srinivasan Parthasarthy, Atanas Rountev, P. Sadayappan

Page 2: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Background

• Trees and graphs can concisely represent relationships between data

• Data sets are becoming increasingly large and can require compute-intensive processing

• Developing efficient, memory hierarchy-aware applications is hard

Page 3: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Sample Applications

• n-body simulation

• Fast Multipole Methods (FMM)

• multiresolution analysis

• clustering and classification

• frequent pattern mining

Page 4: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Key Contributions

• Efficient fine-grained data access with a global view of data

• Exploit linked structure to provide fast global pointer dereferencing

• High-level, locality-aware, parallel operations on linked data structures

• Application-driven customization

• Empirical validation of the approach

Page 5: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Framework Design

Page 6: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Global Chunk Layer (GCL)

• API and run-time library for managing chunks - built on ARMCI

• Abstracts common functionality for handling irregular, linked data

• Provides a global namespace with access and modification operations

• Extensible and highly customizable to maximize functionality and performance

Page 7: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Chunks

A chunk is:

• Contiguous memory segment

• Globally accessible

• Physically local to only one process

• Collection of user-defined elements

• Unit of data transfer

Page 8: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Programming Model

• SPMD with MIMD-style parallelism

• Global pointers permit fine-grained access

• Chunks allow coarse-grained data movement

• Uses get/compute/put model for globally shared data access

• Provides both uniform global view and chunked global view of data

Page 9: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Global Pointers

c = &p.child[i] + p.child[i].ci + p.child[i].no

} }cp

4252 + -4252 + 4340

Page 10: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Global Trees (GT)

• Run-time library and API for global view programming trees on DM clusters

• Built on GCL chunk communication framework

• High-level tree operations which work in parallel and are locality aware

• Each process can asynchronously access any portion of the shared tree structure

Page 11: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

GT Concepts

• Tree Groups• set of global trees

• allocations are made from the same chunk pool

• Global Node Pointers

• Tree Nodes• link structure managed by GT

• body is user-defined structure

Page 12: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Example: Copying a Tree

Page 13: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Tree Traversals

• GT provides optimized, parallel traversals for common traversal orders

• Visitor callbacks are application-defined computations on a single node

• GT currently provides top-down, bottom-up, and level-wise traversals

Page 14: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Sample Traversal Usage

Page 15: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Node Mapping

Page 16: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Custom Allocation

• No single mapping of data elements to chunks will be optimal

• GT/GCL supports custom allocators to improve spatial locality

• Allocators can use a hint from call-site and can keep state between calls

• Default allocation is local-open

Page 17: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Experimental Results

• Evaluate using:

• Barnes-Hut from SPLASH-2

• Compression operation from MADNESS

• GT compared with:

• Intel’s Cluster OpenMP and TreadMarks runtime

• UPC

Page 18: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Global Pointer Overhead

Barnes-Hut

compress()

Page 19: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Chunk Size and Bandwidth

Experiments run on the department WCI Cluster - 2.33GHz Intel Xeon, 6GB RAM, Infiniband

Page 20: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Impact of Chunk UtilizationBarnes-Hut

Experiments run on the department WCI Cluster - 2.33GHz Intel Xeon, 6GB RAM, Infiniband

Page 21: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Barnes-Hut Chunk Size Selection

Barnes-Hut application from SPLASH-2 suite

Page 22: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Barnes-Hut Scaling

chunk size = 256, bodies = 512k

Page 23: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Local vs. Remote AccessMADNESS compress()

Page 24: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Related Approaches

• Distributed Shared Memory (DSM)

• Cluster OpenMP, TreadMarks, Cashmere, Midway

• Distributed Shared Objects (DSO)

• Charm++, Linda, Orca

• Partitioned Global Address Space (PGAS) Languages and Systems

• UPC, Titanium, CAF, ARMCI, SHMEM, GASNET

• Shared pointer-based data structure support on distributed memory clusters

• Parallel Irregular Trees, Olden

• HPCS Programming Languages

• Chapel, X10

Page 25: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Future Work

• Global Graphs

• GT data reference locality tools

• More applications

Page 26: Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

Questions

email: [email protected]