region-based hierarchical operation partitioning for multicluster processors michael chu, kevin fan,...

18
Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian Petrescu-Prahova

Upload: avis-wood

Post on 17-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Region-based Hierarchical Operation Partitioning for Multicluster Processors

Michael Chu, Kevin Fan, Scott Mahlke

University of MichiganPresented by Cristian Petrescu-

Prahova

Page 2: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Clustered Register Files Why?

Register file cost and access time grows with the square of he number of register ports

Bypass logic grows quadratically with the number of operations issued per cycle

Distance separating FUs from register file increases with a large number of FUs

=> Clustered register files Decentralized architecture with several small register

files Each register file supplies operands to a subset of FUs Multiflow Trace, Alpha 21264, TI C6x, Analog

Tigersharc (two clusters); reconfigurable meshes?

Page 3: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Goal Partition operations across the resources

available on each cluster to maximize ILP Minimize inter-cluster communication Rule of thumb:

2 identical clusters processor loose ~20% performance

4 identical clusters processor loose ~30% performance

Nonidentical clusters lead to even more performance loss

Page 4: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Well Known Technique:Bottom-Up Greedy Recurse along DFG,

critical path first Assign each operation a

cluster based on estimates of when the operation and its predecessors can complete earliest (from scheduler)

Problem 1: makes local decisions (see figure)

Problem 2: is slow - needs to query accurate cluster status info for each operation considered

Page 5: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Region-Based Hierarchical Operation Partitioning

Works on acyclic DFGs extracted from the complete program based on region decomposition. I assume region ~ loop (?!?)

Two phases: Weigth calculation: Node and Edge Partitioning: Coarsening and Refining

Page 6: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Node Weight Calculation

Reflects the quantity of resources per operation

Ignores dependencies Individual weight (FUs)

Shared weight (ports, buses)

Page 7: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Edge Weight Calculation Measure of criticalness Based on the notion of slack

First come first serve slack distribution

Page 8: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Coarsening Partitioning Multilevel graph partitioning algorithm (Chaco,

Metis) Works by coarsening highly related nodes into

partitions, takes in account only edge weights Takes a snapshot of each step for refining step

Page 9: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Refinement Partitioning Traverse back the coarsening stages, making

improvements to the initial partition At each stage the coarsened nodes available at that

point are considered for movement to another cluster Highly related operations are grouped together at each

stage because we follow the coarsening process backwards

Metrics Cluster weight

estimate of the load per cluster the cluster with highest weight is denoted ‘the imbalanced

cluster’ System load

Estimates the load across all clusters Gain

The gain of moving operations into other clusters

Page 10: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Cluster Weight Individual resource

constraint per cluster, per cycle (op groups)

Total node weight per cluster per cycle (shared constraints)

Cycle weight per cluster

Cluster weight

Page 11: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Sytem Load Inter-cluster move

overhead Total load, based

on cycle by cycle estimation

Page 12: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Gain Load gain

Edge gain

Move gain

Page 13: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Example

Page 14: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Evaluation Implemented using Trimaran tool set Compared with BUG algorithm 5 DSP benchmarks (high ILP), SPECint2000 (low ILP) 5 configurations, functional units: integer (I), float

(F), memory (M), branch (B)

Page 15: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Improvement in dynamic total cycles of RHOP over BUG

Page 16: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Comparison of BUG and RHOP clustering performance versus a 1-cluster machine

2-1111 processor 4-1111 processor

Page 17: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Histogram of RHOP versus BUG

Achieved schedule length versus critical path length. Numbers of top are dynamic execution percentage

Page 18: Region-based Hierarchical Operation Partitioning for Multicluster Processors Michael Chu, Kevin Fan, Scott Mahlke University of Michigan Presented by Cristian

Compiling performance: number of calls to the resource table