cse dept., (xhu) 1 the salishan conference on high-speed computing no free lunch, no hidden cost x....

25
CSE Dept., (XHU) 1 The Salishan conference on High-Speed Computing No Free Lunch, No Hidden Cost X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame 1 Department of Computer Science and Engineeri How Can Co-Design Help? The Salishan Conference on High- Speed Computing

Upload: lesley-bailey

Post on 31-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

CSE Dept., (XHU) 1The Salishan conference on High-Speed Computing

No Free Lunch, No Hidden Cost

X. Sharon Hu

Dept. Computer Science and Engineering

University of Notre Dame

11Department of Computer Science and Engineering

How Can Co-Design Help?

The Salishan Conference on High-Speed Computing

CSE Dept., (XHU) 2The Salishan conference on High-Speed Computing

Theme: Exposing Hidden Execution Costs

Cost of execution: performance and power Computation Communication Data motion Synchronization …

How can we strike a balance between the extremes? Hide as much as possible? Explicitly manage “all” costs?

My “position”: Expose widely and choose wisely Focus on power

CSE Dept., (XHU) 3The Salishan conference on High-Speed Computing

Why Taking the Position?

Expose widely Better understanding the contribution by each

component Allowing application-specific tradeoffs Providing opportunities for powerful co-design tools

Choose wisely Requiring sophisticated co-design tools Exploring more algorithm/software options

CSE Dept., (XHU) 4The Salishan conference on High-Speed Computing

But Easier Said Than Done! Heterogeneity

Compute nodes: (multi-core) CPU, GP-GPU, FPGA, … Memory components: on-chip, on-board, disks, … Communication infrastructure: bus, NoC, networks, …

Parallelism (”non-determinism”) Data access: movement, coherence, … Resource contention synchronization

CSE Dept., (XHU) 5The Salishan conference on High-Speed Computing

Outline

Why expose widely?

How to benefit from exposing widely?

How to choose wisely?

Going forward

CSE Dept., (XHU) 6The Salishan conference on High-Speed Computing

Why Expose Widely? (1)

Different programs has different power distribution

MemoryConstSM

ConstCache

TextCache

GPU Cores

}

Hong and Kim, ISCA 2010

GPU Power Distribution (NVidia GTX 280)

CSE Dept., (XHU) 7The Salishan conference on High-Speed Computing

Why Expose Widely? (2)

Energy consumptions of three sorting algorithms (Pentium 4 + GeForce 570)

Data movement impacts different algorithms differently

CSE Dept., (XHU) 8The Salishan conference on High-Speed Computing

Why Expose Widely? (3)

Application dependent

Massaki Kondo, et. al., SigARCH 2007

Performance degradation due to memory bus contention

CSE Dept., (XHU) 9The Salishan conference on High-Speed Computing

Outline

Why expose widely?

How to benefit from exposing widely?

How to choose wisely?

Going forward

CSE Dept., (XHU) 10The Salishan conference on High-Speed Computing

How to Benefit from “Exposing Widely”?

Co-design is the key Expose all factors impacting the “execution model”

Computation: processing resource Data motion: memory components and hierarchy Communication: bus and network Resource contention, synchronization… Some examples

Software macromodelingHardware module-based modeling

Optimize through power management Keep in mind Amdahl’s law

CSE Dept., (XHU) 11The Salishan conference on High-Speed Computing

Macromodeling: Algorithm Complexity Based

Relate power/energy of a program with its complexity

Example: E = C1S + C2S2 + C3S3 (Tan, et. al. DAC’01) where S is the size of the array for a sorting algorithm

Example: Ecomm = C0 + C1S (Loghi, et. al. ACMTECS’07) where S is the size of exchanged messages

More sophisticated models to account for both computing and communication

How to handle resource contention?

CSE Dept., (XHU) 12The Salishan conference on High-Speed Computing

Power Modeling of Bus Contension

Penolazzi, Sander and Ahmed Hemani: DATE’11 Characterization step

C%N,1 : percentage of cycle difference between the N-

processor case and 1-processor case Can be one by IP providers on chosen benchmarks

Prediction step

)1(,)(

)( %1, CTCt

TNt

cycleE

NE Nstall

idleaa EnEEnE )()1()(

CSE Dept., (XHU) 13The Salishan conference on High-Speed Computing

Hierarchical Module-Based Power Modeling Accumulate energy/power of modules

CPU+GPU example

Access rate: software dependent

Data movement contributes to memory power

Resource contention modifies access rate

)()()( iotheri

iitotal MPMPMUtilP

idlei

imemGPUCPUtotal PMPPPPP )(

)()(

)()()(

ii

iii

MNonGatedPMMaxP

MgArchScalinMAccessRateMP

Adapted from Isci and Martonosi, Micro’03

CSE Dept., (XHU) 14The Salishan conference on High-Speed Computing

Outline

Why expose widely?

How to benefit from exposing widely?

How to choose wisely?

Going forward

CSE Dept., (XHU) 15The Salishan conference on High-Speed Computing

Managing Bus Contention to Reduce Energy

M. Kondo, H. Sasaki and H. Nakamura, 2006

Counter for mem request

Register for PU identification

Thresholds for selecting which PU uses what Vdd value

CSE Dept., (XHU) 16The Salishan conference on High-Speed Computing

Application Mapping to Reduce Energy (1)

Application mapping for heterogeneous systems

J1 J2

J3 J4

([minR1,maxR1], D1) ([minR2,maxR2], D2)

PE 1 PE 2

PE 3 PE 4

Memory

([minR4,maxR4], D4)([minR3,maxR3], D3)

R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.

CSE Dept., (XHU) 17The Salishan conference on High-Speed Computing

Application Mapping to Reduce Energy (2)

Optimization: Minimize power/energy dissipation Satisfying timing properties (e.g. average path latency,

average lateness, etc.) …

Search Space: Scheduling parameter, traffic shaping, … Task level DVFS, i.e. task speed assignment Resource level DVFS, i.e., resource speed assignment …

CSE Dept., (XHU) 18The Salishan conference on High-Speed Computing

Application Mapping (3): Sensitivity Analysis

R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.

CSE Dept., (XHU) 19The Salishan conference on High-Speed Computing

Application Mapping (4): GA-Based Approach

PowerAnalyzer

2’. Scheduling Trace

3’. Power Dissipation

Power model needed

CSE Dept., (XHU) 20The Salishan conference on High-Speed Computing

A Sample Result

CSE Dept., (XHU) 21The Salishan conference on High-Speed Computing

Outline

Why expose widely?

How to benefit from exposing widely?

How to choose wisely?

Going forward

CSE Dept., (XHU) 22The Salishan conference on High-Speed Computing

Going Forward: Systematic Co-design Effort

Expose more More hardware counters / registers More efficient/accurate high-level power models Better models for resource contention and

synchronization

Choose better Handling parallelism

Algorithm, OS, hardwareResource contentionsynchronization

Handling non-determinismWorst case boundsStatistical analysisInterval-based techniques

CSE Dept., (XHU) 23The Salishan conference on High-Speed Computing

ES Design v.s. HPCS Design Differences (maybe)

Application specific workloads v.s. domain specific workloads

Constraints, objectives, desirables? latency, throughput, energy, cost, reliability, fault

tolerance, IP protection/privacy, ToM, … Other issues: homogeneous v.s. heterogeneous, levels

of complexity, user expertise,…

Similarities Ever increasing hardware capability: multi-core, multi-

thread, complex communication fabrics, memory hierarchy, …

Productivity gap Common concerns: latency, throughput, energy, cost,

reliability, fault tolerance, …

CSE Dept., (XHU) 24The Salishan conference on High-Speed Computing

Leverage Co-Design for HPC

Systematic performance estimation Formal methods: scenario-based, statistical analysis Hybrid approaches: analytical+simulation Seamless migration from one abstraction level to the

next

Efficient design space exploration Efficient search techniques Multiple-level abstraction models Multiple-attribute optimization Others: memory and communication analysis and

design

CSE Dept., (XHU) 25The Salishan conference on High-Speed Computing

Thank you!