gpu power model nandhini sudarsanan [email protected]@umn.edu nathan vanderby...

31
GPU Power Model Nandhini Sudarsanan [email protected] Nathan Vanderby [email protected] Neeraj Mishra [email protected] Usha Vinodh [email protected] Chi Xu [email protected]

Upload: melissa-nichols

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

2

Outline

Introduction and Motivation

Analytical Model Description

Experiment Setup

Results

Conclusion and Further Work

5/4/11

Page 3: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

3

Introduction

Develop a methodology for building an accurate power model for a GPU.

Validate with a NVIDA’s GTX 480 GPU.

Measure power efficiency of various NVIDIA SDK benchmarks.

Accurate power model can helpExplore various architectural and algorithmic trade offs.Figure out balance of workload between GPU and CPU.

5/4/11

Page 4: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

4

Motivation

Power Consumption: Key criterion for future Hardware Devices and Embedded Software.

Effect of increased power density has been not been felt till now Supply voltage was scaled back too. Current and Power density remained constant.

Further reduction in supply voltage difficult in future Supply voltage approaching close to threshold voltage. Gate oxide thickness almost equal to 1nm.

5/4/11

Page 5: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

5

Motivation

5/4/11

Page 6: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

6

GPU Processing Power

5/4/11

Page 7: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

7

Price of Power

Maximum Load = Lot of Power Nvidia 8800 GTX: 137W Intel Xeon LS5400: 50W

5/4/11

Page 8: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

8

Power Wall

Power Density in GPUs larger that even high end CPUs

Power gating, Clock gating have been successfully employed in CPUs [Brooks, Hpca 2001]

Power gating, Clock gating and other H/W based schemes are not used in most GPUs [Kim Isca 2010]

Accurate power model can help Explore various architectural and algorithmic trade offs. Figure out balance of workload between GPU and CPU.

5/4/11

Page 9: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

9

Background

Power consumption can be divided into:

Power = Dynamic_power + Static_power + Short_Ckt_Power

Dynamic power is determined by run-time events Fixed-function units: texture filtering and rasterization Programmable units: memory and floating point

Static power determined by circuit technology chip layout operating temperature.

P = VCC * N* Kdesign* Ileak

5/4/11

Page 10: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

10

Previous Power Models

Statistical power modeling approach for GPU [Matsuoka 2010] Uses 13 CUDA Performance counters (ld,st,branch,tlb miss) to obtain profile Finds correlation b/w profiles and power by statistical model learning. Lot of information not captured by counters lost

Cycle-level simulations based Power Model ,[Skadron HWWS'04] Assume hypothetical architecture to explore new GPU microarchitectures and model

power and leakage properties Cycle-level processor simulations are time consuming [Martonosi&Isci 2003] Do not allow a complete view of operating system effects, I/O [Isci 2003]

5/4/11

Page 11: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

11

Outline

Introduction and Motivation

Analytical Model Description Parser Power Model

Experiment Setup

Results

Conclusion and Further Work

5/4/11

Page 12: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

12

Need for a Parser

GPGPUsim is time consuming

GPGPUsim output is not tailored to our needs

Parser is very fast

GPGPUsim works only with CUDA 2.3 or prior

5/4/11

Page 13: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

13

Limitations of the Parser

Dynamic loops are not automatically determined.

Branch prediction is assumed to be taken

Highly tailored to our specific needs.

A change in the PTX layout might require change to parser.

5/4/11

Page 14: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

14

Outline

Introduction and Motivation

Analytical Model Description Parser Power Model

Experiment Setup

Results

Conclusion and Further Work

5/4/11

Page 15: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

15

Fermi Architecture: sm_20

5/4/11

Memory Hierarchy PCIE & RAM L2 Cache L1 Cache Shared Memory Registers

Streaming Processor 32 ALU, 32FPU, 4SFU 2 Pipelines, 16-24 stages 2 Warp Scheduler, 2 Inst /Cycle

Page 16: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

16

Fermi Architecture: sm_20

5/4/11

Memory Hierarchy PCIE & RAM L2 Cache L1 Cache Shared Memory Registers

Streaming Processor 32 ALU, 32FPU, 4SFU 2 Pipelines, 16-24 stages 2 Warp Scheduler, 2 Inst /Cycle

Page 17: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

17

Factors in the Power Model

Temperature # of SMs

5/4/11

Page 18: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

18

Power Model

Assembly Level

5/4/11

Page 19: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

19

Outline

Introduction and Motivation

Analytical Model Description Parser Power Model

Experiment Setup

Results

Conclusion and Further Work

5/4/11

Page 20: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

20

Experiment Setup - Hardware

Measure Power Consumption and Temperature Sample Temperature @ 10Hz, GPU sensor Current Clamp for PCIE & GPU Power Cable

Data Acquisition Card @ 100Hz GPU Performance Counter

Profile 57 Counters per Kernel 9 Executions

5/4/11

Page 21: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

21

Experiment Setup - Software Driver API

PTX level Micro-benchmark Minimize control loops Stress one type of PTX instruction per kernel, over 95% 76 kernels Wisely choose block and grid size and

CUDA 4.0 Built in Binary -> Assembly Converter (cuobjdump)

Timer interrupt to collect Temperature

Remote login

5/4/11

Page 22: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

22

Limitations of PTX

Higher level than assembly 30 out of 76 PTX take multiple assembly Divide, Sqrt, etc.: 1 PTX line, library in assembly

Compiler optimizations from PTX -> assembly

Doesn’t reflect RAW dependencies

Performance counters results based on assembly

5/4/11

Page 23: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

23

CUDA – Fermi Architecture

Third Generation Streaming Multiprocessor(SM) 32 CUDA cores per SM, 4x over GT200 1024 thread block size, 2x over GT200 Unified address space enables full C++ support Improved Memory Subsystem

5/4/11

Page 24: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

24

CUDA – Fermi Architecture

5/4/11

Fermi Memory Hierarchy

RegistersSM - 0

L1 Cache Shared Mem.

Registers

SM - N

L1 Cache Shared Mem.

L2 Cache

Global Memory

Page 25: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

25

Validation Benchmarks

Small number of overhead operations (loop counters, initialization, etc.).

Computational intensive work to allow for an experiment of significant length for accurate current measurement.

Exhibit high utilization of the CUDA cores, few data hazards as possible.

Grid and block sizes appropriately so that all SM are used, since idle SM leak.

Accordingly 7 benchmarks were selected from CUDA SDK.

5/4/11

Page 26: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

26

Validation Benchmarks

Our benchmarks 2D convolution Matrix Multiplication Vector Addition Vector Reduction Scalar Product DCT 8x8 3DFD

5/4/11

Page 27: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

27

Outline

Introduction and Motivation

Analytical Model Description Parser Power Model

Experiment Setup

Results

Conclusion and Further Work

5/4/11

Page 28: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

28

Results

5/4/11

Page 29: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

29

Outline

Introduction and Motivation

Analytical Model Description Parser Power Model

Experiment Setup

Results

Conclusion and Further Work

5/4/11

Page 30: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

30

Conclusion and Further Work

Conclusion

Further Work Take into account context switches Consider Multiple kernels running simultaneously

5/4/11

Page 31: GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu

CSCI 8205: GPU Power Model

31

The End

Thanks

Q&A

5/4/11