dynamic feedback: an effective technique for adaptive computing pedro diniz and martin rinard...

Dynamic Feedback:An Effective Techniquefor Adaptive Computing

Pedro Diniz and Martin Rinard

Department of Computer ScienceUniversity of California, Santa Barbara

http://www.cs.ucsb.edu/~{pedro,martin}

Basic Issue:Efficient Implementation of Atomic

Operations in Object-Based Languages

Approach:Reduce Lock Overhead by

Coarsening Lock Granularity

Problem:Coarsening Lock Granularity

May ReduceAvailable Concurrency

Solution: Dynamic Feedback

• Multiple Lock Coarsening Policies

• Dynamic Feedback• Generate Multiple Versions of Code• Measure Dynamic Overhead of Each Policy• Dynamically Select Best Version

• Context• Parallelizing Compiler

• Irregular Object-Based Programs• Pointer-Based Data Structures

• Commutativity Analysis

Talk Outline

• Lock Coarsening

• Dynamic Feedback

• Experimental Results

• Related Work

• Conclusions

Model of Computation

• Parallel Programs• Serial Phases• Parallel Phases

•Atomic Operations on Shared Objects•Mutual Exclusion Locks•Acquire Constructs•Release Constructs

AtomicOperations

SerialPhase

ParallelPhase

L.acquire()

L.release()

Mutual ExclusionRegion

Problem: Lock Overhead

L.acquire()

L.release()

L.acquire()

L.release()

Solution: Lock Coarsening

Original After Lock Coarsening

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

Reference: Diniz and Rinard“Synchronization Transformations for Parallel Computing”, POPL97

Lock Coarsening Trade-Off

• Advantage: • Reduces Number of Executed Acquires and Releases• Reduces Acquire and Release Overhead

• Disadvantage: May Introduce False Exclusion• Multiple Processors Attempt to Acquire Same Lock• Processor Holding the Lock is Executing Code that

was Originally in No Mutual Exclusion Region

False Exclusion

Original After Lock Coarsening

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

L.release()

L.acquire()

•••

L.release()

FalseExclusion

Lock Coarsening Policy

Goal: Limit Potential Severity of False Exclusion

Mechanism: Multiple Lock Coarsening Policies

• Original: Never Coarsen Granularity• Bounded: Coarsen Granularity Only Within

Cycle-Free Subgraphs of ICFG

• Aggressive: Always Coarsen Granularity

Choosing Best Policy

• Best Lock Coarsening Policy May Depend On• Topology of Data Structures• Dynamic Schedule Of Computation

• Information Required to Choose Best Policy Unavailable at Compile Time

• Complications• Different Phases May Have Different Best Policy• In Same Phase, Best Policy May Change Over Time

Solution: Dynamic Feedback

• Generated Code Executes• Sampling Phases: Measure Performance of Different Policies• Production Phases : Use Best Policy From Sampling Phase

• Periodically Resample to Discover Best Policy Changes

AggressiveOriginal Bounded

Sampling Phase Production Phase Sampling Phase

AggressiveCodeVersion Original

Guaranteed Performance Bounds

• Assumptions:• Overhead Changes Bounded by Exponential Decay

Functions

• Worst Case Scenario:• No Useful Work During Sampling Phase• Sampled Overheads Are Same For All Versions• Overhead of Selected Version Increases at Maximum Rate• Overhead of Other Versions Decreases at Maximum Rate

S PS S

Guaranteed Performance Bound

Definition 1. Policy p is at Most Worse Than Policy p over a Time Interval T if

Work = 0

(1 - oi(t))

(1 - ) P + (1/) e(-P) Š (- 1) SN + (1/)

Result 1. To Guarantee this Bound

Work - Work Š T T

Definition 2. Dynamic Feedback is at Most Worse Than the Optimal if

Work - Work Š (P+SN) P+SN

0 where Work = 1

(1 - o1(t))

Guaranteed Performance Bounds

(1 - ) P + (1/) e(-P)

(- 1) SN + (1/)

Production Interval P

sFeasibleRegion

Production Interval Too Long:May Execute Suboptimal

Policy for Long Time

Production Interval Too Short:

Unable to Amortize Sampling Overhead

Basic Constraint:Decay Rate () Must be Small Enough

Dynamic Feedback: Implementation

• Code Generation

• Measuring Policy Overhead

• Interval Selection

• Interval Expiration

• Policy Switch

Code Generation

• Statically Generate Different Code Versions for Each Policy• Alternative: Dynamic Code Generation

• Advantages of Static Code Generation:• Simplicity of Implementation• Fast Policy Switching

• Potential Drawback of Static Code Generation• Code Size (In Practice Not a Problem)

Measuring Policy Overhead

• Sources of Overhead• Locking Overhead• Waiting Overhead

• Compute Locking Overhead• Count Number of Executed Acquire/Release Constructs

• Estimate Waiting Overhead• Count Number of Spins on Locks Waiting to be

Released

Sampling TimeSampled Overhead =

Numberof Spins

Number ofAcquire/Release

xx Spin TimeAcquire/ReleaseExecution Time( )+( )

Interval Selection and Expiration

• Fixed Interval Values• Sampling Interval: 10 milliseconds• Production Interval: 10 seconds• Good Results for Wide Range of Interval

Values

• Polling Code for Expiration Detection• Location: Back Edges of Parallel Loop• Advantage: Low Overhead• Disadvantage: Potential Interaction with

Iteration Size

AtomicOperationsPolling

Points

Policy Switch

• Synchronous• Processors Poll Timer to Detect Interval Expiration• Barrier At End of Each Interval

• Advantages:• Consistent Transitions• Clean Overhead Measurements

• Disadvantages:• Need to Synchronize All Processors• Potential Idle Time At Barrier

Experimental Results

• Parallelizing Compiler Based on Commutativity Analysis [PLDI’96]

• Set of Complete Scientific Applications• Barnes-Hut N-Body Solver (1500 lines of C++)• Liquid Water Simulation Code (1850 lines of C++)• Seismic Modeling String Code (2050 lines of C++)

• Different Lock Coarsening Policies

• Dynamic Feedback

• Performance on Stanford DASH Multiprocessor

Code Sizes

Barnes-Hut

SerialOriginalDynamic

Serial

OriginalDynamic

String

Serial

OriginalDynamic

Lock Overhead

Barnes-Hut(16K Particles)

Original

Bounded

Aggressive

Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing

Mutual Exclusion Locks

Water(512 Molecules)

Original

BoundedAggressive

String(Big Well Model)

OriginalAggressive

Contention OverheadC

Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors

0 4 8 12 16Processors

OriginalBoundedAggressive

Barnes-Hut(16K Particles)

Water(512 Molecules)

String(Big Well Model)

Performance Results: Barnes-Hut

IdealAggressive

Dynamic FeedbackBounded

Original

Barnes-Hut on DASH(16K Particles)

0 4 8 12 16

Number of Processors

Performance Results: Water

Bounded

OriginalAggressive

Dynamic Feedback

Water on DASH(512 Molecules)

0 4 8 12 16

Performance Results: String

String on DASH(Big Well Model)

Original

Aggressive

Dynamic Feedback

0 4 8 12 16

Summary

• Code Size Is Not An Issue

• Lock Coarsening Has Significant Performance Impact

• Best Lock Coarsening Policy Varies With Application

• Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy

Related Work

• Adaptive Execution Techniques (Saavedra Park:PACT96)

• Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94)

• Dynamic Code Generation (Engler:PLDI96)

• Profiling (Brewer:PPoPP95)

• Synchronization Optimizations (Plevyak et al:POPL95)

Conclusions

• Dynamic Feedback• Generated Code Adapts to Different Execution

Environments

• Integration with Parallelizing Compiler• Irregular Object-Based Programs• Pointer-Based Linked Data Structures• Commutativity Analysis

• Evaluation with Three Complete Applications• Performance Comparable to Best Hand-Tuned

Optimization

BACKUP SLIDES

0 2 4 6 8 10 12 14 16Number of Processors

Aggressive

Bounded

Original

Barnes-Hut (16K Particles)

Performance Results : Barnes-Hut

Performance Results: Water

Aggressive

Bounded

Original

0 2 4 6 8 10 12 14 16

Water (512 Molecules)

Performance Results: String

String (Big Well Model)

0 2 4 6 8 10 12 14 16

Original

Aggressive

Policy Switch

TimerExpires

Policy 1

Policy 2TimerExpires

Motivation

Challenges:• Match Best Implementation to Environment• Heterogeneous and Mobile Systems

Goal: • Develop Mechanisms to Support Code that

Adapts to Environment Characteristics

Technique:• Dynamic Feedback

Overhead for Barnes-Hut

0 5 10 15 20 25

Execution Time (Seconds)

Original

Aggressive

Bounded

Barnes-Hut on DASH (8 Processors)FORCES Loop

Data Set - 16K Particles

Overhead for Water

Water on DASH (8 Processors) INTERF Loop

Data Set - 512 Molecules

0 10 20 30 40 50 60

Original

Bounded

Overhead for Water

Water on DASH (8 Processors)POTENG Loop

Data Set - 512 Molecules

0 10 20 30 40 50 60

Aggressive

Original

Overhead for String

String on DASH (8 Processors)PROJFWD Loop

Data Set -Big Well

0 100 200 300 400 500

Aggressive

Original

Dynamic Feedback

AggressiveOriginalBounded

Sampling Phase Production Phase Sampling Phase

AggressiveCodeVersion

dynamic feedback: an effective technique for adaptive computing pedro diniz and martin rinard...

release slide

lock overhead

granularity slide

release false exclusion

lock coarsening policy

time slide

lock processor

release overhead disadvantage

Documents

cbpf brasil and lhcone in south america edoardo martelli...

beatriz diniz e eduardo diniz

managing application resilience: a programming language...

sciences usc information institute pedro c. diniz, mary w....

loop optimizations - information sciences institute · loop...

diniz news 19

lexical analysis - information sciences...

diniz news 20

lexical analysis - information sciences institute ·...

holographic governance - lisa diniz

compiler design - information sciences...

sciences usc information institute pedro c. diniz university...

diniz news 17

catalogo diniz

synchronization transformations for parallel computing pedro...

meeting - evrsedited films ... south africa - suprachoroidal...

dfa minimization & equivalence to regular...

diniz news 31

diniz news 28

csci 565 -compiler design spring 2017 intermediate...