fast paths in concurrent programs wen xu, princeton university sanjeev kumar, intel labs. kai li,...

22
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs . Kai Li, Princeton University

Upload: ariel-cunningham

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Fast Paths in Concurrent Programs

Wen Xu, Princeton UniversitySanjeev Kumar, Intel Labs .

Kai Li, Princeton University

Fast Paths in Concurrent Programs 2Intel Labs & Princeton University

Processor 2

Processor 1Processor 1

Concurrent Programs Message-Passing Style

Processes & Channels E.g. Streaming Languages

C1 C3

C2

P2 P3

P4

P1

Uniprocessors Programming Convenience

─ Embedded devices─ Network Software Stack─ Media Processing

Multiprocessors Exploit parallelism Partition Processes

Problem: Compile a concurrent program

to run efficiently on a Uniprocessor

Fast Paths in Concurrent Programs 3Intel Labs & Princeton University

Compiling Concurrent Programs Process-based Approach

Keep processes separate Context Switch between

the processes

Small executable Sum of Processes

Significant overhead

Automata-based Approach Treat each process as a

state machine Combine the state machines

Small Overhead Large Executables

Potentially Exponential

One Study Compared the two approaches and found: Compared to Process-based approach, the Automata-based

Approach generates code that is─ Twice as fast─ 2-3 Orders of magnitude larger executable

Neither approach is satisfactory

Fast Paths in Concurrent Programs 4Intel Labs & Princeton University

Our Work Our Goal: Compile Concurrent Programs

Automated using a Compiler Low Overhead Small Executable Size

Our Approach: Combine the two approaches Use process-based approach to handle all cases Use automata-based approach to speed up the

common cases

Fast Paths in Concurrent Programs 5Intel Labs & Princeton University

Outline Motivation Fast Paths Fast Paths in Concurrent Programs Experimental Evaluation Conclusions

Fast Paths in Concurrent Programs 6Intel Labs & Princeton University

Fast Paths Path: A dynamic execution path in the program Fast Path or Hot Path: Well-known technique

Commonly-executed Paths (Hot Path) Specialize and Optimize (Fast Path)

Two components Predicate that specifies the fast path Optimized code to execute the fast path

Compilers can be used to automate it

Mostly in sequential Programs

Fast Paths in Concurrent Programs 7Intel Labs & Princeton University

Manually implementing Fast Paths To achieve good performance in Concurrent

programs Start: Insert code that identifies the common case

and transfer control to fast path code Extract and optimize fast path code manually Finish: Patch up state and return control at the end

of fast path

Obvious drawbacks Difficult to implement correctly Difficult to maintain

Fast Paths in Concurrent Programs 8Intel Labs & Princeton University

Outline Motivation Fast Paths Fast Paths in Concurrent Programs Experimental Evaluation Conclusions

Fast Paths in Concurrent Programs 9Intel Labs & Princeton University

Fast Path (Automata-based)

Our Approach

Baseline (Process-based)

Test

1

a = b;b = c * d;d = 0;if (c > 0) c++;a = c;b = c * d;d = 3;if (c > 0) c++;

Optimized Code

2

Abort?

3

Fast Paths in Concurrent Programs 10Intel Labs & Princeton University

Specifying Fast Paths Multiple processes

Concurrent Program

Regular expressions Statements Conditions (Optional) Synchronization

(Optional)

Support early abort Advantages

Powerful Compact Hint

fastpath example { process first { statement A, B, C, D, #1; start A ? (size<100); follows B ( C D )*; exit #1; } process second { ... } process third { ... }}

Fast Paths in Concurrent Programs 11Intel Labs & Princeton University

Extracting Fast Paths Automata-based approach to extract fast paths

A Fast Path involves a group of processes Compiler keeps track of the execution point for

each of the involved processes On exit, control is returned to the appropriate

location in each of the processes

Baseline: Concurrent. Fast Path: Sequential Code Fairness on Fast Path

Embed scheduling decisions in the fast path─ Avoid scheduling/fairness overhead on the fast path

Rely on baseline code for fairness─ Always taken a fraction of the time

Fast Paths in Concurrent Programs 12Intel Labs & Princeton University

Optimization on Fast Path Enabling Traditional Fast Paths

Generate and Optimize baseline code Generate Fast path code

─ Fast Paths have exit/entry points to baseline code

Use data-flow information from baseline code at the exit/entry point to start analysis and optimize the fast path code

Speeding up fast path using lazy execution Delay operations that are not needed when fast

paths are executed to the end Such operations can be performed if the fast path is

aborted

Fast Paths in Concurrent Programs 13Intel Labs & Princeton University

Outline Motivation Fast Paths Fast Paths in Concurrent Programs Experimental Evaluation Conclusions

Fast Paths in Concurrent Programs 14Intel Labs & Princeton University

Experimental Evaluation Implemented the techniques in the paper

In ESP Compiler─ Supports concurrent programs

Two class of programs Filter Programs VMMC Firmware

Answer three questions Programming effort (annotation complexity) needed Size of the executable Performance

Fast Paths in Concurrent Programs 15Intel Labs & Princeton University

Filter Programs Well-defined structure

Streaming applications Use Filter Programs by Probsting et al.

Good to evaluate our technique─ Concurrency overheads dominate

Experimental Setup 2.66 GHz Pentium 4, 1 GB Memory, Linux 2.4 4 Versions of the code

Annotation Complexity Program sizes: 153, 125, 190, 196 lines Annotation sizes: 7, 7, 10, 10 lines

P1

C1

P2

P3

C2

P4

C3

Fast Paths in Concurrent Programs 16Intel Labs & Princeton University

Filter Programs Cont’d

0

0.5

1

1.5

2

2.5

Program 1 Program 2 Program 3 Program 4

Process-based Automata-basedProcess-based with Manual Fast Path Process-based with Automatic Fast Path

4.1723.5228.339.47

0

0.5

1

1.5

2

2.5

Program 1 Program 2 Program 3 Program 4

5.155.53

Exe

cuta

ble

Siz

eP

erfo

rman

ce

Program 1 Program 2 Program 3 Program 4

Better Performance than Both

Relatively Small Executable

Fast Paths in Concurrent Programs 17Intel Labs & Princeton University

VMMC Firmware Firmware for a gigabit network (Myrinet) Experimental Setup

Measure network performance between two machines connected with Myrinet─ Latency & Bandwidth

3 Versions of the firmware─ Concurrent C version with Manual Fast Paths─ Process-based code without Fast Paths─ Process-based code with Compiler-extracted Fast Paths

Annotation Complexity (3 fast paths) Fast Path Specification: 20, 14, and 18 lines Manual Fast Paths in C: 1100 lines total

Fast Paths in Concurrent Programs 18Intel Labs & Princeton University

VMMC Firmware Cont’d

Message size (in Bytes)

Performance: Latency

0

10

20

30

40

50

60

70

4 8 16 32 64 128 256 512

Hand-Optimized C with Manual Fast PathsProcess-based

Process-based with Automatic Fast Paths

s

0

10000

20000

30000

40000

Hand-Optimized C with Manual Fast Paths

Process-based Code

Process-based with Automatic Fast Paths

Generated Code Size

Ass

emb

ly

Inst

ruct

ion

s

Fast Paths in Concurrent Programs 19Intel Labs & Princeton University

Outline Motivation Fast Paths Fast Paths in Concurrent Programs Experimental Evaluation Conclusions

Fast Paths in Concurrent Programs 20Intel Labs & Princeton University

Conclusions Fast Paths in Concurrent Programs

Evaluated using Filter programs and VMMC firmware

Process-based approach to handle all cases Keeps executable size reasonable

Automata-based approach to handle only the common cases (Fast Path) Avoid high overhead of process-based approach Often outperforms the automata-based code

Questions ?

Fast Paths in Concurrent Programs 22Intel Labs & Princeton University

ABCDEF

ABCDEF

ABCDEF

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk