fast paths in concurrent programs wen xu, princeton university sanjeev kumar, intel labs. kai li,...

Fast Paths in Concurrent Programs

Wen Xu, Princeton UniversitySanjeev Kumar, Intel Labs .

Kai Li, Princeton University

Fast Paths in Concurrent Programs 2Intel Labs & Princeton University

Processor 2

Processor 1Processor 1

Concurrent Programs Message-Passing Style

Processes & Channels E.g. Streaming Languages

C1 C3

C2

P2 P3

P4

P1

Uniprocessors Programming Convenience

─ Embedded devices─ Network Software Stack─ Media Processing

Multiprocessors Exploit parallelism Partition Processes

Problem: Compile a concurrent program

to run efficiently on a Uniprocessor


Compiling Concurrent Programs Process-based Approach

Keep processes separate Context Switch between

the processes

Small executable Sum of Processes

Significant overhead

Automata-based Approach Treat each process as a

state machine Combine the state machines

Small Overhead Large Executables

Potentially Exponential

One Study Compared the two approaches and found: Compared to Process-based approach, the Automata-based

Approach generates code that is─ Twice as fast─ 2-3 Orders of magnitude larger executable

Neither approach is satisfactory


Our Work Our Goal: Compile Concurrent Programs

Automated using a Compiler Low Overhead Small Executable Size

Our Approach: Combine the two approaches Use process-based approach to handle all cases Use automata-based approach to speed up the

common cases


Outline Motivation Fast Paths Fast Paths in Concurrent Programs Experimental Evaluation Conclusions


Fast Paths Path: A dynamic execution path in the program Fast Path or Hot Path: Well-known technique

Commonly-executed Paths (Hot Path) Specialize and Optimize (Fast Path)

Two components Predicate that specifies the fast path Optimized code to execute the fast path

Compilers can be used to automate it

Mostly in sequential Programs


Manually implementing Fast Paths To achieve good performance in Concurrent

programs Start: Insert code that identifies the common case

and transfer control to fast path code Extract and optimize fast path code manually Finish: Patch up state and return control at the end

of fast path

Obvious drawbacks Difficult to implement correctly Difficult to maintain


Fast Path (Automata-based)

Our Approach

Baseline (Process-based)

Test

1

a = b;b = c * d;d = 0;if (c > 0) c++;a = c;b = c * d;d = 3;if (c > 0) c++;

Optimized Code

2

Abort?

3


Specifying Fast Paths Multiple processes

Concurrent Program

Regular expressions Statements Conditions (Optional) Synchronization

(Optional)

Support early abort Advantages

Powerful Compact Hint

fastpath example { process first { statement A, B, C, D, #1; start A ? (size<100); follows B ( C D )*; exit #1; } process second { ... } process third { ... }}


Extracting Fast Paths Automata-based approach to extract fast paths

A Fast Path involves a group of processes Compiler keeps track of the execution point for

each of the involved processes On exit, control is returned to the appropriate

location in each of the processes

Baseline: Concurrent. Fast Path: Sequential Code Fairness on Fast Path

Embed scheduling decisions in the fast path─ Avoid scheduling/fairness overhead on the fast path

Rely on baseline code for fairness─ Always taken a fraction of the time


Optimization on Fast Path Enabling Traditional Fast Paths

Generate and Optimize baseline code Generate Fast path code

─ Fast Paths have exit/entry points to baseline code

Use data-flow information from baseline code at the exit/entry point to start analysis and optimize the fast path code

Speeding up fast path using lazy execution Delay operations that are not needed when fast

paths are executed to the end Such operations can be performed if the fast path is

aborted


Experimental Evaluation Implemented the techniques in the paper

In ESP Compiler─ Supports concurrent programs

Two class of programs Filter Programs VMMC Firmware

Answer three questions Programming effort (annotation complexity) needed Size of the executable Performance


Filter Programs Well-defined structure

Streaming applications Use Filter Programs by Probsting et al.

Good to evaluate our technique─ Concurrency overheads dominate

Experimental Setup 2.66 GHz Pentium 4, 1 GB Memory, Linux 2.4 4 Versions of the code

Annotation Complexity Program sizes: 153, 125, 190, 196 lines Annotation sizes: 7, 7, 10, 10 lines

P1

C1

P2

P3

C2

P4

C3


Filter Programs Cont’d

0

0.5

1

1.5

2

2.5

Program 1 Program 2 Program 3 Program 4

Process-based Automata-basedProcess-based with Manual Fast Path Process-based with Automatic Fast Path

4.1723.5228.339.47

0

0.5

1

1.5

2

2.5


5.155.53

Exe

cuta

ble

Siz

eP

erfo

rman

ce


Better Performance than Both

Relatively Small Executable


VMMC Firmware Firmware for a gigabit network (Myrinet) Experimental Setup

Measure network performance between two machines connected with Myrinet─ Latency & Bandwidth

3 Versions of the firmware─ Concurrent C version with Manual Fast Paths─ Process-based code without Fast Paths─ Process-based code with Compiler-extracted Fast Paths

Annotation Complexity (3 fast paths) Fast Path Specification: 20, 14, and 18 lines Manual Fast Paths in C: 1100 lines total


VMMC Firmware Cont’d

Message size (in Bytes)

Performance: Latency

0

10

20

30

40

50

60

70

4 8 16 32 64 128 256 512

Hand-Optimized C with Manual Fast PathsProcess-based

Process-based with Automatic Fast Paths

s

0

10000

20000

30000

40000

Hand-Optimized C with Manual Fast Paths

Process-based Code

Process-based with Automatic Fast Paths

Generated Code Size

Ass

emb

ly

Inst

ruct

ion

s


Conclusions Fast Paths in Concurrent Programs

Evaluated using Filter programs and VMMC firmware

Process-based approach to handle all cases Keeps executable size reasonable

Automata-based approach to handle only the common cases (Fast Path) Avoid high overhead of process-based approach Often outperforms the automata-based code

Questions ?


ABCDEF

ABCDEF

ABCDEF

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

Abcdef Ghijk

fast paths in concurrent programs wen xu, princeton university sanjeev kumar, intel labs. kai li,...

Documents