mama: mostly automatic management of atomicity

Post on 08-Jan-2016

31 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

MAMA: Mostly Automatic Management of Atomicity. Christian DeLozier, Joseph Devietti, Milo M. K. Martin University of Pennsylvania. March 2 nd , 2014. Start with a serial problem. Find and express the parallelism. Coordinate the parallel execution (synchronization). Don’t mess up!. - PowerPoint PPT Presentation

TRANSCRIPT

MAMA: Mostly Automatic Management of Atomicity

Christian DeLozier, Joseph Devietti, Milo M. K. MartinUniversity of Pennsylvania

March 2nd, 2014

2

Start with a serial problem

Christian DeLozier - 2014

3

Find and express the parallelism

Christian DeLozier - 2014

4

Coordinate the parallel execution (synchronization)

Christian DeLozier - 2014

5

Don’t mess up!

Christian DeLozier - 2014

6

Is there another way to do this?

• Programmer currently has to:1. Express the parallelism (Hard)

2. Coordinate the parallelism (Hard)

• Alternative:1. Programmer expresses the parallelism

2. Machine handles coordination

Christian DeLozier - 2014

7

Coordinating Parallel Execution

• Atomicity vs. Ordering• Types of concurrency bugs [Lu et al., ASPLOS 2008]

• Atomicity: Locks, transactions• Ordering: Barriers, fork/join, blocking on a queue, etc.

• Atomicity constraints are more common than ordering constraints

• Difficult to infer ordering constraints

Christian DeLozier - 2014

8

Mostly Automatic Management of Atomicity

• Toward automatically providing atomicity for parallel programs

• Program either executes atomically – or deadlocks

• Protect every shared variable with its own lock

• Restore progress and performance when necessary (with help from the programmer)

Christian DeLozier - 2014

9

Related Work

• Automatic Parallelization• [Bernstein, IEEE Transactions 1966]• …

• Data Centric Synchronization• [Vaziri et. al, POPL 2006]• [Ceze et. al, HPCA 2007]

• Transactional Memory• [Herlihy and Moss, ISCA 1993]• …

Christian DeLozier - 2014

10

Lock-Based Atomic Sections

• What lock do we acquire?

• When do we acquire the lock?

• When should we release the lock?

Christian DeLozier - 2014

11

What lock do we acquire?

• Associate a lock with each variable

• Trade-off between parallelism and overhead

• Coarse-grained vs. Fine-grained• Coarse-grained: 1 lock per object, 1 lock per array• Fine-grained: 1 lock per field, 1 lock per array element

• Mutex vs. Reader-writer lock

Christian DeLozier - 2014

12

MAMA Prototype

• Uses fine-grained locking• More parallelism

• Especially for arrays• Optimization: Divide arrays into N chunks, 1 lock per

chunk

• Uses reader-writer locks• More parallelism

• Read sharing is common

Christian DeLozier - 2014

13

Lock-Based Atomic Sections

• What lock do we acquire?• One reader-writer lock per variable (fine-grained)

• When do we acquire the lock?• Acquire before the first dynamic access

• When should we release the lock?

Christian DeLozier - 2014

14

T1T1

When should we release the lock?

• Simple case: After the owning thread has exited

Christian DeLozier - 2014

T1 T2

Write A

Exit

Write A

T2T2

Exited

Write A

15

When should we release the lock?

• When the owning thread is waiting for another thread to make progress (e.g. join, barrier)

Christian DeLozier - 2014

T1T1

T1 T2

Write A

Join T2

Write A T2T2

Joined

Write A

Exit

Spawn T2

16

When should we release the lock?

• Other deadlocks cannot be safely broken

• Need help from the programmer• Trusted annotations to sanction breaking a deadlock

• MAMA_release(object)• Also used to improve performance when threads are

over-serializedChristian DeLozier - 2014

T1T1

T1 T2

Write A

Write B

Write B T2T2

Write B

Write A

Write A

17

Lock-Based Atomic Sections

• What lock do we acquire?• One reader-writer lock per variable (fine-grained)

• When do we acquire the lock?• Acquire before the first dynamic access

• When should we release the lock?• At thread exit• When waiting for another thread to make progress• Or, at programmer sanctioned program points

Christian DeLozier - 2014

18

What can deadlocks tell us?• When a thread cannot acquire a lock:

• Perform distributed deadlock detection [Bracha and Toueg, Distributed Computing 1987]

Christian DeLozier - 2014

void f(){ A = 1; B = 2;}

void g(){ B = 1; A = 2;}

T1T1 T2T2

Write B

Write AT2

T1

19

MAMA Prototype

• Implemented as a RoadRunner tool [Flanagan and Freund, PASTE 2010]

• Dynamic instrumentation for Java byte-code

• Evaluated on the Java Grande benchmarks and selected DaCapo benchmarks

• Running on one socket (8 cores) of a 4 socket Nehalem system with 128 GB RAM

• Removed all synchronized blocks and java.util.concurrent constructs from benchmarks

• Ensure that MAMA is providing all of the atomicity

Christian DeLozier - 2014

20

Evaluating MAMA

• Can we execute parallel programs correctly?

• How many annotations need to be added for progress and performance?

• How is the performance of the program affected?• Does MAMA permit thread to execute in parallel?

Christian DeLozier - 2014

21

Benchmark Lines of CodeProgress

AnnotationsPerformance Annotations

crypt 314 0 0

lufact 461 1 4

lusearch 124105 0 4

matmult 187 0 0

moldyn 487 3 0

montecarlo 1165 0 28

pmd 60062 0 4

series 180 0 0

sor 186 1 0

sunflow 21970 1 3

xalan 172300 0 0

Annotation Burden

Christian DeLozier - 2014

22

Performance

Christian DeLozier - 2014

• MAMA incurs overhead due to locking and serial execution• But, MAMA still allows some parallel execution as

compared to serialization

23x

23

Performance Breakdown

Christian DeLozier - 2014

• Many benchmarks have significant portions that run in parallel• Checking whether or not a lock is already owned incurs

significant overhead on some benchmarks

24

Memory Usage

Christian DeLozier - 2014

• Fine-grained locking incurs significant memory overheads• Could be optimized to save space via chunking arrays or

decreasing the size of the lock

25

Future Directions

• Does this approach apply to other languages?

• How do we test programs running with MAMA?• Find uncommon deadlocks• Gain more confidence in trusted annotations

• How can we reduce the performance overheads?

• How can we infer ordering constraints?

Christian DeLozier - 2014

26

MAMA

• Provides atomicity for parallel programs• Some help via annotations from programmer

• A step toward programming without worrying about atomicity

• Programmer expresses parallelism• Machine provides atomicity automatically

Christian DeLozier - 2014

27

Thank you for listening!

Christian DeLozier - 2014

top related