mama: mostly automatic management of...

27
MAMA: Mostly Automatic Management of Atomicity Christian DeLozier, Joseph Devietti, Milo M. K. Martin University of Pennsylvania March 2 nd , 2014

Upload: others

Post on 22-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

MAMA: Mostly Automatic Management of Atomicity

Christian DeLozier, Joseph Devietti, Milo M. K. Martin University of Pennsylvania

March 2nd, 2014

Page 2: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

2

Start with a serial problem

Christian DeLozier - 2014

Page 3: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

3

Find and express the parallelism

Christian DeLozier - 2014

Page 4: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

4

Coordinate the parallel execution (synchronization)

Christian DeLozier - 2014

Page 5: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

5

Don’t mess up!

Christian DeLozier - 2014

Page 6: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

6

Is there another way to do this? • Programmer currently has to:

1.  Express the parallelism (Hard) 2.  Coordinate the parallelism (Hard)

• Alternative: 1.  Programmer expresses the parallelism 2.  Machine handles coordination

Christian DeLozier - 2014

Page 7: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

7

Coordinating Parallel Execution • Atomicity vs. Ordering

•  Types of concurrency bugs [Lu et al., ASPLOS 2008] •  Atomicity: Locks, transactions •  Ordering: Barriers, fork/join, blocking on a queue, etc.

• Atomicity constraints are more common than ordering constraints

• Difficult to infer ordering constraints

Christian DeLozier - 2014

Page 8: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

8

Mostly Automatic Management of Atomicity • Toward automatically providing atomicity for

parallel programs

• Program either executes atomically – or deadlocks

• Protect every shared variable with its own lock

• Restore progress and performance when necessary (with help from the programmer)

Christian DeLozier - 2014

Page 9: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

9

Related Work • Automatic Parallelization

•  [Bernstein, IEEE Transactions 1966] •  …

• Data Centric Synchronization •  [Vaziri et. al, POPL 2006] •  [Ceze et. al, HPCA 2007]

• Transactional Memory •  [Herlihy and Moss, ISCA 1993] •  …

Christian DeLozier - 2014

Page 10: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

10

Lock-Based Atomic Sections • What lock do we acquire?

• When do we acquire the lock?

• When should we release the lock?

Christian DeLozier - 2014

Page 11: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

11

What lock do we acquire? • Associate a lock with each variable

• Trade-off between parallelism and overhead

• Coarse-grained vs. Fine-grained •  Coarse-grained: 1 lock per object, 1 lock per array •  Fine-grained: 1 lock per field, 1 lock per array element

• Mutex vs. Reader-writer lock

Christian DeLozier - 2014

Page 12: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

12

MAMA Prototype • Uses fine-grained locking

•  More parallelism •  Especially for arrays

•  Optimization: Divide arrays into N chunks, 1 lock per chunk

• Uses reader-writer locks •  More parallelism

•  Read sharing is common

Christian DeLozier - 2014

Page 13: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

13

Lock-Based Atomic Sections • What lock do we acquire?

•  One reader-writer lock per variable (fine-grained)

• When do we acquire the lock? •  Acquire before the first dynamic access

• When should we release the lock?

Christian DeLozier - 2014

Page 14: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

14

T1

When should we release the lock? • Simple case: After the owning thread has exited

Christian DeLozier - 2014

T1 T2

Write A

Exit

Write A T2

Exited

Write A

Page 15: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

15

When should we release the lock? • When the owning thread is waiting for another

thread to make progress (e.g. join, barrier)

Christian DeLozier - 2014

T1

T1 T2

Write A

Join T2

Write A T2

Joined

Write A

Exit

Spawn T2

Page 16: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

16

When should we release the lock? • Other deadlocks cannot be safely broken

• Need help from the programmer •  Trusted annotations to sanction breaking a deadlock

•  MAMA_release(object) •  Also used to improve performance when threads are

over-serialized Christian DeLozier - 2014

T1

T1 T2

Write A

Write B

Write B T2

Write B

Write A

Write A

Page 17: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

17

Lock-Based Atomic Sections • What lock do we acquire?

•  One reader-writer lock per variable (fine-grained)

• When do we acquire the lock? •  Acquire before the first dynamic access

• When should we release the lock? •  At thread exit •  When waiting for another thread to make progress •  Or, at programmer sanctioned program points

Christian DeLozier - 2014

Page 18: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

18

What can deadlocks tell us? • When a thread cannot acquire a lock:

•  Perform distributed deadlock detection [Bracha and Toueg, Distributed Computing 1987]

Christian DeLozier - 2014

void f(){ A = 1; B = 2; }

void g(){ B = 1; A = 2; }

T1 T2

Write B

Write A T2

T1

Page 19: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

19

MAMA Prototype •  Implemented as a RoadRunner tool [Flanagan and

Freund, PASTE 2010] •  Dynamic instrumentation for Java byte-code

• Evaluated on the Java Grande benchmarks and selected DaCapo benchmarks

•  Running on one socket (8 cores) of a 4 socket Nehalem system with 128 GB RAM

• Removed all synchronized blocks and java.util.concurrent constructs from benchmarks

•  Ensure that MAMA is providing all of the atomicity

Christian DeLozier - 2014

Page 20: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

20

Evaluating MAMA • Can we execute parallel programs correctly?

• How many annotations need to be added for progress and performance?

• How is the performance of the program affected? •  Does MAMA permit thread to execute in parallel?

Christian DeLozier - 2014

Page 21: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

21

Benchmark Lines of Code Progress Annotations

Performance Annotations

crypt 314 0 0 lufact 461 1 4

lusearch 124105 0 4 matmult 187 0 0 moldyn 487 3 0

montecarlo 1165 0 28 pmd 60062 0 4

series 180 0 0 sor 186 1 0

sunflow 21970 1 3 xalan 172300 0 0

Annotation Burden

Christian DeLozier - 2014

Page 22: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

22

Performance

Christian DeLozier - 2014

0x

2x

4x

6x

8x

10x

Nor

mal

ized

Run

time

RR-ser MAMA-par MAMA-ser

•  MAMA incurs overhead due to locking and serial execution •  But, MAMA still allows some parallel execution as

compared to serialization

23x

Page 23: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

23

Performance Breakdown

Christian DeLozier - 2014

0%

20%

40%

60%

80%

100%

Perc

ent R

untim

e

Running Blocked Deadlock Acquire Check

•  Many benchmarks have significant portions that run in parallel •  Checking whether or not a lock is already owned incurs

significant overhead on some benchmarks

Page 24: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

24

Memory Usage

Christian DeLozier - 2014

•  Fine-grained locking incurs significant memory overheads •  Could be optimized to save space via chunking arrays or

decreasing the size of the lock

0x

10x

20x

30x

40x

Nor

mal

ized

Mem

ory

Usa

ge MAMA-par

Page 25: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

25

Future Directions • Does this approach apply to other languages?

• How do we test programs running with MAMA? •  Find uncommon deadlocks •  Gain more confidence in trusted annotations

• How can we reduce the performance overheads?

• How can we infer ordering constraints?

Christian DeLozier - 2014

Page 26: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

26

MAMA • Provides atomicity for parallel programs

•  Some help via annotations from programmer

• A step toward programming without worrying about atomicity

•  Programmer expresses parallelism •  Machine provides atomicity automatically

Christian DeLozier - 2014

Page 27: MAMA: Mostly Automatic Management of Atomicitywodet.cs.washington.edu/wp-content/uploads/2014/03/MAMA.pdf · • MAMA incurs overhead due to locking and serial execution • But,

27

Thank you for listening!

Christian DeLozier - 2014