wormbench a configurable application for evaluating transactional memory systems medea workshop...

33
WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2 , Sanja Cvijic 3 , Osman Unsal 1 , Adrian Cristal 1 , Eduard Ayguade 1, 2 , Tim Harris 4 , Mateo Valero 1, 2 1 Barcelona Supercomputing Center, 2 Universitat Politecnica de Catalunya, 3 Belgrade University, 4 Microsoft Research Cambridge UK

Upload: darcy-lang

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

WormBenchA Configurable Application for Evaluating

Transactional Memory Systems

MEDEA Workshop

26.10.2008

Ferad Zyulkyarov1, 2, Sanja Cvijic3, Osman Unsal1, Adrian Cristal1, Eduard Ayguade1, 2, Tim Harris4, Mateo Valero1, 2

1Barcelona Supercomputing Center, 2Universitat Politecnica de Catalunya,

3Belgrade University, 4Microsoft Research Cambridge UK

Page 2: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Outline

• Transactional Memory

• Idea

• Motivation

• WormBench Features

• WormBench main components

• WormBench input – run configuration

• Analysis

• Modeling STAMP’s genome

• Conclusion

Page 3: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Transactional Memory

atomic{ < statements >}

Page 4: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Idea

• Inspired by the Snake game

• Worms are active objects

• Worms move in a BenchWorld

• On every move Worms do computation

Page 5: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Motivation - General

• We don’t know how exactly to write TM applications

• 1:1 Converting applications from locks is not correct approach– For example, is it the same to convert lock

based application into message passing synchronization 1:1?

Page 6: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Motivation - Existing TM Applications (1/2)

• STAMP [IISWC’2008]– specific to TL2 [ISCA’2007]– does not have lock based implementation– tm_write() and tm_read() carefully used – thus

assuming perfect compiler

• STMBench7 [EuroSys’2007]– Suitable for STM– Too big data structures (700.000 bytes); too

long transactions (10 tx/s)

Page 7: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Motivation - Existing TM Applications (2/2)

• SPLASH-2 [ISCA’1995]– Embarrassingly parallel– Fine grain locking– Not suitable for the intended TM usage

pattern (coarse grain locking)

• Haskell STM Benchmark [CF’2007]– Implemented in declarative language– Depends on language and type system

enforced constraints (TVar, monads)

Page 8: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

WormBench’s Goal

• Unify the features of existing TM applications

• A tool for instrumenting multi-threaded applications

• Set of run configurations to serve as a baseline to evaluate TM systems among each other and locks

• Specific run configuration that stresses a particular design or implementation aspect of a TM system such as the sizes of internally used buffers.

Page 9: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

WormBench Features (1/2)

• Implemented in imperative language C#– Compiling with Bartok

• Follows the object oriented programming concepts

• Critical sections are marked with atomic– Can be used to test the compiler infrastructure

• Represents typical parallel application with shared data

• Highly configurable through run configurations

Page 10: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

WormBench Features (2/2)

• Suitable for HTM, STM and Hybrid TM variants

• No assumptions about TM system design and implementation

• Lock based and transactional implementation for comparison purposes

• Sanity check verification for the underlying TM system

Page 11: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Main Objects in MainBench

• BenchWorld– BenchWorldNode

• Worm– Body

– Head

• Message

Page 12: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Example

• Worm– Body length 8

– Head Size 4

• Operations– Sum – ahead

– Average – right

– Min - ahead

Page 13: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

WormBench Input – Run configuration

• Size of the BenchWorld;

• Number of worms (number of threads);

• Body length of each worm;

• Head size of each worm;

• The number and type of worm operations that each worm has to perform while moving

Page 14: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Instantiates Common Sync Scenarios (1/2)

• Object access serializability– Guarding a shared variable with locks

• Two phase locking and its derivatives– Locking protocol which attempts non-blocking

fine grain locking avoiding dead-lock

• Multiple granularity locking– Fine-grain locking technique used to lock a

region in a collection/hierarchical data structure

Page 15: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Instantiates Common Sycnh Scenarios (2/2)

• Dining Philosophers– Deadlock scenario

• Barrier synchronization– Worms wait until all the group (or all worms)

reach certain point in execution

Page 16: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Retry or Conditional Atomic

Retry

Mostly neglected utilization of retry or conditional atomic.

Page 17: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Currently Available Worm Operations (1/2)

• Read-only– Sum– Average– Min– Max– Median

• I/O– Checkpoint– Undo

Page 18: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Currently Available Worm Operations (1/2)

• Read dominated– Replace min with average

– Replace max with average

– Replace median with average

– Replace min and max

• Write dominated– Sort

– Transpose

• Leave message – for complex synchronization scenarios– Goto node message

Page 19: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Worm Operations – Execution Distribution

Op B[1.1]H[1.1] B[4.4]H[4.4] B[8.8]H[8.8] B[1.8]H[1.8]

Sum 0.42 0.433 0.194 0.31Avg 0.42 0.278 0.324 0.434Median 0.839 3.649 9.354 5.14Min 0.315 0.588 0.278 0.372Max 0.42 0.588 0.33 0.537Rep Max with Avg 1.364 0.711 0.427 0.743Rep Min with Avg 0.735 0.742 0.537 0.702Rep Med with Avg 2.518 4.793 11.412 6.689Rep Max and Min 2.099 0.588 0.634 0.929Rep Med with Min 2.728 5.009 11.186 7.06Rep Med and Max 2.518 5.257 11.387 7.122Sort 1.679 6.586 11.257 7.184Transpose 1.154 3.247 2.369 3.365Checkpoint 1.12 1.45 1.982 1.522Undo 1.06 1.32 1.85 1.488

Total 19.389 35.239 63.521 42.597

Page 20: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Worm Operations – Execution Distribution

Op B[1.1]H[1.1] B[4.4]H[4.4] B[8.8]H[8.8] B[1.8]H[1.8]

Sum 0.42 0.433 0.194 0.31Avg 0.42 0.278 0.324 0.434Median 0.839 3.649 9.354 5.14Min 0.315 0.588 0.278 0.372Max 0.42 0.588 0.33 0.537Rep Max with Avg 1.364 0.711 0.427 0.743Rep Min with Avg 0.735 0.742 0.537 0.702Rep Med with Avg 2.518 4.793 11.412 6.689Rep Max and Min 2.099 0.588 0.634 0.929Rep Med with Mix 2.728 5.009 11.186 7.06Rep Med and Max 2.518 5.257 11.387 7.122Sort 1.679 6.586 11.257 7.184Transpose 1.154 3.247 2.369 3.365Checkpoint 1.12 1.45 1.982 1.522Undo 1.06 1.32 1.85 1.488

Total 19.389 35.239 63.521 42.597

Page 21: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Worm Operations – TM Characteristics

Op1 2 4 8

R W R W R W R W1 11 3 11 4 11 6 11 102 11 3 11 4 11 6 11 103 11 3 11 4 11 6 11 104 11 3 11 4 11 6 11 105 11 3 11 4 11 6 11 106 14 4 14 5 14 7 14 117 14 4 14 5 14 7 14 118 14 4 14 5 14 7 14 119 14 4 14 5 14 7 14 1110 14 4 14 5 14 7 14 1111 14 4 14 5 14 7 14 1112 16 4 16 5 16 7 16 1113 16 4 16 5 16 7 16 1114 11 3 11 4 11 6 11 1115 11 3 11 4 11 6 11 11

Worm Head Size is fixed to 1 and body length is 1, 2, 4, 8

Page 22: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Worm Operations – TM Characteristics

Op1 2 4 8

R W R W R W R W1 11 3 11 4 11 6 11 102 11 3 11 4 11 6 11 103 11 3 11 4 11 6 11 104 11 3 11 4 11 6 11 105 11 3 11 4 11 6 11 106 14 4 14 5 14 7 14 117 14 4 14 5 14 7 14 118 14 4 14 5 14 7 14 119 14 4 14 5 14 7 14 1110 14 4 14 5 14 7 14 1111 14 4 14 5 14 7 14 1112 16 4 16 5 16 7 16 1113 16 4 16 5 16 7 16 1114 11 3 11 4 11 6 11 1115 11 3 11 4 11 6 11 11

Worm Head Size is fixed to 1 and body length is 1, 2, 4, 8

Page 23: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Worm Operations – TM Characteristics

Op1 2 4 8

R W R W R W R W1 11 3 14 3 26 3 74 32 11 3 14 3 26 3 74 33 11 3 14 3 26 3 74 34 11 3 14 3 26 3 74 35 11 3 14 3 26 3 74 36 14 4 17 4 29 4 77 47 14 4 17 4 29 4 77 48 14 4 17 4 29 4 77 49 14 4 17 5 29 5 77 510 14 4 17 5 29 5 77 511 14 4 17 5 29 5 77 512 16 4 19 7 31 19 79 6713 16 4 19 7 31 19 79 6714 11 3 14 3 26 3 74 315 11 3 14 3 26 3 74 3

Body Length is fixed to 1 and head size is 1, 2, 4, 8

Page 24: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Worm Operations – TM Characteristics

Op1 2 4 8

R W R W R W R W1 11 3 14 3 26 3 74 32 11 3 14 3 26 3 74 33 11 3 14 3 26 3 74 34 11 3 14 3 26 3 74 35 11 3 14 3 26 3 74 36 14 4 17 4 29 4 77 47 14 4 17 4 29 4 77 48 14 4 17 4 29 4 77 49 14 4 17 5 29 5 77 510 14 4 17 5 29 5 77 511 14 4 17 5 29 5 77 512 16 4 19 7 31 19 79 6713 16 4 19 7 31 19 79 6714 11 3 14 3 26 3 74 315 11 3 14 3 26 3 74 3

Body Length is fixed to 1 and head size is 1, 2, 4, 8

Page 25: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Analyzing Sample Run Configurations

• Lock vs Transactions

• Change in BenchWorld size

• Change in worms’ body length and size

• Initialization of worms for smaller BenchWorld

Page 26: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Lock vs Transactions

Page 27: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Throughput ~ Worms ~ BenchWorld

Relationship between throughput, worms’ size and BenchWorld

Page 28: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Initializing Worms for Smaller BenchWorldHow the conflict rate is affected when worms are initialized for smaller BenchWorld.

Averaged ResultsWorms initialized for 128x128 and

run in 128x, 256x, 512x and 1024xsize BenchWorld

Page 29: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Modeling Genome App. From STAMP

• To obtain the results shown on Table IV we used the following run configuration:– Worms body length = 1

– Worms head size = 4

– BenchWorld of size 52x52

– Worm Operations: Randomly generated stream of worm operations, where the ration between the worm operations was Operations

(1:2:3:4:5:6:7:8:9:10:11:12:13:14:15) = Ration(1:1:1:0:0:2:1:1:1:1:1:1:2:0:0)

T#Commit Rate Read per TX Write per TX Speedup

Gen. WB Gen. WB Gen. WB Gen. WB1 1 1 36.362 31.480 1.374 1.962 1 12 0.998 0.998 34.260 31.609 1.373 1.962 2.177 1.44 0.994 0.995 37.974 31.815 1.371 1.962 3.474 2.28 0.985 0.987 46.219 32.300 1.377 1.963 5.435 2.867

Page 30: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Future Work

• Toolset that automatically generates a run configuration representing a user defined transactional and runtime behavior, e.g.:– Commit rate 80%– Reads per TX = 6– Writes per TX 2– Runtime = 100 moves/ms

• Implement BenchWorld as– Linked list– Sparse matrix

Page 31: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Future Work

• Understand how the Messaging works in BenchWorld

• Prepare a baseline set of run configurations to benchmark TM systems (HTM, STM and hybrid TMs)

• Fine grain version using two-phase locking

Page 32: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Conclusion

• WormBench highly configurable workload for TM

• TM design and implementation independent

• Critical sections defined by language level atomic blocks

• Coarse lock based version

• Sanity check for the overall TM system

• But still small that does not exercise language extensions for TM and their semantics

Page 33: WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman

Край