eazyhtm: eager-lazy hardware transactional memory€¦ · eazyhtm: eager-lazy hardware...

24
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center, UPC BITS Pilani Microsoft Research Cambridge

Upload: others

Post on 27-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

EazyHTM: Eager-Lazy Hardware

Transactional Memory

Saša Tomić, Cristian Perfumo, Chinmay Kulkarni,

Adrià Armejach, Adrián Cristal, Osman Unsal,

Tim Harris, Mateo Valero

Barcelona Supercomputing Center, UPC

BITS Pilani

Microsoft Research Cambridge

Page 2: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Why Transactional Memory?

• Lock-based parallel programming has problems

– Deadlocks, races, complexity, performance, …

• Transactional Memory (TM) to the rescue

– Optimistic concurrency control mechanism

– Easy to use

– Deadlock free

– Supports composability

– Protects data in critical sections

• Hardware-TM (HTM), Software-TM (STM) and hybrid

• Lock-based parallel programming has problems

– Deadlocks, races, complexity, performance, …

• Transactional Memory (TM) to the rescue

– Optimistic concurrency control mechanism

– Easy to use

– Deadlock free

– Supports composability

– Protects data in critical sections

• Hardware-TM (HTM), Software-TM (STM) and hybrid

2

Page 3: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

HTM terminology

• Atomic section/transaction: group of instructions that

appear to take effect instantaneously

• Where are speculative values stored (version

management):

– in-place, and log the original value, or

– buffered in private storage, publish on commit

• Conflict: TX writes where others TX reads

– Detection: an action in which we check for conflicts

– Resolution: an action performed to resolve the conflict

• Can be abort, stalling the execution, …

3

Page 4: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

• A.k.a. pessimistic

• Writes in-place, detects&resolves conflicts on every access

• LogTM [Moore, HPCA06], LogTM-SE [Yen, HPCA07]

Eager HTM

4

Stall

W

RR

TX 1

TX 2

TX 3

fast

commit

Limited

concurrency

Fast commit

Slow abort

Page 5: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

• A.k.a. optimistic

• Writes buffered, detect&resolve conflicts on commit

• TCC [Hammond, ISCA04], Scalable-TCC [Chafi, HPCA07]

Lazy HTM

5

W

RR

TX 1

TX 2

TX 3

complex

commit:

validate +

write

Fast abort

Complex

commit

Good

concurrency

Page 6: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

The Motivation

Splitting conflict management

• Eager-Lazy hardware-software TM exists (FlexTM [Shriraman, ISCA08]):

– Software begin, commit and abort

– Probabilistic (signature based) conflict detection

• EazyHTM is the first pure-hardware TM

6

Conflict

detection

Eager

Lazy

Conflict resolution

Eager Lazy

LogTM

TCC, S-TCCImpossible

EazyHTM Fast commit

Good

concurrency

Page 7: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Outline

• Motivation

• Contributions

• Hardware changes

• The Protocol

• Evaluation

• Conclusions

7

Page 8: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

EazyHTM Contributions

• The best of two worlds

– Eager conflict detection: simple commit/exact list of

conflicts in advance

– Lazy conflict resolution: good concurrency

• Parallel commits of non-conflicting TXs

• Designed for CMPs (Chip-Multiprocessors)

– Use cores proximity

– MESI/MOESI protocol upgrade (easier verification)

8

Page 9: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Hardware changes

9

Racers list – 1 bit per core

Killers list – 1 bit per core

SR – 1 bit per line

SM – 1 bit per line

TD – 1 bit per line

Register file

checkpoint

Racers listRacers list

Killers listKillers listCPU

S

R

S

R Existing cache logicPrivate

Cache(s)S

M

S

M

T

D

T

D Existing directory logicDirectory

• tracks conflicts

• tracks conflicts

• bit-vector

• 32 bits for 32 cores

holds read/write set

read only optimization bit

(details in the paper)

read-only optimization bit

(details in the paper)

core core core... ... ...

Page 10: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Racers and killers list

• If line is shared between two TXs:

– Read-Read

• No conflict

– Write-Read, Read-Write, Write-Write

• Writer adds reader TX into “racers” list

– “TXs that I have to abort” list, if I commit first

• Reader adds writer TX into “killers” list

– “TXs that can abort me” list, if they commit first

• We illustrate only the Write-after-Read (WAR) conflict

10

Page 11: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

txMark @A

ACK @A, 0

... ...

no other

sharers

EazyHTM Protocol

Conflict Detection (1/2)

11

racers

killers

TX 0

racers

killers

TX 2

sharers @A

Directory

1

2

TX 0 TX 2

BTX

RD A

CTX

TX 0 TX 2

BTX

BTX

RD A

WR A

CTX

CTX

Replaces

GETS/GETX

Page 12: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

TX 0 TX 2

BTX

RD A

CTX

TX 0 TX 2

BTX

BTX

RD A

WR A

CTX

CTX

racers

killers

TX 2

sharers @A

Directory

racers

killers

TX 0

ACK @A, 1txAccessor #2, @A

txMark @A

Reader #0, @A

Potential

conflict

1 other

sharer

Writer #2, @A

EazyHTM Protocol

Conflict Detection (2/2)

12

Remember:

abort TX#0

on commitRemember:

TX#2 can

abort me

1

23

4

5

Page 13: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

racers

killers

TX 2

racers

killers

TX 0

sharers @A

Directory

Abort from TX#2

WR @A (commit)

Abort Ack from TX#0

EazyHTM Protocol

Conflict Resolution

13

TX#2 first came to the commit point, abort TX#0!1

1

2

3

TX 0 TX 2

BTX

RD A

CTX

TX 0 TX 2

BTX

BTX

RD A

WR A

CTX

CTX

Page 14: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

TX 0 TX 2

BTX

WR A

CTX

TX 0 TX 2

BTX

BTX

WR A

WR B

CTX

CTX

TX 0 TX 2

BTX

WR A

CTX

TX 0 TX 2

BTX

BTX

WR A

WR B

CTX

CTX

TX 0 TX 2

BTX

WR A

CTX

TX 0 TX 2

BTX

BTX

WR A

WR B

CTX

CTX

0 other

sharers

EazyHTM Protocol

Disjoint data => parallel commit

14

txMark @B

...

txMark @A

ACK @A, 0

WR @A

(commit)

WR @B

(commit)

TX#0 works with line @A TX#2 works with line @B

sharers @A

Directorysharers @B

1 1

ACK @B, 022

racers

killers

TX 0

3racers

killers

TX 2

3

...

NO

SERIALIZATION0 other

sharers

Page 15: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Implementation

• Implemented in M5, full-system simulator (Alpha)

• Private L1 (32KB, 4-way, 64B CL, 2 cycles)

• Private L2 (512KB, 8-way, 64B CL, 10 cycles)

• Memory (with directory, 100 cycles)

• ICN (2D Mesh, 10 cycles per hop)

15

Page 16: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Evaluation

• Evaluated STAMP benchmarks

• Compared with Scalable-TCC-like HTM

– Same base simulator

– Implemented specialized directory protocol

• Compared with ideal lazy HTM (MESI based)

– magical conflict detection

– instant conflict resolution

– parallel write-back commit

16

Page 17: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Kmeans Low

• Small TXs (RS 15 CL; WS 5 CL)

• Low contention

(10% aborts)

• Similar profile to

“replacing locks with atomic”

• Near ideal performance

• K-means: groups N-dimensional

space into K clusters

• Most of the SPLASH-2 suite has

similar profile

17

0

5

10

15

20

25

30

0 10 20 30 40

sp

ee

du

p

processors

Kmeans-Low

Ideal

EazyHTM

STCC

Page 18: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

SSCA2

• Small TXs (RS 50 CL, WS 10 CL)

• Low contention

(1.2% aborts)

• Near ideal performance

• Scalability affected by barriers,

not by contention

• SSCA2: large directed graph

operations

18

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 10 20 30 40

sp

ee

du

p

processors

SSCA2

Ideal

EazyHTM

STCC

Page 19: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Yada

• Large TXs (260 CL RS, 140 CL

WS)

• Moderate contention

(35% aborts)

• We can see good performance

also for large TXs!

• Yada: delaunay mesh refinement

19

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

Yada

Ideal

EazyHTM

STCC

Page 20: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Intruder

• Medium TXs (53 CL RS, 20 CL

WS)

• High contention (85%

aborts)

• Very bad scalability for all HTMs

• Every transaction detects conflicts

over and over again – lot of

conflict detection messages slow

down the execution

• Intruder: signature based network

intrusion detection system

20

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

Intruder

Ideal

EazyHTM

STCC

Page 21: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Only high-conflict STAMP

• >50% abort rate only

• High contention high-core-count

should be optimized

• Averages:

• Labyrinth

• Intruder

• Kmeans-Hi

• Results highly affected by

Intruder

21

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

High-conflict STAMP

Ideal

EazyHTM

STCC

Page 22: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Only low-conflict STAMP

• <50% abort rate only

• Low abort rate necessary for

scaling

• Excludes:

• Labyrinth 8-32

• Intruder 16-32

• Kmeans-Hi 32

22

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

Scaling STAMP

Ideal

EazyHTM

STCC

Page 23: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Conclusions

• Introduced EazyHTM, a new HTM implementation

– Eager conflict detection, lazy conflict resolution

– Fast: performs well for low conflict parallel applications

– Minimal changes to directory protocols (easier verification)

– As scalable as standard directory protocol

• EazyHTM mechanism could allow (future work):

– Simpler transaction prioritization

– Less wasted work

– Better performance optimization

– Power efficient TM mechanisms

23

Page 24: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Thank you!

Questions?

[email protected]

24