josep torrellas (university of illinois at urbana...

1
Josep Torrellas (University of Illinois at Urbana-Champaign) Ben Abbott (Southwest Research Institute) Ted Bapty (Vanderbilt University) Bob Bassett, David Ngo (BAE SYSTEMS) Hubertus Franke, Jose Moreira(IBM Research) Architecture Architecture Compiler Support Compiler Support Software Productivity Software Productivity M P Φ Φ Φ M M P M P P M P M M P M Φ Φ Φ M3T Novel Inter-Task Optimizations ! " # " Front End High Level Transformations Task Selection Inter-Task Optimizations Code Generation Intra-Task Optimizations Novel compiler algorithms to build tasks Sync Bus CPU+L1 CPU+L1 Banked L2 Off-Chip Memory On-Chip Network Banked L2 Banked L2 TST PTW task TST: Task State Table PTW: Pending Task Window TaskScalar Morph Evaluation Applications: $ % & % ( % Effect of Task Size Effect of Number of Processors Effect of Network Latency Timeline of Tasks (Matrix) Timeline of Tasks (Bubble) Timeline of Tasks (Pathological) Debugging Data Races Debugging Data Races [ISCA03] [ISCA03] LD A INC ST A lock(L) LD A INC ST A unlock(L) Task X Task Y ? CPU Memory Cache CPU Cache A A M3T Architecture CPU+L1 CPU+L1 TST CPU+L1 CPU+L1 TST TaskScalar Morph PTW PTW No explicit order between and Unlock L Unlock L Lock L Lock L Set F Wait F Barrier Barrier Task Ordering ( ) * + , Effectiveness Speculative Barrier Speculative Lock C D ACQUIRE RELEASE Safe Speculative B A E C BARRIER A B Safe Speculative 0 20 40 60 80 100 120 Base Spec Normalized Time Lost to Synchronization 17.7% Sync Time Reduction TaskScalar attempts to run section in parallel speculate past synchronization Result: appear as if we had invested more man-hours Reducing Parallel Programming Effort Reducing Parallel Programming Effort [ASPLOS02] [ASPLOS02] Parallelism Superscalar SMT CMP TaskS c a l a r SpecInt SpecFP Scientific Performance 0% 5% 10% 15% 0 20 40 60 80 100 120 Rollback Distance [Instructions per CPU] Overhead Better Chosen - . Overhead K K K K K K

Upload: others

Post on 14-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Josep Torrellas (University of Illinois at Urbana ...iacoma.cs.uiuc.edu/m3t/poster/poster_sandiego.pdfTed Bapty (Vanderbilt University) Bob Bassett, David Ngo (BAE SYSTEMS) Hubertus

Josep Torrellas (University of Illinois at Urbana-Champaign) Ben Abbott (Southwest Research Institute)Ted Bapty (Vanderbilt University) Bob Bassett, David Ngo (BAE SYSTEMS) Hubertus Franke, Jose Moreira(IBM Research)

ArchitectureArchitecture Compiler SupportCompiler Support

Software ProductivitySoftware Productivity

� � �� �� �� � �� � �� ��

� � ��

� �� � �

��� �� � � � �� �

�� �� � � � �� � � � � ��

�� � !� � !" � � " " ��

� � � " � !# � �� �$% &' � � % �� � (� � � �� � � # ) � � � �

M PΦΦΦΦ M

M P M P

P M P M

M P M ΦΦΦΦ

M3T

�����

�������

� � �� � � �� � � �� � ���� � �� � � � � � �

� ��� � � � � � � �

� �� � �

� � � � � � � �

� �� � �

Novel Inter-Task Optimizations� � � �� �� � �� � � � � ��

� � � �� � � � ��

� � � �� � � � � ��

� � � �� � � � � � �� � � �

� � � �� �� � ��

� � � �� �� �� � �� � � �

� � � ��� � � � � � � ��

� � ! � "

� # �� "

*+ , - *. *+ , - */*+ , - *0*+ , - *.1 /

Front End

High Level Transformations

Task Selection

Inter-Task Optimizations

Code Generation

Intra-Task Optimizations

Novel compiler algorithms to build tasks

Sync Bus

CPU+L1CPU+L1

Banked L2

Off-Chip Memory

On-Chip Network

Banked L2 Banked L2

TST

PT

W task

TST: Task State Table

PTW: Pending Task Window

TaskScalar Morph Evaluation

Applications: � � � $ %

2 �3 2 � � � � � # ) � # � � � � !3 � � !" �4

& � ' ' � �%

�" " � !� � ! � # � ! � 5� � � 4 � � ( � " � � # � � � � � � � �� 5� 5 5 � �� � 6 � � !7 ! 4

� � ( �� �� � � � %

� ! ( � �� � !3 5� � � � � � !7 ! 4 � � # � � # � � � ) � # � # � � �� � ��

Effect of Task Size Effect of Number of Processors Effect of Network Latency

Timeline of Tasks (Matrix) Timeline of Tasks (Bubble) Timeline of Tasks (Pathological)

89: : ;< 9= > ?: @AB @: C: D EF ?= G > H HI= AJ : ;K >= L= 8M > H > N A HAKO F EK @: = 9 : : ;< 9 = = AB D A EA M > DK HO;: 9: D ;= F DK @: >9 9 HA M >K A F D 89 : : ;< 9= > ?: C: ?O K F H: ? > DK K F D: KP F ? L H >K : DMO

8 G F F K @: Q: M < K A F D F EK >= L= R < ?= KO = 9 >P D > D ;: Q: M < K A F D F EK >= L= S AB @ H F > ; A G N > H > DM:T P >= K : F E ?: = F < ?M: =

$ � # � 6 !# � � # � � 4 # � � � � # ! � � ! � #

U F ;: M F > ?= : M ?AK A M > H= : M K A F DV D= : ?K 9: ? @ >9= < D D: M: = = > ?O N > ? ? A: ?=� " � � � �� !7 � � 4 # � � � � # ! � � ! � # � � " � � � �� � " � � � � !7 � 5 � � � ! �� �% � �� (�% 6 ��W �

X: K: M K M F D E HA M K =Y ? F H H N >M L F E E: D ;A DB K @ ?: > ;=Z= : M >M @: = K F = K F ?: = 9: M < H >K A C: = K >K :

�� !# � !# � �� � � �

�����������→

6 �� � � � ) " � �W � �� �

[ F M L\ FP D: ?] H >B \ 9 ? F ;< M: ?

R > ? ? A: ? \ H >B B A DB K >= L=

Debugging Data Races Debugging Data Races [ISCA03][ISCA03]

……LD AINCST A…

…lock(L)LD AINCST Aunlock(L)…

Task X Task Y

?

CPU

Memory

Cache

CPU

Cache

A A

M3T Architecture

CPU+L1CPU+L1TST CPU+L1CPU+L1TST

TaskScalar Morph^ # ! � 6 �3 �� � ! � # � � � � (

� � � ( � � # 5 � � � ! � ) �� �_ � � � � � ) !# � # � � � �

PT

W

PT

W

No explicit orderbetween

`` ``

and

`` ``

$ � �� � # � 4 # � � � � # ! � � ) � � � # !� � ! � #

� # � � � � # � � � # ) � � # ) � �a � 3 � � � ! � #

b �a �3 �� � ! � # !� ) � � � !# !� !�

Unlock L

Unlock L

Lock LLock L

Set F

Wait F

Barrier

Barrier

Task Ordering

cccdef� # )� � � )g � W � �

b � � 7 � )g � � � ! ��

defdefdefdef� # )� � � )g � W � �

b � � 7 � ) ' �� (

hidefdefdefj � � � k 3 !� !# W

$ � � b� � ��

defdefdefdef� 4 # � � � �� W �

" �� !# 7 � � !� 5 � ��

� � () � � * � �+ �� � ' � �, �� �� �

Effectiveness

Speculative Barrier

Speculative Lock

C

D ACQUIRE

RELEASESafe

Speculative

BA

E

C

BARRIERA B

Safe

Speculative

0

20

40

60

80

100

120

Base Spec

Nor

mal

ized

Tim

e Lo

st to

Syn

chro

niza

tion

lmn o pnq r

stu q v pwx y rz{ | }~stu q v pw� q � vn �q �q ~stu q v pw � ou n �q �q ~s� y |

17.7%

Sync Time Reduction

� ' � �W � � � )� � ! � # � � ��

TaskScalar attempts torun section in parallelspeculate past synchronization

Result: appear as if we had invested more man-hours

Reducing Parallel Programming Effort Reducing Parallel Programming Effort [ASPLOS02][ASPLOS02]

Parallelism

Superscalar

SMT

CMPTaskScalar

SpecIntSpecFP

Scientific

Per

form

ance

0%

5%

10%

15%

0 20 40 60 80 100 120Rollback Distance [Instructions per CPU]

Ove

rhe

ad

Better

Chosen- .

Overhead

K K K K K K