, wonsun ahn, josep torrellas university of illinoiswahn/papers/pres/present_micro10_1.pdf · dir...
TRANSCRIPT
Universityof Illinois http://iacoma.cs.uiuc.edu/
ScalableBulk: Scalable Cache Coherence for
Atomic Blocks in a Lazy Envionment
Xuehai Qian, Wonsun Ahn, Josep TorrellasUniversity of Illinois
Xuehai Qian Scalable Bulk Protocol
Motivation
• Architectures that continuously execute Atomic Blocks or Chunks (e.g.,TCC, BulkSC)
• Chunk: group of dynamically contiguous instructions executed atomically
• Provide performance and programmability advantages [Hammond 04], [Ahn 10]
• An important operation is commit: makes the state of chunk visible atomically
• Designs with lazy detection of chunk conflicts
• Commit involves updating cache states and checking for conflicts
• In lazy directory-based cache coherent systems, commit is very challenging
• Requires updating the states of the distributed caches in a way that appears that chunks execute in a total order
• In large systems, it results in an execution bottleneck
2
Xuehai Qian Scalable Bulk Protocol
Commit in Directory-Based Machine
3
ProcDir
Proc
Dir
ProcDir Proc
Dir
Proc
Dirwr
inv
ack
ack
ProcDir
Proc
Dir
ProcDir Proc
Dir
Proc
Dircommit
inv
ack
ack
commit
ackinv
ack
Conventional Cache Coherence Chunk-based Cache Coherence
Xuehai Qian Scalable Bulk Protocol
Recent Commit Protocols
• BulkSC [Ceze 07]:
• Centralized commit arbiter
• Scalable TCC [Chafi 07]:
• First distributed scheme
• Enforce total order by grabbing a commit token
• Serialization and global communication
• SEQ Protocol [Pugsley 08]:
• Extends Scalable TCC by eliminating global communication
• Still requires serialization of commits that use the same directory module
4
Xuehai Qian Scalable Bulk Protocol
SEQ Protocol
5
P0 P1
D0 D1 D2
P=Processor D=Directory module Chunk 0 Chunk 1
P0 P1
D0 D1 D2
occack
occack
P0 P1
D0 D1 D2
occcannot overlap
P0 P1
D0 D1 D2
ack ackocc
Commits of chunk 0 and chunk
1 are serialized.
Commits that use the same directory module are serialized even when they touch non-overlapped lines
Xuehai Qian Scalable Bulk Protocol
Goals for Scalable Commit
• No centralized structure
• Committing processor communicates only with the relevant directory modules
• Allow concurrent commits of chunks that use the same directory module, as long as the accessed addresses do not overlap
7
Xuehai Qian Scalable Bulk Protocol
Contribution: ScalableBulk
• ScalableBulk: protocol for bottleneck-free commit of chunks in directory-based system
• Key properties:
• Enable multiple concurrent chunk commits that use the same directory module
• Made possible by integrating signatures into directory design
• Better tolerance to commits that use many directories
• Eliminates all centralized structures and global communications
• Committing processor only communicates with the relevant directories (the homes of the addresses accessed by the chunk)
• More scalable than previous schemes
• Results: practically eliminates all commit stall overhead for 64 processors
8
Xuehai Qian Scalable Bulk Protocol
ScalableBulk Protocol Primitives
• Allowing multiple non-overlapping commits to use the same directory module
• Grouping directory modules
• Initiating the commit optimistically
9
L2$DirectoryModule
Proc. +L1$
Many-Core Architecture Considered
Xuehai Qian Scalable Bulk Protocol
Primitive 1: Allowing Concurrent Non-overlapping Commits
• Read and write footprint of a chunk is summarized in read and write signatures using Bloom filters
• On commit: signatures are sent to the relevant directory modules
• Only the addresses in the signature are locked in the directory during the commit
• Other chunks can commit using the same directory if their signatures do not conflict with the committing signatures
10
Xuehai Qian Scalable Bulk Protocol
Primitive 1: Allowing Concurrent Non-overlapping Commits
• Enables more concurrent commits
11
W1W0
Currently CommittingSignatures
⋂⋂
NOR
R2 W2
if not ∅: Nack commitelse: Start committing
Incoming Signatures
Xuehai Qian Scalable Bulk Protocol
Primitive 2:Grouping Directory Modules
12
• On a chunk commit: the relevant directory modules
• Coordinate their transitions by exchanging messages
• Form a Directory Group
• Identify a leader module that sends messages to the caches and the committing processor on behalf of the group
• Grouping Protocol:
• Complete distributed operation
• Few messages are required
• Leader: lowest-numbered directory module in the group
Xuehai Qian Scalable Bulk Protocol
Primitive 2:Grouping Directory Modules
13
P0
D0 D1 D2
P=Processor D=Directory module Chunk 0
P0
D0 D1 D2
Sig{R,W}Sig{R,W} Sig{R,W}
grab grabgrab
Group is formed
P0
D0 D1 D2
Leader commit_ack
succ
succ
bulk_inv
P0
D0 D1 D2
inv_ack
done
done
Commit finished
Xuehai Qian Scalable Bulk Protocol
Distributed Conflict Detection
14
P0 P1
D0 D1 D2
P=Processor D=Directory module Chunk 0 Chunk 1
P0 P1
D0 D1 D2
Sig{R,W} Sig{R,W}
grab
grab
P0 P1
D0 D1 D2
g_failure
P0 P1
D0 D1 D2grab
Conflict is detected in D1
commit failure
Xuehai Qian Scalable Bulk Protocol
Primitive 2:Grouping Directory Modules
15
• Deadlock is avoided by following a fixed directory-module traversal order
• Multiple groups can commit concurrently
Xuehai Qian Scalable Bulk Protocol
Concurrent Commit
16
P0 P1
D0 D1 D2
P=Processor D=Directory module Chunk 0 Chunk 1
P0 P1
D0 D1 D2
Sig{R,W} Sig{R,W}
grab
grab
P0 P1
D0 D1 D2
grab
P0 P1
D0 D1 D2grab
No conflict is detected in D1.
Two groups commit
concurrently.
Xuehai Qian Scalable Bulk Protocol
Primitive 3:Optimistic Commit Initiation
• Idea:
• Committing processor (CP) assumes its commit transaction will succeed
• CP consumes incoming messages before receiving OK to commit
• Advantages: it enables more overlapping of commits
• Details in the paper
17
Xuehai Qian Scalable Bulk Protocol
Summary: Scalable Commit
• Commit has no centralization point
• Committing processor communicates only with relevant directory modules (no message broadcasting)
• Multiple committing chunks can use the same directory modules, if the addresses that they access do not overlap
• Similar to how conventional protocols support concurrent writes
• Optimistic Commit Initiation removes operations from critical path of the commit
18
Commit is truly scalable
Xuehai Qian Scalable Bulk Protocol
Evaluation
• Cycle-accurate execution-driven simulator based on SESC and Pin
• Number of cores: 32 and 64
• 11 SPLASH-2 and 7 PARSEC applications
• Implemented all existing protocols:
• ScalableBulk
• Scalable TCC
• SEQ
• BulkSC
20
Xuehai Qian Scalable Bulk Protocol
Execution Time
• ScalableBulk practically eliminates all commit stall time
• Other existing protocols suffer commit stall (see paper)
21
Vips_32
Vips_64
Swaptions_32
Swaptions_64
Blackscholes_32
Blackscholes_64
Fluidanimate_32
Fluidanimate_64
Canneal_32
Canneal_64
Dedup_32
Dedup_64
Facesim_32
Facesim_64
AVERAGE_32
AVERAGE_64
00.010.020.030.040.050.060.070.080.09
0.1Ex
ecut
ion
Tim
e
31.2
45.9
25.2
44.3
31.7
50.0
29.133.7
22.6
40.9
15.8
19.9
32.0
63.6
25.3
37.9
SquashCommitCache MissUseful
Xuehai Qian Scalable Bulk Protocol
Directories Used Per Commit
• Chunk commits use about 6 directories on average
• ScalableBulk is able to overlap the commits that use the same directories if the signatures do not conflict
22
Xuehai Qian Scalable Bulk Protocol
Network Message Characterization
• ScalableBulk sends fewer messages than other distributed protocols
23
Scalable TCC
ScalableBulk
SEQ
BulkSC
Xuehai Qian Scalable Bulk Protocol
Also in the paper
• Mechanism to handle fairness and avoid starvation
• Many details on the implementation of the ScalableBulk protocol
• Detailed results characterizing various aspects of the protocol
24
Xuehai Qian Scalable Bulk Protocol
Conclusion
• Proposed ScalableBulk: protocol for bottleneck-free commit of chunks in lazy directory-based system
• Key properties:
• Enables multiple concurrent chunk commits that use the same directory module
• Thanks to use of signatures
• Eliminates all centralized structures and global communications
• Results: practically eliminates all commit stall overhead for 64 processors
• Effectively enables a large-scale chunk-based machine
25
Universityof Illinois http://iacoma.cs.uiuc.edu/
ScalableBulk: Scalable Cache Coherence for
Atomic Blocks in a Lazy Envionment
Xuehai Qian, Wonsun Ahn, Josep TorrellasUniversity of Illinois