optimistic intra-transaction parallelism using thread level speculation chris colohan 1, anastassia...
TRANSCRIPT
![Page 1: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/1.jpg)
Optimistic Intra-Transaction Parallelism using Thread Level Speculation
Chris Colohan1, Anastassia Ailamaki1,J. Gregory Steffan2 and Todd C. Mowry1,3
1Carnegie Mellon University2University of Toronto
3Intel Research Pittsburgh
![Page 2: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/2.jpg)
2
Chip Multiprocessors are Here!
2 cores now, soon will have 4, 8, 16, or 32 Multiple threads per core How do we best use them?
IBM Power 5
AMD Opteron
Intel Yonah
![Page 3: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/3.jpg)
3
Multi-Core Enhances Throughput
Database ServerUsers
Transactions DBMS Database
Cores can run concurrent transactions and improve
throughput
Cores can run concurrent transactions and improve
throughput
![Page 4: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/4.jpg)
4
Using Multiple CoresDatabase ServerUsers
Transactions DBMS Database
Can multiple cores improvetransaction latency?
Can multiple cores improvetransaction latency?
![Page 5: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/5.jpg)
5
Parallelizing transactions
SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}
DBMS
Intra-query parallelism Used for long-running queries (decision support) Does not work for short queries
Short queries dominate in commercial workloads
![Page 6: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/6.jpg)
6
Parallelizing transactions
SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}
DBMS
Intra-transaction parallelism Each thread spans multiple queries
Hard to add to existing systems! Need to change interface, add latches and locks,
worry about correctness of parallel execution…
![Page 7: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/7.jpg)
7
Parallelizing transactions
SELECT cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock; quantity--; UPDATE stock WITH quantity; INSERT item INTO order_line;}
DBMS
Intra-transaction parallelism Breaks transaction into threads
Hard to add to existing systems! Need to change interface, add latches and locks,
worry about correctness of parallel execution…
Thread Level Speculation (TLS)makes parallelization easier.
Thread Level Speculation (TLS)makes parallelization easier.
![Page 8: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/8.jpg)
8
Thread Level Speculation (TLS)
*p=
*q=
=*p
=*q
Sequential
Tim
e
Parallel
*p=
*q=
=*p
=*q
=*p
=*q
Epoch 1 Epoch 2
![Page 9: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/9.jpg)
9
Thread Level Speculation (TLS)
*p=
*q=
=*p
=*q
Sequential
Tim
e
*p=
*q=
=*p
R2
Violation!
=*p
=*q
Parallel
Use epochs
Detect violations Restart to
recover Buffer state
Oldest epoch: Never restarts No buffering
Worst case: Sequential
Best case: Fully parallelData dependences limit performance.Data dependences limit performance.
Epoch 1 Epoch 2
![Page 10: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/10.jpg)
10
TransactionProgrammer
DBMS Programmer
Hardware Developer
A Coordinated Effort
Choose epoch boundaries
Choose epoch boundaries
Remove performance bottlenecks
Remove performance bottlenecks
Add TLS support to architecture
Add TLS support to architecture
![Page 11: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/11.jpg)
11
So what’s new?
Intra-transaction parallelism Without changing the transactions With minor changes to the DBMS Without having to worry about locking Without introducing concurrency bugs With good performance
Halve transaction latency on four cores
![Page 12: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/12.jpg)
12
Related Work
Optimistic Concurrency Control (Kung82)
Sagas (Molina&Salem87)
Transaction chopping (Shasha95)
![Page 13: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/13.jpg)
13
Outline
Introduction Related work Dividing transactions into epochs Removing bottlenecks in the DBMS Results Conclusions
![Page 14: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/14.jpg)
14
Case Study: New Order (TPC-C)
Only dependence is the quantity field Very unlikely to occur (1/100,000)
GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
78% of transactionexecution time
![Page 15: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/15.jpg)
15
Case Study: New Order (TPC-C)GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;
TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
GET cust_info FROM customer;UPDATE district WITH order_id; INSERT order_id INTO new_order;
TLS_foreach(item) { GET quantity FROM stock WHERE i_id=item; UPDATE stock WITH quantity-1 WHERE i_id=item; INSERT item INTO order_line;}
![Page 16: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/16.jpg)
16
Outline
Introduction Related work Dividing transactions into epochs Removing bottlenecks in the
DBMS Results Conclusions
![Page 17: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/17.jpg)
17
Dependences in DBMSTim
e
![Page 18: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/18.jpg)
18
Dependences in DBMSTim
e
Dependences serialize execution!
Example: statistics gathering pages_pinned++ TLS maintains serial ordering of
increments To remove, use per-CPU counters
Performance tuning: Profile execution Remove bottleneck dependence Repeat
![Page 19: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/19.jpg)
19
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 1
put_page(5)
ref: 0
![Page 20: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/20.jpg)
20
get_page(5)put_page(5)
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
TLS ensures first epoch gets page
first.Who cares?
TLS ensures first epoch gets page
first.Who cares?
TLS maintains original load/store order
Sometimes this is not needed
get_page(5)put_page(5)
![Page 21: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/21.jpg)
21
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
= Escape SpeculationIsolated: undoing get_page will not affect other transactionsUndoable: have an operation (put_page) which returns the system to its initial state
Isolated: undoing get_page will not affect other transactionsUndoable: have an operation (put_page) which returns the system to its initial state
• Escape speculation• Invoke operation• Store undo function• Resume speculation
• Escape speculation• Invoke operation• Store undo function• Resume speculation
get_page(5)put_page(5)
put_page(5)get_page(5)
![Page 22: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/22.jpg)
22
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
get_page(5)put_page(5)
Not undoable!
Not undoable!
get_page(5)put_page(5)
= Escape Speculation
![Page 23: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/23.jpg)
23
Buffer Pool Management
CPU
Buffer Pool
get_page(5)
ref: 0
put_page(5)
Tim
e
get_page(5)
put_page(5)
get_page(5)
put_page(5)
Delay put_page until end of epoch Avoid dependence
= Escape Speculation
![Page 24: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/24.jpg)
24
Removing Bottleneck Dependences
We introduce three techniques: Delay operations until non-speculative
Mutex and lock acquire and release Buffer pool, memory, and cursor release Log sequence number assignment
Escape speculation Buffer pool, memory, and cursor allocation
Traditional parallelization Memory allocation, cursor pool, error
checks, false sharing
![Page 25: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/25.jpg)
25
Outline
Introduction Related work Dividing transactions into epochs Removing bottlenecks in the DBMS Results Conclusions
![Page 26: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/26.jpg)
26
Experimental Setup
Detailed simulation Superscalar, out-of-
order, 128 entry reorder buffer
Memory hierarchy modeled in detail
TPC-C transactions on BerkeleyDB In-core database Single user Single warehouse Measure interval of 100
transactions Measuring latency not
throughput
CPU
32KB4-wayL1 $
Rest of memory system
Rest of memory system
CPU
32KB4-wayL1 $
CPU
32KB4-wayL1 $
CPU
32KB4-wayL1 $
2MB 4-way L2 $
![Page 27: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/27.jpg)
27
Optimizing the DBMS: New Order
0
0.25
0.5
0.75
1
1.25
Seque
ntial
No Opt
imiza
tions
Latch
es
Lock
s
Mall
oc/F
ree
Buffe
r Poo
l
Curso
r Que
ue
Error C
heck
s
False
Sharin
g
B-Tre
e
Logg
ing
Tim
e (n
orm
aliz
ed)
Idle CPU
ViolatedCache Miss
Busy
Cache misses
increase
Cache misses
increase
Other CPUs not helping
Other CPUs not helping
Can’t optimize
much more
Can’t optimize
much more
26% improvemen
t
26% improvemen
t
![Page 28: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/28.jpg)
28
Optimizing the DBMS: New Order
0
0.25
0.5
0.75
1
1.25
Seque
ntial
No Opt
imiza
tions
Latch
es
Lock
s
Mall
oc/F
ree
Buffe
r Poo
l
Curso
r Que
ue
Error C
heck
s
False
Sharin
g
B-Tre
e
Logg
ing
Tim
e (n
orm
aliz
ed)
Idle CPU
ViolatedCache Miss
Busy
This process took me 30 days and <1200 lines of code.
This process took me 30 days and <1200 lines of code.
![Page 29: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/29.jpg)
29
Other TPC-C Transactions
0
0.25
0.5
0.75
1
New Order Delivery Stock Level Payment Order Status
Tim
e (
no
rma
lize
d)
Idle CPU
FailedCache Miss
Busy
3/5 Transactions speed up by 46-66%
![Page 30: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/30.jpg)
30
Conclusions
TLS makes intra-transaction parallelism practical Reasonable changes to transaction,
DBMS, and hardware Halve transaction latency
![Page 31: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/31.jpg)
31
Needed backup slides (not done yet)
2 proc. Results Shared caches may change how you
want to extract parallelism! Just have lots of transactions: no sharing TLS may have more sharing
![Page 32: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/32.jpg)
Any questions?
For more information, see:www.colohan.com
![Page 33: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/33.jpg)
Backup Slides Follow…
![Page 34: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/34.jpg)
LATCHES
![Page 35: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/35.jpg)
35
Latches
Mutual exclusion between transactions Cause violations between epochs
Read-test-write cycle RAW Not needed between epochs
TLS already provides mutual exclusion!
![Page 36: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/36.jpg)
36
Latches: Aggressive Acquire
Acquirelatch_cnt++…work…latch_cnt--
Homefree
latch_cnt++…work…(enqueue release)
Commit worklatch_cnt--
Homefree
latch_cnt++…work…(enqueue release)
Commit worklatch_cnt--Release
Larg
e c
riti
cal secti
on
![Page 37: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/37.jpg)
37
Latches: Lazy Acquire
Acquire…work…Release
Homefree
(enqueue acquire) …work…(enqueue release)
AcquireCommit workRelease
Homefree
(enqueue acquire)…work…(enqueue release)
AcquireCommit workRelease
Sm
all c
riti
cal secti
on
s
![Page 38: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/38.jpg)
HARDWARE
![Page 39: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/39.jpg)
39
TLS in Database Systems
Non-DatabaseTLS
Tim
e
TLS in DatabaseSystems
Large epochs:• More dependences
• Must tolerate
• More state• Bigger buffers
Large epochs:• More dependences
• Must tolerate
• More state• Bigger buffers
Concurrenttransactions
![Page 40: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/40.jpg)
40
Feedback Loop I know this is
parallel!
for() { do_work();}
par_for() { do_work();}
Must…Make…Faster
think
feed back feed back feed back feed back feed back feed back feed back feed back feed back feed back feed
![Page 41: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/41.jpg)
41
Violations == Feedback
*p=
*q=
=*p
=*q
Sequential
Tim
e
*p=
*q=
=*p
R2
Violation!
=*p
=*q
Parallel
0x0FD80xFD200x0FC00xFC18
Must…Make…Faster
Must…Make…Faster
![Page 42: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/42.jpg)
42
Eliminating Violations
*p=
*q=
=*p
R2
Violation!
=*p
=*q
Parallel
*q==*q
=*q
Violation!
Eliminate *p Dep.
Tim
e
0x0FD80xFD200x0FC00xFC18
Optimization maymake slower?
![Page 43: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/43.jpg)
43
Tolerating Violations: Sub-epochs
Tim
e
*q=Violation!
Sub-epochs
=*q
=*q
*q==*q
=*q
Violation!
Eliminate *p Dep.
![Page 44: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/44.jpg)
44
Sub-epochs
Started periodically by hardware How many? When to start?
Hardware implementation Just like epochs
Use more epoch contexts No need to check
violations between sub-epochs within an epoch
*q=Violation!
Sub-epochs
=*q
=*q
![Page 45: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/45.jpg)
45
Old TLS Design
CPU
L1 $
L2 $
Rest of memory systemRest of memory system
CPU
L1 $
CPU
L1 $
CPU
L1 $L1 $ L1 $ L1 $ L1 $
Buffer speculative state in write back L1
cache
Invalidation
Detect violations through
invalidations
Rest of system only sees committed
data
Restart by invalidating speculative lines
Problems:
• L1 cache not large enough• Later epochs only get values on commit
Problems:
• L1 cache not large enough• Later epochs only get values on commit
![Page 46: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/46.jpg)
46
New Cache Design
CPU
L1 $
L2 $
Rest of memory systemRest of memory system
CPU
L1 $
CPU
L1 $
CPU
L1 $
Buffer speculative and non-speculative state for all epochs
in L2
Speculative writes immediately visible
to L2 (and later epochs)
Detect violations at lookup time
Invalidation coherence between L2 caches
L1 $ L1 $ L1 $ L1 $
L2 $
Invalidation
Restart by invalidating speculative lines
![Page 47: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/47.jpg)
47
New Features
CPU
L1 $
L2 $
Rest of memory systemRest of memory system
CPU
L1 $
CPU
L1 $
CPU
L1 $L1 $ L1 $ L1 $ L1 $
L2 $
Speculative state in L1 and L2 cache
Cache line replication (versions)
Data dependence tracking within cache
Speculative victim cache
New
!
New
!
![Page 48: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/48.jpg)
48
Scaling
0
0.25
0.5
0.75
1
Seque
ntial
2 CPUs
4 CPUs
8 CPUs
Tim
e (n
orm
aliz
ed)
0
0.25
0.5
0.75
1
Seque
ntial
2 CPUs
4 CPUs
8 CPUs
Modified with 50-150 items/transaction
Idle CPU
Failed Speculation
IUO Mutex Stall
Cache Miss
Instruction Execution
![Page 49: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/49.jpg)
49
Evaluating a 4-CPU system
0
0.25
0.5
0.75
1
Seque
ntial
TLS S
eq
No Sub
-epo
ch
Baseli
ne
No Spe
culat
ion
Tim
e (n
orm
aliz
ed)
Idle CPU
Failed Speculation
IUO Mutex Stall
Cache Miss
Instruction Execution
Original benchmark run on
1 CPU
Original benchmark run on
1 CPU
Parallelized benchmark run on 1
CPU
Parallelized benchmark run on 1
CPU
Without sub-epoch support
Without sub-epoch support
Parallel execution
Parallel execution
Ignore violations (Amdahl’s Law limit)
Ignore violations (Amdahl’s Law limit)
![Page 50: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/50.jpg)
50
Sub-epochs: How many/How big?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Number of Sub-epochs/Instructions per Sub-epoch
Tim
e (n
orm
aliz
ed)
• Supporting more sub-epochs is better• Spacing depends on location of violations
• Even spacing is good enough
• Supporting more sub-epochs is better• Spacing depends on location of violations
• Even spacing is good enough
![Page 51: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/51.jpg)
51
Query Execution
Actions taken by a query: Bring pages into buffer pool Acquire and release latches & locks Allocate/free memory Allocate/free and use cursors Use B-trees Generate log entries
These generate violations.These generate violations.
![Page 52: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/52.jpg)
52
Applying TLS
1. Parallelize loop2. Run benchmark3. Remove bottleneck4. Go to 2 T
ime
![Page 53: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/53.jpg)
53
Outline
Hardware Developer
TransactionProgrammer
DBMS Programmer
![Page 54: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/54.jpg)
54
Violation Prediction
*q==*qDone
Predict Dependences
Pre
dic
tor
*q==*q
=*q
Violation!
Eliminate R1/W2
Tim
e
![Page 55: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/55.jpg)
55
Violation Prediction
*q==*qDone
Predict Dependences
Pre
dic
tor
Tim
e
Predictor problems: Large epochs
many predictions Failed prediction
violation Incorrect prediction
large stall
Two predictors required:
Last store Dependent load
![Page 56: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/56.jpg)
56
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
CPU 1
L1 $
L2 $
Rest of memory systemRest of memory system
CPU 2
L1 $
CPU 3
L1 $
CPU 4
L1 $L1 $ L1 $ L1 $ L1 $
![Page 57: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/57.jpg)
57
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*p11
*s1
*t
1
1
1
1valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
![Page 58: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/58.jpg)
58
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*p11
*s1
*t
1
1
1
1
1
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
![Page 59: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/59.jpg)
59
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*p11 1
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
![Page 60: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/60.jpg)
60
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*q1
*p11 1
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
1
![Page 61: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/61.jpg)
61
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
1 *q1
*p11 1
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
1
![Page 62: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/62.jpg)
62
Replication
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*q=
*p11 1
1 *q11 1 *q11
Can’t invalidate line if it contains two epoch’s changes
Can’t invalidate line if it contains two epoch’s changes
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
![Page 63: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/63.jpg)
63
Replication
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*q=
1
*p11 1
1 *q1
*q11
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
![Page 64: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/64.jpg)
64
Replication
*p=
*q=
=*p
R2
Violation!
=*p
=*q
*q=
Makes epochs independent Enables sub-epochs
1
*p11 1
1 *q1
*q11
valid CPU 2 CPU 3
SM
SL
SM
SL
CPU 4
SM
SL
CPU 1
SM
SL
![Page 65: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/65.jpg)
65
Sub-epochs
*p=*q==*p
=*q
*p1
*q1
*q1
valid CPU 1b CPU 1c
SM
SL
SM
SL
CPU 1d
SM
SL
CPU 1a
SM
SL
1
1*q=
1
1*p=
*p1 1
1
1a
1b
1c
1d …………
Uses more epoch contexts Detection/buffering/rewind is “free” More replication:
Speculative victim cache
![Page 66: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/66.jpg)
66
get_page() wrapper
page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret;
tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut);
ret = get_page(id);
tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation()
return ret;}
Wraps get_page()
![Page 67: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/67.jpg)
67
get_page() wrapper
page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret;
tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut);
ret = get_page(id);
tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation()
return ret;}
No violations while calling get_page()
![Page 68: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/68.jpg)
68
May get bad input data from speculative thread!
get_page() wrapper
page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret;
tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut);
ret = get_page(id);
tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation()
return ret;}
![Page 69: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/69.jpg)
69
get_page() wrapper
page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret;
tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut);
ret = get_page(id);
tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation()
return ret;}
Only one epoch per transaction at a time
![Page 70: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/70.jpg)
70
How to undo get_page()
get_page() wrapper
page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret;
tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut);
ret = get_page(id);
tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation()
return ret;}
![Page 71: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/71.jpg)
71
get_page() wrapper
page_t *get_page_wrapper(pageid_t id) { static tls_mutex mut; page_t *ret;
tls_escape_speculation(); check_get_arguments(id); tls_acquire_mutex(&mut);
ret = get_page(id);
tls_release_mutex(&mut); tls_on_violation(put, ret); tls_resume_speculation()
return ret;}
Isolated Undoing this operation
does not cause cascading aborts
Undoable Easy way to return
system to initial state
Can also be used for: Cursor management malloc()
![Page 72: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/72.jpg)
72
TPC-C Benchmark
Company
Warehouse 1 Warehouse W
District 1 District 2 District 10
Cust 1 Cust 2 Cust 3k
![Page 73: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/73.jpg)
73
TPC-C Benchmark
WarehouseW
DistrictW*10
CustomerW*30k
OrderW*30k+
Order LineW*300k+
New OrderW*9k+
HistoryW*30k+
StockW*100k
Item100k
10
3k
1+
1+
0-1
5-15
3+W
100k
![Page 74: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/74.jpg)
74
What is TLS?
while(cond) {
x = hash[i];
...
hash[j] = y;
...
}
Tim
e
= hash[3];
...
hash[10] =
...
= hash[19];
...
hash[21] =
...
= hash[33];
...
hash[30] =
...
= hash[10];
...
hash[25] =
...
![Page 75: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/75.jpg)
75
What is TLS?
while(cond) {
x = hash[i];
...
hash[j] = y;
...
}
Tim
e
= hash[3];
...
hash[10] =
...
= hash[19];
...
hash[21] =
...
= hash[33];
...
hash[30] =
...
= hash[10];
...
hash[25] =
...
Processor A Processor B Processor C Processor D
Thread 1 Thread 2 Thread 3 Thread 4
![Page 76: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/76.jpg)
76
What is TLS?
while(cond) {
x = hash[i];
...
hash[j] = y;
...
}
Tim
e
= hash[3];
...
hash[10] =
...
= hash[19];
...
hash[21] =
...
= hash[33];
...
hash[30] =
...
= hash[10];
...
hash[25] =
...
Processor A Processor B Processor C Processor D
Thread 1 Thread 2 Thread 3 Thread 4
Violation!
![Page 77: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/77.jpg)
77
What is TLS?
while(cond) {
x = hash[i];
...
hash[j] = y;
...
}
Tim
e
= hash[3];
...
hash[10] =
...attempt_commit()
= hash[19];
...
hash[21] =
...attempt_commit()
= hash[33];
...
hash[30] =
...attempt_commit()
Processor A Processor B Processor C Processor D
Thread 1 Thread 2 Thread 3 Thread 4
Violation!
= hash[10];
...
hash[25] =
...attempt_commit()
![Page 78: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/78.jpg)
78
What is TLS?
while(cond) {
x = hash[i];
...
hash[j] = y;
...
}
Tim
e
= hash[3];
...
hash[10] =
...attempt_commit()
= hash[19];
...
hash[21] =
...attempt_commit()
= hash[33];
...
hash[30] =
...attempt_commit()
= hash[10];
...
hash[25] =
...attempt_commit()
Processor A Processor B Processor C Processor D
Thread 1 Thread 2 Thread 3 Thread 4
Violation!
Redo
= hash[10];
...
hash[25] =
...attempt_commit()
Thread 4
![Page 79: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/79.jpg)
79
TLS Hardware Design
What’s new? Large threads Epochs will communicate Complex control flow Huge legacy code base
How does hardware change? Store state in L2 instead of L1 Reversible atomic operations Tolerate dependences
Aggressive update propagation (implicit forwarding) Sub-epochs
![Page 80: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/80.jpg)
80
L1 Cache Line
SL bit “L2 cache knows this line has been speculatively
loaded” On violation or commit: clear
SM bit “This line contains speculative changes” On commit: clear On violation: SM Invalid
Otherwise, just like a normal cache
SL
SM
Valid
LR
U
Tag
Data
![Page 81: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/81.jpg)
81
escaping Speculation
Speculative epoch wants to make system visible change!
Ignore SM lines while escaped Stale bit
“This line may be outdated by speculative work.” On violation or commit: clear
SL
SM
Valid
LR
U
Tag
Data
Sta
le
![Page 82: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/82.jpg)
82
L1 to L2 communication
L2 sees all stores (write through) L2 sees first load of an epoch
NotifySL message
L2 can track data dependences!
![Page 83: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/83.jpg)
83
L1 Changes Summary
Add three bits to each line SL SM Stale
Modify tag match to recognize bits Add queue of NotifySL requests
SL
SM
Valid
LR
U
Tag
Data
Sta
le
![Page 84: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/84.jpg)
84
L2 Cache Line
SL
SM
Fin
e G
rain
ed
SM
Valid
Dir
ty
Exclu
siv
e
LR
U
Tag
Data
SL
SM
CPU1 CPU2
Cache line can be: Modified by one CPU Loaded by multiple CPUs
![Page 85: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/85.jpg)
85
Cache Line Conflicts
Three classes of conflict: Epoch 2 stores, epoch 1 loads
Need “old” version to load Epoch 1 stores, epoch 2 stores
Need to keep changes separate Epoch 1 loads, epoch 2 stores
Need to be able to discard line on violation
Need a way of storing multiple conflicting versions in the cache
![Page 86: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/86.jpg)
86
Cache line replication
On conflict, replicate line Split line into two copies Divide SM and SL bits at split point Divide directory bits at split point
![Page 87: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/87.jpg)
87
Replication Problems
Complicates line lookup Need to find all replicas and select “best” Best == most recent replica
Change management On write, update all later copies Also need to find all more speculative
replicas to check for violations On commit must get rid of stale lines
Invalidation Required Buffer (IRB)
![Page 88: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/88.jpg)
88
Victim Cache
How do you deal with a full cache set? Use a victim cache
Holds evicted lines without losing SM & SL bits
Must be fast every cache lookup needs to know: Do I have the “best” replica of this line?
Critical path Do I cause a violation?
Not on critical path
![Page 89: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/89.jpg)
89
Summary of Hardware Support
Sub-epochs Violations hurt less!
Shared cache TLS support Faster communication More room to store state
RAOs Don’t speculate on known operations Reduces amount of speculative state
![Page 90: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/90.jpg)
90
Summary of Hardware Changes
Sub-epochs Checkpoint register state Needs replicas in cache
Shared cache TLS support Speculative L1 Replication in L1 and L2 Speculative victim cache Invalidation Required Buffer
RAOs Suspend/resume speculation Mutexes “Undo list”
![Page 91: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/91.jpg)
91
TLS Execution
*p=
*q=
=*p
R2
Violation!
=*p
=*q
CPU
L1 $
L2 $
Rest of memory systemRest of memory system
CPU
L1 $
CPU
L1 $
CPU
L1 $L1 $ L1 $ L1 $ L1 $
*p
Invalidation
*p*p
*q
*p
*q*q
![Page 92: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/92.jpg)
92
![Page 93: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/93.jpg)
93
![Page 94: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/94.jpg)
94
![Page 95: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/95.jpg)
95
Problems with Old Cache Design
Database epochs are large L1 cache not large enough
Sub-epochs add more state L1 cache not associative enough
Database epochs communicate L1 cache only communicates committed
data
![Page 96: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/96.jpg)
96
Intro Summary
TLS makes intra-transaction parallelism easy
Divide transaction into epochs Hardware support:
Detect violations Restart to recover
Sub-epochs mitigate penalty Buffer state
New process: Modify software
avoid violations improve performance
![Page 97: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/97.jpg)
97
Money! pwhile()
{}
Money!pwhile()
{}
Money!pwhile()
{}
Money! pwhile()
{}
Money!pwhile()
{}
Money!pwhile()
{}
Money!pwhile()
{}
Money!pwhile()
{}
Money! pwhile()
{}
Money!pwhile()
{}
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
The Many Faces of Ogg
Must tune reorder buffer…
![Page 98: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/98.jpg)
98
Must tune reorder buffer…
The Many Faces of Ogg
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Money!pwhile()
{}
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
![Page 99: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/99.jpg)
99
Removing Bottlenecks
Three general techniques: Partition data structures
malloc Postpone operations until non-
speculative Latches and locks, log entries
Handle speculation manually Buffer pool
![Page 100: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/100.jpg)
100
Bottlenecks Encoutered
Buffer pool Latches & Locks Malloc/free Cursor queues Error checks False sharing B-tree performance optimization Log entries
![Page 101: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/101.jpg)
101
Must tune reorder buffer…
The Many Faces of Ogg
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Money! pwhile()
{}
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
![Page 102: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/102.jpg)
102
Performance on 4 CPUs
Unmodified benchmark: Modified benchmark:
![Page 103: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/103.jpg)
103
Incremental Parallelization
Tim
e4 CPUs
![Page 104: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/104.jpg)
104
ScalingUnmodified benchmark: Modified benchmark:
![Page 105: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/105.jpg)
105
Parallelization is Hard
Programmer Effort
Perf
orm
ance
Im
pro
vem
ent
Hand Parallelization
What we want
Parallelizing Compiler
Tuning
Tuning
TuningTuning
![Page 106: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/106.jpg)
106
Case Study: New Order (TPC-C)
Begin transaction {
} End transaction
![Page 107: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/107.jpg)
107
Case Study: New Order (TPC-C)
Begin transaction { Read customer info Read & increment order # Create new order
} End transaction
![Page 108: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/108.jpg)
108
Case Study: New Order (TPC-C)
Begin transaction { Read customer info Read & increment order # Create new order For each item in order { Get item info Decrement count in stock Record order info }} End transaction
![Page 109: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/109.jpg)
109
Case Study: New Order (TPC-C)
Begin transaction { Read customer info Read & increment order # Create new order For each item in order { Get item info Decrement count in stock Record order info }} End transaction
80% of transactionexecution time
![Page 110: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/110.jpg)
110
Case Study: New Order (TPC-C)
Begin transaction { Read customer info Read & increment order # Create new order For each item in order { Get item info Decrement count in stock Record order info }} End transaction
80% of transactionexecution time
![Page 111: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/111.jpg)
111
The Many Faces of Ogg
Duh…pwhile()
{}
while(too_slow)
make_faster();
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Duh…pwhile()
{}
Money!pwhile()
{}
while(too_slow)
make_faster();
while(too_slow)
make_faster();
while(too_slow)
make_faster();
Must tune reorder buffer…
![Page 112: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/112.jpg)
Step 2: Changing the Software
![Page 113: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/113.jpg)
113
No problem!
Loop is easy to parallelize using TLS! Not really Calls into DBMS invoke complex
operations Ogg needs to do some work
Many operations in DBMS are parallel Not written with TLS in mind!
![Page 114: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/114.jpg)
114
Resource Management
Mutexes acquired and released
Locks locked and unlocked
Cursors pushed and popped from free stack
Memory allocated and freed
Buffer pool entries Acquired and released
![Page 115: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/115.jpg)
115
Mutexes: Deadlock?
Problem: Re-ordered acquire/release operations!
Possibly introduced deadlock?
Solutions: Avoidance:
Static acquire order Recovery:
Detect deadlock and violate
![Page 116: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/116.jpg)
116
Locks
Like mutexes, but: Allows multiple readers No memory overhead when not held Often held for much longer
Treat similarly to mutexes
![Page 117: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/117.jpg)
117
Cursors
Used for traversing B-trees Pre-allocated, kept in pools
![Page 118: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/118.jpg)
118
Maintaining Cursor Pool
Get
Use
Release
head
![Page 119: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/119.jpg)
119
Maintaining Cursor Pool
Get
Use
Release
head
![Page 120: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/120.jpg)
120
Maintaining Cursor Pool
Get
Use
Release
head
![Page 121: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/121.jpg)
121
Maintaining Cursor Pool
Get
Use
Release
head
Get
Use
Release
Violation!
![Page 122: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/122.jpg)
122
Parallelizing Cursor Pool
Use per-CPU pools: Modify code: each CPU gets its own
pool No sharing == no violations! Requires cpuid() instruction
Get
Use
Release
head Get
Use
Release
head
![Page 123: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/123.jpg)
123
Memory Allocation
Problem: malloc() metadata causes
dependences
Solutions: Per-cpu memory pools Parallelized free list
![Page 124: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/124.jpg)
124
The Log
Append records to global log Appending causes dependence Can’t parallelize:
Global log sequence number (LSN) Generate log records in buffers Assign LSNs when homefree
![Page 125: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/125.jpg)
125
B-Trees
Leaf pages contain free space counts Inserts of random records – o.k. Inserting adjacent records
Dependence on decrementing count Page splits
Infrequent
![Page 126: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/126.jpg)
126
Other Dependences
Statistics gathering Error checks False sharing
![Page 127: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/127.jpg)
127
Related Work
Lots of work in TLS: Multiscalar (Wisconsin) Hydra (Stanford) IACOMA (Illinois) RAW (MIT)
Hand parallelizing using TLS: Manohar Prabhu and Kunle Olukotun
(PPoPP’03)
![Page 128: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/128.jpg)
Any questions?
![Page 129: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/129.jpg)
129
Why is this a problem?
B-tree insertion into ORDERLINE table Key is ol_n
DBMS does not know that keys will be sequential
Each insert usually updates the same btree page
for(ol_n=0; ol_n<15; ol_n++) { INSERT into ORDERLINE (ol_n, ol_item, ol_cnt) VALUES (:ol_n, :ol_item, :ol_cnt);}
![Page 130: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/130.jpg)
130
Sequential Btree Inserts
4
free free
1
item free
free free
free free
4
item item
item item
free free
3
item item
item free
free free
2
item item
free free
free free
![Page 131: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/131.jpg)
132
Outline
Store state in L2 instead of L1 Reversible atomic operations Tolerate dependences
Aggressive update propagation (implicit forwarding)
Sub-epochs Results and analysis
![Page 132: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/132.jpg)
133
Outline
Store state in L2 instead of L1 Reversible atomic operations Tolerate dependences
Aggressive update propagation (implicit forwarding)
Sub-epochs Results and analysis
![Page 133: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/133.jpg)
134
Tolerating dependences
Aggressive update propagation Get for free!
Sub-epochs Periodically checkpoint epochs Every N instructions?
Picking N may be interesting Perhaps checkpoints could be set before the
location of previous violations?
![Page 134: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/134.jpg)
135
Outline
Store state in L2 instead of L1 Reversible atomic operations Tolerate dependences
Aggressive update propagation (implicit forwarding)
Sub-epochs Results and analysis
![Page 135: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/135.jpg)
136
Why not faster?
Possible reasons: Idle cpus RAO mutexes Violations Cache effects Data dependences
![Page 136: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/136.jpg)
137
Why not faster?
Possible reasons: Idle cpus
9 epochs/region average Two bundles of four and one of one ¼ of cpu cycles wasted!
RAO mutexes Violations Cache effects Data dependences
![Page 137: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/137.jpg)
138
Why not faster?
Possible reasons: Idle cpus RAO mutexes
Not implemented yet Ooops!
Violations Cache effects Data dependences
![Page 138: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/138.jpg)
139
Why not faster?
Possible reasons: Idle cpus RAO mutexes Violations
21/969 epochs violated Distance 1 “magic synchronized” 2.2Mcycles (over 4 cpus)
About 1.5%
Cache effects Data dependences
![Page 139: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/139.jpg)
140
Why not faster?
Possible reasons: Idle cpus RAO mutexes Violations Cache effects
Deserves its own slide. Data dependences
![Page 140: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/140.jpg)
141
Cache effects of speculation
Only 20% of references are speculative! Speculative references have small impact
on non-speculative hit rate (<1%) Speculative refs miss a lot in L1
9-15% for reads, 2-6% for writes L2 saw HUGE increase in traffic
152k refs to 3474k refs Spec/non spec lines are thrashing from L1s
![Page 141: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/141.jpg)
142
Why not faster?
Possible reasons: Idle cpus RAO mutexes Violations Cache effects Data dependences
Oh yeah! Btree item count
Split up btree insert? alloc and write Do alloc as RAO Needs more thought
![Page 142: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/142.jpg)
143
L2 Cache Line
SL
SM
Fin
e G
rain
ed
SM
Valid
Dir
ty
Exclu
siv
e
LR
U
Tag
Data
SL
SM
CPU1 CPU2
Set
1S
et
2
![Page 143: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/143.jpg)
144
Why are you here?
Want faster database systems Have funky new hardware –
Thread Level Speculation (TLS)
How can we apply TLS todatabase systems?
Side question: Is this a VLDB or an ASPLOS talk?
![Page 144: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/144.jpg)
145
How?
Divide transaction into TLS-threads Run TLS-threads in parallel;
maintain sequential semantics Profit!
![Page 145: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/145.jpg)
146
Why parallelize transactions?
Decrease transaction latency Increase concurrency while avoiding
concurrency control bottleneck A.k.a.: use more CPUs, same # of
xactions
The obvious: Database performance matters
![Page 146: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/146.jpg)
147
Shopping List
What do we need? (research scope)
Cheap hardware Thread Level Speculation (TLS)
Minor changes allowed.
Important database application TPC-C
Almost no changes allowed!
Modular database system BerkeleyDB
Some changes allowed.
![Page 147: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/147.jpg)
148
Outline
TLS Hardware The Benchmark (TPC-C) Changing the database system Results Conclusions
![Page 148: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/148.jpg)
149
Outline
TLS Hardware The Benchmark (TPC-C) Changing the database system Results Conclusions
![Page 149: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/149.jpg)
150
What’s new?
Database operations are: Large Complex
Large TLS-threads Lots of dependences Difficult to analyze
Want: Programmer optimization effort =
faster program
![Page 150: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/150.jpg)
151
Hardware changes summary
Must tolerate dependences Prediction? Implicit forwarding?
May need larger caches May need larger associativity
![Page 151: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/151.jpg)
152
Outline
TLS Hardware The Benchmark (TPC-C) Changing the database system Results Conclusions
![Page 152: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/152.jpg)
153
Parallelization Strategy
1. Pick a benchmark2. Parallelize a loop3. Analyze dependences4. Optimize away dependences5. Evaluate performance6. If not satisfied, goto 3
![Page 153: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/153.jpg)
154
Outline
TLS Hardware The Benchmark (TPC-C) Changing the database system
Resource management The log B-trees False sharing
Results Conclusions
![Page 154: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/154.jpg)
155
Outline
TLS Hardware The Benchmark (TPC-C) Changing the database system Results Conclusions
![Page 155: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/155.jpg)
156
Results
Viola simulator: Single CPI Perfect violation prediction No memory system 4 cpus Exhaustive dependence tracking
Currently working on an out-of-order superscalar simulation (cello)
10 transaction warm-up Measure 100 transactions
![Page 156: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/156.jpg)
157
Outline
TLS Hardware The Benchmark (TPC-C) Changing the database system Results Conclusions
![Page 157: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/157.jpg)
158
Conclusions
TLS can improve transaction latency Violation predictors important
Iff dependences must be tolerated TLS makes hand parallelizing easier
![Page 158: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/158.jpg)
159
Improving Database Performance
How to improve performance: Parallelize transaction Increase number of concurrent
transactions
Both of these require independence of database operations!
![Page 159: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/159.jpg)
160
Case Study: New Order (TPC-C)
Begin transaction {
} End transaction
![Page 160: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/160.jpg)
161
Case Study: New Order (TPC-C)
Begin transaction { Read customer info (customer, warehouse) Read & increment order # (district) Create new order (orders, neworder)
} End transaction
![Page 161: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/161.jpg)
162
Case Study: New Order (TPC-C)
Begin transaction { Read customer info (customer, warehouse) Read & increment order # (district) Create new order (orders, neworder) For each item in order { Get item info (item) Decrement count in stock (stock) Record order info (orderline) }} End transaction
![Page 162: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/162.jpg)
163
Case Study: New Order (TPC-C)
Begin transaction { Read customer info (customer, warehouse) Read & increment order # (district) Create new order (orders, neworder) For each item in order { Get item info (item) Decrement count in stock (stock) Record order info (orderline) }} End transaction
Parallelizethis loop
![Page 163: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/163.jpg)
164
Case Study: New Order (TPC-C)
Begin transaction { Read customer info (customer, warehouse) Read & increment order # (district) Create new order (orders, neworder) For each item in order { Get item info (item) Decrement count in stock (stock) Record order info (orderline) }} End transaction
Parallelizethis loop
![Page 164: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/164.jpg)
165
Implementing on a Real DB
Using BerkeleyDB “Table” == “Database” Give database any arbitrary key will return arbitrary data (bytes) Use structs for keys and rows Database provides ACID through:
Transactions Locking (page level) Storage management
Provides indexing using b-trees
![Page 165: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/165.jpg)
166
Parallelizing a Transaction
For each item in order { Get item info (item) Decrement count in stock (stock) Record order info (order line)}
![Page 166: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/166.jpg)
167
Parallelizing a Transaction
For each item in order { Get item info (item) Decrement count in stock (stock) Record order info (order line)}
•Get cursor from pool•Use cursor to traverse b-tree•Find row, lock page for row•Release cursor to pool
•Get cursor from pool•Use cursor to traverse b-tree•Find row, lock page for row•Release cursor to pool
![Page 167: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/167.jpg)
168
Maintaining Cursor Pool
Get
Use
Release
head
![Page 168: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/168.jpg)
169
Maintaining Cursor Pool
Get
Use
Release
head
![Page 169: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/169.jpg)
170
Maintaining Cursor Pool
Get
Use
Release
head
![Page 170: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/170.jpg)
171
Maintaining Cursor Pool
Get
Use
Release
head
Get
Use
Release
Violation!
![Page 171: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/171.jpg)
172
Parallelizing Cursor Pool 1
Use per-CPU pools: Modify code: each CPU gets its own
pool No sharing == no violations! Requires cpuid() instruction
Get
Use
Release
head Get
Use
Release
head
![Page 172: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/172.jpg)
173
Parallelizing Cursor Pool 2
Dequeue and enqueue: atomic and unordered
Delay enqueue until end of thread Forces separate pools Avoids modification
of data structGet
Use
Release
head Get
Use
Release
![Page 173: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/173.jpg)
174
Parallelizing Cursor Pool 3
Atomic unordered dequeue & enqueue
Cursor struct is “TLS unordered”
Struct defined as a byte range in memory
Get
Use
Release
headGet
Use
Release
Get
Use
Release
Get
Use
Release
![Page 174: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/174.jpg)
175
Parallelizing Cursor Pool 4
Mutex protect dequeue & enqueue;declare pointer to cursor struct to be “TLS unordered”
Any access through pointer does not have TLS applied
Pointer is tainted, any copies of it keep this property
![Page 175: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/175.jpg)
176
Problems with 3 & 4
What exactly is the boundary of a structure?
How do you express the concept of object in a loosely-typed language like C?
A byte range or a pointer is only an approximation.
Dynamically allocated sub-components?
![Page 176: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/176.jpg)
177
Mutexes in a TLS world
Two types of threads: “real” threads TLS threads
Two types of mutexes: Inter-real-thread Inter-TLS-thread
![Page 177: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/177.jpg)
178
Inter-real-thread Mutexes
Acquire == get mutex for all TLS threads
Release == release for current TLS thread May still be held by another TLS thread!
![Page 178: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/178.jpg)
179
Inter-TLS-thread Mutexes
Should never interact between two real threads
Implies no TLS ordering between TLS threads while mutex is held
But what do to on a violation? Can’t just throw away changes to memory Must undo operations performed in
critical section
![Page 179: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/179.jpg)
180
Parallelizing Databases using TLS
Split transactions into threads Threads created are large
60k+ instructions 16kB of speculative state More dependences between threads
How do we design a machine which can handle these large threads?
![Page 180: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/180.jpg)
181
The “Old” Way
P
L1 $
L2 $
P
L1 $
P
L1 $
P
L1 $
Committed state
L3 $
P
L1 $
L2 $
P
L1 $
P
L1 $
P
L1 $
Memory System
…
Speculative state
![Page 181: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/181.jpg)
182
The “Old” Way
Advantages Each epoch has its own L1 cache Epoch state does not intermix
Disadvantages L1 cache is too small! Full cache == dead meat No shared speculative memory
![Page 182: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/182.jpg)
183
The “New” Way
L2 cache is huge! State of the art in caches, Power5:
1.92MB 10-way L2 32kB 4-way L1
Shared speculative memory “for free” Keeps TLS logic off of the critical path
![Page 183: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/183.jpg)
184
TLS Shared L2 Design
L1 Write-through write-no-allocate [CullerSinghGupta99]
Easy to understand and reason about Writes visible to L2 – simplifies shared
speculative memory L2 cache: shared cache architecture with
replication Rest of memory: distributed TLS coherence
![Page 184: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/184.jpg)
185
TLS Shared L2 Design
P
L1 $
L2 $
P
L1 $
P
L1 $
P
L1 $
“Real” speculative state
L3 $
P
L1 $
L2 $
P
L1 $
P
L1 $
P
L1 $
Memory System
…
Cached speculative state
Explain from the top down
![Page 185: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/185.jpg)
186
TLS Shared L2 Design
P
L1 $
L2 $
P
L1 $
P
L1 $
P
L1 $
“Real” speculative state
L3 $
P
L1 $
L2 $
P
L1 $
P
L1 $
P
L1 $
Memory System
…
Cached speculative state
![Page 186: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/186.jpg)
Part II: Dealing with dependences
![Page 187: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/187.jpg)
188
Predictor Design
How do you design a predictor that: Identifies violating loads Identifies the last store that causes them Only triggers when they cause a problem Has very very high accuracy
???
![Page 188: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/188.jpg)
189
Sub-epoch design
Like checkpointing Leave “holes” in epoch # space Every 5k instructions start a new
epoch Uses more cache to buffer changes
More strain on associativity/victim cache Uses more epoch contexts
![Page 189: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/189.jpg)
190
Summary
Supporting large epochs needs: Buffer state in L2 instead of L1 Shared speculative memory Replication Victim cache Sub-epochs
![Page 190: Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry](https://reader036.vdocument.in/reader036/viewer/2022062421/56649e1f5503460f94b0aa84/html5/thumbnails/190.jpg)
Any questions?