transaction chains: achieving serializability with low-latency in geo-distributed storage systems
DESCRIPTION
Transaction chains: achieving serializability with low-latency in geo-distributed storage systems. Yang Zhang Russell Power Siyuan Zhou Yair Sovran *Marcos K. Aguilera Jinyang Li. New York University *Microsoft Research Silicon Valley. Why geo-distributed storage?. - PowerPoint PPT PresentationTRANSCRIPT
Transaction chains: achieving serializability with low-latency in geo-distributed storage systems
Yang Zhang Russell Power Siyuan ZhouYair Sovran *Marcos K. Aguilera Jinyang Li
New York University *Microsoft Research Silicon Valley
Large-scale Web applications
Why geo-distributed storage?
Geo-distributed storage
Replication
Geo-distribution is hard
Low latency:O(Intra-datacenter RTT)
Strong semantics:relational tables w/
transactions
?Low latency
Key/value only
Limited forms of transaction
General transaction
Prior workStrictserializable
Serializable
Eventual
Variousnon-serializable
High latency
Provably high latency according to CAP
Spanner [OSDI’12]
Dynamo [SOSP’07]
COPS [SOSP’11]
Walter [SOSP’11]
Eiger [NSDI’13]
Our work
Our contributions1. A new primitive: transaction chain– Allow for low latency, serializable transactions
2. Lynx geo-storage system: built with chains– Relational tables– Secondary indices, materialized join views
Talk Outline• Motivation• Transaction chains• Lynx• Evaluation
Why transaction chains?
Bidder Item Price Seller Item Highest bidBids Items
Alice Book $100
Bob Book $20
Alice iPhone $20
Bob
Datacenter-1 Datacenter-2
Alice
Bob Camera $100
Auction service
Why transaction chains?
Alice’s BidsAlice Book $100
Bob
Datacenter-1 Datacenter-2
AliceBob Camera $100
Bob’s Items
1. Insert bid to Alice’s Bids
2. Update highest bid on Bob’s Items
Operation: Alice bids on Bob’s camera
1. Insert bid to Alice’s Bids
Why transaction chains?
Alice’s BidsAlice Book $100
Bob
Datacenter-1 Datacenter-2
AliceBob Camera $100
Bob’s Items
2. Update highest bid on Bob’s Items
Operation: Alice bids on Bob’s camera
1. Insert bid to Alice’s Bids
Low latency with first-hop return
Alice’s BidsAlice Book $100
Bob
Datacenter-1 Datacenter-2
Alice
Bob Camera $100Bob’s Items
bid on Bob’s camera
Alice Camera $500
$500
Problem: what if chains fail?
1. What if servers fail after executing first-hop?
2. What if a chain is aborted in the middle?
Solution: provide all-or-nothing atomicity
1. Chains are durably logged at first-hop– Logs are replicated to another closest data center– Chains are re-executed upon recovery
2. Chains allow user-aborts only at first hop
• Guarantee: First hop commits all hops eventually commit
Problem: non-serializable interleaving• Concurrent chains ordered inconsistently at different hops
X=1 Y=1
X=2 Y=2
Time
T1
T2
Server-X: T1 < T2 Server-Y: T2 < T1
Not serializable!
T2 T1
• Traditional 2PL+2PC prevents non-serializable interleaving at the cost of high latency
Conflict?
Solution: detect non-serializable interleaving via static analysis
• Statically analyze all chains to be executed– Web applications invoke fixed set of operations
X=1 Y=1
X=2 Y=2
Serializable if no SC-cycle [Shasha et. al TODS’95]
A SC-cycle has both red and blue edges
T1
T2
Outline• Motivation• Transaction chains• Lynx’s design• Evaluation
How Lynx uses chains
• User chains: used by programmers to implement application logic
• System chains: used internally to maintain– Secondary indexes– Materialized join views– Geo-replicas
Example: secondary index
Bob Car $20Alice Book $20
Bob Camera $100Alice iPhone $100
Bidder Item PriceBids (base table)
Alice Camera $100
Bob iPhone $20
Bidder Item PriceBids (secondary index)
Alice Camera $100
Bob Car $20
Example user and system chain
Alice Book $100
Bob
Datacenter-1 Datacenter-2
Alice
Bob Camera $100
bid on Bob’s camera
Alice Camera $100
Insert to Bids table
Update Items table
Lynx statically analyzes all chains beforehand
Put-bid
Read-bids
Put-bidInsert to Bids table
Update Items table
Read-bids
SC-cycleOne solution: execute chain as a distributed transaction
Read Bids table
Read Bids table
Insert to Bids table
Update Items table
SC-cycle source #1: false conflicts in user chains
Put-bid
Insert to Bids table
Update Items tablePut-bid
False conflict because max(bid, current_price)
commutes
Insert to Bids table
Update Items table
Solution: users annotate commutativity
Put-bid
Insert to Bids table
Update Items tablePut-bid
com
mut
es
SC-cycle source #2: system chains
Insert to Bids table
…Put-bid
Insert to Bids table
…Put-bid
Insert to Bids-secondary
Insert to Bids-secondary
SC-cycle
Solution: chains provide origin-ordering• Observation: conflicting system chains originate at the
same first hop server.
Both write the same row of Bids table
• Origin-ordering: if chains T1 < T2 at same first hop, then T1 < T2 at all subsequent overlapping hops.– Can be implemented cheaply sequence number vectors
T1
Insert to Bids table
Insert to Bids-secondary
T2
Insert to Bids table
Insert to Bids-secondary
Limitations of Lynx/chains1. Chains are not strictly serializable, only serializable.2. Programmers can abort only at first hop
• Our application experience: limitations are managable
Outline• Motivation• Transaction chains• Lynx’s design• Evaluation
Simple Twitter Clone on Lynx
Author Tweet
Tweets
Alice New York rocks
From To
Follow-Graph
Alice Bob
Alice Eve
Bob Time to sleep
To From
Follow-Graph (secondary)
Bob Alice
Bob Clark
Geo-replicated
Geo-replicated
Author(=to)
From Tweet
Bob Alice Time to sleep
Eve Alice Hi there
Tweets JOIN Follow-Graph (Timeline)
Eve Hi there
Experimental setup
us-west
europe
us-east
82ms
153ms
102ms
Lynx protoype:• In-memory database• Local disk logging only.
Returning on first-hop allows low latency
Follow-user Post-tweet Follow-user Post-tweet Read-timeline0
50
100
150
200
250
300
174
252
3.2 3.1 3.1
Late
ncy
(ms)
First hop return
Chain completion
Applications achieve good throughput
Follow-User Post-Tweet Read-Timeline0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
184000 173000
1350000
Mill
ion
ops/
sec
Related work• Transaction decomposition– SAGAS [SIGMOD’96], step-decomposed transactions
• Incremental view maintenance– Views for PNUTS [SIGMOD’09]
• Various geo-distributed/replicated storage– Spanner[OSDI’12], MDCC[Eurosys’13],
Megastore[CIDR’11], COPS [SOSP’11], Eiger[NSDI’13], RedBlue[OSDI’12].
Conclusion• Chains support serializability at low latency– With static analysis of SC-cycles
• Key techniques to reduce SC-cycles– Origin ordering– Commutative annotation
• Chains are useful – Performing application logic – Maintaining indices/join views/geo-replicas
Limitations of Lynx/chains1. Chains are not strict serializable
Time
Remedies: – Programmers can wait for chain completion– Lynx provides read-your-own-writes
2. Programmers can only abort at first hop• Our application experience shows the limitations are managable
Serializable Strict serializable
2PC and chainsThe easy way
W(A)
R(A)
W(B)
W(A) W(B)
R(A)
2PC-W(AB)
R(A)
R(A)
T1
T2
T2
T1
T2
T1
T1
2PC and chainsThe hard way
W(A)
R(A) R(B)
W(B)
W(A) W(B)
R(A) R(B)
2PC-W(AB)
R(A) R(B)
R(A) R(B)
T1
T2
T2
T1
T2
T1
T1
2PC and chainsThe hard way
Chain
DC1 DC2 DC3 DC4
A B C D
2PC retry
Parallelunlock
Lynx is scalable
1 2 4 80
500
1000
1500
2000
2500
3000
48 93 184374
42 86 173356265
586
1350
2770
FollowTweetTimeline
#Servers per DC
QPS
(K/
s)
1. Insert bid into bid history 2. Update max price on item
1. Insert bid into bid history 2. Update max price on item
T1
T2
Conflict onbid history
Conflict onitem
SC-cycle Not serializable
Challenge of static analysis: false conflict
Solution: communitivity annotations
1. Insert bid into bid history 2. Update max price on item
1. Insert bid into bid history 2. Update max price on item
T1
T2
Conflict onbid history
Commutativeoperation
No SC-cycle Serializable
Conflict onitem
No real conflict because bid ids
are unique
Updating max commutes
Commutativeoperation
ACID: all-or-nothing atomicity• Chain’s failure guarantee:– If the first hop of a chain commits, then all hops
eventually commit• Users are only allowed to abort a chain in the first hop
• Achievable with low latency:– Log chains durably at the first hop• Logs replicated to a nearby datacenter
– Re-execute stalled chains upon failure recovery
ACID: serializability• Serializability– Execution result appears as if obey a serial order
for all transactions– No restrictions on the serial order
Ordering 1 Ordering 2
Transactions
Problem #2: unsafe interleaving• Serializability– Execution result appears as if obey a serial order
for all transactions– No restrictions on the serial order
Ordering 1 Ordering 2
Transactions
Chains are not linearizable• Serializability• Linearability
Ordering 1 Ordering 2
Transactions
Time
Linearizable
a total ordering of chains a total ordering of chains
& total order obeys the issue order
Transaction chains: recap• Chains provide all-or-nothing atomicity• Chains ensure serializability via static analysis• Practical challenges:– How to use chains?– How to avoid SC-cycles?
Example user chain
Bidder Item PriceBids
Alice Camera 100
1. Insert bid into Alice’s bid history
Alice Bob
Seller Item HighestItems
Bob CameraBob Camera 100
2. Update max price on Bob’s camera
Lynx implementation
• 5000 lines C++ and 3500 lines RPC library• Uses an in-memory key/value store• Support user chains in Javascript (via V8)
Geo-distributed storage is hard• Applications demand simplicity & performance– Friendly programming model
• Relational tables• Transactions
– Fast response• Ideally, operation latency = O(intra-datacenter RTT)
• Geo-distribution leads to high latency– Coordinate data access across datacenters
• Operation latency = O(inter-datacenter RTT) = O(100ms)