dude where is my data
TRANSCRIPT
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 1/28
USENIX LISA San Diego, CA December 13, 2012
Dude, Where' !" Da#a$
Jeff Darcygluster.org
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 2/28
USENIX LISA San Diego, CA December 13, 2012
The Problem
● Compute cycles are everywhere
● Your data isn't
● It's easy to move computation then find thatthere's no data for it to wor! on
%%&S
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 3/28
USENIX LISA San Diego, CA December 13, 2012
(arie#"
"ig Data# $%
(e)oci#"(o)ume
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 4/28
USENIX LISA San Diego, CA December 13, 2012
$olume $elocity $ariety
● $olume & total data T"(
● affects initial setup or full)resync( time
● bandwidth problem
● $elocity & rate of change T"*hour files*hour(
● affects ongoing bandwidth need
● bandwidth and latency problem
● $ariety & data +shape,● file) and directory)si-e distribution sparseness
etended attributes even contents
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 5/28
USENIX LISA San Diego, CA December 13, 2012
Di*ergence
"igger Data# D%
DomainDi#ance
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 6/28
USENIX LISA San Diego, CA December 13, 2012
Distance Domains Divergence
● Distance# how far/
● across the river vs. across the world
● Domains# how many/
● two sites vs. four sites vs. hundreds of sites
● also separate security perimeters and policies
● Divergence# how similar/
● sync vs. async ordered vs. unordered conflictresolution ...
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 7/28
USENIX LISA San Diego, CA December 13, 2012
0ample# rsync
● $ariety affects scanning rate
● delta comparison favors large files
● 1ensitive to distance
● still need networ! round trips to comparechec!sums
● 2ard to manage with many domains
● set up each connection separately includingparallel connections
● 2igh divergence ) scanning order conflicts
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 8/28
USENIX LISA San Diego, CA December 13, 2012
Initial 1ync
● Transfer large files instead of small ones
● 34 5"*s 6& 3 T"*day
● 74ms 8TT 6& 35 ops*day fewer files*day(
● copy tarballs or dis! images pac!*unpac! locally
● Transfer in parallel
● 9rid:TP P:TP "itTorrent 5urder ; Twitter(
● <ggressively pre)deploy replicas– ...or let a CD= do it for you
● Don't forget compression*deduplication
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 9/28
USENIX LISA San Diego, CA December 13, 2012
>? =ow @hat/
Ini#ia)S"nc
In#ance+Da#a Lie#ime
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 10/28
USENIX LISA San Diego, CA December 13, 2012
8eplication 1emantics
)
sync async
ordered unordered
con#inuou)ogging
-eriodiccanning
)a#enc"eni#i*e
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 11/28
USENIX LISA San Diego, CA December 13, 2012
8eplication Topologies
"oston
1an Diego
"oston
Iowa City
<nn <rbor
1an Diego
"oston
Iowa City
<nn <rbor
1an Diego
"oston
Iowa City
<nn <rbor
1an Diego
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 12/28
USENIX LISA San Diego, CA December 13, 2012
>ther Distinctions
● Directionality
● static master floating master peer to peer
● 5igration and caching are replication too
● epressed vs. assumed interest
● partial
● ependable not dependable(
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 13/28
USENIX LISA San Diego, CA December 13, 2012
8eplication Aite
● Consider using an overlay*union :1
● Bnionfs <B:1 overlay mounts
● 0ach client has their own overlay on top of
same read)only base● 1hip overlay bac! home to apply and resolve
conflicts/( at leisure
● :ree version history
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 14/28
USENIX LISA San Diego, CA December 13, 2012
1ync vs. <sync
● 1ynchronous replication
● divergence very small
– still possible with errors
● performance limited by latency● <synchronous replication
● divergence can be uite large
–
conflict handling becomes most of the code● performance limited by bandwidth
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 15/28
USENIX LISA San Diego, CA December 13, 2012
1canning vs. Aogging
● 1canning negatives
● naive versions are slow and resource intensive
● even smart versions have high divergence
– +many siblings, problem– often missing info for proper conflict resolution
● Aogging negatives
●
reuires local buffer space– one more thing to provision*manage or have fail(
● networ! interruptions still create divergence
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 16/28
USENIX LISA San Diego, CA December 13, 2012
Improving on rsync 3 of %(
● @rap a script around it
● connection setup and credential management
● parallel streams
● continuous iteration
● >ptimi-e scanning
● mar! changes up toward root
● don't scan unchanged subtrees● net slide
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 17/28
USENIX LISA San Diego, CA December 13, 2012
Improving on rsync of %(
*bar
*bar*-*bar*y
*foo
*foo*w *foo*
*
mar!ed not mar!ed
S#o-
Scanning
.houandSib)ing
&rob)em
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 18/28
USENIX LISA San Diego, CA December 13, 2012
Improving rsync % of %(
● 1o it's better than before
● higher scanning rate
● more automated
● ...but...● scanning is still inherently inefficient
– still have to find changes within files and*or transfer morethan necessary
● divergence is still high
– changes appear in scanning order might conflict
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 19/28
USENIX LISA San Diego, CA December 13, 2012
"y The @ay...
● That's pretty much 9luster:1 geo)sync but I'mnot here to tal! about that.
● Current proEect# ordered async replication
● +pony, replication as in +all that and...,● full duple mesh partition tolerant
● vector)cloc! conflict resolution
●
maybe I'll be able to tal! more about it net year● meanwhile see 8esources last lin!(
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 20/28
USENIX LISA San Diego, CA December 13, 2012
@hat's @rong/
● Isn't this all rather . . . manual/
● you manage scheduling
● you manage parallelism
● you manage credentials● you manage conflicts
● Yes it isF
● Aet's loo! at more transparent solutions.
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 21/28
USENIX LISA San Diego, CA December 13, 2012
<:1
● The grand)daddy of wide)area distributedfilesystems
● Deployed successfully at hundreds of sites
tens of thousands of users● >nly one writable replica others read)only
● 1tatic file)Gserver assignment
● =otoriously hard to administer and debug● seven types of servers +uniue,
communication*loc!ing protocols
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 22/28
USENIX LISA San Diego, CA December 13, 2012
:ile1erver
:ile1erver
<:1 Diagram
*
*users *data
*users*staff *users*students
:ile1erver
:ile1erver
:ile1erver
$olume1erver
Client
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 23/28
USENIX LISA San Diego, CA December 13, 2012
Coda
● <:1 descendant
● <dds disconnected)client operation
● <dds multi)way write replication between
servers
● Conflict resolution is automatic but type)specific
● 1hares other drawbac!s with <:1
● =ot widely deployed
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 24/28
USENIX LISA San Diego, CA December 13, 2012
Htreem:1
● 0uropean Htreem>1*Contrail proEects
● 1ervers# one DI8 one 58C multiple >1D
● dynamic placement on >1D better than <:1(
● DI8*58C replication*failover still immature/
● 2istorically# read)only replication pull model(
● 5ore recently# read*write replication
● floating master leases
● 1napshots
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 25/28
USENIX LISA San Diego, CA December 13, 2012
Client
Htreem:1 Diagram
>1D >1D >1D
DI8Client
58C
Da#a &a#h
Con#ro)&a#h
Client
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 26/28
USENIX LISA San Diego, CA December 13, 2012
>ther 1olutions
● dCache i8>D1# archival orientation onlineinformation almost unreadable
● 1ector# paired with 1phere 2adoop
alternative( claims @<= distribution● D8"D# two)way async bloc! replication
● :1)Cache# client caching add)on to =:1 <:1
● PeerDist*"ranchCache# content)addressablecaching from 15"*CI:1
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 27/28
USENIX LISA San Diego, CA December 13, 2012
Conclusions
● Initial sync is easy staying in sync is hard
● Conflict resolution is a maEor issue
● potential for failure plus performance concern
● segregate data by consistency reuirements
– including read only
● try to choose +Eust enough, consistency
●
1ome assembly reuired
7/24/2019 Dude Where is My Data
http://slidepdf.com/reader/full/dude-where-is-my-data 28/28
USENIX LISA San Diego, CA December 13, 2012
8esources
● 1aito and 1hapiro essentialF( http#**www.ysaito.com*survey.pdf
● <cademic "ac!ground
● "ayou http#**www)users.cs.umn.edu*6he*iss*iss47344.ppt
● :icus http#**www.lasr.cs.ucla.edu*ficus*ficussummary.html
● >cean1tore http#**oceanstore.cs.ber!eley.edu*
● Production Code
● http#**rsync.samba.org*
● http#**www.openafs.org*
● http#**www.coda.cs.cmu.edu*● http#**www.treemfs.org*
● http#**www.gluster.org*
● http#**he!afs.org*inde.php*433*34*all)that)and)a)pony*