storage codes: managing big data with small overheadsdata with...

103
Storage codes: Managing Big Data with Small Overheads Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang Technological University, Singapore © 2013, A. Datta & F. Oggier, NTU Singapore Tutorial at NetCod 2013, Calgary, Canada.

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Storage codes: Managing Big Data with Small OverheadsData with Small Overheads

Presented by

Anwitaman Datta & Frédérique E. Oggier Nanyang Technological University, Singapore

© 2013, A. Datta & F. Oggier, NTU Singapore

Tutorial at NetCod 2013, Calgary, Canada.

Page 2: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Big Data Storage: DisclaimerBig Data Storage: Disclaimer

A note from the trenches: "You know you have a large storage f y g gsystem when you get paged at 1 AM because you only have a few petabytes of storage left." – from Andrew Fikes’ (Principal Engineer Google) faculty summit talk ` Storage ArchitectureEngineer, Google) faculty summit talk Storage Architecture and Challenges `, 2010.

© 2013, A. Datta & F. Oggier, NTU Singapore2

Page 3: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Big Data Storage: DisclaimerBig Data Storage: Disclaimer

A note from the trenches: "You know you have a large storage f y g gsystem when you get paged at 1 AM because you only have a few petabytes of storage left." – from Andrew Fikes’ (Principal Engineer Google) faculty summit talk ` Storage ArchitectureEngineer, Google) faculty summit talk Storage Architecture and Challenges `, 2010.

We neverWe neverget such calls!!

© 2013, A. Datta & F. Oggier, NTU Singapore3

Page 4: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Big dataBig data

• June 2011 EMC2 studyy– world’s data is more than doubling

every 2 years• faster than Moore’s Law

Big data: - big problem?

bi i ?– 1.8 zettabytes of data to be created in 2011

- big opportunity?

Zetta: 1021

Zettabyte: If you stored all of this data onDVD th t k ld h f th E thDVDs, the stack would reach from the Earthto the moon and back.

© 2013, A. Datta & F. Oggier, NTU Singapore

* http://www.emc.com/about/news/press/2011/20110628-01.htm

4

Page 5: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

The data deluge: Some numbersThe data deluge: Some numbers• Facebook “currently” (in 2010) stores

over 260 billion images, which translatesto over 20 petabytes of data. Users uploadone billion new photos (60 terabytes) eachweek and Facebook serves over oneweek and Facebook serves over onemillion images per second at peak.[quoted from a paper on “Haystack” fromFacebook]

• On “Saturday”, photo number fourbillion was uploaded to photo sharing siteFlickr. This comes just five and a halfmonths after the 3 billionth and nearly 18months after the 3 billionth and nearly 18months after photo number two billion. –Mashable (13th October 2009)[http://mashable.com/2009/10/12/flickr-4-

© 2013, A. Datta & F. Oggier, NTU Singapore

pbillion/]

5

Page 6: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Scale how?Scale how?

© 2013, A. Datta & F. Oggier, NTU Singapore * Definitions from Wikipedia6

Page 7: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Scale how?Scale how?To scale vertically (or scale up) means to

add resources to a single node in a system*

© 2013, A. Datta & F. Oggier, NTU Singapore

Scale up* Definitions from Wikipedia7

Page 8: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Scale how? not distributing is not an option!Scale how?To scale vertically (or scale up) means to

add resources to a single node in a system*

g p

To scale horizontally (or scale out) means toTo scale horizontally (or scale out) means to add more nodes to a system, such as adding a new

computer to a distributed software application*

© 2013, A. Datta & F. Oggier, NTU Singapore

Scale up Scale out* Definitions from Wikipedia8

Page 9: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Failure (of parts) is InevitableFailure (of parts) is Inevitable

© 2013, A. Datta & F. Oggier, NTU Singapore9

Page 10: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Failure (of parts) is InevitableFailure (of parts) is Inevitable • But, failure of the system is not an option either!

F il i th ill f i l ’– Failure is the pillar of rivals’ success …

© 2013, A. Datta & F. Oggier, NTU Singapore10

Page 11: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Deal with itDeal with it

• Data from Los Alamos National Laboratory (DSN 2006), y ( ),gathered over 9 years, 4750 machines, 24101 CPUs.

• Distribution of failures:– Hardware 60%– Software 20%– Network/Environment/Humans 5%

• Failures occurred between once a day to once a month.

© 2013, A. Datta & F. Oggier, NTU Singapore11

Page 12: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Failure happens without fail soFailure happens without fail, so …• But, failure of the system is not an option either!

F il i th ill f i l ’– Failure is the pillar of rivals’ success …

• Solution: Redundancy

© 2013, A. Datta & F. Oggier, NTU Singapore12

Page 13: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Many Levels of RedundancyMany Levels of Redundancy

• Physicaly• Virtual resource• Availability zone• Region• Cloud

© 2013, A. Datta & F. Oggier, NTU Singapore

From: http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html

13

Page 14: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Redundancy Based Fault ToleranceRedundancy Based Fault Tolerance

• Replicate datap– e.g., 3 or more copies– In nodes on different racks

• Can deal with switch failures• Can deal with switch failures

• Power back-up using battery between racks (Google)

© 2013, A. Datta & F. Oggier, NTU Singapore14

Page 15: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

But At What Cost?But At What Cost?

• Failure is not an option, but …p ,– … are the overheads acceptable?

© 2013, A. Datta & F. Oggier, NTU Singapore15

Page 16: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Reducing the Overheads of Redundancy• Erasure codes

– Much lower storage overhead

Reducing the Overheads of Redundancy

Much lower storage overhead– High level of fault-tolerance

• In contrast to replication or RAID based systems

H h i l i ifi l i h “b li ”• Has the potential to significantly improve the “bottomline” – e.g., Both Google’s new DFS Collossus, as well as Microsoft’s Azure

now use ECs

© 2013, A. Datta & F. Oggier, NTU Singapore16

Page 17: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs)

© 2013, A. Datta & F. Oggier, NTU Singapore17

Page 18: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs)

• An (n,k) erasure code = a map that takes as input k blocks and ( , ) p poutputs n blocks, thus introducing n-k blocks of redundancy.

© 2013, A. Datta & F. Oggier, NTU Singapore18

Page 19: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs)

• An (n,k) erasure code = a map that takes as input k blocks and ( , ) p poutputs n blocks, thus introducing n-k blocks of redundancy.

• 3 way replication is a (3,1) erasure code!

© 2013, A. Datta & F. Oggier, NTU Singapore19

Page 20: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs)

• An (n,k) erasure code = a map that takes as input k blocks and ( , ) p poutputs n blocks, thus introducing n-k blocks of redundancy.

• 3 way replication is a (3,1) erasure code!

k=1 block

© 2013, A. Datta & F. Oggier, NTU Singapore20

Page 21: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs)

• An (n,k) erasure code = a map that takes as input k blocks and ( , ) p poutputs n blocks, thus introducing n-k blocks of redundancy.

• 3 way replication is a (3,1) erasure code!

Encodingg

k=1 block n=3 encoded blocks

© 2013, A. Datta & F. Oggier, NTU Singapore21

Page 22: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs)

• An (n,k) erasure code = a map that takes as input k blocks and ( , ) p poutputs n blocks, thus introducing n-k blocks of redundancy.

• 3 way replication is a (3,1) erasure code!

Encodingg

k=1 block n=3 encoded blocks

• An erasure code such that the k original blocks can be recreated out of any k encoded blocks is called MDS

© 2013, A. Datta & F. Oggier, NTU Singapore

(maximum distance separable).

22

Page 23: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes (ECs)Erasure Codes (ECs) • Originally designed for communication

– EC(n k)

Receive any k’ (≥ k) bl k

B1B2 O1O1

EC(n,k)

D di…k’ (≥ k) blocks

O2

…mes

sage

uct D

ataO2

Decoding… …

L t bl k

Encoding…

Dat

a =

m

Rec

onst

ru

Bn

Ok

Lost blocks

Originalk blocks

R

Ok

© 2013, A. Datta & F. Oggier, NTU Singapore

n encoded blocks k blocks 23

Page 24: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Erasure Codes for Networked StorageErasure Codes for Networked Storage

OB1 Retrieve any O

bjec

t

O1

O2B2

yk’ (≥ k) blocks

t Dat

a

O1

O2

Dat

a =

Ob

Encoding… …

econ

stru

ct

DecodingBl

D

Ok … Lost blocks O i i l

Re

Ok

k blocks Bn

n encoded blocks

… Lost blocks Originalk blocks

© 2013, A. Datta & F. Oggier, NTU Singapore

(stored in storage devices in a network)24

Page 25: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

HDFS-RAIDHDFS RAID

• Distributed RAID File system (DRFS) clienty ( )– provides application access to the files in the DRFS– transparently recovers any corrupt or missing blocks encountered when

reading a file (degraded read)• Does not carry out repairs

• RaidNode, a daemon that creates and maintains parity files for all data files stored in the DRFS

• BlockFixer, which periodically recomputes blocks that have been lost or corrupted– RaidShell allows on demand repair triggered by administrator– RaidShell allows on demand repair triggered by administrator

• Two kinds of erasure codes implemented– XOR code and Reed-Solomon code (typically 10+4 w/ 1.4x overhead)

© 2013, A. Datta & F. Oggier, NTU Singapore25From http://wiki.apache.org/hadoop/HDFS-RAID

Page 26: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Replenishing Lost Redundancy for ECsB1 Retrieve any

Replenishing Lost Redundancy for ECs• Repair needed for long term resilience.

B2

Retrieve any k’ (≥ k) blocks

O1

O2Recreate

lost blocks

… …Decoding EncodingBl

Re-insert

L t bl kOk

Reinsert in (new) storage devices, so that there is (again)

Bn

Lost blocks Originalk blocks

that there is (again) n encoded blocks

n encoded blocks

© 2013, A. Datta & F. Oggier, NTU Singapore26

Page 27: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Replenishing Lost Redundancy for ECsB1 Retrieve any

Replenishing Lost Redundancy for ECs• Repair needed for long term resilience.

B2

Retrieve any k’ (≥ k) blocks

O1

O2Recreate

lost blocks

… …Decoding EncodingBl

Re-insert

L t bl kOk

Reinsert in (new) storage devices, so that there is (again)

Bn

Lost blocks Originalk blocks

that there is (again) n encoded blocks

n encoded blocks

© 2013, A. Datta & F. Oggier, NTU Singapore• Repairs are expensive!

27

Page 28: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Tailored-Made Codes for StorageTailored Made Codes for Storage

© 2013, A. Datta & F. Oggier, NTU Singapore28

Page 29: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Tailored-Made Codes for StorageTailored Made Codes for StorageDesired code properties include:• Low storage overhead• Low storage overhead• Good fault tolerance

Traditional MDS erasure codes achieve these.

© 2013, A. Datta & F. Oggier, NTU Singapore29

Page 30: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Tailored-Made Codes for StorageTailored Made Codes for StorageDesired code properties include:• Low storage overhead• Low storage overhead• Good fault tolerance • Better repairability

Traditional MDS erasure codes achieve these.

Better repairability

© 2013, A. Datta & F. Oggier, NTU Singapore30

Page 31: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Tailored-Made Codes for StorageTailored Made Codes for StorageDesired code properties include:• Low storage overhead• Low storage overhead• Good fault tolerance • Better repairability

Traditional MDS erasure codes achieve these.

Smaller repair fan-in

Better repairability

Reduced I/O for repairs Possibility of multiple simultaneous repairs Fast repairs Fast repairs Efficient B/W usage…

© 2013, A. Datta & F. Oggier, NTU Singapore31

Page 32: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Tailored-Made Codes for StorageTailored Made Codes for StorageDesired code properties include:• Low storage overhead• Low storage overhead• Good fault tolerance • Better repairability

Traditional MDS erasure codes achieve these.

Smaller repair fan-in

Better repairability• Better …

Reduced I/O for repairs Possibility of multiple simultaneous repairs Fast repairs Fast repairs Efficient B/W usage…

© 2013, A. Datta & F. Oggier, NTU Singapore32

Page 33: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Tailored-Made Codes for StorageTailored Made Codes for StorageDesired code properties include:• Low storage overhead• Low storage overhead• Good fault tolerance • Better repairability

Traditional MDS erasure codes achieve these.

Smaller repair fan-in

Better repairability• Better …

Reduced I/O for repairs Possibility of multiple simultaneous repairs Fast repairs Fast repairs Efficient B/W usage…

Better data-insertion Better migration to archival

© 2013, A. Datta & F. Oggier, NTU Singapore33

g…

Page 34: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Pyramid (Local Reconstruction) CodesPyramid (Local Reconstruction) Codes

Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, C. Huang et al. @ NCA 2007

© 2013, A. Datta & F. Oggier, NTU Singapore34

Erasure Coding in Windows Azure Storage, C. Huang et al. @ USENIX ATC 2012

Page 35: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Pyramid (Local Reconstruction) CodesPyramid (Local Reconstruction) Codes

– Good for degraded reads (data locality)

Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, C. Huang et al. @ NCA 2007

© 2013, A. Datta & F. Oggier, NTU Singapore35

Erasure Coding in Windows Azure Storage, C. Huang et al. @ USENIX ATC 2012

Page 36: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Pyramid (Local Reconstruction) CodesPyramid (Local Reconstruction) Codes

– Good for degraded reads (data locality)– Not all repairs are cheap (only partial parity locality)

Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, C. Huang et al. @ NCA 2007

p p ( y p p y y)

© 2013, A. Datta & F. Oggier, NTU Singapore36

Erasure Coding in Windows Azure Storage, C. Huang et al. @ USENIX ATC 2012

Page 37: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Regenerating CodesRegenerating Codes• Network information flow based arguments to determine

“optimal” trade-off of storage/repair-bandwidthoptimal trade off of storage/repair bandwidth

© 2013, A. Datta & F. Oggier, NTU Singapore37

Page 38: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Locally Repairable CodesLocally Repairable Codes• Codes satisfying: low repair fan-in, for any failure

Th i i i t f “l ll d d bl d ”• The name is reminiscent of “locally decodable codes”

© 2013, A. Datta & F. Oggier, NTU Singapore

38

Page 39: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Self-repairing CodesSelf-repairing Codes

© 2013, A. Datta & F. Oggier, NTU Singapore39

Page 40: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Self-repairing CodesSelf-repairing Codes• Usual disclaimer: “To the best of our knowledge”

Fi t i t f l ll i bl d– First instances of locally repairable codes• Self‐repairing Homomorphic Codes for Distributed Storage Systems

– Infocom 2011• Self‐repairing Codes for Distributed Storage Systems – A ProjectiveSelf repairing Codes for Distributed Storage Systems  A Projective Geometric Construction

– ITW 2011– Since then, there have been many other instances from other

researchers/groups

© 2013, A. Datta & F. Oggier, NTU Singapore40

Page 41: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Self-repairing CodesSelf-repairing Codes• Usual disclaimer: “To the best of our knowledge”

Fi t i t f l ll i bl d– First instances of locally repairable codes• Self‐repairing Homomorphic Codes for Distributed Storage Systems

– Infocom 2011• Self‐repairing Codes for Distributed Storage Systems – A ProjectiveSelf repairing Codes for Distributed Storage Systems  A Projective Geometric Construction

– ITW 2011– Since then, there have been many other instances from other

researchers/groups• Note

– k encoded blocks are enough to recreate the object• Caveat: not any arbitrary k (i.e., SRCs are not MDS)• However, there are many such k combinations

© 2013, A. Datta & F. Oggier, NTU Singapore41

Page 42: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Self-repairing Codes: Blackbox ViewSelf-repairing Codes: Blackbox ViewB1 Retrieve some B2

k” (< k) blocks (e.g. k”=2)to recreate a lost block

Bl

Re-insert… Lost blocks

Reinsert in (new) storage devices, so

Bn

n encoded blocks

… Lost blocks that there is (again) n encoded blocks

© 2013, A. Datta & F. Oggier, NTU Singapore

(stored in storage devices in a network)

42

Page 43: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

PSRC Example

© 2013, A. Datta & F. Oggier, NTU Singapore

Page 44: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

PSRC Example

© 2013, A. Datta & F. Oggier, NTU Singapore

Page 45: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

PSRC Example

(o1+o2+o4) + (o1) => o2+o4

R i i t d (o3) + (o2+o3) => o2(o1) + (o2) => o1+ o2

Repair using two nodes

Four pieces needed to regenerate two pieces

Say N1 and N3

© 2013, A. Datta & F. Oggier, NTU Singapore

Page 46: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

PSRC Example

(o1+o2+o4) + (o1) => o2+o4

R i i t d (o3) + (o2+o3) => o2(o1) + (o2) => o1+ o2

Repair using two nodes

Four pieces needed to regenerate two pieces

Say N1 and N3

(o +o +o ) + (o ) => o +o(o2) + (o4) => o2+ o4Repair using three nodes

© 2013, A. Datta & F. Oggier, NTU Singapore

(o1+o2+o4) + (o4) => o1+o2

Three pieces needed to regenerate two pieces

Say N2, N3 and N4

46

Page 47: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RecapReplicas

DataData access

Erasure coded data

© 2013, A. Datta & F. Oggier, NTU Singapore47

Page 48: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RecapReplicas

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

© 2013, A. Datta & F. Oggier, NTU Singapore48

Page 49: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RecapReplicas

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

(partial) re-encode/repair

© 2013, A. Datta & F. Oggier, NTU Singapore49

(p ) p(e.g., Self-Repairing Codes)

Page 50: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RecapReplicas

Data insertion

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

(partial) re-encode/repair

© 2013, A. Datta & F. Oggier, NTU Singapore50

(p ) p(e.g., Self-Repairing Codes)

Page 51: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

NextReplicas

Data insertion

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

(partial) re-encode/repair

© 2013, A. Datta & F. Oggier, NTU Singapore51

(p ) p(e.g., Self-Repairing Codes)

Page 52: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Inserting Redundant Data

Inserting Redundant Data

© 2013, A. Datta & F. Oggier, NTU Singapore52

Page 53: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Inserting Redundant Data

• Data insertion

Inserting Redundant Data

© 2013, A. Datta & F. Oggier, NTU Singapore53

Page 54: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Inserting Redundant Data

• Data insertion

Inserting Redundant Data

– Replicas can be inserted in a pipelined manner

© 2013, A. Datta & F. Oggier, NTU Singapore54

Page 55: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Inserting Redundant Data

• Data insertion

Inserting Redundant Data

– Replicas can be inserted in a pipelined manner

© 2013, A. Datta & F. Oggier, NTU Singapore55

Page 56: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Inserting Redundant Data

• Data insertion

Inserting Redundant Data

– Replicas can be inserted in a pipelined manner

– Traditionally, erasure coded systems used a central point of processing

© 2013, A. Datta & F. Oggier, NTU Singapore56

Page 57: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Inserting Redundant Data

• Data insertion

Inserting Redundant Data

– Replicas can be inserted in a pipelined manner

– Traditionally, erasure coded systems used a central point of processing

Can the process of redundancy generation be distributed among

© 2013, A. Datta & F. Oggier, NTU Singapore57

be distributed among the storage nodes?

Page 58: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Ref: In-Network Redundancy Generation for Opportunistic y ppSpeedup of Backup, Future Generation Comp. Syst. 29(1), 2013

© 2013, A. Datta & F. Oggier, NTU Singapore

58

Page 59: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Ref: In-Network Redundancy Generation for Opportunistic y ppSpeedup of Backup, Future Generation Comp. Syst. 2013

• Motivations– Reduce the bottleneck at a single point

• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generationredundancy generation

© 2013, A. Datta & F. Oggier, NTU Singapore

59

Page 60: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Ref: In-Network Redundancy Generation for Opportunistic y ppSpeedup of Backup, Future Generation Comp. Syst. 2013

• Motivations– Reduce the bottleneck at a single point

• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generationredundancy generation

© 2013, A. Datta & F. Oggier, NTU Singapore

60

Page 61: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Ref: In-Network Redundancy Generation for Opportunistic y ppSpeedup of Backup, Future Generation Comp. Syst. 2013

• Motivations– Reduce the bottleneck at a single point

• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generationredundancy generation

– Utilize network resources opportunisticallyD Wh k/ d b d i h hi

© 2013, A. Datta & F. Oggier, NTU Singapore

• Data-centers: When network/nodes are not busy doing other things• P2P/F2F: When nodes are online

61

Page 62: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Ref: In-Network Redundancy Generation for Opportunistic y ppSpeedup of Backup, Future Generation Comp. Syst. 2013

• Motivations– Reduce the bottleneck at a single point

• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generationredundancy generation

More traffic (ephemeral resource)

b t hi h d t

– Utilize network resources opportunisticallyD Wh k/ d b d i h hi

but higher data insertion throughput

© 2013, A. Datta & F. Oggier, NTU Singapore

• Data-centers: When network/nodes are not busy doing other things• P2P/F2F: When nodes are online

62

Page 63: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

© 2013, A. Datta & F. Oggier, NTU Singapore63

Page 64: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

© 2013, A. Datta & F. Oggier, NTU Singapore64

Page 65: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:

© 2013, A. Datta & F. Oggier, NTU Singapore65

Page 66: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

© 2013, A. Datta & F. Oggier, NTU Singapore66

Page 67: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

© 2013, A. Datta & F. Oggier, NTU Singapore67

Page 68: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

© 2013, A. Datta & F. Oggier, NTU Singapore68

Page 69: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

r1

r2

r4

© 2013, A. Datta & F. Oggier, NTU Singapore69

Page 70: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

r1 r6

r2

r4

© 2013, A. Datta & F. Oggier, NTU Singapore70

Page 71: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

r1 r6

r2

r4 r7

© 2013, A. Datta & F. Oggier, NTU Singapore71

Page 72: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

r1 r6 Need a good “schedule” to insert the redundancy!

- Avoid cycles/dependenciesr2

r4 r7

Avoid cycles/dependencies

© 2013, A. Datta & F. Oggier, NTU Singapore72

Page 73: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

r1 r6 Need a good “schedule” to insert the redundancy!

- Avoid cycles/dependenciesr2

r4 r7

Avoid cycles/dependenciesSubject to “unpredictable” availability of resource!!

© 2013, A. Datta & F. Oggier, NTU Singapore73

Page 74: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Dependencies among self-repairing coded fragments can be p g p g gexploited for in-network coding!

• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6

• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]

• A naïve approach pp

r1 r6 Need a good “schedule” to insert the redundancy!

- Avoid cycles/dependenciesr2

r4 r7

Avoid cycles/dependenciesSubject to “unpredictable” availability of resource!!

Turns out to be

© 2013, A. Datta & F. Oggier, NTU Singapore74

Turns out to be O(n!) w/ Oracle

Page 75: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

© 2013, A. Datta & F. Oggier, NTU Singapore75

Page 76: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• Heuristics– Several other policies (such as max data) were also tried

© 2013, A. Datta & F. Oggier, NTU Singapore76

Page 77: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

© 2013, A. Datta & F. Oggier, NTU Singapore77

Page 78: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

© 2013, A. Datta & F. Oggier, NTU Singapore78

Page 79: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• RndFlw was the best heuristicRndFlw was the best heuristic – Among those we tried

© 2013, A. Datta & F. Oggier, NTU Singapore79

Page 80: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• RndFlw was the best heuristicRndFlw was the best heuristic – Among those we tried

• Provided 40% (out of a possible 57%) bandwidthProvided 40% (out of a possible 57%) bandwidth savings at source for a SRC(7,3) code

© 2013, A. Datta & F. Oggier, NTU Singapore80

Page 81: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• RndFlw was the best heuristicRndFlw was the best heuristic – Among those we tried

• Provided 40% (out of a possible 57%) bandwidthProvided 40% (out of a possible 57%) bandwidth savings at source for a SRC(7,3) code– An increase in the data-insertion throughput btw. 40-60%

© 2013, A. Datta & F. Oggier, NTU Singapore81

Page 82: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

In-network codingIn network coding

• RndFlw was the best heuristicRndFlw was the best heuristic – Among those we tried

• Provided 40% (out of a possible 57%) bandwidthProvided 40% (out of a possible 57%) bandwidth savings at source for a SRC(7,3) code– An increase in the data-insertion throughput btw. 40-60%

• No free lunch: Increase of 20-30% overall network traffic

© 2013, A. Datta & F. Oggier, NTU Singapore82

Page 83: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RecapReplicas

Data insertion

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

(partial) re-encode/repair

© 2013, A. Datta & F. Oggier, NTU Singapore83

(p ) p(e.g., Self-Repairing Codes)

Page 84: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

NextReplicas

Data insertion

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

(partial) re-encode/repair

© 2013, A. Datta & F. Oggier, NTU Singapore84

(p ) p(e.g., Self-Repairing Codes)

Page 85: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAIDRapidRAID

• Ref:– RapidRAID: Pipelined Erasure Codes for Fast Data Archival in

Distributed Storage Systems (Infocom 2013) • Has some local repairability properties, but that aspect is yet to be exploredHas some local repairability properties, but that aspect is yet to be explored

– Another code instance @ ICDCN 2013• Decentralized Erasure Coding for Ecient Data Archival in Distributed

Storage SystemsStorage Systems– Systematic code (unlike RapidRAID)– Found using numerical methods, and a general theory for the construction of

such codes, as well as their repairability properties are open issues

© 2013, A. Datta & F. Oggier, NTU Singapore85

Page 86: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAIDRapidRAID

• Ref:– RapidRAID: Pipelined Erasure Codes for Fast Data Archival in

Distributed Storage Systems (Infocom 2013) • Has some local repairability properties, but that aspect is yet to be exploredHas some local repairability properties, but that aspect is yet to be explored

– Another code instance @ ICDCN 2013• Decentralized Erasure Coding for Ecient Data Archival in Distributed

Storage SystemsStorage Systems– Systematic code (unlike RapidRAID)– Found using numerical methods, and a general theory for the construction of

such codes, as well as their repairability properties are open issues

• Problem statement: Can the existing (replication based) redundancy be exploited to create an erasure coded archive?

© 2013, A. Datta & F. Oggier, NTU Singapore86

Page 87: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Slight change of viewSlight change of view

S S S

S1 S1 S1

S1 S2 S

S1 S2 S

S1 S2 S

S2 S2 S2

S3 S4

S3 S4

S3 S4

S3 S3 S3

S4 S4 S4

© 2013, A. Datta & F. Oggier, NTU Singapore87

Two ways to look at replicated data

Page 88: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAIDRapidRAID

centralized encoding processencoding process

© 2013, A. Datta & F. Oggier, NTU Singapore88

Page 89: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAIDRapidRAID

D t li iDecentralizing the hitherto centralized

encoding processencoding process

© 2013, A. Datta & F. Oggier, NTU Singapore89

Page 90: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID – Example (8 4) code

RapidRAID Example (8,4) code

© 2013, A. Datta & F. Oggier, NTU Singapore90

Page 91: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID – Example (8 4) code

• Initial configuration

RapidRAID Example (8,4) code

g

© 2013, A. Datta & F. Oggier, NTU Singapore91

Page 92: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID – Example (8 4) code

• Initial configuration

RapidRAID Example (8,4) code

g

• Logical phase 1: Pipelined coding

© 2013, A. Datta & F. Oggier, NTU Singapore92

Page 93: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID – Example (8 4) code

RapidRAID Example (8,4) code

© 2013, A. Datta & F. Oggier, NTU Singapore93

Page 94: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID – Example (8 4) code

• Logical phase 2: Further local coding

RapidRAID Example (8,4) code

g p g

© 2013, A. Datta & F. Oggier, NTU Singapore94

Page 95: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID – Example (8 4) code

• Logical phase 2: Further local coding

RapidRAID Example (8,4) code

g p g

Resulting Linear Code

© 2013, A. Datta & F. Oggier, NTU Singapore95

Page 96: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID: Some resultsRapidRAID: Some results

© 2013, A. Datta & F. Oggier, NTU Singapore96

Page 97: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

RapidRAID: Some resultsRapidRAID: Some results

© 2013, A. Datta & F. Oggier, NTU Singapore97

Page 98: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Big picReplicas

Data insertion

Datafault-tolerant

Data access

data access

(MSR’s Reconstruction

Erasure coded data

Reconstruction code)

Agenda: A composite system achieving all these (partial) re-encode/repair

© 2013, A. Datta & F. Oggier, NTU Singapore98

properties …(p ) p

(e.g., Self-Repairing Codes)

Page 99: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Wrapping up: A moment of reflection

• Revisiting repairability – an engineering alternativeg p y g g– Redundantly Grouped Cross-object Coding for Repairable Storage

(APSys2012)– The CORE Storage Primitive: Cross-Object Redundancy for Efficient DataThe CORE Storage Primitive: Cross-Object Redundancy for Efficient Data

Repair & Access in Erasure Coded Storage (arXiv: arXiv:1302.5192)• HDFS-RAID compatible implementation • http://sands sce ntu edu sg/StorageCORE/http://sands.sce.ntu.edu.sg/StorageCORE/

© 2013, A. Datta & F. Oggier, NTU Singapore99

Page 100: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Wrapping up: A moment of reflectionec

ts

Erasure coding of individual objects

e11 e12 e1k e1k+1 e1n

ffere

nt o

bje

e21…

e22…

e2k…

e2k+1…e2n…

iece

s of d

if

em1

em2

emk

……

emk+1

emn

ure

code

d p

p1 p1 pk pk+1 pn

D-4

of e

rasu

© 2013, A. Datta & F. Oggier, NTU Singapore100

RA

ID

(reminiscent of product codes!)

Page 101: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

Separation of concernsSeparation of concerns

• Two distinct design objectives for distributed storage systemsg j g y– Fault-tolerance– Repairability

A t l i l id• An extremely simple idea– Introduce two different kinds of redundancy

• Any (standard) erasure code – for fault-tolerance

• RAID-4 like parity (across encoded pieces of different objects) – for repairability

© 2013, A. Datta & F. Oggier, NTU Singapore

Page 102: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

CORE repairabilityCORE repairability

• Choosing a suitable m < kg– Reduction in data transfer for repair– Repair fan-in disentangled from base code parameter “k”

• Large “k” may be desirable for faster (parallel) data access• Large k may be desirable for faster (parallel) data access• Codes typically have trade-offs between repair fan-in, code parameter

“k” and code’s storage overhead (n/k)

• However: The gains from reduced fan in is probabilistic• However: The gains from reduced fan-in is probabilistic– For i.i.d. failures with probability “f”

• Possible to reduce repair timeBy pipelining data through the live nodes and computing partial

© 2013, A. Datta & F. Oggier, NTU Singapore

– By pipelining data through the live nodes, and computing partial parity

Page 103: Storage codes: Managing Big Data with Small OverheadsData with …sands.sce.ntu.edu.sg/CodingForNetworkedStorage/pdf/netcod13tuto… · Big dataBig data • June 2011 EMC2 study –

• Interested to – Follow: http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/ p g g g

• Also, two surveys on (repairability of) storage codes – one short, at high level (SIGACT Distr. Comp. News, Mar. 2013) – one detailed (FnT June 2013)

© 2013, A. Datta & F. Oggier, NTU Singapore

one detailed (FnT, June 2013) – Get involved: {anwitaman,frederique}@ntu.edu.sg

103 ଧନ୍ଯବାଦ