redundant data update in server-less video-on-demand systems presented by ho tsz kin

Redundant Data Update in Server-less Video-on-Demand Systems

Presented by Ho Tsz Kin

Agenda

Background

Redundant Data Regeneration

Sequential Redundant Data Update(SRDU)

Update Overhead Analysis

Performance Evaluation

Conclusion and Future Works

BackgroundSLVoD System

Data Nodes

Redundancy Nodes

Receiver

N etw o r k

......

BackgroundData Placement

v0 v1 v2 v3 c0,0 c0,1

d2d1d0 d3 r0 r1

v4

v8

v12

c1,0

c2,0

c3,0

c1,1

c2,1

c3,1

v5 v6 v7

v9 v10 v11

v13 v14 v15

v16 c4,0 c4,1v17 v18 v19

Same Parity Group

Data Nodes Redundancy Nodes

Q-bytes data blocks Q-bytes

redundant data blocks

BackgroundNew nodes join the system

Data blocks are reorganized to utilize storage and streaming capacity

Data blocks in Parity Group change

Redundant data blocks need to update/re-compute

v0 v1 v2 v3 c’0,0 c’0,1

d2d1d0 d3 r0 r1

v5

v10

v15

c’1,0

c’2,0

c’3,0

c’1,1

c’2,1

c’3,1

v6 v7 v8

v11 v12 v13

v16 v17 v18

v4

d4

v9

v14

v19

Data blocksin parity group CHANGE!

Redundant data blocks CHANGE!

Redundant Data Regeneration

Regenerate new redundant blocks by all the data blocksData blocks are distributed among nodes

Send all the data blocks to redundancy nodeCompute the new redundant

blocks in redundancy nodeOverhead is too high

Redundancy Node

Data Nodes

...

Redundant Data RegenerationData blocks are stored in central archive

serverRegenerate all the new redundant data in serverSend to redundancy nodesServer may be costly, not desirable, not feasible

Redundancy Node

Server

Sequential Redundant Data UpdateIdeaOld redundant data blocks contain valuable

information on the data blocksData blocks are different after reorganizationBut some blocks in parity group is the SAMEPossible to reuse for linear code

v0 v1 v2 v3 c’0,0 c’0,1v4

New Redundant data blocks

v0 v1 v2 v3 c0,0 c0,1

Old Redundant data blocks

d2d1d0 d3 r0 r1d4

d2d1d0 d3 r0 r1

Sequential Redundant Data UpdateLinear erasure correction code Can be analyzed using linear algebra Linear matrix multiplications in computing redundant data

Reed-Solomon Erasure Correction Code Redundant data by Vandermonde matrix

Number of nodes, N Number of redundancy node, h Element at row i, column j = ji-1

Data from data node 0

Redundant data in redundancy node 0

,0 ,0

,1 ,1

1 1 1, 1 , 1

1 1 1 1

1 2 3

1 2 3 ( )

i i

i i

h h hi h i N h

c d

c dN h

c dN h

Rounding or truncating would prevent a correct reconstruction of source data 1 / 3 = 0.3333, decoding fails

Arithmetic over Galois Fields Finite Fields

Finite elements Add, subtract, multiply, divide similar to algebra Closed under these operations

GF(2w) are integers from zero to 2w-1 Fixed bits to represent an element

Assume a Q-bytes data block can be represented by a field element

Sequential Redundant Data Update

Sequential Redundant Data Update(SRDU) consists of 3 techniquesDirect reuse of old redundant dataParity group reshufflingCaching of transmitted data

Sequential Redundant Data Update

SRDU – Direct Reuse

v0 v1 v2 v3 c’0,0 c’0,1v4

v0 v1 v2 v3 c0,0 c0,1

d2d1d0 d3 r0 r1d4

d2d1d0 d3 r0 r1

0

0,0 1

0,1 2

3

1 1 1 1

1 2 3 4

v

c v

c v

v

0

10,0

20,1

3

4

' 1 1 1 1 1

' 1 2 3 4 5

v

vc

vc

v

v

0,1 0 1 2 31* 2* 3* 4*c v v v v

Only need to send v4,

Save 4 blocks

0,1 0 1 2 3 4

0,1 4

' (1* 2* 3* 4* ) 5*

5*

c v v v v v

c v

Construct 1st parity group Before node addition

After node addition

SRDU – Direct ReuseConstruct 2nd parity group Before node addition

v5 v6 v7 v8 c’1,0 c’1,1v9

v4 v5 v6 v7 c1,0 c1,1

d2d1d0 d3 r0 r1d4

d2d1d0 d3 r0 r1

4

1,0 5

1,1 6

7

1 1 1 1

1 2 3 4

v

c v

c v

v

5

61,0

71,1

8

9

' 1 1 1 1 1

' 1 2 3 4 5

v

vc

vc

v

v

1,1 4 5 6 71* 2* 3* 4*c v v v v

Terms are different.Direct Reuse Fails!

1,1 5 6 7 8 9' 1* 2* 3* 4* 5*c v v v v v

After node addition

SRDU – Parity Group ReshufflingThe order during encoding is not important

Parity group reshuffling Before node addition

After node addition

5 8

6 5

7 6

8 7

9 9

reshuffle to

v v

v v

v v

v v

v v

1,1 4 5 6 71* 2* 3* 4*c v v v v

1,1 8 5 6 7 9

8 4 5 6 7 4 9

8 1,1 4 9

' 1* 2* 3* 4* 5*

1* (1* 2* 3* 4* 1* ) 5*

1* ( 1* ) 5*

c v v v v v

v v v v v v v

v c v v

Only need to send v4, v8 and v9

Save 2 blocks

SRDU – Parity Group ReshufflingChoices of old redundant data for reusing

5 8

6 9

7 5

8 6

9 7

reshuffle to

v v

v v

v v

v v

v v

Before node addition

After node addition

v5 v6 v7 v8 c’1,0 c’1,1v9

v8 v9 v10 v11 c2,0 c2,1

d2d1d0 d3 r0 r1d4

d2d1d0 d3 r0 r1

2,1 8 9 10 111* 2* 3* 4*c v v v v

1,1 8 9 5 6 7

2,1 10 11 5 6 7

' (1* 2* ) 3* 4* 5*

( 3* 4* ) 3* 4* 5*

c v v v v v

c v v v v v

5 blocks needed,

More overhead

SRDU - CachingCaching of transmitted data in constructing previous redundant block

Reuse them to construct current redundant blocksConstruct 1st parity group

Construct 2nd parity group

0,1 0,1 4' 5*c c v

1,1 8 1,1 4 9' 1* ( 1* ) 5*c v c v v

v4 is cached in the redundancy node

v4 can be reused from locally cached data,Only v8 and v9 need to send

SRDU - CachingThe buffer for caching is at most (N-h)

Data blocks from earlier than (N-h) blocks are not useful for reusingConstruct the 4th parity group

15 16 17 18 19( , , , , )v v v v v

16 17 18 19( , , , )v v v v12 13 14 15( , , , )v v v v

c4,1c3,1

All previous blocks are useless

c’3,1

At most 4 data blocks are cached

Old redundant block

New redundant block

SRDU - Summary

For construction of each new redundant blockReuse old redundant block by

Direct reuseParity group reshufflingCaching of transmitted data

Select the one with lowest overhead to reuse

Send the required blocks only

Update Overhead AnalysisTotal data blocks, BRedundant Data Regeneration Data blocks are distributed among nodes

Total overhead = B + (h-1) * B/(N-h)

All data blocks are sent, overhead = B

All redundant data are constructed here overhead = B / (N-h)

.... . .

...

Update Overhead AnalysisData blocks are stored in central archive

serverTotal overhead = h * B/(N-h)

overhead = B / (N-h)

All redundant data are constructed in server

......

Update Overhead AnalysisSequential Redundant Data Update(SRDU)Total overhead = block movement under SRDU

+ (h -1) * B/(N-h)

Data blocks are sent under decision of SRDU

Compute partial result of each new redundant blocks

overhead = B / (N-h)

Combine the old redundant data and received partial result

1,1 8 1,1 4 9

1,1

' 1* ( 1* ) 5*

c v c v v

c P

8 4 91* 1* 5*P v v v

.... . .

...

Update Overhead AnalysisRedundant data RegenerationData blocks are distributed among nodes

Total overhead = B + (h-1) * B/(N-h)Data blocks are stored in central archive server

Total overhead = B/(N-h) + (h-1) * B/(N-h)

Sequential Redundant Data Update(SRDU)Total overhead = blocks movement under SRDU +

(h-1) * B/(N-h)

Sufficient to consider overhead for one redundant update as performance metric

Performance EvaluationUpdate overhead in continuous system growth

Number of data blocks = 40000Grow from 1 node to 400 nodes, one node added each

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 100 200 300 400

System Size (nodes)

Red

un

dan

cy

Up

date

Ov

erh

ead

(b

lock

s)

Regeneration by Fully Distributed Data

Regeneration by a Archive Server

Direct reuse + Reshuffling

Direct Reuse + Reshuffling + Caching

Direct Reuse

Performance EvaluationUpdate overhead in batched redundancy update

Defer update until a fixed number of nodes, W Initial system size = 80, W ranges from 1 to 200

100

1000

10000

100000

0 50 100 150 200

batch size W (nodes)

Per

-no

de

Red

un

dan

cy U

pd

ate

Ov

ehea

d (

blo

cks)

Direct Reuse + Reshuffling + Caching, Starting at 80

Direct Reuse + Reshuffling,Starting at 80 Direct Reuse,

Starting at 80

Conclusion

Identify the redundant data update problem

Investigate redundant data regeneration

Propose SRDU to reduce the overhead substantially (about 75%)

Further reduce the overhead by batched redundancy update (about 97% for a batch size of 10)

Future WorksDistributed Redundant Data Update and its scheduling

Mathematical Modeling

Optimality of redundant data update

Redundant Data Addition

redundant data update in server-less video-on-demand systems presented by ho tsz kin

Documents

data blocks data blocks

redundant data blocks

system data blocks

new redundant blocks

streaming capacity data

v4v4 v0v0 v1v1 v2v2

redundancy nodes server

background new nodes