digital fountains, and their application to informed content delivery over adaptive overlay networks...

88
Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Upload: vincent-lambert

Post on 24-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains,and Their Application to

Informed Content Delivery over Adaptive Overlay Networks

Michael Mitzenmacher

Harvard University

Page 2: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

The Talk• Survey of the area

– My work, and work of others– History, perspective– Less on theoretical details, more on big ideas

• Start with digital fountains– What they are– How they work– Simple applications

• Content delivery– Digital fountains, and other tools

Page 3: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Data in the TCP/IP World• Data is an ordered sequence of bytes

– Generally split into packets

• Typical download transaction:– “I need the file: packets 1-100,000.”– Sender sends packets in order (windows)– “Packet 75 is missing, please re-send.”

• Clean semantics– File is stored this way– Reliability is easy– Works for point-to-point downloads

Page 4: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Problem Case: Multicast

• One sender, many downloaders– Midnight madness problem – new software– Video-on-demand (not real time)

• Can download to each individual separately– Doesn’t scale

• Can “broadcast” – All users must start at the same time?– Heterogeneous packet loss– Heterogeneous download rates

Page 5: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountain Paradigm

Stop thinking of data as an ordered stream of bytes.

• Data is like water from a fountain– Put out your cup, stop when the cup is full.– You don’t care which drops of water you get.– You don’t care what order the drops get to your

cup.

Page 6: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

What is a Digital Fountain?

• For this talk, a digital fountain is an ideal/paradigm for data transmission.– Vs. the standard (TCP) paradigm: data is an

ordered finite sequence of bytes.

• Instead, with a digital fountain, a k symbol file yields an infinite data stream; once you have received any k symbols from this stream, you can quickly reconstruct the original file.

Page 7: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains for Multicast

• Packets sent from a single source along a tree.• Everyone grabs what they can.

– Starting time does not matter – start whenever.

– Packet loss does not matter – avoids feedback explosion of lost packets.

– Heterogeneous download rates do not matter – drop packets at routers as needed for proper rate.

• When a user has filled their cup, they leave the multicast session.

Page 8: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains for Parallel Downloads

• Download from multiple sources simultaneously and seamlessly.– All sources fill the cup – since each fountain has an

“infinite” collection of packets, no duplicates.

– Relative fountain speeds unimportant; just need to get enough.

– No coordination among sources necessary.

• Combine multicast and parallel downloading.– Wireless networks, multiple stations and antennas.

Page 9: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains forPoint-to-Point Data Transmission• TCP has problems over long-distance connections.

– Packets must be acknowledged to increase sending window (packets in flight).

– Long round-trip time leads to slow acks, bounding transmission window.

– Any loss increases the problem.

• Using digital fountain + TCP-friendly congestion control can greatly speed up connections.

• Separates the “what you send” from “how much” you send.– Do not need to buffer for retransmission.

Page 10: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

One-to-Many TCP

• Setting: Web server with popular files, may have many open connections serving same file.– Problem: has to have a separate buffer, state for each

connection to handle retransmissions.– Limits number of connections per server.

• Instead, use a digital fountain to generate packets useful for all connections for that file.

• Separates the “what you send” from “how much” you send.– Do not need to buffer for retransmission.

• Keeps TCP semantics, congestion control.

Page 11: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital fountains seem great!

But do they really exist?

Page 12: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

How Do We Build a Digital Fountain?

• We can construct (approximate) digital fountains using erasure codes.– Including Reed-Solomon, Tornado, LT,

fountain codes.

• Generally, we only come close to the ideal of the paradigm.– Streams not truly infinite; encoding or

decoding times; coding overhead.

Page 13: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains through Erasure Codes

Message

Encoding

Received

Message

Encoding Algorithm

Decoding Algorithm

Transmission

n

cn

n

n

Page 14: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Reed-Solomon Codes

• In theory, can produce an unlimited number of encoding symbols, only need k to recover.

• In practice, limited by:– Field size (usually 256 or 65,536)– Quadratic encoding/decoding times

• These problems ameliorated by striping data.– But raises overhead; now many more than k packets required

to recover.

• Conclusion: may be suitable for some applications, but far from practical or theoretical goals of a digital fountain.

Page 15: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Tornado Codes

• Irregular low-density parity check codes.• Based on graphs: k input symbols lead to n

encoding symbols, using XORs.– Sparse set of equations derived from input symbols.– Solve received set of equations using back substitution.

• Properties:– Graph of size n agreed on by encoder, decoder, and

stored.– Need k(1+) symbols to decode, for some > 0.– Encoding/decoding time proportional to n ln (1/).

Page 16: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Tornado Codes

An Example

Page 17: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Encoding Process

a b f

a b c d g

c e g h

b d e f g h

a

b

c

d

e

f

g

h

Page 18: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Decoding Process: Substitution Recovery

g

e g h

e g h

b

?

?

?indicates right node has one edge

Page 19: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Regular Graphs

Random Permutation

of the Edges

Degree 3

Degree 6

Page 20: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Decoding Process Analysis

Recovered

Missing/not yet recovered

=

=

InducedGraph

n n

Page 21: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

3-6 Regular Graph AnalysisLeft Right Left

y

x

Pr[

( ( ) )

not recovered]

= 1 1 5 2

Pr[ ]

( )

all recovered

1 5xx Pr[not recovered]

Page 22: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

3-6 Regular Graph Equation

y x ( ( ) )1 1 5 2

Want: y < x for all 0 < x <

Works for < 0.43

Page 23: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Irregular Graphs

• 3-6 regular graphs can correct up to 0.43 fraction of erasures.

• Best possible, with n/2 constraints for n symbols, would be 0.5.

• 3-6 gives best performance of all regular graphs.

• Need irregular graphs, with varying degrees, to reach optimality.

Page 24: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Tornado Codes: Weaknesses

• Encoding size n must be fixed ahead of time. • Memory, encoding and decoding times

proportional to n, not k. • Overhead factor of (1+).

– Hard to design around. In practice = 0.05.

• Conclusion: Tornado codes a dramatic step forward, allowing good approximations to digital fountains for many applications.

• Key problem: fixed encoding size.

Page 25: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Decoding Process: Direct Recovery

b

b g

e g h

b e g h

a

?

c

d

?

f

?

?

Page 26: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains through Erasure Codes : Problem

Message

Encoding

Received

Message

Encoding Algorithm

Decoding Algorithm

Transmission

n

cn

n

n

Page 27: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains through Erasure Codes : Solution

Message

Encoding

Received

Message

Encoding Algorithm

Decoding Algorithm

Transmission

n

n

n

Page 28: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

LT Codes• Key idea: graph is implicit, rather than explicit.

– Each encoding symbol is the XOR of a random subset of neighbors, independent of other symbols.

– Each encoding symbol carries a small header, telling what message symbols it is the XOR of.

• No initial graph; graph derived from received symbols.• Properties:

– “Infinite” supply of packets possible.– Need k +o(k) symbols to decode.– Decoding time proportional to k ln k.– On average, ln k time to produce an encoding symbol.

Page 29: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

LT Codes

• Conclusion: making the graph implicit gives us an almost ideal digital fountain.

• One remaining issue: why does average degree need to be around ln k? – Standard coupon collector’s problem: for each

message symbol to be hit by some equation, need k ln k variables in the equations.

• Can remove this problem by pre-coding.

Page 30: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Rateless/Raptor Codes

• Pre-coding independently described by Shokrollahi, Maymoukov.

• Rough idea: – Expand original k message symbols to k (1+)

symbols using (for example) a Tornado code.– Now use an LT code on the expanded message.– Don’t need to recover all of the expanded

message symbols, just enough to recover original message.

Page 31: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Raptor/Rateless Codes

• Properties:– “Infinite” supply of packets possible.– Need k(1+) symbols to decode, for some > 0.– Decoding time proportional to k ln (1/).– On average, ln (1/) (constant) time to produce

an encoding symbol.– Very efficient.

Raptor codes give, in practice, a digital fountain.

Page 32: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Impact on Coding

• These codes are examples of low-density parity-check (LDPC codes).

• Subsequent work: designed LDPC codes for error-correction using these techniques.

• Recent developments: LDPC codes approaching Shannon capacity for most basic channels.

Page 33: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Putting Digital Fountains To Use• Digital fountains are out there.

– Digital Fountain, Inc. sells them.

• Limitations to their use:– Patent issues.– Perceived complexity.

• Lack of reference implementation.

– What is the killer app?

Page 34: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Patent Issues

• Several patents / patents pending on irregular LDPC codes, LT codes, Raptor codes by Digital Fountain, Inc.

• Supposition: this stifles external innovation.– Potential threat of being sued.– Potential lack of commercial outlet for research.

• Suggestion: unpatented alternatives that lead to good approximations of a digital fountain would be useful. – There is work going on in this area, but more is needed to

keep up with recent developments in rateless codes.

Page 35: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Perceived Complexity

• Digital fountains are now not that hard…• …but networking people do not want to deal with

developing codes.• A research need:

– A publicly available, easy to use, reasonably good black box digital fountain implementation that can be plugged in to research prototypes.

• Issue: patents.– Legal risk suggests such a black box would need to be

based on unpatented codes.

Page 36: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

What’s the Killer App?

• Multicast was supposed to be the killer app.– But IP multicast was/is a disaster.– Distribution now handled by contend

distributions companies, e.g. Akamai.

• Possibilities:– Overlay multicast.– Big wireless: e.g. automobiles, satellites.– Others???

Page 37: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Conclusions, Part I

Stop thinking of data as an ordered stream of bytes.

Think of data as a digital fountain.

Digital fountains are implementable in practice with erasure codes.

Page 38: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

A Short Breather

• We’ve covered digital fountains.

• Next up: – Digital fountains for overlay networks.– And other tricks!

Pause for questions, 30 second stretch.

Page 39: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Overlays for Content Delivery• A substitute for IP multicast.• Build distribution topology out of unicast

connections (tunnels).• Requires active participation of end-

systems.• Native IP multicast unnecessary.• Saves considerable bandwidth over N *

unicast solution.• Basic paradigm easy to build

and deploy.

• Bonus: Overlay topology can adapt to network conditions by

self-reconfiguration.

SOURCE

Page 40: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Limitations of Existing Schemes

• Tree-like topologies.– Rooted in history (IP Multicast).– Limitations:

• bandwidth decreases monotonically from the source.• losses increase monotonically along a path.

• Does this matter in practice?– Anecdotal and experimental evidence says yes:

• Downloads from multiple mirror sites in parallel.• Availability of better routes.• Peer-to-peer: Morpheus, Kazaa and Grokster.

Page 41: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

An Illustrative Example

1. A basic tree topology.

1

2. Harnessing the power of parallel downloads.

2

3. Incorporating collaborative transfers.

3

Page 42: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Our Philosophy

• Go beyond trees. – Use additional links and bandwidth by:

• downloading from multiple peers in parallel• taking advantage of “perpendicular” bandwidth

– Has potential to significantly speed up downloads…

• But only effective if:– collaboration is carefully orchestrated– methods are amenable to frequent adaptation of the

overlay topology

Page 43: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Suitable Applications

• Prerequisite conditions:– Available bandwidth between peers.– Differences in content received by peers.– Rich overlay topology.

• Applications– Downloads of large, popular files.– Video-on-demand or nearly real-time streams.– Shared virtual environments.

Page 44: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Use Digital Fountains!

• Intrinsic resilience to packet loss, reordering.• Better support for transient connections via stateless

migration, suspension.• Peers with full content can always generate useful

symbols.• Peers with partial content are more likely to have content

to share.

• ButBut using a digital fountain comes at a price:– Content is no longer an ordered stream.– Therefore, collaboration is more difficult.

Page 45: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Informed Content Delivery:Definitions and Problem Statement

• Peers A and B have working sets of symbols SA, SB drawn from a large universe U and want to collaborate effectively.

• Key components:1)1) Summarize: Furnish a concise and useful

sample of a working set to a peer.

2)2) Approximately Reconcile: Compute as many elements in SA - SB as possible and transmit them.

• Do so with minimal control messaging overhead.

Page 46: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Summarization

• Goal: each peer has a 1 packet calling card.– Can be used to estimate |SA – SB|.

• One possibility: random sampling.– B sends A a random sample of k elements of SB.– Each element is in SA with probability– Negative: must search SA.– Negative: hard to work with multiple summaries.

• Alternative: min-wise independent sampling.

BBA SSS

Page 47: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Min-Wise Summaries

• Let U be the set of 64 bit numbers, and be a random permutation on U. Then

• Calling card for A: keep vector of k values min j(A), j=1…k.

• To estimate , count the j for which min j(A) = min j(B), divide by k.

BABA SSSSBA ))(min)(Pr(min

BABA SSSS

Page 48: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Min-Wise Summaries : Example

Page 49: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Recoding: An Intermediate Solution

Problem: What to transmit when peers have similar content?

Solution: Allow peers to probabilistically “hedge their bets,” minimizing chance of transmission of useless content.

Example:

Suppose the resemblance between SSAA and SB is 0.9.

If A sends a symbol at random the probability of it being

useful to B is 0.1.

A better strategy is to XOR 10 random symbols together.

B can extract one useful symbol with probability:

10 x (1/10) x (9/10)9 > 1/e 0.37

Page 50: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Approximate Reconciliation

• Suppose summarization suggests collaboration is worthwhile.

• Goal: compute as many elements in SA - SB

as possible, with low communication.

• Idea: we do not need all of SA - SB , just as much as possible.– Use Bloom filters.

Page 51: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Lookup Problem

• Given a set SA = {x1,x2,x3,…xn} on a universe U, want to answer queries of the form:

• Bloom filter provides an answer in– “Constant” time (time to hash).– Small amount of space.– But with some probability of being wrong.

.ASyIs

Page 52: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Bloom FiltersStart with an m bit array, filled with 0s.

Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0B

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B

To check if y is in S, check B at Hi(y). All k values must be 1.

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0BPossible to have a false positive; all k values are 1, but y is not in S.

Page 53: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Errors• Assumption: We have good hash functions, look

random.• Given m bits for filter and n elements, choose

number k of hash functions to minimize false positives:– Let – Let

• As k increases, more chances to find a 0, but more 1’s in the array.

• Find optimal at k = (ln 2)m/n by calculus.

mknkn emp /)/11(]empty is cellPr[ kmknk epf )1()1(]pos falsePr[ /

Page 54: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Example

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 1 2 3 4 5 6 7 8 9 10

Hash functions

Fal

se p

osit

ive

rate

m/n = 8

Opt k = 8 ln 2 = 5.45...

Page 55: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Bloom Filters for Reconciliation

• B transmits a Bloom filter of its set to A; A then sends packets from the set difference.– All elements will be in difference: no false

negatives.– Not all element in difference found: false pos.

• Improvements– Compressed Bloom filters– Approximate Reconciliation Trees

Page 56: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Experimental Scenarios• Three methods for collaboration

– UninformedUninformed: A transmits symbols at random to B.– SpeculativeSpeculative: B transmits a minwise summary to A;

A then sends recoded symbols to B.– ReconciledReconciled: B transmits a Bloom filter of its set to A;

A then sends packets from the set difference.

• Overhead:

– Decoding overhead: with erasure codes, fixed 2.5%.– Reception overhead: useless duplicate packets.– Recoding overhead: useless recoding packets.

symbols received - symbols needed

symbols needed

Page 57: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Pairwise Reconciliation

Containment of B in A:|SA SB|

|SB|

128MB file96K input symbols

115K distinct symbolsin system initially

Page 58: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Four peers in parallel

128MB file96K input symbols

105K distinct symbolsin system initially

Containment of B in A:|SA SB|

|SB|

Page 59: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Four peers, periodic updates

128MB file96K input symbols

105K distinct symbolsin system initially

Filters updated at every 10%.

Containment of B in A:|SA SB|

|SB|

Page 60: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Subsequent Work• Maymounkov: each source sends a stream of

consecutive encoded packets.– Possibly simplifies collaboration, with loss of flexibility.

• Bullet (SOSP ’03): – An implementation with our ideas, plus purposeful

distribution of different content.

• Network coding– Nodes inside the network can compute on the input, rather

than just the endpoints.– Potentially more powerful paradigm– Practice?

Page 61: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Conclusions

• Even with ultimate routing topology optimization, the choice of whatwhat to send is paramount to content delivery.

• Digital fountain model ideal for fluid and ephemeral network environments.

• Collaborations with coded content worthwhile.• Richly connected topologies are key to harnessing

perpendicular bandwidth.• Wanted: more algorithms for intelligent collaboration.

Page 62: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Why regular graphs are bad

d

Left node has on average neighbors of degree one.

Right degree 2d implies Pr[right degree =1] = 1 2 12 d

d d22 1

Page 63: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Irregular Graphs

Random Permutation

of the Edges

Degree 2

Degree 3

Degree 4

Degree 1Degree 4

Degree 5

Degree 6Degree 10

Page 64: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Degree Sequence Functions

• Left Side– fraction of edges of degree i on the left in the

original graph.

• Right Side– fraction of edges of degree i on the right in

the original graph.

( ): .x i xi 1

( ): .x i xi 1

i

i

Page 65: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Irregular Graph AnalysisLeft Right Left

y

x

Pr[

( ( ))

not recovered]

= 1 1

Pr[ ]

( )

all recovered

1 xx Pr[not recovered]

Page 66: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Irregular Graph Condition

y x ( ( ))1 1

Want: y < x for all 0 < x <

Page 67: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Good Left Degree Sequence:Truncated Heavy Tail

0

0.1

0.2

0.3

0.4

0.5

0.6

2 3 4 5 6 7 8 9 D+1

D = 9, N = 11

D

Fraction of nodes of degree i isN

i i( ) 1

Average node degree is N H D D ( ) ln( )

Page 68: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Good Right Degree Sequence:Poisson

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Average node degree is N H D D ( ) ln( )

Page 69: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Good Degree Sequence Functions

( ) ln( ) / ( )x x H DD 1

( ) exp ( )( ) ( )x xD H D

D

1 1

y x ( ( ))1 1

Want: y < x for all 0 < x <

Works for D

D 1

Page 70: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Tornado Code Performance

00.10.20.30.40.5

0.60.70.80.9

1

0 1 2 3 4 5 6 7 8 9 10

Time overhead(Average left degree)

ReceptionEfficiency

Page 71: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Why irregular graphs are good

D+1

Left node of max degree has on average one neighbor of degree one.

Average right degree 2ln(D) implies Pr[right degree =1] 1/(D+1)

Page 72: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Digital Fountains:A Survey and Look Forward

Michael Mitzenmacher

Page 73: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Goals of the Talk

• Explain the digital fountain paradigm for network communication.

• Examine related advances in coding.

• Summarize work on applications.

• Speculate on what comes next.

Page 74: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

How Do We Build a Digital Fountain?

• We can construct (approximate) digital fountains using erasure codes.– Including Reed-Solomon, Tornado, LT,

fountain codes.

• Generally, we only come close to the ideal of the paradigm.– Streams not truly infinite; encoding or

decoding times; coding overhead.

Page 75: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Reed-Solomon Codes

• In theory, can produce an unlimited number of encoding symbols, only need k to recover.

• In practice, limited by:– Field size (usually 256 or 65,536)– Quadratic encoding/decoding times

• These problems ameliorated by striping data.– But raises overhead; now many more than k packets required

to recover.

• Conclusion: may be suitable for some applications, but far from practical or theoretical goals of a digital fountain.

Page 76: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Tornado Codes

• Irregular low-density parity check codes.• Based on graphs: k input symbols lead to n

encoding symbols, using XORs.– Sparse set of equations derived from input symbols.– Solve received set of equations using back substitution.

• Properties:– Graph of size n agreed on by encoder, decoder, and

stored.– Need k(1+) symbols to decode, for some > 0.– Encoding/decoding time proportional to n ln (1/).

Page 77: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Tornado Codes: Weaknesses

• Encoding size n must be fixed ahead of time. • Memory, encoding and decoding times

proportional to n, not k. • Overhead factor of (1+).

– Hard to design around. In practice = 0.05.

• Conclusion: Tornado codes a dramatic step forward, allowing good approximations to digital fountains for many applications.

• Key problem: fixed encoding size.

Page 78: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

LT Codes• Key idea: graph is implicit, rather than explicit.

– Each encoding symbol is the XOR of a random subset of neighbors, independent of other symbols.

– Each encoding symbols carries a small header, telling what message symbols it is the XOR of.

• No initial graph; graph derived from received symbols.• Properties:

– “Infinite” supply of packets possible.– Need k +o(k) symbols to decode.– Decoding time proportional to k ln k.– On average, ln k time to produce an encoding symbol.

Page 79: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

LT Codes

• Conclusion: making the graph implicit gives us an almost ideal digital fountain.

• One remaining issue: why does average degree need to be around ln k? – Standard coupon collector’s problem: for each

message symbol to be hit by some equation, need k ln k variables in the equations.

• Can remove this problem by pre-coding.

Page 80: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Rateless/Raptor Codes

• Pre-coding independently described by Shokrollahi, Maymoukov.

• Rough idea: – Expand original k message symbols to k (1+)

symbols using (for example) a Tornado code.– Now use an LT code on the expanded message.– Don’t need to recover all of the expanded

message symbols, just enough to recover original message.

Page 81: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Raptor/Rateless Codes

• Properties:– “Infinite” supply of packets possible.

– Need k(1+) symbols to decode, for some > 0.

– Decoding time proportional to k ln (1/).

– On average, ln (1/) (constant) time to produce an encoding symbol.

• Conclusion: these codes can be made very efficient and deliver on the promise of the digital fountain paradigm.

Page 82: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Applications

• Long-distance transmission (avoiding TCP)

• Reliable multicast

• Parallel downloads

• One-to-many TCP

• Content distribution on overlay networks

• Streaming video

Page 83: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Point-to-Point Data Transmission

• TCP has problems over long-distance connections.– Packets must be acknowledged to increase sending window

(packets in flight).– Long round-trip time leads to slow acks, bounding

transmission window.– Any loss increases the problem.

• Using digital fountain + TCP-friendly congestion control can greatly speed up connections.

• Separates the “what you send” from “how much” you send.– Do not need to buffer for retransmission.

Page 84: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Reliable Multicast

• Many potential problems when multicasting to large audience. – Feedback explosion of lost packets.– Start time heterogeneity.– Loss/bandwidth heterogeneity.

• A digital fountain solves these problems.– Each user gets what they can, and stops when

they have enough.

Page 85: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Downloading in Parallel

• Can collect data from multiple digital fountains for the same source seamlessly.– Since each fountain has an “infinite” collection

of packets, no duplicates.– Relative fountain speeds unimportant; just need

to get enough. – Combined multicast/multigather possible.

Page 86: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

One-to-Many TCP

• Setting: Web server with popular files, may have many open connections serving same file.– Problem: has to have a separate buffer, state for each

connection to handle retransmissions.– Limits number of connections per server.

• Instead, use a digital fountain to generate packets useful for all connections for that file.

• Separates the “what you send” from “how much” you send.– Do not need to buffer for retransmission.

• Keeps TCP semantics, congestion control.

Page 87: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Distribution on Overlay Networks

• Encoded data make sense for overlay networks.– Changing, heterogeneous network conditions.– Allows multicast.– Allows downloading from multiple sources, as well as

peers.

• Problem: peers may be getting same encoded packets as you, via the multicast.– Not standard digital fountain paradigm.

• Requires reconciliation techniques to find peers with useful packets.

Page 88: Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

Video Streaming

• For “near-real-time” video:– Latency issue.

• Solution: break into smaller blocks, and encode over these blocks.– Equal-size blocks.

– Blocks increases in size geometrically, for only logarithmically many blocks.

• Engineering to get right latency, ensure blocks arrive on time for display.