the power of prediction: cloud bandwidth and cost...
TRANSCRIPT
The Power of Prediction:
Cloud Bandwidth and Cost
Reduction
Eyal Zohar Israel Cidon
Technion
Osnat (Ossi) Mokryn
Tel-Aviv College
Traffic Redundancy Elimination (TRE)
Traffic redundancy stems from downloading
same or similar information items.
We found around 70% redundancy in
end-clients traffic, compared with past traffic
and local files.
2SIGCOMM 2011
TRE Importance
Moving to the cloud => higher e2e traffic.
Cloud users pay for traffic used in practice =>
incentive to use TRE.
Cloud ProviderCloud Provider
Cloud User
Pay for Use
End-user
Application
TRE
Cloud Traffic
3SIGCOMM 2011
How TRE Works
Chunk 1 Chunk 2 Chunk 3
Byte stream
Anchor 1 Anchor 2 Anchor 3 Anchor 4
Sign. 1 Sign. 2 Sign 3
Rolling hash
SHA-1 signature
Server parses the outgoing stream to content-
based chunks and signs with SHA-1
Chunk 1 Chunk 2’ Chunk 3Insertion example
New bytes
4SIGCOMM 2011
Problems in Existing Solutions
In the cloud environment:
1. High processing costs in the cloud.
2. Scalability – remember each client.
3. Elasticity - unaware of data from other sources.
4. Do not handle long-term repeats (days/weeks).
ReceiverServer 1
Server 2
5SIGCOMM 2011
Our Solution: PACK (Predictive ACK)
Redundancy detection by the client.
Repeats appear in chains.
Tries to match incoming chunks with a
previously received chain or local file.
Sends to the server predictions of the future
data.
6SIGCOMM 2011
PACK: The Client Prediction
Each prediction:
1.TCP seq. – no server parsing
2.Hint – spare unnecessary SHA-1
3.SHA-1 signature
Chunk
SHA-1
7
Last-byte
hint
TCP seq.
SIGCOMM 2011
Chunk 1 Chunk 2 Chunk 3
Sign. 1 Sign. 2 Sign 3
SHA-1 signature
Stream chunks
Chain of chunks
Received Prediction
3
PACK: Server Operation
The server compares the hint with the last-byte to sign.
Upon a hint match it performs the expensive SHA-1.
PACK saves cloud’s computational effort in the absence
of redundancy.
First receiver-based TRE: the server does not parse. It
signs with >99% confidence.
ClientServer
Local
storage
1 21 2 3
2,3?
Chain
2,3V
8SIGCOMM 2011
PACK Benefits
Minimizes processing costs induced by TRE.
– Signs with SHA-1 in the presence of redundancy.
Receiver-based end-to-end TRE => suitable for cloud
server elasticity and client mobility.
– Does not require the server to continuously maintain
clients’ status.
9SIGCOMM 2011
Server Effort Experiment
Several data-sets in 3 modes: baseline no-TRE,
PACK and a sender-based TRE.
0%
20%
40%
60%
80%
100%
120%
140%
0% 10% 20% 30% 40% 50%
Sin
gle
Serv
er
Clo
ud O
pera
tional C
ost
(100%
=w
ithout
TR
E s
yste
m)
Redundancy Elimination Ratio
EndRE-like
PACK
25%-30% redundancy:
common to many
data-sets
10SIGCOMM 2011
Sender-based
0%
5%
10%
15%
20%
25%
30%
35%
0.0
0.5
1.0
1.5
2.0
2.5
3.0
PA
CK
TR
E (
Rem
oved R
edundancy)
All
YouT
ube T
raff
ic (
Gbps)
Time (24 hours)
YouTube Traffic
PACK TRE
YouTube Redundancy
Traces of 40k clients, captured at an ISP.
Found 30% end-to-end (personal) redundancy.
11SIGCOMM 2011
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
Avera
ge R
edundancy o
f D
aily
Tra
ffic
Days Since Start
Unlimited
1 Hour
24 Hours
Social network: eliminated 30% with one hour cache
and 75% with a long-term cache.
12
Long-Term TRE
SIGCOMM 2011
Gmail account with 1,000 Inbox messages.
Found 32% static redundancy (higher when
messages are read multiple times).
Cloud Email Redundancy
0
50
100
150
200
250
300
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Tra
ffic
Vo
lum
e P
er
Mo
nth
(M
B)
Month
Redundant
Non-redundant
13SIGCOMM 2011
Implementation
Linux with Netfilter Queue, 25k lines of C and
Java, available for download.
Receiver-sender protocol is embedded in the
TCP Options field.
Transparent use at both sides.
14SIGCOMM 2011
Processing Effort in the Client
SIGCOMM 2011 15
Laptop experiment: PACK-related CPU consumption is
~4% when playing HD video (9 Mbps with 30%
redundancy).
Smartphone experiment: PACK consumes ~3% of the
battery power when processing 1 GB video (avg.
monthly data plan).
Virtual traffic saves
the client the need
to chunk or sign.
New Chunking Algorithm
SIGCOMM 2011 16
Most existing solutions use Rabin fingerprint.
New Chunking Algorithm64 bits
nn-1
n-2n-3
n-4
n-5n-6
n-7n-8
n-40n-41
n-42n-43
n-44n-45
n-46n-47
Mask=00 00 8A 31 10 58 30 80
SIGCOMM 2011 17
Summary
Current TRE solutions may not reduce cloud cost.
PACK is the first receiver-based TRE – leverages the
power of prediction.
Minimizes processing costs induced by TRE.
Suitable for cloud server migration and client mobility.
Implementation is available for download.
18SIGCOMM 2011