1 resilient and coherence preserving dissemination of dynamic data using cooperating peers shetal...
TRANSCRIPT
1
Resilient and Coherence Preserving Dissemination of Dynamic Data Using Cooperating Peers
Shetal Shah, IIT BombayKirthi Ramamritham, IIT
BombayPrashant Shenoy, UMass
2
Streaming (Dynamic) Data
Rapid and unpredictable changes Used in on-line decision making/monitoring
Data gathered by (wireless sensor) networks Sensors that monitor light, humidity, pressure, and
heat Network traffic passing via switches Sports scores, stock prices
Query bw_police:
SELECT source_domain, number_of_packets FROM network_state;WHEN bw_used_exceeds_allocation
3
Coherency Requirement (c )
Clients specify the bound on the tolerable imprecision of requested data. #packets sent, max incoherency = 100
ctStU )()(
Metric:
Fidelity: % of time when the coherency requirement is met
Goal: To achieve high fidelity and resiliency
SourceS(t)
RepositoryP(t)
ClientU(t)
4
Point to Point Data Dissemination
Pull:Client pulls data from the source.
Push:Source pushes data of interest to the
client.
5
Generic Architecture
Data sourcesEnd-hosts
servers
sensors
wired hosts
mobile hosts
Netw
ork
Netw
ork
6
Generic Architecture
Data sources
Proxies/caches
End-hosts
servers
sensors
wired host
mobile host
Netw
ork
Netw
ork
7
Where should the queries execute ?
At clients can’t optimize across clients, links
At source (where changes take place) Advantages
Minimum number of refresh messages, high fidelity Main challenge
Scalability Multiple sources hard to handle
At Data Aggregators -- DAs/proxies -- placed at edge of network Advantages
Allows scalability through consolidation, Multiple data sources Main challenge
Need mechanisms for maintaining data consistency at DAs
8
The Basic Problem…
Executing CQs at the edge of the network -- at Data Aggegators will improve scalability and reduce
overheads but poses challenges for consistency
maintenance
Computation / data dissemination moving query processing on streaming/changing data from sources to execution points (edge servers -- proxies/caches/repositories)
9
Focus of this talk
To create a resilient and efficient content distribution network
(CDN) for streaming/dynamic data.
Existing CDNs for static data and dynamic web pages: Akamai, Dynamai
Sources
CooperatingRepositories
Clients
10
Overview
Architecture of our CDN Techniques for data
dissemination Techniques to build the
overlay network Resilient data dissemination -- suitable replication of
disseminated data
Sources
CooperatingRepositories
Clients
11
Cooperative Repository Architecture
Source(s), repositories (and clients)
Each repository specifies its coherency requirement
Source pushes specific changes to selected repositories
Repositories cooperate with
each other the source
12
Dissemination Graph: Example
Data Set: p,q,r,s Max # push connections : 2
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3 q: 0.3
AB
D C
Clientq:0.3
13
Overview
Architecture of our CDN Techniques for data
dissemination Techniques to build the
overlay network Resilient data dissemination -- suitable replication of
disseminated data
Sources
CooperatingRepositories
Clients
14
Data Dissemination
Different users have different coherency req for the same data item.
Coherency requirement at a repository should be at least as stringent as that of the dependents.
Repositories disseminate only changes of interest.
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3
q: 0.4
q: 0.3
A B
D C
Client
15
QQP cxx
Data Dissemination
A repository P sends changes of interest to the dependent Q if
16
Data dissemination -- must be done with care
Source Repository P Repository Q
3.0Pc 5.0Qc
should prevent missed updates!
1.2
1.5
1
1.41.4
1.7
1.4
1 1 1
1
1.7 1.7
11
17
Dissemination Algorithms
Source Based (Centralized) Repository Based (Distributed)
18
Source Based Dissemination Algorithm
For each data item, source maintains unique coherency requirements of
repositories the last update sent for that coherency
For every change, source finds the maximum coherency for which it must be disseminated tags the change with that coherency disseminates (changed data, tag)
19
Source Based Dissemination Algorithm
Source Repository P Repository Q
3.0Pc 5.0Qc
1.2
1.5
1
1.41.4
1.7
1 1 1
1
1.5
1
1.51.5 1.5
20
QQP cxx
Repository Based Dissemination Algorithm
A repository P sends changes of interest to the dependent Q if
Pc
21
Repository BasedDissemination Algorithm
Source Repository P Repository Q
3.0Pc 5.0Qc
1.2
1.5
1
1.41.4
1.7
1.4
1 1 1
1
1.7 1.7
1.41.4
22
Dissemination Algorithms – number of checks by source.
Repository based algorithm requires fewer checks at source
23
Dissemination Algorithms – number of messages
Source based algorithm requires less messages
24
Overview
Architecture of our CDN Techniques for data
dissemination Techniques to build the
overlay network Resilient data dissemination -- suitable replication of
disseminated data
Sources
CooperatingRepositories
Clients
25
Logical Structure of the CDN
Repositories want many data items,
each with different coherency requirements.
Issues: Which repository
should serve what to whom?
Source
p:0.2, q:0.2 r:0.2,q:0.4
p:0.4, r: 0.3q: 0.3
A B
D C
26
Constructing the Layout Network
Check level by level starting from the source Each level has a load controller. The load controller tries to find data
providers for the new repository(Q).
Insert repositories one by one
27
Selecting Data Providers
Repositories with low preference factor are considered as potential data providers.
The most preferred repository with a needed data item is made the provider of that data item.
The most preferred repository is made to provide the remaining data items.
28
Preference Factor
Resource Availability factor:
Can repository (P) be the provider for one more dependent?
Data Availability Factor:
#data items that P can provide for the new repository Q. Computational delay factor: #dependents P provides for. Communication delay factor: network delay between the 2 repositories.
QservecanPitemsdata
QPdelay
#
),(
29
Experimental Methodology
Physical network: 4 servers, 600 routers, 100 repositories Communication delay: 20-30 ms Computation delay: 3-5 ms Real stock traces: 100-1000 Time duration of observations: 10,000 s Tight coherency range: 0.01 to 0.05 loose coherency range: 0.5 to 0.99
30
Loss of fidelity for different coherency requirements
The less stringent the coherency requirement,
the better the fidelity.
For little/no cooperation, loss in fidely is high.
Too much cooperation?
T% of the data items have stringent coherency requirements
Degree of cooperation
Los
s in
fid
elity
%
31
Controlled cooperation
dependentsdelaycompaverage
delaynetworkaverage
ncooperatioofActual
interested
degree
#
32
Controlled cooperation is essential
Without controlled cooperation
With controlled cooperation
Degree of cooperationDegree of cooperation Max degree of cooperation
33
But …
Loss in fidelity increases for large # data items.
Repositories with stringent coherence requirements should be closer to the source.
34
If parents are not chosen judiciously
It may result in Uneven distribution
of load on repositories.
Increase in the number of messages in the system.
Increase in loss in fidelity!
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3q: 0.3
AB
D C
35
Data item at a Time Algorithm
A dissemination tree for each data item.
Source serving the data item is the root.
Repositories with more stringent coherency requirements are placed closer to the root.
Source
Cq: 0.3
q: 0.2 A
Dynamic Data Dissemination Tree - for q
When multiple data items are considered, repositories form a peer to peer network
36
DiTA
Repository N needs data item x If the source has available push connections, or the source is the only node in the dissemination tree for x
N is made the child of the source Else
repository is inserted in most suitable subtree where
N’’s ancestors have more stringent coherency requirements
N is closest to the root
37
Most Suitable Subtree?
l: smallest level in the subtree with coherency requirement less stringent than N’’s.
d: communication delay from the root of the subtree to N.
smallest (l x d ): most suitable subtree.
Essentially, minimize communication delays!
38
Example
Source
Initially the network consists of the source.
q: 0.2 A
A and B request service of q with coherency requirement 0.2
q: 0.2B
C requests service of q with coherency requirement 0.1
q: 0.1 C
q: 0.2 A
39
Experimental Evaluation
40
Overview
Architecture of our CDN Techniques for data
dissemination Techniques to build the
overlay network Resilient data dissemination -- suitable replication of
disseminated data
Sources
CooperatingRepositories
Clients
41
Handling Failures in the Network
Need to detect permanent/transient failures in the network and to recover from them
Resiliency is obtained by adding redundancy
Without redundancy, failures loss in fidelity
Adding redundancy can increase cost possible loss of fidelity! Handle failures such that cost of adding resiliency is low!
42
Passive/Active Failure Handling
Passive failure detection: Parent sends I’m alive messages at the end
of every time interval. what should the time interval be?
Active failure handling: Always be prepared for failures. For example: 2 repositories can serve the
same data item at the same coherency to a child.
This means lots of work greater loss in fidelity.
Need to be clever
43
Middle Path
A backup parent B is found for each data item that the repository needs
Let repository R want data item x with coherency c.
P
R
B serves R with coherency k × c
k × cc
B
At what coherency should B serve R ?
44
If a parent fails
Detection: Child gets two consecutive updates from the backup parent with no updates from the parent
B
R
k x cc
P
c Recovery: Backup
parent is asked to serve at coherency c till we get an update from the parent
45
Adding Resiliency to DiTA
A sibling of P is chosen as the backup parent of R.
If P fails, A serves B with coherency c change is local. If P has no siblings, a sibling
of nearest ancestor is chosen.
Else the source is made the backup parent.
B
R
k x cc
P
A
46
Markov Analysis for k
Assumptions Data changes as a random walk along the line The probability of an increase is the same as
that of a decreaseNo assumptions made about the unit of change
or time taken for a change
Expected # misses for any k <= 2 k2 – 2
for k = 2, expected # misses <= 6
47
Failure and Recovery Modelling
Failures and recovery modeled based on trends observed in practice
Analysis of link failures in an IP backbone by G. Iannaccone et al
Internet Measurement Workshop 2002
Recovery:10% > 20 min 40% > 1 min & < 20 min, 50% < 1 min
Trend for time between failure:
48
In the Presence of Failures, Varying Recovery Times
Addition of resiliency does improve fidelity.
49
In the Presence of Failures, Varying Data Items
Increasing Load
Fidelity improves with addition of resiliency even for large number of data items.
50
In the Absence of Failures
Increasing Load
Often, fidelity improves with addition of resiliency, even in the absence of failures!
51
Ongoing Work
Mapping Clients to Repositories. Locality, coherency requirements,
load, mobility Reorganization of the CDN.
Snapshot based Dynamic Reorganization