efficient gathering of correlated data in sensor networks himanshu gupta, vishnu navda, samir r....
Post on 28-Dec-2015
215 Views
Preview:
TRANSCRIPT
Efficient Gathering of Correlated Data in Sensor Networks
Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary
Department of CS, State University of New York Stony Brook
MobiHoc 2005
1
Outline
Introduction Problem Formulation Energy-Efficient Distributed Algorithm Centralized Approximation Algorithm Performance Results Conclusion
2
Introduction (1)
Data gathering in sensor networks Collect periodic snapshots of distributed sensor data at a sink
node. Environment application: Temperature, humidity, pressure data
Sensor networks are usually redundant
They exhibit high degree of spatial correlation in the data collected (colored sub-regions in the figure)
3
Introduction (2)
Data Gathering Approach Naïve Method
Collect data from all the nodes by forming an gathering tree with sink node at the root
Energy Efficient Method Given a sensor network, select a subset of sensors “M”, called
Connected Correlation-Dominating Set, such that
(a) Each sensor not in M is correlated to a subset of sensors in selected set M
(b) The selected set M forms a connected communication graph
4
Example
For a given region, any two sensor data are sufficient to infer the data of all other sensors in the region.
Selected node
Deleted node
5
Formal Problem Definition (1) Definition 1. (Communication Graph) Given a sensor network consisting of a
set of sensors I, the communication graph for the sensor network is the undirected graph CG with I as the set of vertices and an edge between any two sensors if they can communicate directly with each other.
t
u
w
v
x
y
z
(a) Communication Graph
6
Formal Problem Definition (2) Definition 2. (Correlation Graph; Correlation Neighbors) Given a sensor
network consisting of a set of sensors I, the correlation graph over the sensor nodes is a directed hypergraph with I as the set of vertices, and a subset of (P(I) × I) as the set of directed hyperedges, where P(I) is the power set of I. In other words, the correlation graph is a hypergraph G(V = I,E (P(I) × I)).⊆
u v w
x
(a) Correlation Edge ((u,v,w), x)
t
u
w
v
x
y
z
(b) Correlation Graph
7
Formal Problem Definition (3) Definition 3. (Connected Correlation-Dominating Set) Consider a sensor
network consisting of n sensors. Let C be the correlation graph over the sensor nodes in the network. A set of sensors M is called a connected correlation-dominating set if :
1. The communication subgraph induced by M is connected.
2. For each sensor node s M, there is a set of sensors S M ⊆ such that (S, s) is a correlation edge in C.
t
u
w
v
x
y
z
(a) Correlation Graph “C”
t
u
w
v
x
y
z
(b) Connected Correlation-Dominating Set “M”
M = {t, u, v, w}
8
Formal Problem Definition (4) Connected Correlation-Dominating Set Problem:
Given a sensor network and a correlation graph over the sensors, the connected correlation-dominating set problem is to find the smallest connected correlation-dominating set.
The connected correlation-dominating set problem is NP-hard as the less general minimum dominating set problem is well known to be NP-hard.
9
Formal Problem Definition (5) Computing Correlation Hyperedge Parameters
A hyperedge (S, s) exists if data values of s can be inferred from values of S within certain error bound.
Linear Prediction Model
Least Square Approach
L
ll ksks
11 ][]['
K
k
ksksE1
2])[]['()(
][' ks
][ks
: Predicted value of node s at kth time
: Actual value of node s at kth time
},...,,{ 21 LsssS : Source nodes
][ksl : Actual value of source node l at kth time
10
Formal Problem Definition (6)
2)()( sSE
022])[()( 2 sSSSsSd
d
d
dE TT
sSSS TTTL
121 )(],...,,[
][][][
]2[]2[]2[
]1[]1[]1[
21
21
21
ksksks
sss
sss
S
L
L
L
L
2
1
][
]2[
]1[
ks
s
s
s
11
Energy-Efficient Distributed Algorithm (1) Basic Distributed Algorithm
1. Initially, each node assigns itself a priority. Data-gathering nodes mark itself selected.
2. Next, each node collects d-hop neighborhood information.
3. Remaining, nodes are marked deleted and instruct the related correlation neighbors as selected while the following conditions are satisfied during periodically testing.
(i) It can be inferred (using a correlation edge) from a set of non-deleted nodes.
(ii) Its deletion preserves the connectivity of the communication subgraph induced over the non-deleted nodes.
The non-deleted nodes forms a Connected Correlation-Dominating Set
12
Energy-Efficient Distributed Algorithm (2) Conditions for Marking Deleted
C1
not selected
for node s
s
13
Energy-Efficient Distributed Algorithm (3) 2-Round Distributed Algorithm
Based on basic distributed algorithm Replace C3 and C4 with C33 and C44 in the initial round.
C33: There is a correlation edge (S, s) in the correlation graph, such that no node in the set S is marked deleted. In addition, each node in S is either marked selected or doesn’t satisfy the C2 condition or has a priority less than p(s).
C44: If there is a correlation edge (R, r) where s R∈ , then either r is marked deleted or marked selected or doesn’t satisfy the C2 condition or has a priority less than p(s).
14
Energy-Efficient Distributed Algorithm (4) Handshake Algorithm
Based on basic distributed algorithm Using C33 and C44 in all testing rounds Additional C2-satisfied messages
Whenever a node’s C2 condition is satisfied, it transmits a C2-satisfied message to its correlation neighbors.
Before node s marks itself deleted, it makes a “handshakes” with the used source nodes.
15
Centralized Approximation Algorithm (1) Definition 4. (Intersection Graph of Source Sets) Let I be the set of nodes in
the network, and I = { {s} | s ∈ I }. Let S be the set of source sets in the correlation graph of the network. The intersection graph of source sets is the simple graph G( V =S∪I, E = { (v1, v2) | (v1 ∩ v2) = φ}).
S1 S2 S3 S4
16
Centralized Approximation Algorithm (2) Definition 5. (Connected Subgraph of Sources; Connected Source Set) A connected
subgraph in the intersection graph of source sets is called a connected subgraph of sources. A connected source set is a set of nodes corresponding to some connected subgraph of sources, i.e., the union of the sets corresponding to the vertices of a connected subgraph of sources.
S1 S2 S3 S4
S1 S2 S3
(a) Source Sets (b) Intersection graph of Source Sets
(c) Connected Subgraph of Sources
(d) Connected Source Set
S1 S2 S3
{ b1, b2, b3, b4}
17
Centralized Approximation Algorithm (3) Definition 6. (Inferred Nodes) Given a set of nodes S, the set of inferred
nodes for S is denoted by I(S) and is defined as
I(S) = S { x | ∪ (Y, x) is a correlation edge and Y S }.⊆
Definition 7. (Benefit of a Set of Nodes) Benefit of a set S with respect to a set M of nodes M is denoted by B(S,M) and is defined as B(S,M) = |(I(S) − I(M)| / |S − M|, where I(S) and I(M) are the set of inferred nodes for S and M respectively.
I(M) I(S)
B(S,M) = 3 / 1 = 3
M S
18
Centralized Approximation Algorithm (4) Centralized Approximation Algorithm
1st Phase: Constructing a near-optimal Correlation-Dominating Set Initially, set M contains the data-gathering node. The algorithm iteratively adds to M the connected source set that has
the maximum benefit with respect to M. The Phase terminates while the set M becomes a correlation-
dominating set.
2nd Phase: Connecting the Correlation-Dominating Set The algorithm iteratively connects the closest pair of connected
components. The time complexity of the algorithm is exponential in n (nodes
num), since the number of connected source sets in the first phase can be exponential.
19
Centralized Approximation Algorithm (5) Polynomial-time Heuristics (l-hop Heuristic)
Based on the above algorithm At each stage, the algorithm constructs the connected source set fl(S) for each
source set S, and pick the fl(S) having the max benefit and add it to the selected set M.
The fl(S) is constructed in a greedy manner by merging S with the best source set that is at most l away from S in the intersection graph.
Example:
M
1-hop heuristic at 1st stage
S1 S2 S32111 )( SSSf
32121 )( SSSSf Max Benefit!
˙˙
20
Performance Results (1)
Random Sensor Networks with Synthetic Correlation 1000 nodes Area: 40 x 40 units Transmission Radius : 3 units For each node s and a set of nodes S (1 to 3 nodes within at
most d = 2 hop), the hyperedge (S, s) is added with a probability P/100.
Simulation Environments Correlation computation: K=3, L=5 Small size network: 100 nodes, 7 x 7 area Large size network: 1000 nodes, 40 x 40 area
21
Performance Results (2)
Centralized Algorithm
100 nodes, 7 x 7 area with synthetically generated correlation
22
Performance Results (3)
Distributed Algorithm
1000 nodes, 40 x 40 area with synthetically generated correlation
Mn
Dq
23
Performance Results (5)
Simulation on Real Temperature Data Average temperature of over 600 US cities Source set S consists of 1 to 3 nodes within most distance d = 2 Error threshold: 5%
24
Conclusion
The paper considered the connected correlation-dominating set that helps in minimizing energy costs in data-gathering sensor network.
The correlation structure (hypergraph) can capture general data correlation.
25
top related