efficient gathering of correlated data in sensor networks himanshu gupta, vishnu navda, samir r....

Efficient Gathering of Correlated Data in Sensor Networks

Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary

Department of CS, State University of New York Stony Brook

MobiHoc 2005

Outline

Introduction Problem Formulation Energy-Efficient Distributed Algorithm Centralized Approximation Algorithm Performance Results Conclusion

Introduction (1)

Data gathering in sensor networks Collect periodic snapshots of distributed sensor data at a sink

node. Environment application: Temperature, humidity, pressure data

Sensor networks are usually redundant

They exhibit high degree of spatial correlation in the data collected (colored sub-regions in the figure)

Introduction (2)

Data Gathering Approach Naïve Method

Collect data from all the nodes by forming an gathering tree with sink node at the root

Energy Efficient Method Given a sensor network, select a subset of sensors “M”, called

Connected Correlation-Dominating Set, such that

(a) Each sensor not in M is correlated to a subset of sensors in selected set M

(b) The selected set M forms a connected communication graph

Example

For a given region, any two sensor data are sufficient to infer the data of all other sensors in the region.

Selected node

Deleted node

Formal Problem Definition (1) Definition 1. (Communication Graph) Given a sensor network consisting of a

set of sensors I, the communication graph for the sensor network is the undirected graph CG with I as the set of vertices and an edge between any two sensors if they can communicate directly with each other.

(a) Communication Graph

Formal Problem Definition (2) Definition 2. (Correlation Graph; Correlation Neighbors) Given a sensor

network consisting of a set of sensors I, the correlation graph over the sensor nodes is a directed hypergraph with I as the set of vertices, and a subset of (P(I) × I) as the set of directed hyperedges, where P(I) is the power set of I. In other words, the correlation graph is a hypergraph G(V = I,E (P(I) × I)).⊆

(a) Correlation Edge ((u,v,w), x)

(b) Correlation Graph

Formal Problem Definition (3) Definition 3. (Connected Correlation-Dominating Set) Consider a sensor

network consisting of n sensors. Let C be the correlation graph over the sensor nodes in the network. A set of sensors M is called a connected correlation-dominating set if :

1. The communication subgraph induced by M is connected.

2. For each sensor node s M, there is a set of sensors S M ⊆ such that (S, s) is a correlation edge in C.

(a) Correlation Graph “C”

(b) Connected Correlation-Dominating Set “M”

M = {t, u, v, w}

Formal Problem Definition (4) Connected Correlation-Dominating Set Problem:

Given a sensor network and a correlation graph over the sensors, the connected correlation-dominating set problem is to find the smallest connected correlation-dominating set.

The connected correlation-dominating set problem is NP-hard as the less general minimum dominating set problem is well known to be NP-hard.

Formal Problem Definition (5) Computing Correlation Hyperedge Parameters

A hyperedge (S, s) exists if data values of s can be inferred from values of S within certain error bound.

Linear Prediction Model

Least Square Approach

ll ksks

11 ][]['

ksksE1

2])[]['()(

][' ks

: Predicted value of node s at kth time

: Actual value of node s at kth time

},...,,{ 21 LsssS : Source nodes

][ksl : Actual value of source node l at kth time

Formal Problem Definition (6)

2)()( sSE

022])[()( 2 sSSSsSd

sSSS TTTL

121 )(],...,,[

][][][

]2[]2[]2[

]1[]1[]1[

ksksks

Energy-Efficient Distributed Algorithm (1) Basic Distributed Algorithm

1. Initially, each node assigns itself a priority. Data-gathering nodes mark itself selected.

2. Next, each node collects d-hop neighborhood information.

3. Remaining, nodes are marked deleted and instruct the related correlation neighbors as selected while the following conditions are satisfied during periodically testing.

(i) It can be inferred (using a correlation edge) from a set of non-deleted nodes.

(ii) Its deletion preserves the connectivity of the communication subgraph induced over the non-deleted nodes.

The non-deleted nodes forms a Connected Correlation-Dominating Set

Energy-Efficient Distributed Algorithm (2) Conditions for Marking Deleted

not selected

for node s

Energy-Efficient Distributed Algorithm (3) 2-Round Distributed Algorithm

Based on basic distributed algorithm Replace C3 and C4 with C33 and C44 in the initial round.

C33: There is a correlation edge (S, s) in the correlation graph, such that no node in the set S is marked deleted. In addition, each node in S is either marked selected or doesn’t satisfy the C2 condition or has a priority less than p(s).

C44: If there is a correlation edge (R, r) where s R∈ , then either r is marked deleted or marked selected or doesn’t satisfy the C2 condition or has a priority less than p(s).

Energy-Efficient Distributed Algorithm (4) Handshake Algorithm

Based on basic distributed algorithm Using C33 and C44 in all testing rounds Additional C2-satisfied messages

Whenever a node’s C2 condition is satisfied, it transmits a C2-satisfied message to its correlation neighbors.

Before node s marks itself deleted, it makes a “handshakes” with the used source nodes.

Centralized Approximation Algorithm (1) Definition 4. (Intersection Graph of Source Sets) Let I be the set of nodes in

the network, and I = { {s} | s ∈ I }. Let S be the set of source sets in the correlation graph of the network. The intersection graph of source sets is the simple graph G( V =S∪I, E = { (v1, v2) | (v1 ∩ v2) = φ}).

S1 S2 S3 S4

Centralized Approximation Algorithm (2) Definition 5. (Connected Subgraph of Sources; Connected Source Set) A connected

subgraph in the intersection graph of source sets is called a connected subgraph of sources. A connected source set is a set of nodes corresponding to some connected subgraph of sources, i.e., the union of the sets corresponding to the vertices of a connected subgraph of sources.

S1 S2 S3 S4

S1 S2 S3

(a) Source Sets (b) Intersection graph of Source Sets

(c) Connected Subgraph of Sources

(d) Connected Source Set

S1 S2 S3

{ b1, b2, b3, b4}

Centralized Approximation Algorithm (3) Definition 6. (Inferred Nodes) Given a set of nodes S, the set of inferred

nodes for S is denoted by I(S) and is defined as

I(S) = S { x | ∪ (Y, x) is a correlation edge and Y S }.⊆

Definition 7. (Benefit of a Set of Nodes) Benefit of a set S with respect to a set M of nodes M is denoted by B(S,M) and is defined as B(S,M) = |(I(S) − I(M)| / |S − M|, where I(S) and I(M) are the set of inferred nodes for S and M respectively.

I(M) I(S)

B(S,M) = 3 / 1 = 3

Centralized Approximation Algorithm (4) Centralized Approximation Algorithm

1st Phase: Constructing a near-optimal Correlation-Dominating Set Initially, set M contains the data-gathering node. The algorithm iteratively adds to M the connected source set that has

the maximum benefit with respect to M. The Phase terminates while the set M becomes a correlation-

dominating set.

2nd Phase: Connecting the Correlation-Dominating Set The algorithm iteratively connects the closest pair of connected

components. The time complexity of the algorithm is exponential in n (nodes

num), since the number of connected source sets in the first phase can be exponential.

Centralized Approximation Algorithm (5) Polynomial-time Heuristics (l-hop Heuristic)

Based on the above algorithm At each stage, the algorithm constructs the connected source set fl(S) for each

source set S, and pick the fl(S) having the max benefit and add it to the selected set M.

The fl(S) is constructed in a greedy manner by merging S with the best source set that is at most l away from S in the intersection graph.

Example:

1-hop heuristic at 1st stage

S1 S2 S32111 )( SSSf

32121 )( SSSSf Max Benefit!

Performance Results (1)

Random Sensor Networks with Synthetic Correlation 1000 nodes Area: 40 x 40 units Transmission Radius : 3 units For each node s and a set of nodes S (1 to 3 nodes within at

most d = 2 hop), the hyperedge (S, s) is added with a probability P/100.

Simulation Environments Correlation computation: K=3, L=5 Small size network: 100 nodes, 7 x 7 area Large size network: 1000 nodes, 40 x 40 area

Centralized Algorithm

100 nodes, 7 x 7 area with synthetically generated correlation

Distributed Algorithm

1000 nodes, 40 x 40 area with synthetically generated correlation

Simulation on Real Temperature Data Average temperature of over 600 US cities Source set S consists of 1 to 3 nodes within most distance d = 2 Error threshold: 5%

Conclusion

The paper considered the connected correlation-dominating set that helps in minimizing energy costs in data-gathering sensor network.

The correlation structure (hypergraph) can capture general data correlation.

efficient gathering of correlated data in sensor networks himanshu gupta, vishnu navda, samir r....

Documents

wifi himanshu

19353603 luxury market kushal chowdhary

himanshu project

himanshu report

ip security - prof. kr chowdhary, ph.d

himanshu ntpc

himanshu kumar

list odd sem...subject name mathematics-i ... himanshu singh...

arcwp23 himanshu

himanshu agrawal

allelopathy by himanshu

rashmi chowdhary -...

day-3, mr. d m chowdhary

himanshu wimax

himanshu college

abhijnan chakraborty, vishnu navda, venkataa n. padmanabhan,...

research himanshu

luxury market - kushal chowdhary

himanshu thesis

shri ajaykumar s chowdhary -...