efficient gathering of correlated data in sensor networks himanshu gupta, vishnu navda, samir r....

25
Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University of New York Stony Brook MobiHoc 2005 1

Upload: vincent-harmon

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Efficient Gathering of Correlated Data in Sensor Networks

Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary

Department of CS, State University of New York Stony Brook

MobiHoc 2005

1

Page 2: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Outline

Introduction Problem Formulation Energy-Efficient Distributed Algorithm Centralized Approximation Algorithm Performance Results Conclusion

2

Page 3: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Introduction (1)

Data gathering in sensor networks Collect periodic snapshots of distributed sensor data at a sink

node. Environment application: Temperature, humidity, pressure data

Sensor networks are usually redundant

They exhibit high degree of spatial correlation in the data collected (colored sub-regions in the figure)

3

Page 4: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Introduction (2)

Data Gathering Approach Naïve Method

Collect data from all the nodes by forming an gathering tree with sink node at the root

Energy Efficient Method Given a sensor network, select a subset of sensors “M”, called

Connected Correlation-Dominating Set, such that

(a) Each sensor not in M is correlated to a subset of sensors in selected set M

(b) The selected set M forms a connected communication graph

4

Page 5: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Example

For a given region, any two sensor data are sufficient to infer the data of all other sensors in the region.

Selected node

Deleted node

5

Page 6: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Formal Problem Definition (1) Definition 1. (Communication Graph) Given a sensor network consisting of a

set of sensors I, the communication graph for the sensor network is the undirected graph CG with I as the set of vertices and an edge between any two sensors if they can communicate directly with each other.

t

u

w

v

x

y

z

(a) Communication Graph

6

Page 7: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Formal Problem Definition (2) Definition 2. (Correlation Graph; Correlation Neighbors) Given a sensor

network consisting of a set of sensors I, the correlation graph over the sensor nodes is a directed hypergraph with I as the set of vertices, and a subset of (P(I) × I) as the set of directed hyperedges, where P(I) is the power set of I. In other words, the correlation graph is a hypergraph G(V = I,E (P(I) × I)).⊆

u v w

x

(a) Correlation Edge ((u,v,w), x)

t

u

w

v

x

y

z

(b) Correlation Graph

7

Page 8: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Formal Problem Definition (3) Definition 3. (Connected Correlation-Dominating Set) Consider a sensor

network consisting of n sensors. Let C be the correlation graph over the sensor nodes in the network. A set of sensors M is called a connected correlation-dominating set if :

1. The communication subgraph induced by M is connected.

2. For each sensor node s M, there is a set of sensors S M ⊆ such that (S, s) is a correlation edge in C.

t

u

w

v

x

y

z

(a) Correlation Graph “C”

t

u

w

v

x

y

z

(b) Connected Correlation-Dominating Set “M”

M = {t, u, v, w}

8

Page 9: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Formal Problem Definition (4) Connected Correlation-Dominating Set Problem:

Given a sensor network and a correlation graph over the sensors, the connected correlation-dominating set problem is to find the smallest connected correlation-dominating set.

The connected correlation-dominating set problem is NP-hard as the less general minimum dominating set problem is well known to be NP-hard.

9

Page 10: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Formal Problem Definition (5) Computing Correlation Hyperedge Parameters

A hyperedge (S, s) exists if data values of s can be inferred from values of S within certain error bound.

Linear Prediction Model

Least Square Approach

L

ll ksks

11 ][]['

K

k

ksksE1

2])[]['()(

][' ks

][ks

: Predicted value of node s at kth time

: Actual value of node s at kth time

},...,,{ 21 LsssS : Source nodes

][ksl : Actual value of source node l at kth time

10

Page 11: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Formal Problem Definition (6)

2)()( sSE

022])[()( 2 sSSSsSd

d

d

dE TT

sSSS TTTL

121 )(],...,,[

][][][

]2[]2[]2[

]1[]1[]1[

21

21

21

ksksks

sss

sss

S

L

L

L

L

2

1

][

]2[

]1[

ks

s

s

s

11

Page 12: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Energy-Efficient Distributed Algorithm (1) Basic Distributed Algorithm

1. Initially, each node assigns itself a priority. Data-gathering nodes mark itself selected.

2. Next, each node collects d-hop neighborhood information.

3. Remaining, nodes are marked deleted and instruct the related correlation neighbors as selected while the following conditions are satisfied during periodically testing.

(i) It can be inferred (using a correlation edge) from a set of non-deleted nodes.

(ii) Its deletion preserves the connectivity of the communication subgraph induced over the non-deleted nodes.

The non-deleted nodes forms a Connected Correlation-Dominating Set

12

Page 13: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Energy-Efficient Distributed Algorithm (2) Conditions for Marking Deleted

C1

not selected

for node s

s

13

Page 14: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Energy-Efficient Distributed Algorithm (3) 2-Round Distributed Algorithm

Based on basic distributed algorithm Replace C3 and C4 with C33 and C44 in the initial round.

C33: There is a correlation edge (S, s) in the correlation graph, such that no node in the set S is marked deleted. In addition, each node in S is either marked selected or doesn’t satisfy the C2 condition or has a priority less than p(s).

C44: If there is a correlation edge (R, r) where s R∈ , then either r is marked deleted or marked selected or doesn’t satisfy the C2 condition or has a priority less than p(s).

14

Page 15: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Energy-Efficient Distributed Algorithm (4) Handshake Algorithm

Based on basic distributed algorithm Using C33 and C44 in all testing rounds Additional C2-satisfied messages

Whenever a node’s C2 condition is satisfied, it transmits a C2-satisfied message to its correlation neighbors.

Before node s marks itself deleted, it makes a “handshakes” with the used source nodes.

15

Page 16: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Centralized Approximation Algorithm (1) Definition 4. (Intersection Graph of Source Sets) Let I be the set of nodes in

the network, and I = { {s} | s ∈ I }. Let S be the set of source sets in the correlation graph of the network. The intersection graph of source sets is the simple graph G( V =S∪I, E = { (v1, v2) | (v1 ∩ v2) = φ}).

S1 S2 S3 S4

16

Page 17: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Centralized Approximation Algorithm (2) Definition 5. (Connected Subgraph of Sources; Connected Source Set) A connected

subgraph in the intersection graph of source sets is called a connected subgraph of sources. A connected source set is a set of nodes corresponding to some connected subgraph of sources, i.e., the union of the sets corresponding to the vertices of a connected subgraph of sources.

S1 S2 S3 S4

S1 S2 S3

(a) Source Sets (b) Intersection graph of Source Sets

(c) Connected Subgraph of Sources

(d) Connected Source Set

S1 S2 S3

{ b1, b2, b3, b4}

17

Page 18: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Centralized Approximation Algorithm (3) Definition 6. (Inferred Nodes) Given a set of nodes S, the set of inferred

nodes for S is denoted by I(S) and is defined as

I(S) = S { x | ∪ (Y, x) is a correlation edge and Y S }.⊆

Definition 7. (Benefit of a Set of Nodes) Benefit of a set S with respect to a set M of nodes M is denoted by B(S,M) and is defined as B(S,M) = |(I(S) − I(M)| / |S − M|, where I(S) and I(M) are the set of inferred nodes for S and M respectively.

I(M) I(S)

B(S,M) = 3 / 1 = 3

M S

18

Page 19: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Centralized Approximation Algorithm (4) Centralized Approximation Algorithm

1st Phase: Constructing a near-optimal Correlation-Dominating Set Initially, set M contains the data-gathering node. The algorithm iteratively adds to M the connected source set that has

the maximum benefit with respect to M. The Phase terminates while the set M becomes a correlation-

dominating set.

2nd Phase: Connecting the Correlation-Dominating Set The algorithm iteratively connects the closest pair of connected

components. The time complexity of the algorithm is exponential in n (nodes

num), since the number of connected source sets in the first phase can be exponential.

19

Page 20: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Centralized Approximation Algorithm (5) Polynomial-time Heuristics (l-hop Heuristic)

Based on the above algorithm At each stage, the algorithm constructs the connected source set fl(S) for each

source set S, and pick the fl(S) having the max benefit and add it to the selected set M.

The fl(S) is constructed in a greedy manner by merging S with the best source set that is at most l away from S in the intersection graph.

Example:

M

1-hop heuristic at 1st stage

S1 S2 S32111 )( SSSf

32121 )( SSSSf Max Benefit!

˙˙

20

Page 21: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Performance Results (1)

Random Sensor Networks with Synthetic Correlation 1000 nodes Area: 40 x 40 units Transmission Radius : 3 units For each node s and a set of nodes S (1 to 3 nodes within at

most d = 2 hop), the hyperedge (S, s) is added with a probability P/100.

Simulation Environments Correlation computation: K=3, L=5 Small size network: 100 nodes, 7 x 7 area Large size network: 1000 nodes, 40 x 40 area

21

Page 22: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Performance Results (2)

Centralized Algorithm

100 nodes, 7 x 7 area with synthetically generated correlation

22

Page 23: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Performance Results (3)

Distributed Algorithm

1000 nodes, 40 x 40 area with synthetically generated correlation

Mn

Dq

23

Page 24: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Performance Results (5)

Simulation on Real Temperature Data Average temperature of over 600 US cities Source set S consists of 1 to 3 nodes within most distance d = 2 Error threshold: 5%

24

Page 25: Efficient Gathering of Correlated Data in Sensor Networks Himanshu Gupta, Vishnu Navda, Samir R. Das, Vishal Chowdhary Department of CS, State University

Conclusion

The paper considered the connected correlation-dominating set that helps in minimizing energy costs in data-gathering sensor network.

The correlation structure (hypergraph) can capture general data correlation.

25