anomaly detection in wireless sensor networks- a survey

8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey

1/25

This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research

and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or

licensing copies, or posting to personal, institutional or third partywebsites are prohibited.

In most cases authors are permitted to post their version of the

article (e.g. in Word or Tex form) to their personal website or

institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are

encouraged to visit:

http://www.elsevier.com/copyright

http://www.elsevier.com/copyrighthttp://www.elsevier.com/copyright


2/25

Author's personal copy

Anomaly detection in wireless sensor networks: A survey

Miao Xie ,1, Song Han , Biming Tian, Sazia Parvin

Digital Ecosystems and Business Intelligence Institute, Curtin University, DEBII, GPO Box U1987, Perth, WA 6845, Australia

a r t i c l e i n f o

Article history:

Received 19 August 2010Received in revised form

10 February 2011

Accepted 7 March 2011Available online 21 March 2011

Keywords:

Wireless sensor networks

Information security

Anomaly detection

a b s t r a c t

Since security threats to WSNs are increasingly being diversified and deliberate, prevention-based

techniques alone can no longer provide WSNs with adequate security. However, detection-basedtechniques might be effective in collaboration with prevention-based techniques for securing WSNs. As

a significant branch of detection-based techniques, the research of anomaly detection in wired

networks and wireless ad hoc networks is already quite mature, but such solutions can be rarely

applied to WSNs without any change, because WSNs are characterized by constrained resources, such

as limited energy, weak computation capability, poor memory, short communication range, etc. The

development of anomaly detection techniques suitable for WSNs is therefore regarded as an essential

research area, which will enable WSNs to be much more secure and reliable. In this survey paper, a few

of the key design principles relating to the development of anomaly detection techniques in WSNs are

discussed in particular. Then, the state-of-the-art techniques of anomaly detection in WSNs are

systematically introduced, according to WSNs’ architectures (Hierarchical/Flat) and detection technique

categories (statistical techniques, rule based, data mining, computational intelligence, game theory,

graph based, and hybrid, etc.). The analyses and comparisons of the approaches that belong to a similar

technique category are represented technically, followed by a brief discussion towards the potential

research areas in the near future and conclusion.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

A wireless sensor network (WSN) is made up of a mass of

spatially distributed autonomous sensors, to jointly monitor

physical or environmental conditions, such as temperature,

sound, vibration, pressure, motion and pollutants (Yick et al.,

2008). To date, WSNs have been successfully applied to many

industrial and civil domains, including industrial process, mon-

itoring and control, machine health monitoring, environment and

habitat monitoring, healthcare applications, home automation,

and traffic control. A typical WSN has little or no infrastructure. If the deployment of a WSN is subject to an ad hoc manner, it is

categorized as unstructured. In contrast, the network deployed

with a pre-planned manner is categorized as structured. Each

sensor node is optionally built up with a variety of network

services such as localization, coverage, synchronization, data

compression and aggregation, and security, for the purpose of

enhancing the network’s overall performance. Sensor nodes

communicate with each other, through following the typical

five-layer communication protocol stack, which consists of

physical layer, data link layer, network layer, transport layer,

and application layer.

The properties of WSN inevitably cause that a sensor node is

extremely restricted by resources, including energy, memory,

computing, bandwidth, and communication. Hence, WSN is

vulnerable to security threats both external and internal. In

addition, physical access is allowed for sensor nodes, as the

network is usually deployed near the physical source of the event,

but without tamper-resistance owing to cost constraint. What is

worse, the information exchange can be captured by any internal

and external devices, caused by the use of publicly accessiblecommunication channels. In consequence, a WSN is often threa-

tened by multiple security threats, which could be categorized as

follows (Lopez and Zhou, 2008):

communication attack; denial of service attack; node compromise; impersonation attack; protocol-specific attack.

Han et al. (2005) also propose a good taxonomy that surveys the

security threats according to a more detailed criteria.

Securing WSN is imperative and challenging accordingly.

Prevention-based techniques that fundamentally build upon

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/jnca

Journal of Network and Computer Applications

1084-8045/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.

doi:10.1016/j.jnca.2011.03.004

Corresponding authors.

E-mail addresses: [email protected] (M. Xie),

[email protected] (S. Han).1

Tel.: þ61 040 1400624.

Journal of Network and Computer Applications 34 (2011) 1302–1325


3/25


cryptography are the first line of defense for protecting WSN.

Based on a primitive of secret key management, encryption and

authentication are the primary measures in a prevention-based

technique, as that introduced in the security framework SPINS

(Perrig et al., 2001). However, in case the first line of defense is

broken through, compromised nodes could extract security-sen-

sitive information (e.g. secret key), leading to breaches of security.Thus, developing detection-based techniques as the second line of

defense appears to be of great importance. Intrusion detection is a

typical example of detection-based techniques. This concept was

originally proposed by Anderson (1980) two decades ago in a

report ‘‘Computer Security Threat Monitoring and Surveillance’’.

Intrusion detection is defined as the process of monitoring the

events occurring in a computer system or network and analyzing

them for any signs of possible incidents, which are violations or

imminent threats of violation of computer policies, acceptable use

policies, or standard practices (Scarfone and Mell, 2007). How-

ever, anomaly detection (Hu, 2010, also referred as outlier

detection, deviation detection, etc.), a branch of intrusion detec-

tion, is best suited to WSN because its methodology is flexible and

resource-friendly in general. Anomaly detection is defined as theprocess of comparing definitions of activity that is considered

normal against observed events in order to identify significant

deviations. Moreover, an anomaly in a dataset is defined as an

observation that appears to be inconsistent with the remainder of

the dataset (Hodge and Justin, 2004).

Anomaly may be caused by not only security threats, but also

faulty sensor nodes in the network or unusual phenomena in the

monitoring zone (Rajasegarar et al., 2008). In the real world,

isolated node failures can bring down the entire network, which

is harmful to reliability of WSN. This survey paper merely focuses

on anomaly detection techniques in WSN, irrespective of causes

of generating anomaly. The overview of the content of this survey

paper is given in Fig. 1.

1.1. Motivation

The research relating to anomaly detection in WSN has been

followed with much interest in recent years. From the ISSNIP

(Intelligent Sensors, Sensor Networks and Information Proces-

sing, The University of Melbourne, Australia) group, Rajasegarar

et al. (2008) did a survey on the related works before 2007 with a

simpler criteria: statistical parameter estimation techniques or

non-parametric techniques. Nevertheless, a technology-con-

cerned survey is yet absent to present the latest progress of

developing anomaly detection in WSN.

Moreover, our paper expects acting as a guideline of selecting

appropriate anomaly detection techniques. Through analyzing

and comparing those particular approaches that belong to a

similar technique category, the advantages and shortcomings of each technique category can be identified. Accordingly, it further

extracts the key design principles to overcome possible flaws.

The pattern of anomaly detection significantly impacts on the

performance of a detection scheme, which basically relates to

who is mainly responsible for the data processing of detection.

The choice of detection pattern depends on the application

scenario. The fair understanding with regard to these available

anomaly detection patterns could facilitate the development of

detection schemes. In consequence, these anomaly detection

patterns are surveyed separately in this paper.

In our survey paper, all detection schemes are divided into two

types of detection method: prior-knowledge based, or prior-

knowledge free. The prior-knowledge-based detection schemes

are better suited to the applications which are biased to detectionspeed; the prior-knowledge free schemes, on the contrary, are

capable of providing applications with stronger detection general-

ity. This awareness is positive to optimally selecting anomaly

detection techniques. Attribute selection is traditionally a critical

issue in a detection system, as using less number of attributes is

able to conserve resource. Our paper emphasizes the importance

of this issue for developing anomaly detectors in WSNs, whereas a

detailed discussion is not given owing to space constraint.

Finally, the developing orientations in this area are examined,

and a number of potential research areas in the near future are

proposed.

1.2. State-of-the-art techniques

Other than anomaly detection, there are also misuse/signa-

ture detection and stateful protocol analysis in the category of

intrusion detection (Scarfone and Mell, 2007). Misuse/signature

detection is defined as a process of comparing signatures against

observed events to identify possible incidents, where each

signature is a pattern corresponding to a known threat. Stateful

protocol analysis is defined as the process of comparing pre-

determined profiles of generally accepted definitions of benign

Fig. 1. The content of this survey paper.

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1303

https://www.researchgate.net/publication/200446667_Guide_to_Intrusion_Detection_and_Prevention_Systems_IDPS?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==


4/25


protocol activities for each protocol state against observed

events to identify outliers. Misuse/signature detection and state-

ful protocol analysis need complicated expression computing

and/or sizeable memory, to which WSNs usually cannot afford.

Moreover, they are unable to defense against unknown security

threats. Consequently, anomaly detection is currently the domi-

nant technology for enhancing the security and reliabilityof WSN.

Though WSN is derived from wireless ad hoc networks, the

most of detection schemes well-functioned in ad hoc networks

are not suitable for WSN, probably because (Akyildiz et al., 2002):

the number of sensor nodes in a WSN can be several orders of magnitude higher than that of an ad hoc network;

sensor nodes are densely deployed; a sensor node is less stable; the topology of WSNs varies frequently; sensor nodes mainly use a broadcast communication para-

digm, whereas ad hoc networks are mainly based on point-to-

point communication;

each sensor node is highly constrained in energy, computationcapability, memory, etc.

sensor nodes may have no global identifications as a result of the large amount of overhead.

Accordingly, the advanced anomaly detection schemes in ad

hoc networks (Qian et al., 2007; Tarique et al., 2009; Wu et al.,

2007) cannot be applied to WSN, as well as those developed in

wired networks.

In this survey paper, recently proposed detection schemes

in WSN are introduced. Because the architecture of a WSN is

strongly related to many aspects of designing a suited scheme,

these detection schemes are classified as hierarchical and flat

(homogeneous) according to their architectures. In a hierarchical

WSN, all sensor nodes are grouped or clustered, where only asingle node is elected as the cluster head (possibly equipped with

stronger capacity) to conduct the organizational functions within

its group or cluster. On the contrary, all sensor nodes equally

contribute to any team-functions and participate in internal

protocols (e.g. routing protocols) in a flat WSN. For each of the

architectures, a number of typical examples are given in terms of

the technique category that they belong to.

As far as the technique categories, statistical techniques, data

mining, and computational intelligence are employed most

widely. Statistical techniques consist of statistical distribution

(Palpanas et al., 2003; Subramaniam et al., 2006; Liu et al., 2007;

Dallas et al., 2007; Li et al., 2008a; Tiwari et al., 2009), statistical

measure (e.g. mean, variance, self-defined, etc.) (Zhang et al.,

2008; Pires et al., 2004; Onat and Miri, 2005a,b; Li et al., 2008b),

and statistical model (e.g. auto regression) (Curiac et al., 2007).

Computational intelligence is closely linked to machine learning

and remotely linked to data mining. Conceptually, machine

learning is more concerned with design and development of the

algorithms that enable computers to learn from large-scale

datasets. Data mining, however, principally focuses on discover-

ing patterns, associations, changes, anomalies, and statistically

significant structures and events in datasets. Under the technique

category of data mining and computational intelligence, a couple

of examples are introduced, including clustering algorithms

(Rajasegarar et al., 2006; Masud et al., 2009; Wang et al., 2009),

support vector machine (SVM) (Rajasegarar et al., 2007), artificial

neural network (ANN) (Wang et al., 2009), self-organizing map

(SOM) (Wang et al., 2009), genetic algorithm (GA) (Rahul et al.,

2009), and association rule learning (Yu and Tsai, 2008). Gametheory is dedicated to build up smart strategies for identifying

vulnerable areas in WSN (Agah et al., 2004a,b). There is only a

case that concentrates on linking detection with prevention

together to protect a hierarchical WSN from both internal and

external attacks (Su et al., 2005). Graph-based techniques specia-

lize in modeling a graph with the network flow (Ngai et al., 2006,

2007), which allows applying a few of graph algorithms (such as

tree construction, depth-first search, etc.) to detect anomaly.

Finally, rule-based techniques, which often build upon prior-

knowledge such as assumption and experience, are preferred inflat WSNs (Silva et al., 2005; Yu and Xiao, 2006; Ioannis et al.,

2007; Ho et al., 2009). Table 1 shows this taxonomy in brief.

1.3. Key challenge

The key challenge of evolving anomaly detection in WSN is to

identify anomaly with high accuracy but minimized energy cost,

so as to prolong the lifetime of the entire network. This target

could be attained from several paths. Above all, paying much

more attention on lightweight detection techniques, which are

characterized by compactness and efficiency. Second, reconstruct-

ing detection schemes with a distributed manner can spread the

energy overhead around the entire network and markedly reduce

the communication overhead, such that the lifetime of the net-work stretches. A suited detection pattern could also conserve the

energy cost without losing the security and reliability. In addition,

taking smart strategies into account such as shrinking the scale of

attributes set, compressing the input dataset, and simplifying the

procedure of analysis and decision could make lots of progress for

conserving energy.

1.4. Organization

The rest of this paper is organized as follows. In the second

section, these key design principles with respect to anomaly

detection in WSNs are discussed in detail. The following two

sections introduce many representative detection schemes, in

terms of hierarchical and flat topologies respectively. The fifth

section states the analysis and comparisons between schemes

that belong to a similar technique category. Finally, this survey is

summarized with a presentation about the potential research

areas in the near future.

2. Key design principles

The key design principles of anomaly detection in WSN must

be followed along with several aspects

target;

typical security threats;

detection pattern; detection method; attribute selection.

Table 1

Summary of the taxonomy.

Category Techniques

Statistical Distribution Measure Model

Data mining Clustering SVM Rule learner

Computational intelligence SOM ANN GARule Assumption Experience

Game theory Non-cooperative and non-zero-sum

Graph Tree construction Depth-first search

Hybrid Prevention and detection

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–13251304

https://www.researchgate.net/publication/232621756_Intrusion_Detection_for_Wireless_Sensor_Networks_Based_on_Multi-agent_and_Refined_Clustering_PDF?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/232621756_Intrusion_Detection_for_Wireless_Sensor_Networks_Based_on_Multi-agent_and_Refined_Clustering_PDF?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/232621756_Intrusion_Detection_for_Wireless_Sensor_Networks_Based_on_Multi-agent_and_Refined_Clustering_PDF?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/224719426_Quarter_Sphere_Based_Distributed_Anomaly_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4345953_A_Framework_of_Machine_Learning_Based_Intrusion_Detection_for_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4314491_Malicious_Node_Detection_in_Wireless_Sensor_Networks_Using_an_Autoregression_Technique?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==


5/25


2.1. Target

The target implies what a detection scheme is expected to be

able to do. In order for ensuring the performance, a detection

scheme is suggested to achieve a target comprising of Ioannis

et al. (2007):

Effectiveness: The effectiveness of a detection scheme reflect bythe detection accuracy and false alarm rate. The rate of detection

accuracy is the number of successfully detected anomalies divides

by the number of total anomalies. False alarm consists of false

positive and false negative, where a false positive signifies a

legitimate activity is falsely identified as an anomaly, and a miss

of capturing a real anomaly results in a false negative. False alarm

rate is the number of false alarm divides by the number of

reported anomalies. A good scheme should reach at high detec-

tion accuracy rate while remaining false alarm rate down. On the

other hand, the ability of detecting unknown (new types of

anomaly) anomalies is also significant as security threats to

WSN are more and more diversified and deliberate. This ability

is referred as detection generality in this paper.

Minimized resource: WSN characterizes by tremendously con-strained resources, especially the availability of energy. As a

result, minimizing the energy cost is a priority. The less use of

resource partly determines faster detection speed, but probably

leads to the loss of effectiveness. In consequence, it is difficult to

trade off the effectiveness and resource usage. According to a

truth that the most of energy in a sensor node is drained by radio

communication rather than by computation (Roman et al., 2006),

activating in-network computing as much as possible, namely

using distributed manner for computing, might be a promising

way to address this issue. In addition, the resource conservation

may come with effort made to design lightweight detection

schemes as well as smart strategies.

Trust no node: Unlike wired networks or ad hoc networks, a

sensor node can be compromised easily due to its weakness.

Accordingly, a detection scheme has to meet the criterion

‘‘trust-no-node’’ at any time. Based on a security foundation

(Zhang et al., 2008; Curiac et al., 2007; Su et al., 2005; Ngai

et al., 2006, 2007; Yu and Xiao, 2006; Ho et al., 2009), adding a

process of data filtering (Liu et al., 2007), and employing a vote

(or similar) mechanism (Liu et al., 2007; Li et al., 2008a,b;

Tiwari et al., 2009; Pires et al., 2004; Ioannis et al., 2007) might

be effective for directly ensuring the legitimate identity of a

sensor node or diluting the bad effects caused by the unat-

tended malicious nodes.

Be secure: The detection schemes themselves must be secure,

because the line of defense would be destroyed to the ground if

sophisticated adversaries disable or jump over the detection

service before initiate thorough attacks. In theory, adversaries

could make use of analytical measures to speculate what a kind of detection rules or algorithms is in employment by their targeted

schemes. Furthermore, adversaries perhaps wreck the detection

scheme with brute force. The survivability against malicious

activities is thus a significant point to assess the security of

detection schemes themselves. Moreover, the optimal detection

scheme must own the capability to recover its detection service

immediately once being wrecked, which is referred as tolerability.

2.2. Typical security threats

The typical security threats to WSN which can be identified by

a detection scheme should be fully reviewed. Many surveys

regarding these security threats have been introduced (Lopez

and Zhou, 2008; Han et al., 2005) according to different criteria,but detection is not effective against all of the mentioned threats,

such as eavesdropping attack only can be resisted by the built-in

security foundation. On the other hand, the relationship between

these threats is sometimes indistinguishable. Selective forward-

ing attack is a subsequent offence based on sinkhole attack, for

example, whereas the breakthrough of a sinkhole attack will

result in not only the following selective forwarding attack, but

also a series of severe security damages such as message alter. As

a result, the typical security threats and their countermeasures

which have been mentioned in the cited papers are roughly

shown in Table 2. In fact, more comparisons should be put intopractice, such as the damage scope of each security threat, the

damage degree of each security threat, the symptom of each

security threat (relating to attribute selection, see Section 2.5),

etc. This full work is expected to be finished separately, due to the

space limitation. Random failure is regarded as a special case of

security threats here, as anomaly detection is also able to deal

with it.

2.3. Detection pattern

Axelsson (1998) proposed a generic framework of intrusion

detection systems (IDSs), consisting of audit collection/storage,

processing , configuration/reference data, active/processing data,and alarm. As a branch technique of intrusion detection, a generic

framework of anomaly detection systems (ADSs) is simply

derived from the original IDS framework, which is comprised of

input , data processing , analysis and decision, and output (Chandola

et al., 2009). In general, a dataset that includes a collection of data

instances is the input for anomaly detection. A data instance

consists of a set of attributes, either univariate or multivariate.

The feature of an attribute could be binary, categorical, or

continuous. In the procedure of data processing, a normal profile

representing the benign status of the system is produced with a

training procedure, or with prior-knowledge. Certain detection

schemes probably need a special procedure of preprocessing.

According to the label of the input dataset, supervised, semi-

supervised, and unsupervised are popular methodologies to

training. Relying on the established normal profile, a test instance

can be identified whether it is an anomaly with specified algo-

rithms, during the procedure of analysis and decision. Usually,

single or multiple thresholds will be established for doing this

task. The type of anomaly could be point, contextual, or collective.

The final result, namely the output is produced by the anomaly

detector as one of the two possible forms: score or label. Figure 2

illustrates the generic framework of anomaly detection.

As for the detection pattern, it is basically linked to who takes

charge of carrying out the data processing procedure of anomaly

detection, since this is deterministic to many design details of a

scheme as well as its performance. Depending on the architecture

of a WSN, a range of detection patterns have been in use, which

will be briefly described below. Moreover, Table 3 shows a list of

these popular detection patterns and their corresponding refer-ences, where we use CH and CSN stand for cluster head and

common sponsor node for short.

Table 2

The typical security threats and preferred countermeasures.

Security threats Preferred countermeasures

Black-hole Statistical measure

Malicious node Statistical distribution, data mining

Sinkhole Graph, ruleSelective for warding Statistical mea sur e, data min ing

Wormhole Statistical measure, rule

Replica node Rule

Random failure Statistical distribution, data mining


https://www.researchgate.net/publication/221243920_Insider_Attacker_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==


6/25


In a hierarchical WSN, basically there are three available

detection patterns. First, the cluster head is responsible for

the data processing procedure alone (Wang et al., 2009; Su

et al., 2005). Second, the cluster head and common sensor

nodes cooperate to accomplish this (Palpanas et al., 2003;

Subramaniam et al., 2006; Zhang et al., 2008; Rajasegarar

et al., 2006, 2007). Third, this procedure is carried out at the

base station (Masud et al., 2009; Rahul et al., 2009). In the first

pattern, except collecting the input datasets the common sensor

nodes do not participate in the data processing procedure, and/or

partly contribute to the procedure of analysis and decision; the

cluster head alone is in charge of the data processing procedure.

However, this clearly leads to the overuse of energy in the cluster

head. As a result, the second and third detection patterns seem to

be more reasonable. None of them considers having the cluster

head attended; this may fail to meet the criterion ‘‘trust-no-

node’’. One possible remedy is letting the common sensor nodes

to monitor the cluster head by turns, such as picking out a part

of nodes according to their remaining energy ( Wang et al., 2009;

Su et al., 2005). These detection patterns are illustrated in Fig. 3.

There are also three broad categories of detection pattern in

flat WSNs. First, a part of nodes are on duty for covering its

neighborhood according to certain specification. In detail, thisneighborhood can be its ‘‘one-hop’’ (Onat and Miri, 2005a,b),

‘‘radio range’’ (Liu et al., 2007; Pires et al., 2004; Silva et al., 2005),

or ‘‘other’’ (Dallas et al., 2007; Yu and Tsai, 2008; Yu and Xiao, 2006;

Ioannis et al., 2007; Ho et al., 2009). The active nodes take care of its

specified neighborhood by monitoring and accomplishing the proce-

dure of data processing. The procedure of analysis and decision may

be resolved by the active nodes alone or a cooperative method.

Second, the base station conducts anomaly detection across the

network (Curiac et al., 2007; Ngai et al., 2006, 2007). Third, partition

the network into groups and then activate a part of sensor nodes in

each group to take charge of the monitoring and data processing

procedure (Li et al., 2008a,b). The common shortcoming of the first

pattern is the redundancy of protection coverage, because there is no

mechanism capable of accurately measuring the maximal protection

coverage that the active nodes can afford. As far as the third pattern, it

provides flat WSNs with a chance as employing advanced technique

as hierarchical WSNs. However, the grouping procedure certainly

brings a massive energy burden. Available detection patterns in flat

WSNs are shown in Fig. 4.

2.4. Detection method

Detection method is a key point of a detection scheme, as the

method impacts on its usable scope. The applicable range of a

scheme is to be restricted by the preconditions, according towhich two detection methods are introduced: prior-knowledge

based and prior-knowledge free.

Fig. 2. Generic framework of anomaly detection.

Table 3

Popular detection patterns.

Hierarchical WSNs Flat WSNs

Patterns References Patterns References

CH Wang et al. (2009) and Su et al. (2005) One-hop Onat and Miri (2005a) and Onat and Miri (2005b)

CH and CSNs Palpanas et al. (2003), Subramaniam et al. (2006),

Zhang et al. (2008), and Rajasegarar et al. (2006, 2007)

Radio-range Liu et al. (2007), Pires et al. (2004), and Silva et al. (2005)

Base station Masud et al. (2009) and Rahul et al. (2009) Other Dallas et al. (2007), Yu and Tsai (2008), Yu and Xiao (2006),and Ioannis et al. (2007); Ho et al. (2009)

Base station Curiac et al. (2007) and Ngai et al. (2006, 2007)

Grouping Li et al. (2008a,b)


https://www.researchgate.net/publication/4178614_A_real-time_node-based_traffic_anomaly_detection_algorithm_for_wireless_sensor_networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/224676267_On_the_Intruder_Detection_for_Sinkhole_Attack_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221243920_Insider_Attacker_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/223482038_Group-based_intrusion_detection_system_in_wireless_sensor_networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4077741_Malicious_node_detection_in_wireless_sensor_networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4314491_Malicious_Node_Detection_in_Wireless_Sensor_Networks_Using_an_Autoregression_Technique?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221167128_Reduced_Complexity_Intrusion_Detection_in_Sensor_Networks_Using_Genetic_Algorithm?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==


7/25


Base Station

A hierarchical wireless sensor network

Common Sensor Node

Cluster Head

Pattern 1 CH

A Cluster

Pattern 2 CH & CSNs

Pattern 3 BS

Fig. 3. Available detection patterns (hierarchical).

Base Station

Working Node

A flat wireless sensor network

Sensor Node

Pattern 1 One-hop

A Group

Pattern 2 Radio Range

Pattern 3 Other

Pattern 4 BS

Pattern 5 Grouping

Fig. 4. Available detection patterns (flat).



8/25


The knowledge regarding anomaly detection often consists of

assumption (Palpanas et al., 2003; Subramaniam et al., 2006; Liu

et al., 2007; Dallas et al., 2007; Li et al., 2008a; Pires et al., 2004;Curiac et al., 2007; Ho et al., 2009), and experience (Tiwari et al.,

2009; Silva et al., 2005; Ioannis et al., 2007). If a normal profile is

produced on the basis of the knowledge known in advance

instead of by an explicit training procedure, this scheme is

categorized as prior-knowledge based. For instance, a detector

is put into practice in terms of the assumption that the Mahala-

nobis squared distance constructed by the networking attributes

is subject to chi-square distribution (Liu et al., 2007). Based on the

assumption that the signal propagates with a known model (e.g.

two-ray ground model), a detection scheme is carried out by

comparing the estimated signal strength from the given model

and the real signal strength from the transceiver (Pires et al.,

2004). Security experts suggest that a node is highly possible to

be compromised if it discards the packets more than w percen-

tage during t time units; through this experience, a detection rule

is established (Tiwari et al., 2009; Ioannis et al., 2007).

A prior-knowledge free scheme allows performing detection

without any related knowledge in advance. The normal profile is

produced by a training procedure. All data mining and computa-

tional intelligence-based and graph-based detection schemes are

prior-knowledge free (Rajasegarar et al., 2006, 2007; Masud et al.,

2009; Wang et al., 2009; Rahul et al., 2009; Yu and Tsai, 2008;

Ngai et al., 2006, 2007), as well as the most of statistical detection

schemes (Zhang et al., 2008; Onat and Miri, 2005a,b; Li et al.,

2008b). Classification is a typical detection technique derived

from the family of data mining, in which the classifier is built

upon the training procedure. As to computational intelligence, GA

is a good example, which is applied to measure the fitness of node

without any prior-knowledge (Rahul et al., 2009), and then adetection scheme can be optimally deployed. With the network

flow information, sensor nodes are divided into many sub-trees,

where the root of biggest sub-tree is regarded as a compromised

node (Ngai et al., 2006, 2007). In addition, the standard deviation

of packet arrival intervals during a specified time period is trainedas the normal profile for identifying anomaly (Onat and Miri,

2005a), in accordance to

jmeanðrecBuf ÞmeanðintBuf Þj4K stdðrecBuf Þ:In conclusion, the dependency on prior-knowledge certainly

limits their applicability, but prior-knowledge-based schemes are

generally good at detecting anomaly that closely correlates to

their known knowledge. Besides, these schemes are usually with

fast detection speed, and simplicity of being realized. On the

contrary, prior-knowledge free detection schemes may be awk-

ward at detection speed, whereas they are provided with stronger

capability of addressing unknown security threats or random

failures. Consequently, a rough process of identifying appropriate

detection techniques is shown in Fig. 5.

2.5. Attribute selection

A truth of interest is that the most of malicious activities or

random failures against a WSN could be reflected by a single attribute

or multiple ones over the network. In fact, this is the essence why

anomaly detection can take effect to enhance the security and

reliability of WSN. For example, the irregular change of hop count

implicates a huge likelihood of being endangered by sinkhole attacks

(Dallas et al., 2007; Ngai et al., 2006, 2007); the signal power is

impractical while encountering Hello flood and wormhole attacks

(Pires et al., 2004); the insider attacks markedly affect the underlying

distribution of the sensed data (Liu et al., 2007); and the network

traffic behaviors related measurements such as packet dropping rate

(Ioannis et al., 2007) and packets arrival process (Onat and Miri,2005a) are capable of identifying black-hole and selective forwarding

attacks. This nature of attribute makes it a critical research problem.

Fig. 5. Process of identifying detection techniques. DM: Data Mining; CI: Computational Intelligence; IDA: Intrusion Detection Agent; SF: Security Foundation;

VD: Verifying Dataset; Stat: Statistical Techniques and DAD: Distributed Anomaly Detection.


https://www.researchgate.net/publication/224676267_On_the_Intruder_Detection_for_Sinkhole_Attack_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221243920_Insider_Attacker_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/224095788_Designing_Intrusion_Detection_to_Detect_Black_Hole_and_Selective_Forwarding_Attack_in_WSN_Based_on_Local_Information?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221167128_Reduced_Complexity_Intrusion_Detection_in_Sensor_Networks_Using_Genetic_Algorithm?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==


9/25


Furthermore, a reduced set of attributes can improve the detection

speed as well as the detection accuracy remarkably (Chebrolu et al.,

2005; Kloft et al., 2008). But, this problem remains open in the

anomaly detection of WSNs, despite little progress has been spor-

adically made (Silva et al., 2005; Ho et al., 2009). This issue would be

accounted for separately later, owing to the space limitation.

3. Anomaly detection based on hierarchical WSNs

In hierarchical WSNs, statistical techniques, data mining and

computational intelligence, game theory, and hybrid detection

have been employed to realizing detection schemes. The input is

collected at each common sensor node, probably followed by a

preprocessing procedure or a part of computation tasks coming

from the procedure of data processing. The original/preprocessed

inputs or local normal profiles are then sent to the cluster head or

base station, where the global normal profile is produced with a

training algorithm, some prior-knowledge or a combing algorithm

during the data processing procedure. The procedure of analysis

and decision would be carried out at each common sensor node orthe cluster head respectively, or both. Finally, the output of

anomaly detection is produced as a specified form where the

analysis and decision procedure have been done. Basically, these

techniques tend to find a normal profile using a training proce-

dure in order to realize higher detection generality. Thus, the

most of their detection methods belong to prior-knowledge free.

One common feature of these detection schemes is making

use of their hierarchical architecture to implement detection

within a distributed manner, which spreads the energy overhead

around the entire network and relieves the communication

burden. Because in distributed detection a central entity is

required to globally organize and coordinate the sub-computa-

tion tasks throughout a group, the cluster head suits to such

naturally. In a distributed detection scheme, the common sensornodes participate in the procedure of data processing, thereby

taking over a part of computing cost of the cluster head, and

capable of exchanging less information with the cluster head in

order for conserving the communication cost. For example, kernel

density estimator (Palpanas et al., 2003; Subramaniam et al., 2006),

clustering algorithms (Rajasegarar et al., 2006; Masud et al., 2009),

and support vector machine (SVM) (Rajasegarar et al., 2007) are

typical techniques, upon which these distributed schemes depend.

In the following, a number of particular detection schemes are

introduced according to their technique categories, for each of

which its principle, detection pattern, detection method, and any

unique feature or additional strategy are depicted in detail.

3.1. Statistical techniques

3.1.1. Distributed detection using kernel density estimator

A kernel density estimator is built up to identify anomaly by

estimating the underlying distribution of sensed data (Palpanas

et al., 2003). First, each common sensor node accomplishes the

local detection. The cluster head then collects all local normal

profiles to carry out the global detection within its group. For the

purpose of ensuring the smooth delivery of streaming data, each

discrete event occurs under the control of timing parameters:

dead line and importance.

The principle is simply described as follows. Given that S is a

random sample of static relation T and k( x) is the kernel function,

such that for all tuples in S ,

f ð xÞ ¼ 1n

Xt iAS

kð xt iÞ:

The underlying distribution f ( x) is estimated with

f ð xÞ ¼ 1n

Xt iA S

kð xt iÞ:

Epanechnikov kernel is employed in this case, as

kð xÞ ¼ 34 1B 1 xB 2

, xB

o1,0 otherwise,

8><>:

where B (B ¼ ffiffiffi

5p

sjS j1=5, and s is the standard deviation of T ) is thebandwidth of kernel function. Once f ( x) is estimated, it enables

identifying anomaly through calculating the number of sensed

data’s values ranged within the neighborhood of t 0. N (t 0, r ) is the

number of sensed data’s values in T , which are falling into a

sphere of radius r around t 0, as

N ðt 0 ,r Þ ¼Z

r

f ð xÞ dx:

If N (t 0,r ) is less than a threshold p, t 0 is identified as an anomaly.

Afterwards, the sample set S and bandwidth of kernel function

B at each common senor node are sent to the cluster head. Using acombing algorithm, the cluster head is able to work out the global

normal profile, by which the global detection is launched then.

Kernel density estimator is good at approximating the under-

lying distribution of a multiple dimensional dataset with reason-

able resource cost. Moreover, it is easy to be operated in a

distributed manner by combining the bandwidths of local kernel

functions together. The choice of kernel function is critical to the

performance; however, the estimation of parameters is a hard

problem in this kind of non-parametric statistical techniques.

3.1.2. Online detection using kernel density estimator

In the advanced kernel density estimator-based detection

scheme (Subramaniam et al., 2006), many enhancements arefigured out in contrast with its original effort (Palpanas et al.,

2003). The online approximation of sensed data in a sliding

window is proposed, using ‘‘chain-sample’’ algorithm. In the

interest of supporting the online approximation, a couple of

points are improved. First, the size of the resulting set from two

sensor nodes is reduced by the technique of warehousing of

samples. Second, a suitable technique for computing the standard

deviation in a sliding window of streaming data is made use of

facilitating the combination of bandwidths, as

V 1,2 ¼ V 1 þV 2 þ N 1N 2

N 1,2ðm1m2Þ2,

where m is the mean, V is the variance, N 1,2¼N 1þN 2, andm

1,

2 ¼ ðm

1

N 1þm

2

N 2Þ=N 1,2. Third, each common sensor node only

reports the update of its kernel density estimator with a prob-

ability f ¼ jR pj=ljRj, where a parent node has l children nodes, eachwith a kernel density estimator of size jRj, and the kernel densityestimator of parent node has size jR pj. Except distributed devia-tion detection algorithm which is based on distance (Palpanas

et al., 2003), a new local metrics-based algorithm multi-granular

deviation detection (MGDD) is introduced. Given that MDEF ð p,r ,aÞis the deviation factor of an observation p, and sMDEF ð p,r ,aÞ is thenormalized standard deviation in the sampling neighborhood of

p, p is flagged as an anomaly if

MDEF ð p,r ,aÞ4kssMDEF ð p,r ,aÞ,where ks is the factor of determining a significant deviation.

Online detection is carried out in this advanced scheme. With

a probability-based strategy, the normal profile can be regularlyupdated to meet the dynamic of system but not incurring too

much energy cost. On the other hand, a new local metrics-based



10/25


algorithm is introduced to detection, which suits to the dataset

indistinguishable by distance.

3.1.3. Detection using statistical measures

Relying on spatiotemporal correlation and consistency in some

spatial granularity, and a frequency mechanism respectively, a

detection scheme is designed to deal with insider attacks (Zhanget al., 2008), such as exceptional message and abnormal behavior.

Two detection mechanisms are introduced, one of which is that

the cluster head covers its group, and the other one is that each

common sensor node watches its one-hop neighbors. A random

secret key pre-distribution mechanism cooperates with this

detection scheme.

The principle of the exceptional message detection mechanism

(EMDM) is adopting the similarity between a pair of messages

coming from the common sensor nodes to identify anomaly.

Given a dynamic set maintained by the cluster head

D ¼ fðM i,W iÞjðM 1,W 1Þ,ðM 2,W 2Þ, . . . ,ðM n,W nÞg,where M i stands for a recorded message, W i is the weight

(frequency) of M i. When a new message M new arrives at thecluster head, M new traverses across D. If M new matches with any M iin accordance to

simðM new,M iÞ ¼ V ðM newÞ V ðM iÞV ðM newÞ V ðM iÞ

,

namely the similarity between M new and M i is less than a thresh-

old, M new is identified as normal and its corresponding W iincreases. Otherwise, M new is put into a new observing period to

eventually determine it is a new type of message or fake message.

If similar messages come from the other nodes during this period,

M new is a new type of normal message; on the contrary, M new is a

fake message firmly. The sender of M new is marked as malicious

immediately, and let the other common sensor nodes and base

station be informed.

As for the abnormal behavior detection mechanism (ABDM),

two measures are employed to identify anomaly. One is to

examine if a common sensor node sends too much or too less

messages in a turn. The other one is built upon a security

foundation. Each common sensor node records its one-hop

neighbors’ ID and N (IDi), where N (IDi) is the value of the abnormal

behavior of node ID i. Given

jðID xÞ ¼ ððID jÞ,N ðID jÞÞjððID1Þ,N ðID1ÞÞ, . . . , ðIDmÞ,N ðIDmÞ,where m is the number of ID x’s neighbors, and

uID x ¼ 1

m

Xm j ¼ 1

N ðID jÞ,

sID x ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1m1

Xm j ¼ 1

N ðID jÞmID xv uut

,

jID x ¼N ðID jÞmID j

sIDj

,

where uID x and sID x denote the mean and standard deviation of jðID xÞ respectively, if j ID x is deviated from a normal value, nodeID j will be reported to the cluster head as suspicious node.

This detection scheme makes use of a comparatively simple

technique, such that a faster detection speed comes true. Because

EMDM and ABDM work together, the cluster head and common

sensor nodes activate to perform detection at the same time,

which may provide the network with stronger security. However,

an apparent flaw exists in EMDM. If more than one maliciousnode sends the same fake messages, EMDM is incapable of

sustaining its operation against such attacks.

3.1.4. Detection using rules based on probability

Tiwari et al. lead a probability model (Tiwari et al., 2009) into

the rule-based scheme (Ioannis et al., 2007), aiming at black-hole

and selective forwarding attacks. By using the probability model

to more accurately measure the traffic behaviors, the false alarm

rate of the rule-based detection scheme can be sharply reduced. A

part of the common sensor nodes are selected as watchdogs, tomonitoring the neighbors within its radio range; the cluster head

is responsible for the analysis and decision procedure.

This scheme employs two detection rules: (A) During a time

window of w, if the probability p0 of packets dropping in a sensornode is greater than a threshold t , this node is reported as

suspicious; (B) if the probability p of a sensor node being reported

as suspicious is greater than 50%, the cluster head marks it as

compromised definitely. At each watchdog, the network traffic

pattern is modeled with Poison distribution. If the expected

amount of occurrences during a given interval is l, the probability

of k occurrences (non-negative integer, k ¼0,1,2y) is equal to

f

ðk,l

Þ ¼

lkel

k!

,

where l can be estimated according to network learning. If a

sudden change of the network traffic in a sensor node is perceived

by a watchdog, this node is reported as suspicious to the cluster

head. The rest of the watchdogs covering the radio range where a

suspicion appears, are called for participating in the procedure of

analysis and decision. During this procedure, if the probability p 0

reported by a watchdog against the suspicious node is greater

than t , the cluster head records it as ‘‘1’’, otherwise ‘‘0’’. After a

specified time interval, the cluster head generates a probability

sequence against the suspicious node, with the reports of watch-

dogs. This sequence is split into two-bit pairs; afterwards, all ‘‘00’’

and ‘‘11’’ pairs are eliminated for preventing from bias. Let the

probability of outcome ‘‘0’’ be q and ‘‘1’’ be 1 q. p is thencomputed from the resulting sequence; if (B) is satisfied, thesuspicious node is marked as a compromised node definitively.

This scheme improves a rule-based detection scheme by

taking advantage of probability-based measure, reducing the false

alarm rate significantly.

3.1.5. Research problems

Statistical techniques-based detection schemes are flexible.

Single or multiple attributes over the network such as the

network traffic (Tiwari et al., 2009) and the sensed data (multi-

dimensional) (Palpanas et al., 2003; Subramaniam et al., 2006)

can be utilized to construct a variety of statistical distributions; or

the statistical measurements are dedicated to reflect a normal

status, such as similarity, mean, variance, standard deviation

(Zhang et al., 2008), etc. Taking the appropriate statistical dis-

tributions and measurements into account is necessary for the

sake of meeting a wider range of application scenarios.

The benefits of distributed manner are already mentioned. It is

strongly encouraged that makes use of it as much as possible.

Statistical techniques own great potential to be reconstructed in a

distributed manner, because their core computing tasks are able

to be divided into smaller ones and then combined easily, such as

kernel density estimator (Palpanas et al., 2003; Subramaniam

et al., 2006). Moving along this path, the detection schemes based

on statistical techniques can be implemented with stronger

detection generality, but resource-efficient.

Online detection, which is of great significance for many real-

time application scenarios, has brought to success with kernel

density estimator technique (Subramaniam et al., 2006). How-ever, this needs smart strategies to enormously reduce the

information exchange.



11/25


Incorporating other techniques into statistical techniques

could boost the detection performance, such as rule-based detec-

tion technique (Tiwari et al., 2009), where a couple of detection

rules are set up to avoid the difficulty of training the normal

profile, but using a probability model to accurately measure the

traffic behaviors.

3.2. Data mining and computational intelligence-based techniques

3.2.1. Distributed detection using K-means clustering

With a K-means clustering algorithm, Rajasegarar et al. (2006)

design a distributed detection scheme. Each common sensor node

locally collects the input dataset to work out a normal profile.

Then the cluster head collects all local normal profiles to accom-

plish the procedure of data processing, where a global normal

profile is produced. After received the global normal profile, each

common sensor node initiates the analysis and decision proce-

dure to perform detection. In order to fit in distance-based

clustering, the input dataset is normalized at each common

sensor node with a preprocessing procedure.

Given a dataset vkj, k ¼1ym, it is transformed toukj ¼ ðvkjmvjÞ=dvj,where mvj and dvj stand for the mean and standard deviation of the jth attribute in vkj,8k respectively. Subsequently ukj is normal-ized in the interval [0,1], according to

ukj ¼ ðukjminu jÞ=ðmaxu jminu jÞ:Given a common sensor node si collecting a dataset X i, si sends the

local normal profile

Xmk ¼ 1

xik,Xm

k ¼ 1ð xikÞ2,m, ximax, ximin

!

to the cluster head, where m stands for j X ij. After the global

normal profileðmG,d2G, xGmax, xGminÞis computed, the cluster head sends it back to the common sensor

nodes. After received the global normal profile, each common

sensor node initiates detection locally, using a fixed-width clus-

tering algorithm. If the Euclidean distance between a data point

and its closest cluster centroid is larger than a user-specified

radius o, a new cluster is organized with this data point ascentroid. For reducing the number of resulting clusters, a cluster

merging process is then conducted, through measuring the inner-

cluster distances. The clusters c 1 and c 2 merge if their inner-

cluster distance d(c 1,c 2) is less than o . Finally, the average inter-cluster distance of K nearest neighbor (KNN) clusters is applied to

identify anomalous clusters. Let ICDi be the average inter-cluster

distance (KNN) of cluster i, AVG(ICD) and SD(ICD) be the mean and

standard deviation of all inter-cluster distances respectively. If

ICDi4SDðICDÞþ AVGðICDÞ,cluster i is viewed as anomalous.

This detection scheme is subject to a distributed manner,

where the common sensor nodes are responsible for a part of the

global normalizing procedure, which is served for the core

K-means clustering algorithm. There is a four-parameter tuple

making up a normal profile, which conserves energy cost in

communications.

3.2.2. Distributed detection using SVM

One-class quarter-sphere SVM, as a representative algorithm

of SVM, is also suited to distribute anomaly detection (Rajasegararet al., 2007). First, the local quarter-sphere is computed at each

common sensor node. Second, the cluster heads collects these

locally computed radii to work out a global radius. Detection is

then launched at each common sensor node with the global

normal profile.

In terms of the optimization problem:

minRAR,eARn

R2

þ

1

vnX

n

i ¼ 1xi, s:t: Jj

ð xi

ÞJ

2rR2

þxi, xiZ0,

where xi is a data vector, the mapped vector jð xiÞ is calledas image vector, R is the radius of the quarter-sphere, and

fxi : i ¼ 1 . . . ng are the slack variables that allow a part of theimage vectors lying outside the quarter-sphere. This problem can

be resolved by Lagrange algorithm. The image vectors conse-

quently may fall inside, on the boundary of, and outside the

quarter-sphere (outliers). Subsequently, the cluster head collects

the radii locally computed at each common sensor node to obtain

a global radius Rm. A couple of measures are optional to compute

Rm: mean, median, maximum, and minimum. When the common

sensor nodes receive Rm, detection is initiated. If a test instance xisatisfies

norm~

kð xi, xiÞ4

R

2

m,

xi is identified as an anomaly.

This scheme may suffer from a more massive procedure of

data processing, as a result of the high complexity of SVM. But,

only one parameter as the normal profile is exchanged between

the cluster head and common sensor nodes, indicating mush less

communication cost.

3.2.3. Distributed detection using clustering ellipsoids

Across the entire network, a WSN probably contains multiple

types of data underlying distribution; accordingly, Moshtaghi

et al. propose a distributed detection scheme based on clustering

ellipsoids (Masud et al., 2009). The base station takes charge of

computing the global hyper-ellipsoid, to accommodate the non-

homogenous data underlying distributions. The common sensornodes are in charge of performing detection, on the other hand,

with the global hyper-ellipsoid.

The general form of the elliptical boundary is represented as

ellða, A; t Þ ¼ f xAR pjð xaÞT Að xaÞ ¼ t 2g,where a is the center of the ellipsoid and t is its effective radius.

The Mahalanobis distance of x is

J xmJV 1 ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið xmÞT V 1ð xmÞ

q ,

where m is the mean and V is the covariance matrix. Conse-

quently, x is actually resided within a hyper-ellipsoidal boundary

if its Mahalanobis distance is t , i.e.:

Bðm,V 1; t Þ ¼ f xAR pjJ xmJ2V 1 ¼ t 2g: x is considered as a local anomaly if falling outside this boundary.

Hyper-ellipsoids are sent to the base station by the common

sensor nodes as local normal profiles, where a global ellipsoid is

produced. In order to satisfy as many types of data underlying

distribution as possible, t is intentionally selected. In addition,

these ellipsoids reported by the common sensor nodes are

disposed off with clustering which reduces the redundancy

between them. Given a common sensor node N j sending the

parameter tuple (m j, V j, n j) regarding its local ellipse E j to the base

station B , the similarity between two ellipsoids is measured as

S ðE 1,E 2Þ ¼ eJm1m2J:Positive root eigenvalue (PRE) plot is employed to estimate the

number of clusters c . Ellipses merge as a pairwise manner whenthe similarities and c are ready: Let (mu, V u, nu) and (mv, V v, nv) be

the parameter tuples of the ellipsoids E u and E v respectively, the



12/25


parameter tuple of the global ellipse E 0 will be (m,V ,n):

n ¼ nu þnv,

m ¼ nun

mu þ nv

n mv,

V ¼ nu1n1 V u þ

nv1n1 V v þ

nunvnðn1Þ ½ðmumvÞðmumvÞ

T :

This parameter tuple of the global ellipse is the global normal

profile in fact. When the common sensor nodes receive it from B,

detection is launched locally.

Using the base station to undertake the main computing tasks,

this detection scheme is energy-efficient. However, there is a

scope for thinking over better similarity measures for hyper-

ellipsoids, which take the shape and orientation of the ellipses

into consideration, as well as their separation. Moreover, more

robust methods are in need to merge ellipses which are from

slightly different underlying distributions. In this context, it also

desires for a more appropriate boundary than a standard devia-

tion, in order to avoid excessive false positive alarms.

3.2.4. Detection using multi-agent and refined clustering

Wang et al. (2009) introduce a multi-agents-based detection

scheme, which takes advantage of self-organizing map (SOM)

neural network algorithm and K-means clustering algorithm.

Detection agents including sentry, analysis, response, and man-

agement are attached to each node over the network, which

particularly take charge of detection. In this scheme, the cluster

head is taking care of its common sensor nodes, whereas a part of

common sensor nodes are activated in terms of their remaining

energy for monitoring the cluster head.

In fact, the cluster head and common sensor nodes monitor

with each other, using a same principle. The input dataset is

clustered by SOM neural network first of all. Afterwards, theclusters are refined by using K-means clustering algorithm. Let

Dxi be the Euclidian distance between xi and the center of its

cluster X j1. If Dxi is larger than the distance between xi and the

center of another cluster X j2, xi is re-clustered into cluster X j2. The

U-Matrix Map of the weight generated by neural network enables

to identify anomaly. Once anomaly is perceived, the trust degree

between two nodes is decreased. The definitive alarm is produced

until the degree of trust is below a predefined threshold.

The participation of agents provides this scheme with higher

flexibility, but also incurs excess costs. Letting the cluster head be

attended increases the security, as it meets ‘‘trust-no-node’’.

However, employing SOM neural network algorithm and K-means

clustering algorithm at the same time brings a massive computa-

tion burden.

3.2.5. Optimized detection using genetic algorithm

This GA-based scheme does not focus on detection explicitly,

but it is able to not only speed up the detection accuracy, but also

reduce the false alarm rate (Rahul et al., 2009). This scheme

allocates the monitoring function to the sensor nodes through

using GA to evaluate its fitness on the basis of workloads patterns,

packet statistics, utilization data, battery status, and quality-of-

service compliance.

Sensor nodes are classified as cluster head (CH), inactive node

(powered off), inter-cluster router (ICR), and common sensor

node (NS) in particular. The base station obtains a competing

fitness function based on GA to optimally select CH or ICR as the

local monitoring node (LMN), where each solution is representedas a binary string (chromosome) and an associated fitness

measure. From the mating pool, a solution is picked out with a

probability P i, as

P i ¼ F iPN

j ¼ 0 F j,

where F i is the functional fitness of a possible solution, and N is

the total number of possible solutions. LMN agent is in charge of

monitoring its neighbor nodes: (a) received signal strength,(b) transmission periodicity, (c) spurious transmissions from

illegitimate nodes, (d) response delay, and (e) packet dropping

or modification. In addition, the base station utilizes LMN as a

loop-back agent to transmit special patterns through its trusted

route and receive the patterns with a pre-established route, in

which malicious nodes can be identified by the transmitting of

hashed data. Moreover, the base station covers the entire network

with optional techniques (statistical metrics and models, Markov

model, and time series model, etc.) on the basis of analytical

traffic data and LMN alerts. The fitness function consists of

monitoring node integrity fitness (MIF), monitoring node battery

fitness (MBF), monitoring node coverage fitness (MCF), and

cumulative truest fitness (CTF). MIF resists the allocation of

LMN which is suspected to be compromised; the base stationestimates MIF with integrity rank value, whereby a low value

indicates high susceptibility to intrusion.

MIF ¼PN

ch ¼ 1 IRch K chPN ch ¼ 1 K ch

þPN

icr ¼ 1 IRicr K icr PM icr ¼ 1 K icr

,

K x ¼ 1 if x ¼ LMN ; xAðch,icr Þ,

IRicr ¼PR

r ¼ 1 IRr icr

R ,

where IRch and IRicr are the integrity ranks of CH and ICR

respectively, R is the number of routes, and IRicr r is the integrity

rank of the route r that includes icr as a router in its path. IR is

estimated by the base station according to

R x, y ¼ covð x, yÞ

varð xÞ varð yÞ ; 1oR x, yo1,

lðt Þ ¼ a lðt 1Þþð1aÞ lðt 1Þ,

IDC ¼varXnk ¼ 0

lk

! E

Xnk ¼ 0

lk

!, ,

where lðt Þ stands for the actual number of the packet arrivalsduring interval t , lðt Þ stands for the estimated number of thepacket arrivals during interval t , and lk is the number of the

packet arrivals between time intervals tk and tk þ 1. MBF reflects apenalty on the battery usage of the communication between

sensor nodes, as

MBF ¼PN

i BC i K iPN i K i

, BCi ¼ f ðQ ,U Þ,

where Q is the residual battery capacity, BCi is the projected

battery capacity of node i (CH or ICR). Battery usage rate (U )

depends on individual load and can be estimated with traffic

patterns and node-sync data. MCF rewards LMNs those can snoop

around the maximal number of nodes with low estimated

integrity rank:

MCF ¼ 12

b1PN

i ciF 1 N

þ b2PM

j c jF 1 M

!,

b1 þb2 ¼ 1,where ci is the number of LMN agents that monitor maliciousnode i, which is below the integrity rank threshold, c j is the

number of LMN agents that monitor non-malicious node j, which



13/25


is above the integrity rank threshold, and F 1 and F 2 are the

expected coverage redundancies for each malicious and non-

malicious node respectively. The total fitness is given by CTF, as

CTF ¼a1MIF þa2MBF þa3MCF :This scheme is extremely appropriate to cooperate with any

detection scheme, for not only conserving resource usage, but alsopromoting its detection performance. The limitation of this

scheme is that GA suffers from exponential time increase if the

network’s scale grows.


Data mining and computational intelligence algorithms-based

detection schemes characterize by strong detection generality,

meaning effective to defense against a wider range of security

threats even if unknown. The tempting detection generality, of

course, comes along with high complexity, such that these

schemes’ best effort are tried to operate in distributed manner

(Rajasegarar et al., 2006, 2007; Masud et al., 2009).

Not simply profiting from the hierarchical architecture of the

network, such as proficient control and management, littleredundancy of routing, and adaptability to a distributed manner,

arranging the primary computing tasks to the base station also

provides the detection schemes with much more conversation of

energy overheads (Masud et al., 2009; Rahul et al., 2009).

Equipping each sensor node with detection agents could enhance

the performance and the ease of implementation without taking

too much energy in sensor nodes away (Wang et al., 2009), but

certainly leads to extra expense on advanced devices.

In fact, the GA-based scheme (Rahul et al., 2009) is an

attractive paradigm for developing intelligent detection schemes

over WSNs. A few of significant factors relating to the benign

status are modeled with a fitness function in each potential

solution, according to which the best solution is eventually found

by an optimizing process. The final detection solution couldachieve maximal detection performance with minimal resource.

This scheme is able to cooperate with a range of detection

techniques, and makes them more intelligent.

3.3. Game theory-based techniques

3.3.1. Non-cooperative game theory

A game theory-based scheme is introduced for finding out the

vulnerable areas in a WSN (Agah et al., 2004a), based on many

risk factors such as reliability of a sensor node, different types of

attack, and past behaviors of the attacker. Only these identified

areas are provided with the protection of detection, in order to

save the energy cost.

Intrusion detection is modeled as a game played between

detection system and adversary. Each player is allowed to select

a strategy from a set of strategies once. Given a fixed cluster in

the network, say K , these strategies are available to adversary:

attack cluster K , not attack cluster K , and attack a different

cluster. Detection system responds to either defend cluster K , or

defend a different cluster. The strategies are marked with 1 to

3 and 1 to 2 for adversary and detection system respectively,

where two 2 3 payoff matrixes A and B can be established. Theproblem is to find out the optimized strategy that maximizes the

profit for both players, namely achieving Nash equilibrium.

Measuring the payoff depends on a couple of factors, including

attack type, density of sensor nodes, and the number of previous

attacks. Nash equilibrium is achieved when both players selected

their own first strategy. In other words, protecting the clusterwhich has the highest value of U (t )C k brings about a reliablerate of successful detection, where U (t ) indicates the utility of

the network’s on-going sessions, and C k indicates the average

cost of protecting cluster K .

3.3.2. Comparisons with game theory-based scheme

The non-cooperative game theory-based scheme (Agah et al.,

2004a) is then compared with Markov decision process (MDP)

and intuitive traffic measure (Agah et al., 2004b).With a stochastic process known as Markov Chain, MDP can do

forecasting by modeling the system’s state transitions in the past.

MDP contains a tuple (S , A,R,tr ), where S is a state set, A is a set of

actions, R is the reward function, and tr is the state-transition

function. The past system states and the transitions between

states can be described by a MDP model. The target is to

maximize the expected value of the received rewards over time.

On the other hand, the traffic measure is based on the intuitive

metric, so that the cluster which suffers from heaviest traffic

volume is marked as the most vulnerable area. Because of taking

account into many factors, the non-cooperative game theory-

based scheme accomplishes highest forecasting accuracy among

others.


Similar to the GA-based scheme (Rahul et al., 2009) mentioned

earlier, non-cooperative game theory-based schemes are not

concerned with detection immediately; however, it could assist

detection schemes in advancing their performance as well as

efficiency. The design of the payoff function is crucial to the

forecasting accuracy, which is worth more studying. Moreover, if

the GA-based scheme which is capable of optimizing the place-

ment of the monitoring nodes could cooperate with the game

theory-based scheme which enables identifying the vulnerable

areas, it is expected that the detection schemes can achieve better

performance.

3.4. Hybrid detection

3.4.1. Detection with prevention technique

There is only a hybrid detection framework (Su et al., 2005),

which really calls for the collaboration between the energy-saving

detection technique and the authentication-based prevention

technique. In the detection scheme, the cluster head is respon-

sible for monitoring its common senor nodes; on the other hand, a

part of the common senor nodes are picked out in terms of their

residual energy to monitor their cluster head in turn.

A suite of secret keys are established during initialization, in

which the base station and common sensor nodes share the

individual secret key, each common sensor node shares a set of

pairwise secret keys with its neighbors, the common sensor nodes

within a cluster share a cluster secret key, and the group secret keyis shared among all sensor nodes over the network. The packets

transmitting through the network are categorized as control mes-

sages and sensed data. When the base station, cluster head, or any

intermediate node forwards a control message, a message authenti-

cation code (MAC) is appended with proper secret key. The inter-

mediate nodes forwarding this control message verify the appended

MAC and replace it with a new MAC. The verifying and replacing

of MAC continues until this control message arrives at its destina-

tion. If sender (u) sends control message (M ) to receiver (vi) with

current time stamp T c , a MAC is generated by a proper secret key

according to

u-vi : M ,T c ,MAC ðK uvi ,M jT c Þwhere M

jT c is the concatenation of M and T c , and MAC

ðK uvi ,M

jT c

Þ is

the MAC generated from M jT c with the secret key K uvi which isshared between u and vi. When a common sensor node (vi) forwards

a sensed data (D) to the cluster head (u), u needs to verify D to



14/25


prevent from any fake or redundant messages sent by the attackers.

Because D is usually large and periodically sent from vi to u, the

generation of MACs during the forwarding path is time-consuming

and impractical for a WSN. In consequence, an enhanced authenti-

cation scheme of LEAP is put forward. The original LEAP cannot

identify the compromised nodes, as all the common sensor nodes

within a cluster share only one cluster secret key. First, pairwisesecret key is used by the enhanced scheme, instead of cluster secret

key which is used by the original LEAP. Second, the enhanced

scheme employs one-time key chain as session keys, which is fairly

efficient for authentication.

The detection is implemented in accordance to three types of

misbehaviors: packet dropping, packet duplicating, and packet

jamming. This detection scheme can be divided into two parts:

one is that the cluster head monitors its common sensor nodes

and the other one is that the common senor nodes monitor their

cluster head in turn. In particular, monitoring the cluster head

consists of arranging monitoring nodes, reacting to the abnormal

cluster heads, determining the alarm threshold, and determining

the group size. Moreover, monitoring the common sensor nodes is

simply to localize the suspicious node by pairwise secret key if anomaly found.

This scheme is certainly able to reach at energy-efficient as

well as strongly secured, by taking consideration into many

details, for example linking detection against internal attackers

with prevention against external attackers together, using one-

time key chain, letting the cluster head to be attended with

minimized energy cost, and fast localizing the compromised

nodes with a secret key. However, sensor nodes cannot move

and new sensor nodes cannot be added, once the pairwise key has

been established. Probably a dynamic key management and a

distribution mechanism could overcome this flaw.


Few schemes (Zhang et al., 2008) mentioned to cooperate with

a prevention-based technique in hierarchical WSNs. Moreover,

the security foundation established with a prevention technique

is only served as enhancing the security of the network, instead of

taking advantage of the functions brought by the availability of

secret keys. WSNs should have been protected by a security

foundation (Perrig et al., 2001). Apparently, the detection scheme

will be more efficient if capable of utilizing the functions provided

by this security foundation, rather than making use of prevention

and detection separately.

4. Anomaly detection based on flat WSNs

In flat WSNs, rule-based techniques and statistical techniques

are more likely to be made use of. Without hierarchical architec-

ture, all nodes are equally capable of functioning and participat-

ing in internal protocols. Consequently, detection schemes which

are lightweight and require less communication are preferable. In

this section, we survey some of the representative literatures for

each technique category mentioned above.

A rule-based model is commonly developed in accordance

with assumptions, information, or experiences known in advance.

As a result, it often focuses on specific security issues by examin-

ing the particular attributes of networking behaviors. In flat

WSNs, statistical techniques are relatively simpler than those

for hierarchical WSNs, because of the nature of the architecture.

Because data mining and computation intelligence techniques

often depend on a central entity to cope with heavy organiza-tional tasks, flat architecture is naturally disabled for this,

although data mining and computation intelligence techniques

might be implemented with assistance such as the installation of

agents (Ho et al., 2009).

Besides, detection methods in flat WSN are also diverse.

Minimizing energy consumption while retaining good perfor-

mance is always important, and this is discussed along with the

various detection methods mentioned in the proposed detection

schemes below.

4.1. Rule-based detection

4.1.1. Decentralized detection using rules

A decentralized rule-based scheme is proposed (Silva et al.,

2005), in which a rule union picked from a set of candidate rules

is applied to satisfy the specific demands of application scenarios.

Given a WSN composed of common nodes, monitor nodes,

intruder nodes, and base station, each monitor node is in charge

of monitoring the neighbors within its radio range, by turning the

promiscuous listening mode on.

In particular, this scheme makes up of data acquisition, rule

application, and intrusion detection. In the first phase, each

monitor node collects messages by a promiscuous listening modeand filters off the important information for subsequent analysis.

The applicable rules are selected out according to requirements

during the second phase. As for the intrusion detection phase,

failing to match a rule increases one onto the failure counter. An

alarm is produced until this counter is over a predefined thresh-

old within a round of detection.

This scheme gives a good framework to rule-based detection.

But, there is a lack of clear description in regard of the details of

determining monitor nodes, such as particularly how many and

which sensor nodes should be on duty to make sure the entire

network is under protection.

4.1.2. Detection using multi-hop ACK Building upon a mechanism of multi-hop acknowledgement, a

detection scheme is put forward to defense against selective

forwarding attack (Yu and Xiao, 2006). Detection is active during

the path forwarding packets from the source node to the base

station, where the base station, intermediate nodes, and source

node take part.

A security foundation has to be established firstly, including

(A) node initialization and deployment, and (B) OHC (one-way

hash chain) based one-to-many authentication. The secret key

server loads every sensor node with a unique secret key and a

symmetric bivariate polynomial f (u, v) in the initialization. The

unique secret key is shared between this node and the base

station, and can be used for encrypting messages and genera-

ting MACs (message authentication codes). In the deployme

anomaly detection in wireless sensor networks- a survey

Documents