anomaly detection in wireless sensor networks- a survey
TRANSCRIPT
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
1/25
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third partywebsites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
http://www.elsevier.com/copyrighthttp://www.elsevier.com/copyright
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
2/25
Author's personal copy
Anomaly detection in wireless sensor networks: A survey
Miao Xie ,1, Song Han , Biming Tian, Sazia Parvin
Digital Ecosystems and Business Intelligence Institute, Curtin University, DEBII, GPO Box U1987, Perth, WA 6845, Australia
a r t i c l e i n f o
Article history:
Received 19 August 2010Received in revised form
10 February 2011
Accepted 7 March 2011Available online 21 March 2011
Keywords:
Wireless sensor networks
Information security
Anomaly detection
a b s t r a c t
Since security threats to WSNs are increasingly being diversified and deliberate, prevention-based
techniques alone can no longer provide WSNs with adequate security. However, detection-basedtechniques might be effective in collaboration with prevention-based techniques for securing WSNs. As
a significant branch of detection-based techniques, the research of anomaly detection in wired
networks and wireless ad hoc networks is already quite mature, but such solutions can be rarely
applied to WSNs without any change, because WSNs are characterized by constrained resources, such
as limited energy, weak computation capability, poor memory, short communication range, etc. The
development of anomaly detection techniques suitable for WSNs is therefore regarded as an essential
research area, which will enable WSNs to be much more secure and reliable. In this survey paper, a few
of the key design principles relating to the development of anomaly detection techniques in WSNs are
discussed in particular. Then, the state-of-the-art techniques of anomaly detection in WSNs are
systematically introduced, according to WSNs’ architectures (Hierarchical/Flat) and detection technique
categories (statistical techniques, rule based, data mining, computational intelligence, game theory,
graph based, and hybrid, etc.). The analyses and comparisons of the approaches that belong to a similar
technique category are represented technically, followed by a brief discussion towards the potential
research areas in the near future and conclusion.
& 2011 Elsevier Ltd. All rights reserved.
1. Introduction
A wireless sensor network (WSN) is made up of a mass of
spatially distributed autonomous sensors, to jointly monitor
physical or environmental conditions, such as temperature,
sound, vibration, pressure, motion and pollutants (Yick et al.,
2008). To date, WSNs have been successfully applied to many
industrial and civil domains, including industrial process, mon-
itoring and control, machine health monitoring, environment and
habitat monitoring, healthcare applications, home automation,
and traffic control. A typical WSN has little or no infrastructure. If the deployment of a WSN is subject to an ad hoc manner, it is
categorized as unstructured. In contrast, the network deployed
with a pre-planned manner is categorized as structured. Each
sensor node is optionally built up with a variety of network
services such as localization, coverage, synchronization, data
compression and aggregation, and security, for the purpose of
enhancing the network’s overall performance. Sensor nodes
communicate with each other, through following the typical
five-layer communication protocol stack, which consists of
physical layer, data link layer, network layer, transport layer,
and application layer.
The properties of WSN inevitably cause that a sensor node is
extremely restricted by resources, including energy, memory,
computing, bandwidth, and communication. Hence, WSN is
vulnerable to security threats both external and internal. In
addition, physical access is allowed for sensor nodes, as the
network is usually deployed near the physical source of the event,
but without tamper-resistance owing to cost constraint. What is
worse, the information exchange can be captured by any internal
and external devices, caused by the use of publicly accessiblecommunication channels. In consequence, a WSN is often threa-
tened by multiple security threats, which could be categorized as
follows (Lopez and Zhou, 2008):
communication attack; denial of service attack; node compromise; impersonation attack; protocol-specific attack.
Han et al. (2005) also propose a good taxonomy that surveys the
security threats according to a more detailed criteria.
Securing WSN is imperative and challenging accordingly.
Prevention-based techniques that fundamentally build upon
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/jnca
Journal of Network and Computer Applications
1084-8045/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jnca.2011.03.004
Corresponding authors.
E-mail addresses: [email protected] (M. Xie),
[email protected] (S. Han).1
Tel.: þ61 040 1400624.
Journal of Network and Computer Applications 34 (2011) 1302–1325
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
3/25
Author's personal copy
cryptography are the first line of defense for protecting WSN.
Based on a primitive of secret key management, encryption and
authentication are the primary measures in a prevention-based
technique, as that introduced in the security framework SPINS
(Perrig et al., 2001). However, in case the first line of defense is
broken through, compromised nodes could extract security-sen-
sitive information (e.g. secret key), leading to breaches of security.Thus, developing detection-based techniques as the second line of
defense appears to be of great importance. Intrusion detection is a
typical example of detection-based techniques. This concept was
originally proposed by Anderson (1980) two decades ago in a
report ‘‘Computer Security Threat Monitoring and Surveillance’’.
Intrusion detection is defined as the process of monitoring the
events occurring in a computer system or network and analyzing
them for any signs of possible incidents, which are violations or
imminent threats of violation of computer policies, acceptable use
policies, or standard practices (Scarfone and Mell, 2007). How-
ever, anomaly detection (Hu, 2010, also referred as outlier
detection, deviation detection, etc.), a branch of intrusion detec-
tion, is best suited to WSN because its methodology is flexible and
resource-friendly in general. Anomaly detection is defined as theprocess of comparing definitions of activity that is considered
normal against observed events in order to identify significant
deviations. Moreover, an anomaly in a dataset is defined as an
observation that appears to be inconsistent with the remainder of
the dataset (Hodge and Justin, 2004).
Anomaly may be caused by not only security threats, but also
faulty sensor nodes in the network or unusual phenomena in the
monitoring zone (Rajasegarar et al., 2008). In the real world,
isolated node failures can bring down the entire network, which
is harmful to reliability of WSN. This survey paper merely focuses
on anomaly detection techniques in WSN, irrespective of causes
of generating anomaly. The overview of the content of this survey
paper is given in Fig. 1.
1.1. Motivation
The research relating to anomaly detection in WSN has been
followed with much interest in recent years. From the ISSNIP
(Intelligent Sensors, Sensor Networks and Information Proces-
sing, The University of Melbourne, Australia) group, Rajasegarar
et al. (2008) did a survey on the related works before 2007 with a
simpler criteria: statistical parameter estimation techniques or
non-parametric techniques. Nevertheless, a technology-con-
cerned survey is yet absent to present the latest progress of
developing anomaly detection in WSN.
Moreover, our paper expects acting as a guideline of selecting
appropriate anomaly detection techniques. Through analyzing
and comparing those particular approaches that belong to a
similar technique category, the advantages and shortcomings of each technique category can be identified. Accordingly, it further
extracts the key design principles to overcome possible flaws.
The pattern of anomaly detection significantly impacts on the
performance of a detection scheme, which basically relates to
who is mainly responsible for the data processing of detection.
The choice of detection pattern depends on the application
scenario. The fair understanding with regard to these available
anomaly detection patterns could facilitate the development of
detection schemes. In consequence, these anomaly detection
patterns are surveyed separately in this paper.
In our survey paper, all detection schemes are divided into two
types of detection method: prior-knowledge based, or prior-
knowledge free. The prior-knowledge-based detection schemes
are better suited to the applications which are biased to detectionspeed; the prior-knowledge free schemes, on the contrary, are
capable of providing applications with stronger detection general-
ity. This awareness is positive to optimally selecting anomaly
detection techniques. Attribute selection is traditionally a critical
issue in a detection system, as using less number of attributes is
able to conserve resource. Our paper emphasizes the importance
of this issue for developing anomaly detectors in WSNs, whereas a
detailed discussion is not given owing to space constraint.
Finally, the developing orientations in this area are examined,
and a number of potential research areas in the near future are
proposed.
1.2. State-of-the-art techniques
Other than anomaly detection, there are also misuse/signa-
ture detection and stateful protocol analysis in the category of
intrusion detection (Scarfone and Mell, 2007). Misuse/signature
detection is defined as a process of comparing signatures against
observed events to identify possible incidents, where each
signature is a pattern corresponding to a known threat. Stateful
protocol analysis is defined as the process of comparing pre-
determined profiles of generally accepted definitions of benign
Fig. 1. The content of this survey paper.
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1303
https://www.researchgate.net/publication/200446667_Guide_to_Intrusion_Detection_and_Prevention_Systems_IDPS?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
4/25
Author's personal copy
protocol activities for each protocol state against observed
events to identify outliers. Misuse/signature detection and state-
ful protocol analysis need complicated expression computing
and/or sizeable memory, to which WSNs usually cannot afford.
Moreover, they are unable to defense against unknown security
threats. Consequently, anomaly detection is currently the domi-
nant technology for enhancing the security and reliabilityof WSN.
Though WSN is derived from wireless ad hoc networks, the
most of detection schemes well-functioned in ad hoc networks
are not suitable for WSN, probably because (Akyildiz et al., 2002):
the number of sensor nodes in a WSN can be several orders of magnitude higher than that of an ad hoc network;
sensor nodes are densely deployed; a sensor node is less stable; the topology of WSNs varies frequently; sensor nodes mainly use a broadcast communication para-
digm, whereas ad hoc networks are mainly based on point-to-
point communication;
each sensor node is highly constrained in energy, computationcapability, memory, etc.
sensor nodes may have no global identifications as a result of the large amount of overhead.
Accordingly, the advanced anomaly detection schemes in ad
hoc networks (Qian et al., 2007; Tarique et al., 2009; Wu et al.,
2007) cannot be applied to WSN, as well as those developed in
wired networks.
In this survey paper, recently proposed detection schemes
in WSN are introduced. Because the architecture of a WSN is
strongly related to many aspects of designing a suited scheme,
these detection schemes are classified as hierarchical and flat
(homogeneous) according to their architectures. In a hierarchical
WSN, all sensor nodes are grouped or clustered, where only asingle node is elected as the cluster head (possibly equipped with
stronger capacity) to conduct the organizational functions within
its group or cluster. On the contrary, all sensor nodes equally
contribute to any team-functions and participate in internal
protocols (e.g. routing protocols) in a flat WSN. For each of the
architectures, a number of typical examples are given in terms of
the technique category that they belong to.
As far as the technique categories, statistical techniques, data
mining, and computational intelligence are employed most
widely. Statistical techniques consist of statistical distribution
(Palpanas et al., 2003; Subramaniam et al., 2006; Liu et al., 2007;
Dallas et al., 2007; Li et al., 2008a; Tiwari et al., 2009), statistical
measure (e.g. mean, variance, self-defined, etc.) (Zhang et al.,
2008; Pires et al., 2004; Onat and Miri, 2005a,b; Li et al., 2008b),
and statistical model (e.g. auto regression) (Curiac et al., 2007).
Computational intelligence is closely linked to machine learning
and remotely linked to data mining. Conceptually, machine
learning is more concerned with design and development of the
algorithms that enable computers to learn from large-scale
datasets. Data mining, however, principally focuses on discover-
ing patterns, associations, changes, anomalies, and statistically
significant structures and events in datasets. Under the technique
category of data mining and computational intelligence, a couple
of examples are introduced, including clustering algorithms
(Rajasegarar et al., 2006; Masud et al., 2009; Wang et al., 2009),
support vector machine (SVM) (Rajasegarar et al., 2007), artificial
neural network (ANN) (Wang et al., 2009), self-organizing map
(SOM) (Wang et al., 2009), genetic algorithm (GA) (Rahul et al.,
2009), and association rule learning (Yu and Tsai, 2008). Gametheory is dedicated to build up smart strategies for identifying
vulnerable areas in WSN (Agah et al., 2004a,b). There is only a
case that concentrates on linking detection with prevention
together to protect a hierarchical WSN from both internal and
external attacks (Su et al., 2005). Graph-based techniques specia-
lize in modeling a graph with the network flow (Ngai et al., 2006,
2007), which allows applying a few of graph algorithms (such as
tree construction, depth-first search, etc.) to detect anomaly.
Finally, rule-based techniques, which often build upon prior-
knowledge such as assumption and experience, are preferred inflat WSNs (Silva et al., 2005; Yu and Xiao, 2006; Ioannis et al.,
2007; Ho et al., 2009). Table 1 shows this taxonomy in brief.
1.3. Key challenge
The key challenge of evolving anomaly detection in WSN is to
identify anomaly with high accuracy but minimized energy cost,
so as to prolong the lifetime of the entire network. This target
could be attained from several paths. Above all, paying much
more attention on lightweight detection techniques, which are
characterized by compactness and efficiency. Second, reconstruct-
ing detection schemes with a distributed manner can spread the
energy overhead around the entire network and markedly reduce
the communication overhead, such that the lifetime of the net-work stretches. A suited detection pattern could also conserve the
energy cost without losing the security and reliability. In addition,
taking smart strategies into account such as shrinking the scale of
attributes set, compressing the input dataset, and simplifying the
procedure of analysis and decision could make lots of progress for
conserving energy.
1.4. Organization
The rest of this paper is organized as follows. In the second
section, these key design principles with respect to anomaly
detection in WSNs are discussed in detail. The following two
sections introduce many representative detection schemes, in
terms of hierarchical and flat topologies respectively. The fifth
section states the analysis and comparisons between schemes
that belong to a similar technique category. Finally, this survey is
summarized with a presentation about the potential research
areas in the near future.
2. Key design principles
The key design principles of anomaly detection in WSN must
be followed along with several aspects
target;
typical security threats;
detection pattern; detection method; attribute selection.
Table 1
Summary of the taxonomy.
Category Techniques
Statistical Distribution Measure Model
Data mining Clustering SVM Rule learner
Computational intelligence SOM ANN GARule Assumption Experience
Game theory Non-cooperative and non-zero-sum
Graph Tree construction Depth-first search
Hybrid Prevention and detection
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–13251304
https://www.researchgate.net/publication/232621756_Intrusion_Detection_for_Wireless_Sensor_Networks_Based_on_Multi-agent_and_Refined_Clustering_PDF?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/232621756_Intrusion_Detection_for_Wireless_Sensor_Networks_Based_on_Multi-agent_and_Refined_Clustering_PDF?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/232621756_Intrusion_Detection_for_Wireless_Sensor_Networks_Based_on_Multi-agent_and_Refined_Clustering_PDF?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/224719426_Quarter_Sphere_Based_Distributed_Anomaly_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4345953_A_Framework_of_Machine_Learning_Based_Intrusion_Detection_for_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4314491_Malicious_Node_Detection_in_Wireless_Sensor_Networks_Using_an_Autoregression_Technique?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
5/25
Author's personal copy
2.1. Target
The target implies what a detection scheme is expected to be
able to do. In order for ensuring the performance, a detection
scheme is suggested to achieve a target comprising of Ioannis
et al. (2007):
Effectiveness: The effectiveness of a detection scheme reflect bythe detection accuracy and false alarm rate. The rate of detection
accuracy is the number of successfully detected anomalies divides
by the number of total anomalies. False alarm consists of false
positive and false negative, where a false positive signifies a
legitimate activity is falsely identified as an anomaly, and a miss
of capturing a real anomaly results in a false negative. False alarm
rate is the number of false alarm divides by the number of
reported anomalies. A good scheme should reach at high detec-
tion accuracy rate while remaining false alarm rate down. On the
other hand, the ability of detecting unknown (new types of
anomaly) anomalies is also significant as security threats to
WSN are more and more diversified and deliberate. This ability
is referred as detection generality in this paper.
Minimized resource: WSN characterizes by tremendously con-strained resources, especially the availability of energy. As a
result, minimizing the energy cost is a priority. The less use of
resource partly determines faster detection speed, but probably
leads to the loss of effectiveness. In consequence, it is difficult to
trade off the effectiveness and resource usage. According to a
truth that the most of energy in a sensor node is drained by radio
communication rather than by computation (Roman et al., 2006),
activating in-network computing as much as possible, namely
using distributed manner for computing, might be a promising
way to address this issue. In addition, the resource conservation
may come with effort made to design lightweight detection
schemes as well as smart strategies.
Trust no node: Unlike wired networks or ad hoc networks, a
sensor node can be compromised easily due to its weakness.
Accordingly, a detection scheme has to meet the criterion
‘‘trust-no-node’’ at any time. Based on a security foundation
(Zhang et al., 2008; Curiac et al., 2007; Su et al., 2005; Ngai
et al., 2006, 2007; Yu and Xiao, 2006; Ho et al., 2009), adding a
process of data filtering (Liu et al., 2007), and employing a vote
(or similar) mechanism (Liu et al., 2007; Li et al., 2008a,b;
Tiwari et al., 2009; Pires et al., 2004; Ioannis et al., 2007) might
be effective for directly ensuring the legitimate identity of a
sensor node or diluting the bad effects caused by the unat-
tended malicious nodes.
Be secure: The detection schemes themselves must be secure,
because the line of defense would be destroyed to the ground if
sophisticated adversaries disable or jump over the detection
service before initiate thorough attacks. In theory, adversaries
could make use of analytical measures to speculate what a kind of detection rules or algorithms is in employment by their targeted
schemes. Furthermore, adversaries perhaps wreck the detection
scheme with brute force. The survivability against malicious
activities is thus a significant point to assess the security of
detection schemes themselves. Moreover, the optimal detection
scheme must own the capability to recover its detection service
immediately once being wrecked, which is referred as tolerability.
2.2. Typical security threats
The typical security threats to WSN which can be identified by
a detection scheme should be fully reviewed. Many surveys
regarding these security threats have been introduced (Lopez
and Zhou, 2008; Han et al., 2005) according to different criteria,but detection is not effective against all of the mentioned threats,
such as eavesdropping attack only can be resisted by the built-in
security foundation. On the other hand, the relationship between
these threats is sometimes indistinguishable. Selective forward-
ing attack is a subsequent offence based on sinkhole attack, for
example, whereas the breakthrough of a sinkhole attack will
result in not only the following selective forwarding attack, but
also a series of severe security damages such as message alter. As
a result, the typical security threats and their countermeasures
which have been mentioned in the cited papers are roughly
shown in Table 2. In fact, more comparisons should be put intopractice, such as the damage scope of each security threat, the
damage degree of each security threat, the symptom of each
security threat (relating to attribute selection, see Section 2.5),
etc. This full work is expected to be finished separately, due to the
space limitation. Random failure is regarded as a special case of
security threats here, as anomaly detection is also able to deal
with it.
2.3. Detection pattern
Axelsson (1998) proposed a generic framework of intrusion
detection systems (IDSs), consisting of audit collection/storage,
processing , configuration/reference data, active/processing data,and alarm. As a branch technique of intrusion detection, a generic
framework of anomaly detection systems (ADSs) is simply
derived from the original IDS framework, which is comprised of
input , data processing , analysis and decision, and output (Chandola
et al., 2009). In general, a dataset that includes a collection of data
instances is the input for anomaly detection. A data instance
consists of a set of attributes, either univariate or multivariate.
The feature of an attribute could be binary, categorical, or
continuous. In the procedure of data processing, a normal profile
representing the benign status of the system is produced with a
training procedure, or with prior-knowledge. Certain detection
schemes probably need a special procedure of preprocessing.
According to the label of the input dataset, supervised, semi-
supervised, and unsupervised are popular methodologies to
training. Relying on the established normal profile, a test instance
can be identified whether it is an anomaly with specified algo-
rithms, during the procedure of analysis and decision. Usually,
single or multiple thresholds will be established for doing this
task. The type of anomaly could be point, contextual, or collective.
The final result, namely the output is produced by the anomaly
detector as one of the two possible forms: score or label. Figure 2
illustrates the generic framework of anomaly detection.
As for the detection pattern, it is basically linked to who takes
charge of carrying out the data processing procedure of anomaly
detection, since this is deterministic to many design details of a
scheme as well as its performance. Depending on the architecture
of a WSN, a range of detection patterns have been in use, which
will be briefly described below. Moreover, Table 3 shows a list of
these popular detection patterns and their corresponding refer-ences, where we use CH and CSN stand for cluster head and
common sponsor node for short.
Table 2
The typical security threats and preferred countermeasures.
Security threats Preferred countermeasures
Black-hole Statistical measure
Malicious node Statistical distribution, data mining
Sinkhole Graph, ruleSelective for warding Statistical mea sur e, data min ing
Wormhole Statistical measure, rule
Replica node Rule
Random failure Statistical distribution, data mining
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1305
https://www.researchgate.net/publication/221243920_Insider_Attacker_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
6/25
Author's personal copy
In a hierarchical WSN, basically there are three available
detection patterns. First, the cluster head is responsible for
the data processing procedure alone (Wang et al., 2009; Su
et al., 2005). Second, the cluster head and common sensor
nodes cooperate to accomplish this (Palpanas et al., 2003;
Subramaniam et al., 2006; Zhang et al., 2008; Rajasegarar
et al., 2006, 2007). Third, this procedure is carried out at the
base station (Masud et al., 2009; Rahul et al., 2009). In the first
pattern, except collecting the input datasets the common sensor
nodes do not participate in the data processing procedure, and/or
partly contribute to the procedure of analysis and decision; the
cluster head alone is in charge of the data processing procedure.
However, this clearly leads to the overuse of energy in the cluster
head. As a result, the second and third detection patterns seem to
be more reasonable. None of them considers having the cluster
head attended; this may fail to meet the criterion ‘‘trust-no-
node’’. One possible remedy is letting the common sensor nodes
to monitor the cluster head by turns, such as picking out a part
of nodes according to their remaining energy ( Wang et al., 2009;
Su et al., 2005). These detection patterns are illustrated in Fig. 3.
There are also three broad categories of detection pattern in
flat WSNs. First, a part of nodes are on duty for covering its
neighborhood according to certain specification. In detail, thisneighborhood can be its ‘‘one-hop’’ (Onat and Miri, 2005a,b),
‘‘radio range’’ (Liu et al., 2007; Pires et al., 2004; Silva et al., 2005),
or ‘‘other’’ (Dallas et al., 2007; Yu and Tsai, 2008; Yu and Xiao, 2006;
Ioannis et al., 2007; Ho et al., 2009). The active nodes take care of its
specified neighborhood by monitoring and accomplishing the proce-
dure of data processing. The procedure of analysis and decision may
be resolved by the active nodes alone or a cooperative method.
Second, the base station conducts anomaly detection across the
network (Curiac et al., 2007; Ngai et al., 2006, 2007). Third, partition
the network into groups and then activate a part of sensor nodes in
each group to take charge of the monitoring and data processing
procedure (Li et al., 2008a,b). The common shortcoming of the first
pattern is the redundancy of protection coverage, because there is no
mechanism capable of accurately measuring the maximal protection
coverage that the active nodes can afford. As far as the third pattern, it
provides flat WSNs with a chance as employing advanced technique
as hierarchical WSNs. However, the grouping procedure certainly
brings a massive energy burden. Available detection patterns in flat
WSNs are shown in Fig. 4.
2.4. Detection method
Detection method is a key point of a detection scheme, as the
method impacts on its usable scope. The applicable range of a
scheme is to be restricted by the preconditions, according towhich two detection methods are introduced: prior-knowledge
based and prior-knowledge free.
Fig. 2. Generic framework of anomaly detection.
Table 3
Popular detection patterns.
Hierarchical WSNs Flat WSNs
Patterns References Patterns References
CH Wang et al. (2009) and Su et al. (2005) One-hop Onat and Miri (2005a) and Onat and Miri (2005b)
CH and CSNs Palpanas et al. (2003), Subramaniam et al. (2006),
Zhang et al. (2008), and Rajasegarar et al. (2006, 2007)
Radio-range Liu et al. (2007), Pires et al. (2004), and Silva et al. (2005)
Base station Masud et al. (2009) and Rahul et al. (2009) Other Dallas et al. (2007), Yu and Tsai (2008), Yu and Xiao (2006),and Ioannis et al. (2007); Ho et al. (2009)
Base station Curiac et al. (2007) and Ngai et al. (2006, 2007)
Grouping Li et al. (2008a,b)
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–13251306
https://www.researchgate.net/publication/4178614_A_real-time_node-based_traffic_anomaly_detection_algorithm_for_wireless_sensor_networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/224676267_On_the_Intruder_Detection_for_Sinkhole_Attack_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221243920_Insider_Attacker_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/223482038_Group-based_intrusion_detection_system_in_wireless_sensor_networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4077741_Malicious_node_detection_in_wireless_sensor_networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/4314491_Malicious_Node_Detection_in_Wireless_Sensor_Networks_Using_an_Autoregression_Technique?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221167128_Reduced_Complexity_Intrusion_Detection_in_Sensor_Networks_Using_Genetic_Algorithm?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
7/25
Author's personal copy
Base Station
A hierarchical wireless sensor network
Common Sensor Node
Cluster Head
Pattern 1 CH
A Cluster
Pattern 2 CH & CSNs
Pattern 3 BS
Fig. 3. Available detection patterns (hierarchical).
Base Station
Working Node
A flat wireless sensor network
Sensor Node
Pattern 1 One-hop
A Group
Pattern 2 Radio Range
Pattern 3 Other
Pattern 4 BS
Pattern 5 Grouping
Fig. 4. Available detection patterns (flat).
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1307
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
8/25
Author's personal copy
The knowledge regarding anomaly detection often consists of
assumption (Palpanas et al., 2003; Subramaniam et al., 2006; Liu
et al., 2007; Dallas et al., 2007; Li et al., 2008a; Pires et al., 2004;Curiac et al., 2007; Ho et al., 2009), and experience (Tiwari et al.,
2009; Silva et al., 2005; Ioannis et al., 2007). If a normal profile is
produced on the basis of the knowledge known in advance
instead of by an explicit training procedure, this scheme is
categorized as prior-knowledge based. For instance, a detector
is put into practice in terms of the assumption that the Mahala-
nobis squared distance constructed by the networking attributes
is subject to chi-square distribution (Liu et al., 2007). Based on the
assumption that the signal propagates with a known model (e.g.
two-ray ground model), a detection scheme is carried out by
comparing the estimated signal strength from the given model
and the real signal strength from the transceiver (Pires et al.,
2004). Security experts suggest that a node is highly possible to
be compromised if it discards the packets more than w percen-
tage during t time units; through this experience, a detection rule
is established (Tiwari et al., 2009; Ioannis et al., 2007).
A prior-knowledge free scheme allows performing detection
without any related knowledge in advance. The normal profile is
produced by a training procedure. All data mining and computa-
tional intelligence-based and graph-based detection schemes are
prior-knowledge free (Rajasegarar et al., 2006, 2007; Masud et al.,
2009; Wang et al., 2009; Rahul et al., 2009; Yu and Tsai, 2008;
Ngai et al., 2006, 2007), as well as the most of statistical detection
schemes (Zhang et al., 2008; Onat and Miri, 2005a,b; Li et al.,
2008b). Classification is a typical detection technique derived
from the family of data mining, in which the classifier is built
upon the training procedure. As to computational intelligence, GA
is a good example, which is applied to measure the fitness of node
without any prior-knowledge (Rahul et al., 2009), and then adetection scheme can be optimally deployed. With the network
flow information, sensor nodes are divided into many sub-trees,
where the root of biggest sub-tree is regarded as a compromised
node (Ngai et al., 2006, 2007). In addition, the standard deviation
of packet arrival intervals during a specified time period is trainedas the normal profile for identifying anomaly (Onat and Miri,
2005a), in accordance to
jmeanðrecBuf ÞmeanðintBuf Þj4K stdðrecBuf Þ:In conclusion, the dependency on prior-knowledge certainly
limits their applicability, but prior-knowledge-based schemes are
generally good at detecting anomaly that closely correlates to
their known knowledge. Besides, these schemes are usually with
fast detection speed, and simplicity of being realized. On the
contrary, prior-knowledge free detection schemes may be awk-
ward at detection speed, whereas they are provided with stronger
capability of addressing unknown security threats or random
failures. Consequently, a rough process of identifying appropriate
detection techniques is shown in Fig. 5.
2.5. Attribute selection
A truth of interest is that the most of malicious activities or
random failures against a WSN could be reflected by a single attribute
or multiple ones over the network. In fact, this is the essence why
anomaly detection can take effect to enhance the security and
reliability of WSN. For example, the irregular change of hop count
implicates a huge likelihood of being endangered by sinkhole attacks
(Dallas et al., 2007; Ngai et al., 2006, 2007); the signal power is
impractical while encountering Hello flood and wormhole attacks
(Pires et al., 2004); the insider attacks markedly affect the underlying
distribution of the sensed data (Liu et al., 2007); and the network
traffic behaviors related measurements such as packet dropping rate
(Ioannis et al., 2007) and packets arrival process (Onat and Miri,2005a) are capable of identifying black-hole and selective forwarding
attacks. This nature of attribute makes it a critical research problem.
Fig. 5. Process of identifying detection techniques. DM: Data Mining; CI: Computational Intelligence; IDA: Intrusion Detection Agent; SF: Security Foundation;
VD: Verifying Dataset; Stat: Statistical Techniques and DAD: Distributed Anomaly Detection.
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–13251308
https://www.researchgate.net/publication/224676267_On_the_Intruder_Detection_for_Sinkhole_Attack_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221243920_Insider_Attacker_Detection_in_Wireless_Sensor_Networks?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/224095788_Designing_Intrusion_Detection_to_Detect_Black_Hole_and_Selective_Forwarding_Attack_in_WSN_Based_on_Local_Information?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==https://www.researchgate.net/publication/221167128_Reduced_Complexity_Intrusion_Detection_in_Sensor_Networks_Using_Genetic_Algorithm?el=1_x_8&enrichId=rgreq-1c1db93d-e251-4f8b-9cff-4df8e6a3ba0e&enrichSource=Y292ZXJQYWdlOzI1NjA5NTAxNDtBUzoxMDE1ODI4Mzg0MzU4NDlAMTQwMTIzMDY0NTU5OA==
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
9/25
Author's personal copy
Furthermore, a reduced set of attributes can improve the detection
speed as well as the detection accuracy remarkably (Chebrolu et al.,
2005; Kloft et al., 2008). But, this problem remains open in the
anomaly detection of WSNs, despite little progress has been spor-
adically made (Silva et al., 2005; Ho et al., 2009). This issue would be
accounted for separately later, owing to the space limitation.
3. Anomaly detection based on hierarchical WSNs
In hierarchical WSNs, statistical techniques, data mining and
computational intelligence, game theory, and hybrid detection
have been employed to realizing detection schemes. The input is
collected at each common sensor node, probably followed by a
preprocessing procedure or a part of computation tasks coming
from the procedure of data processing. The original/preprocessed
inputs or local normal profiles are then sent to the cluster head or
base station, where the global normal profile is produced with a
training algorithm, some prior-knowledge or a combing algorithm
during the data processing procedure. The procedure of analysis
and decision would be carried out at each common sensor node orthe cluster head respectively, or both. Finally, the output of
anomaly detection is produced as a specified form where the
analysis and decision procedure have been done. Basically, these
techniques tend to find a normal profile using a training proce-
dure in order to realize higher detection generality. Thus, the
most of their detection methods belong to prior-knowledge free.
One common feature of these detection schemes is making
use of their hierarchical architecture to implement detection
within a distributed manner, which spreads the energy overhead
around the entire network and relieves the communication
burden. Because in distributed detection a central entity is
required to globally organize and coordinate the sub-computa-
tion tasks throughout a group, the cluster head suits to such
naturally. In a distributed detection scheme, the common sensornodes participate in the procedure of data processing, thereby
taking over a part of computing cost of the cluster head, and
capable of exchanging less information with the cluster head in
order for conserving the communication cost. For example, kernel
density estimator (Palpanas et al., 2003; Subramaniam et al., 2006),
clustering algorithms (Rajasegarar et al., 2006; Masud et al., 2009),
and support vector machine (SVM) (Rajasegarar et al., 2007) are
typical techniques, upon which these distributed schemes depend.
In the following, a number of particular detection schemes are
introduced according to their technique categories, for each of
which its principle, detection pattern, detection method, and any
unique feature or additional strategy are depicted in detail.
3.1. Statistical techniques
3.1.1. Distributed detection using kernel density estimator
A kernel density estimator is built up to identify anomaly by
estimating the underlying distribution of sensed data (Palpanas
et al., 2003). First, each common sensor node accomplishes the
local detection. The cluster head then collects all local normal
profiles to carry out the global detection within its group. For the
purpose of ensuring the smooth delivery of streaming data, each
discrete event occurs under the control of timing parameters:
dead line and importance.
The principle is simply described as follows. Given that S is a
random sample of static relation T and k( x) is the kernel function,
such that for all tuples in S ,
f ð xÞ ¼ 1n
Xt iAS
kð xt iÞ:
The underlying distribution f ( x) is estimated with
f ð xÞ ¼ 1n
Xt iA S
kð xt iÞ:
Epanechnikov kernel is employed in this case, as
kð xÞ ¼ 34 1B 1 xB 2
, xB
o1,0 otherwise,
8><>:
where B (B ¼ ffiffiffi
5p
sjS j1=5, and s is the standard deviation of T ) is thebandwidth of kernel function. Once f ( x) is estimated, it enables
identifying anomaly through calculating the number of sensed
data’s values ranged within the neighborhood of t 0. N (t 0, r ) is the
number of sensed data’s values in T , which are falling into a
sphere of radius r around t 0, as
N ðt 0 ,r Þ ¼Z
r
f ð xÞ dx:
If N (t 0,r ) is less than a threshold p, t 0 is identified as an anomaly.
Afterwards, the sample set S and bandwidth of kernel function
B at each common senor node are sent to the cluster head. Using acombing algorithm, the cluster head is able to work out the global
normal profile, by which the global detection is launched then.
Kernel density estimator is good at approximating the under-
lying distribution of a multiple dimensional dataset with reason-
able resource cost. Moreover, it is easy to be operated in a
distributed manner by combining the bandwidths of local kernel
functions together. The choice of kernel function is critical to the
performance; however, the estimation of parameters is a hard
problem in this kind of non-parametric statistical techniques.
3.1.2. Online detection using kernel density estimator
In the advanced kernel density estimator-based detection
scheme (Subramaniam et al., 2006), many enhancements arefigured out in contrast with its original effort (Palpanas et al.,
2003). The online approximation of sensed data in a sliding
window is proposed, using ‘‘chain-sample’’ algorithm. In the
interest of supporting the online approximation, a couple of
points are improved. First, the size of the resulting set from two
sensor nodes is reduced by the technique of warehousing of
samples. Second, a suitable technique for computing the standard
deviation in a sliding window of streaming data is made use of
facilitating the combination of bandwidths, as
V 1,2 ¼ V 1 þV 2 þ N 1N 2
N 1,2ðm1m2Þ2,
where m is the mean, V is the variance, N 1,2¼N 1þN 2, andm
1,
2 ¼ ðm
1
N 1þm
2
N 2Þ=N 1,2. Third, each common sensor node only
reports the update of its kernel density estimator with a prob-
ability f ¼ jR pj=ljRj, where a parent node has l children nodes, eachwith a kernel density estimator of size jRj, and the kernel densityestimator of parent node has size jR pj. Except distributed devia-tion detection algorithm which is based on distance (Palpanas
et al., 2003), a new local metrics-based algorithm multi-granular
deviation detection (MGDD) is introduced. Given that MDEF ð p,r ,aÞis the deviation factor of an observation p, and sMDEF ð p,r ,aÞ is thenormalized standard deviation in the sampling neighborhood of
p, p is flagged as an anomaly if
MDEF ð p,r ,aÞ4kssMDEF ð p,r ,aÞ,where ks is the factor of determining a significant deviation.
Online detection is carried out in this advanced scheme. With
a probability-based strategy, the normal profile can be regularlyupdated to meet the dynamic of system but not incurring too
much energy cost. On the other hand, a new local metrics-based
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1309
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
10/25
Author's personal copy
algorithm is introduced to detection, which suits to the dataset
indistinguishable by distance.
3.1.3. Detection using statistical measures
Relying on spatiotemporal correlation and consistency in some
spatial granularity, and a frequency mechanism respectively, a
detection scheme is designed to deal with insider attacks (Zhanget al., 2008), such as exceptional message and abnormal behavior.
Two detection mechanisms are introduced, one of which is that
the cluster head covers its group, and the other one is that each
common sensor node watches its one-hop neighbors. A random
secret key pre-distribution mechanism cooperates with this
detection scheme.
The principle of the exceptional message detection mechanism
(EMDM) is adopting the similarity between a pair of messages
coming from the common sensor nodes to identify anomaly.
Given a dynamic set maintained by the cluster head
D ¼ fðM i,W iÞjðM 1,W 1Þ,ðM 2,W 2Þ, . . . ,ðM n,W nÞg,where M i stands for a recorded message, W i is the weight
(frequency) of M i. When a new message M new arrives at thecluster head, M new traverses across D. If M new matches with any M iin accordance to
simðM new,M iÞ ¼ V ðM newÞ V ðM iÞV ðM newÞ V ðM iÞ
,
namely the similarity between M new and M i is less than a thresh-
old, M new is identified as normal and its corresponding W iincreases. Otherwise, M new is put into a new observing period to
eventually determine it is a new type of message or fake message.
If similar messages come from the other nodes during this period,
M new is a new type of normal message; on the contrary, M new is a
fake message firmly. The sender of M new is marked as malicious
immediately, and let the other common sensor nodes and base
station be informed.
As for the abnormal behavior detection mechanism (ABDM),
two measures are employed to identify anomaly. One is to
examine if a common sensor node sends too much or too less
messages in a turn. The other one is built upon a security
foundation. Each common sensor node records its one-hop
neighbors’ ID and N (IDi), where N (IDi) is the value of the abnormal
behavior of node ID i. Given
jðID xÞ ¼ ððID jÞ,N ðID jÞÞjððID1Þ,N ðID1ÞÞ, . . . , ðIDmÞ,N ðIDmÞ,where m is the number of ID x’s neighbors, and
uID x ¼ 1
m
Xm j ¼ 1
N ðID jÞ,
sID x ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1m1
Xm j ¼ 1
N ðID jÞmID xv uut
,
jID x ¼N ðID jÞmID j
sIDj
,
where uID x and sID x denote the mean and standard deviation of jðID xÞ respectively, if j ID x is deviated from a normal value, nodeID j will be reported to the cluster head as suspicious node.
This detection scheme makes use of a comparatively simple
technique, such that a faster detection speed comes true. Because
EMDM and ABDM work together, the cluster head and common
sensor nodes activate to perform detection at the same time,
which may provide the network with stronger security. However,
an apparent flaw exists in EMDM. If more than one maliciousnode sends the same fake messages, EMDM is incapable of
sustaining its operation against such attacks.
3.1.4. Detection using rules based on probability
Tiwari et al. lead a probability model (Tiwari et al., 2009) into
the rule-based scheme (Ioannis et al., 2007), aiming at black-hole
and selective forwarding attacks. By using the probability model
to more accurately measure the traffic behaviors, the false alarm
rate of the rule-based detection scheme can be sharply reduced. A
part of the common sensor nodes are selected as watchdogs, tomonitoring the neighbors within its radio range; the cluster head
is responsible for the analysis and decision procedure.
This scheme employs two detection rules: (A) During a time
window of w, if the probability p0 of packets dropping in a sensornode is greater than a threshold t , this node is reported as
suspicious; (B) if the probability p of a sensor node being reported
as suspicious is greater than 50%, the cluster head marks it as
compromised definitely. At each watchdog, the network traffic
pattern is modeled with Poison distribution. If the expected
amount of occurrences during a given interval is l, the probability
of k occurrences (non-negative integer, k ¼0,1,2y) is equal to
f
ðk,l
Þ ¼
lkel
k!
,
where l can be estimated according to network learning. If a
sudden change of the network traffic in a sensor node is perceived
by a watchdog, this node is reported as suspicious to the cluster
head. The rest of the watchdogs covering the radio range where a
suspicion appears, are called for participating in the procedure of
analysis and decision. During this procedure, if the probability p 0
reported by a watchdog against the suspicious node is greater
than t , the cluster head records it as ‘‘1’’, otherwise ‘‘0’’. After a
specified time interval, the cluster head generates a probability
sequence against the suspicious node, with the reports of watch-
dogs. This sequence is split into two-bit pairs; afterwards, all ‘‘00’’
and ‘‘11’’ pairs are eliminated for preventing from bias. Let the
probability of outcome ‘‘0’’ be q and ‘‘1’’ be 1 q. p is thencomputed from the resulting sequence; if (B) is satisfied, thesuspicious node is marked as a compromised node definitively.
This scheme improves a rule-based detection scheme by
taking advantage of probability-based measure, reducing the false
alarm rate significantly.
3.1.5. Research problems
Statistical techniques-based detection schemes are flexible.
Single or multiple attributes over the network such as the
network traffic (Tiwari et al., 2009) and the sensed data (multi-
dimensional) (Palpanas et al., 2003; Subramaniam et al., 2006)
can be utilized to construct a variety of statistical distributions; or
the statistical measurements are dedicated to reflect a normal
status, such as similarity, mean, variance, standard deviation
(Zhang et al., 2008), etc. Taking the appropriate statistical dis-
tributions and measurements into account is necessary for the
sake of meeting a wider range of application scenarios.
The benefits of distributed manner are already mentioned. It is
strongly encouraged that makes use of it as much as possible.
Statistical techniques own great potential to be reconstructed in a
distributed manner, because their core computing tasks are able
to be divided into smaller ones and then combined easily, such as
kernel density estimator (Palpanas et al., 2003; Subramaniam
et al., 2006). Moving along this path, the detection schemes based
on statistical techniques can be implemented with stronger
detection generality, but resource-efficient.
Online detection, which is of great significance for many real-
time application scenarios, has brought to success with kernel
density estimator technique (Subramaniam et al., 2006). How-ever, this needs smart strategies to enormously reduce the
information exchange.
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–13251310
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
11/25
Author's personal copy
Incorporating other techniques into statistical techniques
could boost the detection performance, such as rule-based detec-
tion technique (Tiwari et al., 2009), where a couple of detection
rules are set up to avoid the difficulty of training the normal
profile, but using a probability model to accurately measure the
traffic behaviors.
3.2. Data mining and computational intelligence-based techniques
3.2.1. Distributed detection using K-means clustering
With a K-means clustering algorithm, Rajasegarar et al. (2006)
design a distributed detection scheme. Each common sensor node
locally collects the input dataset to work out a normal profile.
Then the cluster head collects all local normal profiles to accom-
plish the procedure of data processing, where a global normal
profile is produced. After received the global normal profile, each
common sensor node initiates the analysis and decision proce-
dure to perform detection. In order to fit in distance-based
clustering, the input dataset is normalized at each common
sensor node with a preprocessing procedure.
Given a dataset vkj, k ¼1ym, it is transformed toukj ¼ ðvkjmvjÞ=dvj,where mvj and dvj stand for the mean and standard deviation of the jth attribute in vkj,8k respectively. Subsequently ukj is normal-ized in the interval [0,1], according to
ukj ¼ ðukjminu jÞ=ðmaxu jminu jÞ:Given a common sensor node si collecting a dataset X i, si sends the
local normal profile
Xmk ¼ 1
xik,Xm
k ¼ 1ð xikÞ2,m, ximax, ximin
!
to the cluster head, where m stands for j X ij. After the global
normal profileðmG,d2G, xGmax, xGminÞis computed, the cluster head sends it back to the common sensor
nodes. After received the global normal profile, each common
sensor node initiates detection locally, using a fixed-width clus-
tering algorithm. If the Euclidean distance between a data point
and its closest cluster centroid is larger than a user-specified
radius o, a new cluster is organized with this data point ascentroid. For reducing the number of resulting clusters, a cluster
merging process is then conducted, through measuring the inner-
cluster distances. The clusters c 1 and c 2 merge if their inner-
cluster distance d(c 1,c 2) is less than o . Finally, the average inter-cluster distance of K nearest neighbor (KNN) clusters is applied to
identify anomalous clusters. Let ICDi be the average inter-cluster
distance (KNN) of cluster i, AVG(ICD) and SD(ICD) be the mean and
standard deviation of all inter-cluster distances respectively. If
ICDi4SDðICDÞþ AVGðICDÞ,cluster i is viewed as anomalous.
This detection scheme is subject to a distributed manner,
where the common sensor nodes are responsible for a part of the
global normalizing procedure, which is served for the core
K-means clustering algorithm. There is a four-parameter tuple
making up a normal profile, which conserves energy cost in
communications.
3.2.2. Distributed detection using SVM
One-class quarter-sphere SVM, as a representative algorithm
of SVM, is also suited to distribute anomaly detection (Rajasegararet al., 2007). First, the local quarter-sphere is computed at each
common sensor node. Second, the cluster heads collects these
locally computed radii to work out a global radius. Detection is
then launched at each common sensor node with the global
normal profile.
In terms of the optimization problem:
minRAR,eARn
R2
þ
1
vnX
n
i ¼ 1xi, s:t: Jj
ð xi
ÞJ
2rR2
þxi, xiZ0,
where xi is a data vector, the mapped vector jð xiÞ is calledas image vector, R is the radius of the quarter-sphere, and
fxi : i ¼ 1 . . . ng are the slack variables that allow a part of theimage vectors lying outside the quarter-sphere. This problem can
be resolved by Lagrange algorithm. The image vectors conse-
quently may fall inside, on the boundary of, and outside the
quarter-sphere (outliers). Subsequently, the cluster head collects
the radii locally computed at each common sensor node to obtain
a global radius Rm. A couple of measures are optional to compute
Rm: mean, median, maximum, and minimum. When the common
sensor nodes receive Rm, detection is initiated. If a test instance xisatisfies
norm~
kð xi, xiÞ4
R
2
m,
xi is identified as an anomaly.
This scheme may suffer from a more massive procedure of
data processing, as a result of the high complexity of SVM. But,
only one parameter as the normal profile is exchanged between
the cluster head and common sensor nodes, indicating mush less
communication cost.
3.2.3. Distributed detection using clustering ellipsoids
Across the entire network, a WSN probably contains multiple
types of data underlying distribution; accordingly, Moshtaghi
et al. propose a distributed detection scheme based on clustering
ellipsoids (Masud et al., 2009). The base station takes charge of
computing the global hyper-ellipsoid, to accommodate the non-
homogenous data underlying distributions. The common sensornodes are in charge of performing detection, on the other hand,
with the global hyper-ellipsoid.
The general form of the elliptical boundary is represented as
ellða, A; t Þ ¼ f xAR pjð xaÞT Að xaÞ ¼ t 2g,where a is the center of the ellipsoid and t is its effective radius.
The Mahalanobis distance of x is
J xmJV 1 ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið xmÞT V 1ð xmÞ
q ,
where m is the mean and V is the covariance matrix. Conse-
quently, x is actually resided within a hyper-ellipsoidal boundary
if its Mahalanobis distance is t , i.e.:
Bðm,V 1; t Þ ¼ f xAR pjJ xmJ2V 1 ¼ t 2g: x is considered as a local anomaly if falling outside this boundary.
Hyper-ellipsoids are sent to the base station by the common
sensor nodes as local normal profiles, where a global ellipsoid is
produced. In order to satisfy as many types of data underlying
distribution as possible, t is intentionally selected. In addition,
these ellipsoids reported by the common sensor nodes are
disposed off with clustering which reduces the redundancy
between them. Given a common sensor node N j sending the
parameter tuple (m j, V j, n j) regarding its local ellipse E j to the base
station B , the similarity between two ellipsoids is measured as
S ðE 1,E 2Þ ¼ eJm1m2J:Positive root eigenvalue (PRE) plot is employed to estimate the
number of clusters c . Ellipses merge as a pairwise manner whenthe similarities and c are ready: Let (mu, V u, nu) and (mv, V v, nv) be
the parameter tuples of the ellipsoids E u and E v respectively, the
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1311
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
12/25
Author's personal copy
parameter tuple of the global ellipse E 0 will be (m,V ,n):
n ¼ nu þnv,
m ¼ nun
mu þ nv
n mv,
V ¼ nu1n1 V u þ
nv1n1 V v þ
nunvnðn1Þ ½ðmumvÞðmumvÞ
T :
This parameter tuple of the global ellipse is the global normal
profile in fact. When the common sensor nodes receive it from B,
detection is launched locally.
Using the base station to undertake the main computing tasks,
this detection scheme is energy-efficient. However, there is a
scope for thinking over better similarity measures for hyper-
ellipsoids, which take the shape and orientation of the ellipses
into consideration, as well as their separation. Moreover, more
robust methods are in need to merge ellipses which are from
slightly different underlying distributions. In this context, it also
desires for a more appropriate boundary than a standard devia-
tion, in order to avoid excessive false positive alarms.
3.2.4. Detection using multi-agent and refined clustering
Wang et al. (2009) introduce a multi-agents-based detection
scheme, which takes advantage of self-organizing map (SOM)
neural network algorithm and K-means clustering algorithm.
Detection agents including sentry, analysis, response, and man-
agement are attached to each node over the network, which
particularly take charge of detection. In this scheme, the cluster
head is taking care of its common sensor nodes, whereas a part of
common sensor nodes are activated in terms of their remaining
energy for monitoring the cluster head.
In fact, the cluster head and common sensor nodes monitor
with each other, using a same principle. The input dataset is
clustered by SOM neural network first of all. Afterwards, theclusters are refined by using K-means clustering algorithm. Let
Dxi be the Euclidian distance between xi and the center of its
cluster X j1. If Dxi is larger than the distance between xi and the
center of another cluster X j2, xi is re-clustered into cluster X j2. The
U-Matrix Map of the weight generated by neural network enables
to identify anomaly. Once anomaly is perceived, the trust degree
between two nodes is decreased. The definitive alarm is produced
until the degree of trust is below a predefined threshold.
The participation of agents provides this scheme with higher
flexibility, but also incurs excess costs. Letting the cluster head be
attended increases the security, as it meets ‘‘trust-no-node’’.
However, employing SOM neural network algorithm and K-means
clustering algorithm at the same time brings a massive computa-
tion burden.
3.2.5. Optimized detection using genetic algorithm
This GA-based scheme does not focus on detection explicitly,
but it is able to not only speed up the detection accuracy, but also
reduce the false alarm rate (Rahul et al., 2009). This scheme
allocates the monitoring function to the sensor nodes through
using GA to evaluate its fitness on the basis of workloads patterns,
packet statistics, utilization data, battery status, and quality-of-
service compliance.
Sensor nodes are classified as cluster head (CH), inactive node
(powered off), inter-cluster router (ICR), and common sensor
node (NS) in particular. The base station obtains a competing
fitness function based on GA to optimally select CH or ICR as the
local monitoring node (LMN), where each solution is representedas a binary string (chromosome) and an associated fitness
measure. From the mating pool, a solution is picked out with a
probability P i, as
P i ¼ F iPN
j ¼ 0 F j,
where F i is the functional fitness of a possible solution, and N is
the total number of possible solutions. LMN agent is in charge of
monitoring its neighbor nodes: (a) received signal strength,(b) transmission periodicity, (c) spurious transmissions from
illegitimate nodes, (d) response delay, and (e) packet dropping
or modification. In addition, the base station utilizes LMN as a
loop-back agent to transmit special patterns through its trusted
route and receive the patterns with a pre-established route, in
which malicious nodes can be identified by the transmitting of
hashed data. Moreover, the base station covers the entire network
with optional techniques (statistical metrics and models, Markov
model, and time series model, etc.) on the basis of analytical
traffic data and LMN alerts. The fitness function consists of
monitoring node integrity fitness (MIF), monitoring node battery
fitness (MBF), monitoring node coverage fitness (MCF), and
cumulative truest fitness (CTF). MIF resists the allocation of
LMN which is suspected to be compromised; the base stationestimates MIF with integrity rank value, whereby a low value
indicates high susceptibility to intrusion.
MIF ¼PN
ch ¼ 1 IRch K chPN ch ¼ 1 K ch
þPN
icr ¼ 1 IRicr K icr PM icr ¼ 1 K icr
,
K x ¼ 1 if x ¼ LMN ; xAðch,icr Þ,
IRicr ¼PR
r ¼ 1 IRr icr
R ,
where IRch and IRicr are the integrity ranks of CH and ICR
respectively, R is the number of routes, and IRicr r is the integrity
rank of the route r that includes icr as a router in its path. IR is
estimated by the base station according to
R x, y ¼ covð x, yÞ
varð xÞ varð yÞ ; 1oR x, yo1,
lðt Þ ¼ a lðt 1Þþð1aÞ lðt 1Þ,
IDC ¼varXnk ¼ 0
lk
! E
Xnk ¼ 0
lk
!, ,
where lðt Þ stands for the actual number of the packet arrivalsduring interval t , lðt Þ stands for the estimated number of thepacket arrivals during interval t , and lk is the number of the
packet arrivals between time intervals tk and tk þ 1. MBF reflects apenalty on the battery usage of the communication between
sensor nodes, as
MBF ¼PN
i BC i K iPN i K i
, BCi ¼ f ðQ ,U Þ,
where Q is the residual battery capacity, BCi is the projected
battery capacity of node i (CH or ICR). Battery usage rate (U )
depends on individual load and can be estimated with traffic
patterns and node-sync data. MCF rewards LMNs those can snoop
around the maximal number of nodes with low estimated
integrity rank:
MCF ¼ 12
b1PN
i ciF 1 N
þ b2PM
j c jF 1 M
!,
b1 þb2 ¼ 1,where ci is the number of LMN agents that monitor maliciousnode i, which is below the integrity rank threshold, c j is the
number of LMN agents that monitor non-malicious node j, which
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–13251312
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
13/25
Author's personal copy
is above the integrity rank threshold, and F 1 and F 2 are the
expected coverage redundancies for each malicious and non-
malicious node respectively. The total fitness is given by CTF, as
CTF ¼a1MIF þa2MBF þa3MCF :This scheme is extremely appropriate to cooperate with any
detection scheme, for not only conserving resource usage, but alsopromoting its detection performance. The limitation of this
scheme is that GA suffers from exponential time increase if the
network’s scale grows.
3.2.6. Research problems
Data mining and computational intelligence algorithms-based
detection schemes characterize by strong detection generality,
meaning effective to defense against a wider range of security
threats even if unknown. The tempting detection generality, of
course, comes along with high complexity, such that these
schemes’ best effort are tried to operate in distributed manner
(Rajasegarar et al., 2006, 2007; Masud et al., 2009).
Not simply profiting from the hierarchical architecture of the
network, such as proficient control and management, littleredundancy of routing, and adaptability to a distributed manner,
arranging the primary computing tasks to the base station also
provides the detection schemes with much more conversation of
energy overheads (Masud et al., 2009; Rahul et al., 2009).
Equipping each sensor node with detection agents could enhance
the performance and the ease of implementation without taking
too much energy in sensor nodes away (Wang et al., 2009), but
certainly leads to extra expense on advanced devices.
In fact, the GA-based scheme (Rahul et al., 2009) is an
attractive paradigm for developing intelligent detection schemes
over WSNs. A few of significant factors relating to the benign
status are modeled with a fitness function in each potential
solution, according to which the best solution is eventually found
by an optimizing process. The final detection solution couldachieve maximal detection performance with minimal resource.
This scheme is able to cooperate with a range of detection
techniques, and makes them more intelligent.
3.3. Game theory-based techniques
3.3.1. Non-cooperative game theory
A game theory-based scheme is introduced for finding out the
vulnerable areas in a WSN (Agah et al., 2004a), based on many
risk factors such as reliability of a sensor node, different types of
attack, and past behaviors of the attacker. Only these identified
areas are provided with the protection of detection, in order to
save the energy cost.
Intrusion detection is modeled as a game played between
detection system and adversary. Each player is allowed to select
a strategy from a set of strategies once. Given a fixed cluster in
the network, say K , these strategies are available to adversary:
attack cluster K , not attack cluster K , and attack a different
cluster. Detection system responds to either defend cluster K , or
defend a different cluster. The strategies are marked with 1 to
3 and 1 to 2 for adversary and detection system respectively,
where two 2 3 payoff matrixes A and B can be established. Theproblem is to find out the optimized strategy that maximizes the
profit for both players, namely achieving Nash equilibrium.
Measuring the payoff depends on a couple of factors, including
attack type, density of sensor nodes, and the number of previous
attacks. Nash equilibrium is achieved when both players selected
their own first strategy. In other words, protecting the clusterwhich has the highest value of U (t )C k brings about a reliablerate of successful detection, where U (t ) indicates the utility of
the network’s on-going sessions, and C k indicates the average
cost of protecting cluster K .
3.3.2. Comparisons with game theory-based scheme
The non-cooperative game theory-based scheme (Agah et al.,
2004a) is then compared with Markov decision process (MDP)
and intuitive traffic measure (Agah et al., 2004b).With a stochastic process known as Markov Chain, MDP can do
forecasting by modeling the system’s state transitions in the past.
MDP contains a tuple (S , A,R,tr ), where S is a state set, A is a set of
actions, R is the reward function, and tr is the state-transition
function. The past system states and the transitions between
states can be described by a MDP model. The target is to
maximize the expected value of the received rewards over time.
On the other hand, the traffic measure is based on the intuitive
metric, so that the cluster which suffers from heaviest traffic
volume is marked as the most vulnerable area. Because of taking
account into many factors, the non-cooperative game theory-
based scheme accomplishes highest forecasting accuracy among
others.
3.3.3. Research problems
Similar to the GA-based scheme (Rahul et al., 2009) mentioned
earlier, non-cooperative game theory-based schemes are not
concerned with detection immediately; however, it could assist
detection schemes in advancing their performance as well as
efficiency. The design of the payoff function is crucial to the
forecasting accuracy, which is worth more studying. Moreover, if
the GA-based scheme which is capable of optimizing the place-
ment of the monitoring nodes could cooperate with the game
theory-based scheme which enables identifying the vulnerable
areas, it is expected that the detection schemes can achieve better
performance.
3.4. Hybrid detection
3.4.1. Detection with prevention technique
There is only a hybrid detection framework (Su et al., 2005),
which really calls for the collaboration between the energy-saving
detection technique and the authentication-based prevention
technique. In the detection scheme, the cluster head is respon-
sible for monitoring its common senor nodes; on the other hand, a
part of the common senor nodes are picked out in terms of their
residual energy to monitor their cluster head in turn.
A suite of secret keys are established during initialization, in
which the base station and common sensor nodes share the
individual secret key, each common sensor node shares a set of
pairwise secret keys with its neighbors, the common sensor nodes
within a cluster share a cluster secret key, and the group secret keyis shared among all sensor nodes over the network. The packets
transmitting through the network are categorized as control mes-
sages and sensed data. When the base station, cluster head, or any
intermediate node forwards a control message, a message authenti-
cation code (MAC) is appended with proper secret key. The inter-
mediate nodes forwarding this control message verify the appended
MAC and replace it with a new MAC. The verifying and replacing
of MAC continues until this control message arrives at its destina-
tion. If sender (u) sends control message (M ) to receiver (vi) with
current time stamp T c , a MAC is generated by a proper secret key
according to
u-vi : M ,T c ,MAC ðK uvi ,M jT c Þwhere M
jT c is the concatenation of M and T c , and MAC
ðK uvi ,M
jT c
Þ is
the MAC generated from M jT c with the secret key K uvi which isshared between u and vi. When a common sensor node (vi) forwards
a sensed data (D) to the cluster head (u), u needs to verify D to
M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 1302–1325 1313
-
8/17/2019 Anomaly Detection in Wireless Sensor Networks- A Survey
14/25
Author's personal copy
prevent from any fake or redundant messages sent by the attackers.
Because D is usually large and periodically sent from vi to u, the
generation of MACs during the forwarding path is time-consuming
and impractical for a WSN. In consequence, an enhanced authenti-
cation scheme of LEAP is put forward. The original LEAP cannot
identify the compromised nodes, as all the common sensor nodes
within a cluster share only one cluster secret key. First, pairwisesecret key is used by the enhanced scheme, instead of cluster secret
key which is used by the original LEAP. Second, the enhanced
scheme employs one-time key chain as session keys, which is fairly
efficient for authentication.
The detection is implemented in accordance to three types of
misbehaviors: packet dropping, packet duplicating, and packet
jamming. This detection scheme can be divided into two parts:
one is that the cluster head monitors its common sensor nodes
and the other one is that the common senor nodes monitor their
cluster head in turn. In particular, monitoring the cluster head
consists of arranging monitoring nodes, reacting to the abnormal
cluster heads, determining the alarm threshold, and determining
the group size. Moreover, monitoring the common sensor nodes is
simply to localize the suspicious node by pairwise secret key if anomaly found.
This scheme is certainly able to reach at energy-efficient as
well as strongly secured, by taking consideration into many
details, for example linking detection against internal attackers
with prevention against external attackers together, using one-
time key chain, letting the cluster head to be attended with
minimized energy cost, and fast localizing the compromised
nodes with a secret key. However, sensor nodes cannot move
and new sensor nodes cannot be added, once the pairwise key has
been established. Probably a dynamic key management and a
distribution mechanism could overcome this flaw.
3.4.2. Research problems
Few schemes (Zhang et al., 2008) mentioned to cooperate with
a prevention-based technique in hierarchical WSNs. Moreover,
the security foundation established with a prevention technique
is only served as enhancing the security of the network, instead of
taking advantage of the functions brought by the availability of
secret keys. WSNs should have been protected by a security
foundation (Perrig et al., 2001). Apparently, the detection scheme
will be more efficient if capable of utilizing the functions provided
by this security foundation, rather than making use of prevention
and detection separately.
4. Anomaly detection based on flat WSNs
In flat WSNs, rule-based techniques and statistical techniques
are more likely to be made use of. Without hierarchical architec-
ture, all nodes are equally capable of functioning and participat-
ing in internal protocols. Consequently, detection schemes which
are lightweight and require less communication are preferable. In
this section, we survey some of the representative literatures for
each technique category mentioned above.
A rule-based model is commonly developed in accordance
with assumptions, information, or experiences known in advance.
As a result, it often focuses on specific security issues by examin-
ing the particular attributes of networking behaviors. In flat
WSNs, statistical techniques are relatively simpler than those
for hierarchical WSNs, because of the nature of the architecture.
Because data mining and computation intelligence techniques
often depend on a central entity to cope with heavy organiza-tional tasks, flat architecture is naturally disabled for this,
although data mining and computation intelligence techniques
might be implemented with assistance such as the installation of
agents (Ho et al., 2009).
Besides, detection methods in flat WSN are also diverse.
Minimizing energy consumption while retaining good perfor-
mance is always important, and this is discussed along with the
various detection methods mentioned in the proposed detection
schemes below.
4.1. Rule-based detection
4.1.1. Decentralized detection using rules
A decentralized rule-based scheme is proposed (Silva et al.,
2005), in which a rule union picked from a set of candidate rules
is applied to satisfy the specific demands of application scenarios.
Given a WSN composed of common nodes, monitor nodes,
intruder nodes, and base station, each monitor node is in charge
of monitoring the neighbors within its radio range, by turning the
promiscuous listening mode on.
In particular, this scheme makes up of data acquisition, rule
application, and intrusion detection. In the first phase, each
monitor node collects messages by a promiscuous listening modeand filters off the important information for subsequent analysis.
The applicable rules are selected out according to requirements
during the second phase. As for the intrusion detection phase,
failing to match a rule increases one onto the failure counter. An
alarm is produced until this counter is over a predefined thresh-
old within a round of detection.
This scheme gives a good framework to rule-based detection.
But, there is a lack of clear description in regard of the details of
determining monitor nodes, such as particularly how many and
which sensor nodes should be on duty to make sure the entire
network is under protection.
4.1.2. Detection using multi-hop ACK Building upon a mechanism of multi-hop acknowledgement, a
detection scheme is put forward to defense against selective
forwarding attack (Yu and Xiao, 2006). Detection is active during
the path forwarding packets from the source node to the base
station, where the base station, intermediate nodes, and source
node take part.
A security foundation has to be established firstly, including
(A) node initialization and deployment, and (B) OHC (one-way
hash chain) based one-to-many authentication. The secret key
server loads every sensor node with a unique secret key and a
symmetric bivariate polynomial f (u, v) in the initialization. The
unique secret key is shared between this node and the base
station, and can be used for encrypting messages and genera-
ting MACs (message authentication codes). In the deployme