the pennsylvania state university the graduate school security...
TRANSCRIPT
![Page 1: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/1.jpg)
The Pennsylvania State University
The Graduate School
SECURITY AND PRIVACY SUPPORT FOR
WIRELESS SENSOR NETWORKS
A Dissertation in
Computer Science and Engineering
by
Min Shao
c© 2008 Min Shao
Submitted in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
December 2008
![Page 2: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/2.jpg)
The dissertation of Min Shao was reviewed and approved∗ by the following:
Guohong Cao
Professor of Computer Science and Engineering
Dissertation Advisor, Chair of Committee
Thomas F. La Porta
Distinguished Professor of Computer Science and Engineering
Sencun Zhu
Assistant Professor of Computer Science and Engineering
Bhuvan Urgaonkar
Assistant Professor of Computer Science and Engineering
Heng Xu
Assistant Professor of Information Science and Technology
Raj Acharya
Professor of Computer Science and Engineering
Department Head of Computer Science and Engineering
∗Signatures are on file in the Graduate School.
![Page 3: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/3.jpg)
Abstract
Sensor networks have been envisioned to be very useful for a broad spectrumof emerging civil and military applications. However, sensor networks are alsoconfronted with many security threats such as node compromise, routing disruptionand false data injection, because they normally operate in unattended, harsh orhostile environments. Due to the unique characteristics of sensor networks suchas limited system resources and large scale deployment, traditional security andprivacy solutions cannot be applied to sensor networks.
The goal of this thesis is to provide solutions to deal with security and pri-vacy attacks in sensor networks. First, we design and evaluate solutions for local,passive, external attackers. Traditionally, dummy messages are used to hide theevent source. To reduce the message overhead, we propose a cross-layer solutionutilizing beacons at the MAC layer. In this solution, the event information is firstpropagated several hops through a MAC-layer beacon. Then, it is propagated inthe routing layer to the destination to avoid further beacon delay. Second, to de-fend against global, passive, and external attackers, dummy messages are used. Toreduce the event notification delay, we propose a FitProbRate scheme based on astatistically strong source anonymity model. Our analysis and simulation resultsshow that this scheme, besides providing provable privacy, significantly reducesreal event reporting latency compared to other schemes. To reduce the networktraffic, we select some sensors as proxies that proactively filter dummy messageson their way to the base station. Since the problem of optimal proxy placement isNP-hard, we employ local search heuristics. We propose a Proxy-based FilteringScheme and a Tree-based Filtering Scheme to accurately locate proxies. Simu-lation results show that our schemes not only quickly find nearly optimal proxyplacement, but also significantly reduce message overhead and improve messagedelivery ratio. Finally, we study the internal attacker in data-centric sensor net-works (DCS) and present pDCS, a privacy-enhanced DCS network which offers
iii
![Page 4: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/4.jpg)
different levels of data privacy based on different cryptographic keys. In addition,we propose several query optimization techniques based on the Euclidean SteinerTree and Keyed Bloom Filter to minimize the query overhead while providing cer-tain query privacy. Detailed analysis and simulations show that the Keyed BloomFilter scheme can significantly reduce the message overhead with the same level ofquery delay and maintain a very high level of query privacy.
iv
![Page 5: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/5.jpg)
Table of Contents
List of Figures ix
List of Tables xii
Acknowledgments xiii
Chapter 1Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Focus of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Preserving Source Location Privacy Under External, Local,Passive Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Preserving Source Location Privacy Under External, Global,Passive Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Securing DCS System Under Internal, Local, Active Attacks 61.4 Outlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 2Related Work 82.1 Key Management for Sensor Networks . . . . . . . . . . . . . . . . 8
2.1.1 Asymmetric Cryptography . . . . . . . . . . . . . . . . . . . 92.1.2 Symmetric Cryptography . . . . . . . . . . . . . . . . . . . . 9
2.2 Anonymous Communication . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Anonymous Communications in Wired Networks . . . . . . . 112.2.2 Anonymous Communications in Wireless Networks . . . . . 12
v
![Page 6: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/6.jpg)
Chapter 3Preserve Source Location Privacy Under External, Local, Pas-
sive Attacks 153.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.2 MAC Layer Encryption . . . . . . . . . . . . . . . . . . . . . 17
3.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.2 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.3 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.4 Privacy Protection Simulation Model . . . . . . . . . . . . . 20
3.4 The Naive Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4.1 Privacy Protection . . . . . . . . . . . . . . . . . . . . . . . 203.4.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4.3 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 A Cross-Layer Solution . . . . . . . . . . . . . . . . . . . . . . . . . 223.5.1 Privacy Protection . . . . . . . . . . . . . . . . . . . . . . . 233.5.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.5.3 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 A Double Cross-Layer Solution . . . . . . . . . . . . . . . . . . . . 273.6.1 Privacy Protection . . . . . . . . . . . . . . . . . . . . . . . 273.6.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.6.3 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . 303.7.1 The Impact of Source-Destination Distance . . . . . . . . . . 323.7.2 The Impact of Beacon Interval . . . . . . . . . . . . . . . . . 333.7.3 The Impact of Base Station Location . . . . . . . . . . . . . 343.7.4 The Impact of the Attacker’s Hearing Range . . . . . . . . . 34
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 4Preserve Source Location Privacy Under External, Global, Pas-
sive Attacks 364.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Towards Statistically Strong Source Anonymity . . . . . . . . . . . 37
4.2.1 System Model and Design Goal . . . . . . . . . . . . . . . . 384.2.1.1 Network Model . . . . . . . . . . . . . . . . . . . . 384.2.1.2 Adversary Model . . . . . . . . . . . . . . . . . . . 384.2.1.3 Design Goal . . . . . . . . . . . . . . . . . . . . . . 39
vi
![Page 7: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/7.jpg)
4.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . 404.2.3 The FitProbRate Scheme . . . . . . . . . . . . . . . . . . . 42
4.2.3.1 Policy for Dummy Traffic Generation . . . . . . . . 424.2.3.2 Policy for Embedding Real Traffic . . . . . . . . . 444.2.3.3 A Running Example . . . . . . . . . . . . . . . . . 49
4.2.4 Performance Evaluations . . . . . . . . . . . . . . . . . . . . 504.2.4.1 Comparison between FitProbRate and ConstRate . 504.2.4.2 Comparison between FitProbRate and ProbRate . 51
4.2.5 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . 524.2.5.1 Security Property . . . . . . . . . . . . . . . . . . . 534.2.5.2 Robustness to Distribution Tests . . . . . . . . . . 544.2.5.3 Robustness to Mean Test . . . . . . . . . . . . . . 56
4.2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3 Preserve Source Anonymity with Minimum Network Traffic . . . . . 59
4.3.1 System Model and Design Goals . . . . . . . . . . . . . . . . 594.3.1.1 Network Model . . . . . . . . . . . . . . . . . . . . 594.3.1.2 Attack Model . . . . . . . . . . . . . . . . . . . . . 604.3.1.3 Design Goals . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 Proxy-based Filter (PFS) Scheme . . . . . . . . . . . . . . . 614.3.2.1 Scheme Overview . . . . . . . . . . . . . . . . . . . 614.3.2.2 Proxy Placement . . . . . . . . . . . . . . . . . . . 634.3.2.3 Proxy Operations . . . . . . . . . . . . . . . . . . . 674.3.2.4 Security Analysis . . . . . . . . . . . . . . . . . . . 72
4.3.3 Tree-based Filter Scheme (TFS) . . . . . . . . . . . . . . . . 734.3.3.1 Hierarchical Proxy Placement . . . . . . . . . . . . 744.3.3.2 Multi-level Buffering Delays . . . . . . . . . . . . . 754.3.3.3 Security Analysis . . . . . . . . . . . . . . . . . . . 76
4.3.4 Practical Considerations . . . . . . . . . . . . . . . . . . . . 764.3.4.1 System Parameters . . . . . . . . . . . . . . . . . . 764.3.4.2 Role Shifting among Proxy Nodes . . . . . . . . . . 784.3.4.3 Insider Attacks . . . . . . . . . . . . . . . . . . . . 79
4.3.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . 794.3.5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . 794.3.5.2 Simulation Results . . . . . . . . . . . . . . . . . . 804.3.5.3 Prototype Implementation . . . . . . . . . . . . . . 81
4.3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
vii
![Page 8: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/8.jpg)
Chapter 5Secure DCS System Under Internal, Local, Active Attacks 825.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.2 Models and Design Goal . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . 855.2.2 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2.3 Security Assumption . . . . . . . . . . . . . . . . . . . . . . 875.2.4 Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 pDCS: Privacy Enhanced Data-Centric Sensor Networks . . . . . . 885.3.1 The Overview of pDCS . . . . . . . . . . . . . . . . . . . . . 895.3.2 Privacy Enhanced Data-Location Mapping . . . . . . . . . . 90
5.3.2.1 Scheme I: Group-key–based Mapping . . . . . . . . 915.3.2.2 Scheme II: Time-based Mapping . . . . . . . . . . 935.3.2.3 Scheme III: Cell-based Mapping . . . . . . . . . . . 945.3.2.4 Comparison of Different Mapping Schemes . . . . . 95
5.3.3 Key Management . . . . . . . . . . . . . . . . . . . . . . . . 975.3.4 Improving the Query Efficiency . . . . . . . . . . . . . . . . 102
5.3.4.1 The Basic Scheme . . . . . . . . . . . . . . . . . . 1025.3.4.2 The Euclidean Steiner Tree (EST) Scheme . . . . . 1035.3.4.3 The Keyed Bloom Filter Scheme . . . . . . . . . . 1045.3.4.4 Plane Partition . . . . . . . . . . . . . . . . . . . . 107
5.3.5 MS Data Processing . . . . . . . . . . . . . . . . . . . . . . 1085.4 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4.1 Choosing the Partition Method . . . . . . . . . . . . . . . . 1105.4.2 Performance Comparisons of Different Schemes . . . . . . . 111
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Chapter 6Conclusions and Future Work 1146.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Bibliography 117
viii
![Page 9: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/9.jpg)
List of Figures
1.1 MICA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 An application of sensor networks for animal monitoring. . . . . . . 3
3.1 802.15.4 Beacon Frame Format . . . . . . . . . . . . . . . . . . . . 173.2 modified beacon frame format in the naive solution . . . . . . . . . 203.3 Naive Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4 A Cross-Layer Solution . . . . . . . . . . . . . . . . . . . . . . . . . 233.5 modified beacon frame format in the cross-layer solution . . . . . . 243.6 Privacy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.7 hunter’s trace with MAXHOP=6 in a 100*100 network when 66
event messages are sent. The number in the figure shows the hunterhop count. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8 A Double Cross-Layer Solution . . . . . . . . . . . . . . . . . . . . 283.9 hunter’s trace in a 100*100 network with MAXHOP=6 when 133
event messages are sent. The number in the figure shows the hunterhop count. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 The Impact of Source-Destination Distance . . . . . . . . . . . . . . 323.11 The Impact of Beacon Interval . . . . . . . . . . . . . . . . . . . . . 333.12 The Impact of Base Station Location when s-d distance is 30 . . . . 343.13 The Impact of Attacker’s Hearing Range . . . . . . . . . . . . . . . 35
4.1 The illustration of the FitProbRate scheme. The triangular nodes are
sending real event messages, with paths denoted by solid lines. The
square nodes are sending dummy messages, with paths following dot-
ted lines. The reference systems besides nodes indicate the PDF of the
message transmission intervals. . . . . . . . . . . . . . . . . . . . . . 434.2 A running example to illustrate the entire process. . . . . . . . . . . 494.3 Comparing average delay in the FitProbRate scheme (α = 0.05,
ǫ = 0.1) with the ConstRate scheme. . . . . . . . . . . . . . . . . . 504.4 The impact of window size and real event arrival pattern in the
FitProbRate scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ix
![Page 10: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/10.jpg)
4.5 Performance comparison between the FitProbRate scheme (α =0.05, ǫ = 0.1) and the ProbRate scheme under different real trafficpatterns. In (a)-(c), 1, 3, or 5 real event messages are generated in aburst. In (d)-(f) the solid lines are the time points when real eventsare ready and the dotted lines are the time points when real eventmessages are actually forwarded. (g)-(i) show the numerical valuesof real event transmission latency under three different real trafficpatterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 A tradeoff between α′ and β′ for the attacker (α = 0.05). . . . . . . . . 554.7 Illustration of PFS. Blank circles and filled circles represent sources
and proxies, respectively; dashed lines and solid lines denote bogusmessages and real messages, respectively. . . . . . . . . . . . . . . . 61
4.8 The optimal number of proxies in PFS. . . . . . . . . . . . . . . . . 654.9 The optimal proxy placement. . . . . . . . . . . . . . . . . . . . . . 654.10 The impact of network scale on the optimal proxy number in PFS. . 664.11 State transitions of proxies (there are three states: waiting, bogus,
real). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.12 Delay and max queue length under λreal
P= 1/60 per time unit. . . . 69
4.13 Delay and max queue length under Tproxy = 5 time units. . . . . . . 704.14 Improvement of TFS over PFS. . . . . . . . . . . . . . . . . . . . . 734.15 Delay in TFS under different Tproxy and λ (tree level l = 2). . . . . 734.16 Performance under different bogus message generation rate (heavy-
rate real events). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.17 Performance under different bogus message generation rate (light-
rate real events). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 A DCS-based sensor network which can be used by zoologists (whoare authorized to know the locations of all animals) and hunters(who should only know the locations of boars and deers, but notelephants). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 The BEPL as a function of m and s, where m is the number ofdetection cells and s the number of compromised cells . . . . . . . . 92
5.3 Overhead Comparisons among different mapping schemes . . . . . 965.4 Message Overhead Distribution of Different Mapping Schemes . . . 975.5 The mapping between physical network into a logical key tree and
the rekeying packet flows for revoking node u . . . . . . . . . . . . 995.6 Three schemes for delivering a query to the storage cells . . . . . . 1025.7 A Bloom Filter with k hash functions . . . . . . . . . . . . . . . . . 1045.8 17 storage cells are partitioned into three parts . . . . . . . . . . . . 1075.9 Performance comparisons between different partitioning schemes . . 110
x
![Page 11: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/11.jpg)
5.10 The message overhead of different schemes . . . . . . . . . . . . . . 1125.11 Comparisons among different schemes . . . . . . . . . . . . . . . . . 112
xi
![Page 12: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/12.jpg)
List of Tables
3.1 Average hop number to capture the source node when source andBS are 47 hops away . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 # of observations to draw a decision in SPRT when α′ changes(β ′ = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 # of observations to draw a decision in SPRT when β ′ changes(α′ = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
xii
![Page 13: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/13.jpg)
Acknowledgments
I would like to thank all people who have helped and inspired me during mydoctoral study.
I especially would like to express my deep and sincere gratitude to my advisor,Dr. Guohong Cao. With his enthusiasm, his inspiration and his patience, hemade research life fun for me. I am grateful for the numerous hours he has spentdiscussing our research problems and revising my writing. He was always accessibleand willing to help his students with their research. Without his encouragementand his great efforts, I would have been lost.
It is difficult to overstate my appreciation to Dr. Sencun Zhu. His detailed andconstructive comments have guided me from the beginning in my doctoral studies.Not only a great mentor, he has also been a friend in my professional development.
I wish to express my warm and sincere thanks to Dr. Thomas F. La Portaand Dr. Bhuvan Urgaonkar for the inspirational and insightful comments on mypapers, which ensured I was on the right track towards finishing my dissertation.I also want to thank Dr. Heng Xu whose thoughtful advice often gave me a senseof direction during my PhD studies.
My gratitude also goes to my co-author Yi Yang for her inspiring thoughts,insightful suggestions and hard work on numerous research problems we have met.I would like to thank Wensheng Zhang, Hui Song, Jing Zhao, Changlei Liu andYang Zhang for being wonderful group mates and making it a friendly place inwhich to work.
I wish to individually thank all of my friends which, from my childhood untilgraduate school, have joined me in the discovery of what life is about and how tomake the best of it. However, because the list might be too long and from fear ofleaving someone out, I will simply say thank you very much to all of you. Some ofyou are: Jeffrey Talada, Xin Yang, Hua Zhang, Yi Zhu for being my best friends.
I cannot finish without saying how grateful I am for my family: grandparents,uncles, aunts and cousins all have given me a loving and supporting environment.Particular thanks, to my grandpa, Fengying Chen, for his care and love through
xiii
![Page 14: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/14.jpg)
my childhood. Although he is no longer with me, he is forever remembered. Iam sure he shares my joy and happiness. Lastly, and most importantly, I wish tothank my parents, Ronghua Chen and Jianhua Shao; this dissertation is simplyimpossible without them. They have always supported and encouraged me to domy best in all matters of my life.
xiv
![Page 15: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/15.jpg)
Dedication
To my parents and in loving memory of my grandpa..
xv
![Page 16: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/16.jpg)
Chapter 1Introduction
Since the first autonomous sensing and communication “motes” came out in 1998,
sensor nodes have been undergoing several improvements, becoming smaller and
cheaper. These tiny, low-cost devices can work together and self-organize into
multi-hop wireless sensor networks.
Wireless sensor networks may consist of hundreds or thousands of sensor nodes
and are able to collect and disseminate data in areas where ordinary networks can-
not. As such, they are likely to be very useful for a broad spectrum of applications,
such as habitat monitoring, battlefield surveillance, and target tracking.
The most straightforward application of wireless sensor networks is to monitor
remote environments. For example, a natural habitat could be easily monitored for
fire hazards by sensors that automatically form a wireless network and immediately
report the detection of any abnormal temperature change. Unlike traditional wired
systems, sensor networks have low deployment cost. Instead of deploying long
wires around a habitat, only tiny devices with wireless communication capability
(Figure 1.1) are deployed at each sensing point. The network could be extended
by simply adding more devices.
1.1 Motivation
Recently there has been extensive research on developing new algorithms for data
aggregation [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], routing [11, 12, 13, 14, 15, 16, 17, 18, 19],
medium access control, power management, etc. Although these research results
![Page 17: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/17.jpg)
2
Figure 1.1. MICA2
can make sensor networks more reliable and efficient, they cannot address the
security and privacy issues.
Sensor networks may consist of hundreds or thousands of sensor nodes, where
each node may be a potential point of security attack. For example, the attackers
can launch physical attacks to these sensor nodes, extract crypto secrets, inject ma-
licious sensor nodes or reprogram a sensor node. Since sensor nodes communicate
with each other in wireless, an attacker can gain access to privacy information by
eavesdropping. The privacy problem is aggravated by the fact that sensor networks
provide remote surveillance. A large amount of information can be easily available
through remote access. Therefore, attackers don’t even need to be present to do
surveillance. In the mean time, sensor networks in unattended, harsh or hostile
environments are confronted with many other security threats such as denial of
service attacks, traffic analysis, and routing disruption.
Although traditional security mechanisms such as encryption, authentication
and secure routing can address the above security threats, they cannot address the
following threats in sensor networks. Consider a sensor monitoring application (as
shown in Fig. 1.2). When a sensor detects an event, it sends a message including
event related information to the base station. With traditional security mecha-
nisms, an attacker may not be able to determine the content of the message, but
![Page 18: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/18.jpg)
3
may find the traffic flow through simple traffic analysis. Furthermore, the attacker
may trace back to the location of the event source, and find sensitive information
such as whether, when and where an event of concern has happened. For example,
knowledge of the appearance of an endangered animal in a sensor field may enable
an attacker to take some action to capture the animal. Therefore it is important
to provide privacy besides security in sensor networks.
���� �������Figure 1.2. An application of sensor networks for animal monitoring.
1.2 Challenges
A core design challenge in wireless sensor networks is the strict resource constraints
on each individual device. Embedded processors with kilobytes of memory must
implement complex, distributed, ad-hoc networking protocols. Size reduction is
essential to cut the cost and create more applications. As the physical size de-
creases, so does the energy capacity. The underlying energy constraints end up
creating computational and storage limitations that lead to a new set of design
issues.
The memory and energy limitations of sensor nodes are major obstacles to im-
plement traditional security solutions. As a specific example, it is impractical to
use asymmetric cryptosystems in a sensor network where each node only has a slow
(7.8 MHz) processor and 4 KB of RAM (Mica2 [20]). The fact that wireless sen-
sor networks utilize unreliable communication media in unattended environment
![Page 19: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/19.jpg)
4
makes the provision of adequate security countermeasures even more difficult.
We need to overcome these restraints and provide security and privacy support
for sensor networks. However, security does not come free. In order to come up
with a proper defense, it will be helpful to first visualize possible attack models.
1.3 Focus of This Thesis
According to the classification in [21], an adversary can be decided based on the
following characteristics: Internal-External, Passive-Active and Local-Global.
• An internal adversary implies a compromised sensor node that is fully con-
trolled by the adversary, while an external adversary is one who does not
control any sensor nodes, and hence he does not know the sensitive informa-
tion such as cryptographic keys loaded in a node.
• A passive adversary only eavesdrops on the communication and collects over-
heard information, and we assume he has unlimited memory to store the
collected data. An active adversary may jam the transmission media, and
inject, drop, or modify packets.
• For a local adversary, he can observe and launch attacks in a limited range,
whereas a global adversary may eavesdrop over all the communication links
and attack any part of the sensor network. In both cases, we assume there
is no delay and communication range limitation for the adversary to collect
data from different locations.
From all these classifications, we can see a wide spectrum of attack models, which
have different levels of strength. Clearly, an external, passive, and local attack
presents the lowest threat, while an internal, active, and global attack presents the
greatest threat.
This thesis concentrates on providing security and privacy support under dif-
ferent attack models. Specifically, we address the following research problems.
![Page 20: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/20.jpg)
5
1.3.1 Preserving Source Location Privacy Under External,
Local, Passive Attacks
As discussed above, the external, local, passive attacker is the weakest attack
model. Popular solutions such as flooding or using dummy messages have been
proposed, but they all face the drawback of introducing a large amount of message
overhead. Although the researchers in [22] have made progress in protecting source
location privacy under this attack model, the privacy level becomes low when the
attacker has a larger hearing range than the sensor nodes or when it is randomly
located in a field.
To mitigate the message overhead and improve the privacy level, we utilize
beacons at the MAC layer. Beacons are sent out regularly, which essentially forms
a constant-rate of dummy messages. Using beacons to replace the dummy messages
may increase the delivery delay of event information because beacons are only sent
out at the predefined beacon interval, but this latency can be controlled. To do this,
we propose a cross-layer solution in which the event information is first propagated
several hops through a MAC-layer beacon. Then, it is propagated at the routing
layer to the destination to avoid further beacon delays. Simulation results show
that our cross-layer solutions can maintain low message overhead and high privacy,
while controlling delay.
1.3.2 Preserving Source Location Privacy Under External,
Global, Passive Attacks
Since a global attacker may eavesdrop over all the communication links and attack
any part of the sensor network, simple solutions like phantom routing in [22] won’t
work. Network-wide dummy traffic is normally used to deal with this kind of
attack in the privacy community [23].
The basic idea is to let every node in the network send out dummy messages
with intervals following a certain distribution, e.g., constant or probabilistic. When
a node detects a real event, it transmits the real event messages with intervals fol-
lowing the same distribution. As such, the attacker cannot discern the occurrence
of a real event, and cannot find out the location of the real event source.
To reduce the extra overhead caused by dummy messages, we can either use a
![Page 21: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/21.jpg)
6
low message transmission rate or filter dummy messages on the way. In this first
case, the real event report latency will be high, because a source node needs to
postpone the transmission of a real event message to the next interval. To ad-
dress these issues, we propose a notion of statistically strong source anonymity for
the first time, under a challenging attack model where a global attacker is able to
monitor the traffic in the entire network. We propose a scheme called FitProbRate,
which realizes statistically strong source anonymity for sensor networks. Then, we
prove that this scheme can provide such a privacy guarantee, even if an adver-
sary conducts various statistical tests, trying to detect real events. Our analysis
and simulation results show that this scheme, besides providing provable privacy,
significantly reduces real event reporting latency compared to other schemes. In
the second case, we propose a Proxy-based Filtering Scheme (PFS) and a Tree-
based Filtering Scheme (TFS). In PFS, some sensors are selected as proxies to
collect and filter dummy messages from surrounding sensors. PFS greatly reduces
the communication cost of the system by dropping many dummy messages before
they reach the base station. In TFS, proxies are organized into a tree hierarchy.
Proxies closer to the base station filter traffic from proxies farther away, thus the
message overhead is further reduced.
1.3.3 Securing DCS System Under Internal, Local, Active
Attacks
In data-centric sensor networks (DCS), the data are saved inside the sensor net-
work, and the sensor data in contrast to sensor nodes are named based on attributes
such as event type or geographic location. However, saving data inside a network
also creates security problems due to the lack of tamper-resistant sensor nodes and
the unattended nature of the sensor network. For example, an attacker may simply
locate and compromise the node storing the event of his interest.
To address these security problems, we present pDCS, a privacy enhanced DCS
system for unattended sensor networks. Specifically, pDCS provides the following
features. First, even if an attacker can compromise a sensor node and obtain all its
keys, he cannot decrypt the data stored in the compromised node. Second, after an
attacker has compromised a sensor node, he cannot know where this compromised
![Page 22: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/22.jpg)
7
node stored its data detected in the previous time intervals. Third, pDCS includes
very efficient key management schemes for revoking a compromised node once
its compromise has been detected, thus preventing an attacker from knowing the
future storage location for particular events. Finally, pDCS provides a novel query
optimization scheme called Keyed Bloom Filter scheme to significantly reduce the
message overhead without losing any query privacy.
1.4 Outlines
The remaining portion of the thesis is organized as follows. Chapter 2 reviews
previous work on different aspects of security and privacy issues in wired and
wireless networks. Chapter 3 presents our strategies to defend against a local,
external and passive attacker. Chapter 4 focuses on global attackers. Chapter
5 presents solutions against insider attacks in data-centric sensor networks. We
conclude the thesis and discuss the future works in Chapter 6.
![Page 23: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/23.jpg)
Chapter 2Related Work
We introduce the related work in two categories: key management and anonymous
communication.
2.1 Key Management for Sensor Networks
Key establishment and management issues have been well studied in wired net-
works. Most key establishment protocols rely on public-key such as the Diffie-
Hellman public key protocol. Most of the techniques used in wired networks,
however, are not suitable for low power devices such as wireless sensors. This is
because typical key exchange techniques use asymmetric cryptography, where, it
is necessary to maintain two mathematically related keys, one of which is public
while the other is kept private. This allows data to be encrypted with the public
key and decrypted using the private key. The problem with asymmetric cryptog-
raphy in the wireless sensor networks, is that it is computationally expensive for
the individual sensor nodes in a sensor network. This is true in the general case,
although some recent results [24, 25, 26, 27] show that it is still feasible to use
asymmetric cryptography in some cases.
Symmetric cryptography is usually the choice for applications that cannot af-
ford the high computational complexity of asymmetric cryptography. Symmetric
schemes use a single shared key known only between the two communicating par-
ties. This shared key is used for both encryption and decryption.
![Page 24: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/24.jpg)
9
2.1.1 Asymmetric Cryptography
Two of the major techniques used to implement public-key cryptosystems are RSA
and elliptic curve cryptography (ECC) [28], which is considered to be too heavy
to use in wireless sensor networks. However, several groups have successfully im-
plemented public-key cryptography in wireless sensor networks.
In [24], Gura et al. report that both RSA and elliptic curve cryptography are
possible for small devices without hardware acceleration. With 8-bit CPUs, ECC
shows a performance advantage over RSA. Another advantage is that ECC’s 160
bit keys result in shorter messages during transmission compared to the 1024 bit
RSA keys. In particular, Gura et al. demonstrate that ECC point multiplication
on small devices is comparable in performance to RSA public-key operations and
an order of magnitude faster than RSA private-key operations.
In [27], Watro et al. show that part of the RSA cryptosystem can be successfully
applied to actual wireless sensors. The TinyPK system described by [27] is designed
to allow authentication and key agreement between resource constrained sensors.
The protocol is used together with the existing symmetric encryption service for
mote networks, such as, TinySec [29]. In particular, they implemented the RSA
public operations on the sensors and the RSA private operations to an external
party, such as a laptop.
In [26], Malan et al. demonstrate a working implementation of Diffie-Hellman
based on the Elliptic Curve Discrete Logarithm Problem. And they showed that
public keys can be generated within 34 seconds, and that shared secrets can be
distributed among nodes in a sensor network within the same, using just over 1
kilobyte of SRAM and 34 kilobytes of ROM. So public-key infrastructure is viable
on the MICA2 for infrequent distribution of shared secrets.
2.1.2 Symmetric Cryptography
There are pairwise key establishment schemes using a trusted third party (BS) [30],
exploiting the initial trustworthiness of newly deployed sensors [31], and based on
the framework of probabilistic key predeployment [32, 33, 34, 35, 36, 37, 38].
Eschenauer and Gligor propose a key pre-distribution scheme [34] that relies
on probabilistic key sharing among nodes within the sensor network. Their system
![Page 25: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/25.jpg)
10
works by distributing a key ring to each participating node in the sensor network
before deployment. Each key ring should consist a number of randomly chosen
keys from a larger key pool which is generated offline. An enhancement to this
technique utilizing multiple keys is described in [32]. Further enhancements are
proposed in [33, 35]. Using this technique, it is not necessary for each pair of
nodes to share a key. However, any two nodes that do share a key may use the
shared key to establish a direct link to each other. Eschenauer and Gligor show
that it is probabilistically likely that large sensor networks will have shared-key
connectivity. Further, they demonstrate that such a technique can be extended to
key revocation, re-keying, and the resiliency to sensor-node capture.
The LEAP protocol described by Zhu et al. [31] takes an approach that utilizes
multiple keying mechanisms. Their observation is that no single security require-
ment accurately suites all types of communication in a wire- less sensor network.
Therefore, four different keys are used depending on whom the sensor node is
communicating with. Sensors are preloaded with an initial key from which further
keys can be established. As a security precaution, the initial key can be deleted
after its use in order to ensure that a compromised sensor cannot add additional
compromised nodes to the network.
In [39] geographical information is exploited to map a logical key tree [40] to
the physical tree structure so as to optimize the energy expenditure of a group
rekeying operation.
2.2 Anonymous Communication
Since Chaum’s seminal work in 1981 [41], so far hundreds of papers [42] have been
concentrated on building, analyzing, and attacking anonymous communication
systems. There is a very rich literature on anonymous communication in the
Internet [43, 44, 45, 46, 47]. However, most of them cannot be directly applied to
sensor networks due to reasons such as relying on public key techniques to achieve
sender or receiver anonymity [45], incurring high communication overhead [43, 44]
or posing special requirements on the infrastructure [47]. Research on anonymous
communications in sensor networks has only received little attention so far. In
the following, we will discuss those most relevant ones in both wired networks and
![Page 26: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/26.jpg)
11
wireless networks.
2.2.1 Anonymous Communications in Wired Networks
Mix([41]) is a node that hides the correspondences between its input messages and
its output messages in a cryptographically strong way. To achieve this goal, a mix
changes the appearance by encryption and padding. It also changes the flow of
messages by collecting multiple messages, reordering them, and then flushing them
in a batch, ensuring the timing of messages does not leak any linking information.
To sustain mix compromises, Chaum proposed to chain mixes into a cascade where
all messages go through all the mixes in a specific order, or into a fully connected
network where a user can pick an arbitrary route for his messages. In addition, it
has been shown [48] that mix networks do not offer some properties that cascades
offer and they are susceptible to a number of attacks.
Many mix-based systems have been designed later, including ISDN mixes [49]
and web mixes [50], which are mix cascades, and Mixmaster [51], Tarzan [52] and
MorphMix [53] which are traditional mix networks or peer-to-peer mix networks.
Tarzan [52] is an anonymous network overlay. It collects outgoing Internet
Protocol packets for anonymization. These packets are sent into a peer-to-peer
network before being delivered to their specified destinations. When Tarzan is
installed and configured on a computer, the computer functions as a node in the
peer-to-peer network.
Onion routing [54, 55] is the equivalent of mix networks but in the context of
circuit-based routing. That is, all messages from a user follow the same path cre-
ated by his first message through the technique of message labeling, instead of rout-
ing every message separately. All these mix systems intend to secure anonymous
communication even an attacker can control some mixes, and they use asymmetric
cryptographic operations.
Crowds is an anonymous protocol that builds upon the ideas of Onion Routing.
It contains a group of users, known as jondos. All the jondos work together and
form a ”crowd”. Each jondo within the crowd acts as a proxy server and sends
and receives messages from all over the network until one jondo decides to pass it
to the receiver. One of the attacks against this network is that the network keeps
![Page 27: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/27.jpg)
12
state with respect to the return-path of a message. This means that each jondo
needs to know who sent it in order to be able to send the message back to the
originator.
One solution to this was Hordes [56]. Hordes suggests that the network use
multicast to send the reply message back to the originator. While this does solve
the problem of being able to track which node sent out the message, it is not very
practical. Multicast is not a widely supported technology on the Internet, and it
would be very difficult to implement Hordes in today’s Internet environment.
Dummy messages ([23]) may also be introduced in a mix network to make it
more difficult for an attacker to deploy passive and active attacks. Dummies are
normally generated by the mixes although senders may also generate them. [57]
proposed to send pregenerated dummy messages when a user do not have any
message to send. To provide constant rate traffic, in the Pipenet system [58] the
links between mixes are padded with dummy messages whenever the real traffic is
not enough to fill them. In our schemes, based on a different policy both senders
and proxies may generate dummies.
2.2.2 Anonymous Communications in Wireless Networks
Research on anonymous communications in sensor networks has not received much
attention so far. And these work do not provide unobservability and only works
for a local adversary model.
In [59], techniques for hiding the base station (message destination) from an
external global adversary are studied. Two attack models against the base station
are used. One attack is to isolate the base station by blocking communications
between sensor nodes and the base station. Another attack is to determine the
location of the base station through traffic analysis. Two secure strategies are
proposed to defend against these attacks. First, to provide intrusion tolerance
against isolation of the base station, they introduce redundancy in the form of
multiple base stations. One way hash chain is used to set up multiple paths for
each sensor node to multiple base stations. To hide the traffic pattern, randomly
delaying the sending time is proposed to hide the parent-child relationship under
a given traffic model. Basically, every sensor node in the scheme is considered as
![Page 28: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/28.jpg)
13
a mix and transmits at a constant rate, which obviously, may be expensive for
sensor networks.
Ozturk et al. proposed privacy protection mechanisms in [60, 22] to defend
against an external adversary who attempts to trace back to the data source in
a sensor network where sensor nodes report sensing data to a fixed base station.
Based on flooding-based routing and single-path routing, Ozturk et al. have de-
veloped comparable schemes trying to solve other privacy problems in sensor net-
works. By exploring the fact that the messages between the single source and the
base station naturally pull the adversary to the source, they proposed to use fake
sources which inject fake messages into the network to distract the adversary. But
this approach can only provide limited protection. So they continue to propose
phantom routing techniques, which shares the same idea as fake source routing
schemes. Every message in phantom routing experiences two phases, the random
walk phase and a subsequent flooding/single-path routing phase. When the source
sends out a message, the message is unicasted in a random walk for the first hwalk
hops. After it, the message is flooded using base-line flooding.
A more recent work [61] demonstrates an example attack against the flooding
method used in [60, 22] and proposes a new random walk algorithm. It is a
two-way random walk, i.e., from both source and sink, to reduce the chance of
releasing the location information. Both the source and sink initialize a certain
hop random walk. Once the source packet reaches an intersection of these two
paths, it is forwarded through the path created by the sink. Local broadcasting
is used to detect when the two paths intersect. Whenever a sensor forwards a
message, all its neighbors overhear this message and create a route entry for the
source pointing to the forwarding sensor. Eventually, a route is built along the
forwarding path. At each stage, the sensor will pick up one of its neighbors which
has not participated in the random walk to cover an unvisited area and avoid being
backtracked. In order to do this, each sensor will keep those neighbors that have
participated in the forwarding in a Bloom Filter and compare with this Bloom
Filter each time before it selects its next hop.
In [62], a path perturbation algorithm is proposed to increase source location
anonymity. It mainly focuses on the application that continuously collect location
information, such as cellular communication. In this case, simply removing the
![Page 29: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/29.jpg)
14
user identifiers can’t solve the problem. An adversary can use multi-target track-
ing, which can be used to recreate each trajectory even when there is no identity
information available. A numerical algorithm to the path perturbation problem
over multiple user paths is proposed to confuse the attacker. This algorithm makes
use of crossing paths in areas where at least two users meet and modifies location
samples according to a nonlinear optimization solution to increase the chance to
confuse different attackers. Furthermore, they quantitatively define privacy using
two metrics: mean location privacy and mean privacy mean. With this, simulations
verify that the path perturbation algorithm improves privacy while minimizing the
perturbation of location samples.
![Page 30: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/30.jpg)
Chapter 3Preserve Source Location Privacy
Under External, Local, Passive
Attacks
3.1 Introduction
The External, local, passive attacker is the weakest attacker. Researchers in [22]
have made progress in defending against this kind of attacker, but the privacy level
of their scheme becomes low when the attacker has a larger hearing range than the
sensor nodes or when it is randomly located in a field.
In this chapter, we aim to provide a better source location privacy for sensor
networks. We consider applications, such as wildlife and environmental monitoring
which may demand privacy but can tolerate latencies on the order of a few seconds.
For these applications, network-wide dummy messages can achieve a high privacy
level, but at the cost of high network traffic and low delivery ratio. To improve
the network performance, we avoid the network-wide dummy messages by utilizing
beacons at the MAC layer. Beacons are sent out regularly, which essentially forms
a constant-rate of dummy messages. By adding event information to the beacon
frame, we can deliver the information to the base station (BS) without adding
extra traffic. However, using beacons to replace the dummy messages may increase
the delay because beacons have to be sent out at the predefined beacon interval.
![Page 31: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/31.jpg)
16
The beacon interval is usually long to allow sensor sleep and generate less traffic.
Therefore, it is a challenge to control the delivery delay without changing the
beacon interval.
We propose two cross-layer solutions in which the event information is first
propagated several hops through the MAC-layer beacon. In the first solution,
after information is propagated via beacon broadcasts to a random node, it is
routed directly to the BS. In the second, we provide a second level of indirection.
After the first round of beacon broadcasting the event information is routed to a
random node instead of the BS from where it undergoes a second round of beacon
broadcasting. After this, it it routed to the BS.
Compared to existing work [22] which provides privacy against attackers sitting
near the BS, our solution can provide much higher privacy. Further, our solution
can provide much better privacy against attackers located randomly, perhaps even
close to events, within the sensor field. Simulation results demonstrate that our
cross-layer solutions can maintain low message overhead and high privacy, while
controlling the delay.
The rest of the chapter is organized as follows. We first introduce some back-
ground knowledge in Section 3.2. Then, we present the assumptions and design
goals in Section 3.3. After that, Section 3.4,3.5,3.6 present three privacy protection
schemes. The performance is evaluated in Section 3.7. Finally, we conclude this
chapter in Section 3.8.
3.2 Background
In this section, we present some background information on IEEE 802.15.4 which
is used in wireless sensor networks.
3.2.1 Beacons
In beacon-enabled IEEE 802.15.4, beacons are sent out periodically to announce
node presence and certain system parameters as shown in Figure 3.1. The MAC
payload contains the superframe specification, the pending address specification,
address list, and beacon payload fields. The MAC payload is prefixed with a MAC
![Page 32: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/32.jpg)
17
header and appended with a MAC footer. The MAC header contains the MAC
frame control fields, beacon sequence number and addressing information fields.
The MAC footer contains a 16 bit frame check sequence. The MAC header, MAC
payload, and MAC footer together form the MAC beacon frame. The beacon
payload field is an optional sequence of up to MaxBeaconPayloadLength octets
specified to be transmitted in the beacon frame by a higher layer.
Bits: 0-3 4-7 8-11 12 13 14 15 Beacon
order Superframe
order Final CAP
slot Battery life extension
Reserved PAN
coordinator Association
permit
Octets:2 1 4 or 10 2 variable variable variable 2
MAC footer
Frame check
sequence
MAC header
Source address
information
MAC payload
Superframe
specification
GTS
fields
Pending address
fields
Frame
control
Beacon sequence
number
Beacon payload
Figure 3.1. 802.15.4 Beacon Frame Format
Nodes sleep between beacons to extend their battery life. Beacon interval
ranges from 15.36 milliseconds to 786.432 seconds as defined in IEEE 802.15.4.
3.2.2 MAC Layer Encryption
IEEE 802.15.4 provides link layer security which includes four basic security ser-
vices: access control, frame integrity, data encryption, and sequential freshness.
Data encryption uses a stream cipher to protect data from being read by parties
without the cryptographic key.
The symmetric encryption algorithm of 802.15.4 uses a key K and an initial vec-
tor (IV) as the seed and stretches it into a large pseudo-random keystream GK(IV ).
The keystream is then xored against the plain text: C = (IV, GK(IV )⊕
P ). The
security mechanisms in 802.15.4 are symmetric-key based using keys provided by
the upper layer. Given the rich literature in key management, we do not address
key management issues in this chapter. We assume two nodes can establish a
pairwise key based on existing solutions [33, 35, 31].
![Page 33: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/33.jpg)
18
3.3 Problem Definition
In this section we begin by introducing the network and attacker model, and then
discuss the simulation framework.
3.3.1 Network Model
As in other sensor networks [63], we assume that a sensor network is divided into
cells (or grids) where each pair of nodes in neighboring cells can communicate
directly with each other. A cell is the minimum unit for detecting events; for
example, a cell header coordinates all the actions inside a cell. For the network to
be connected, we assume the nodes in a cell rotate their roles as cell leader and at
least one node in the cell is awake. Each cell has a unique id and every sensor node
knows in which cell it is located through its GPS or an attack-resilient localization
scheme [64, 65].
We assume that a base station (BS) works as the network controller to collect
the event data. The BS is interested in the source of the event. Every event has
an event id; for example, we may assign a unique id to each type of animal. When
a cell detects an event, it will send a triplet (cell id, event id, timestamp), which
provides the BS with the source location of the event as well as the time it was
detected. We assume there is only a single event source, which is stationary.
3.3.2 Attacker Model
According to the classification in [21], we assume that the attacker is external, local
and passive. By external, we assume that the attacker does not compromise or
control any sensors. The attacker may launch active attacks by channel jamming
or other denial-of-service attacks. However, since these attacks are not related to
source anonymity, we do not address them in this chapter.
A local attacker can only observe and launch attacks in a limited range. Sup-
pose a sensor network is used to monitor the appearance of pandas. Once a panda
appears in some place, the sensors in that place will send a message to the BS.
A hunter will be the attacker, and it tries to trace back to the event source to
locate the panda. Similar to [22], we assume that the attacker starts from the
![Page 34: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/34.jpg)
19
BS, where it is guaranteed that all packets will arrive eventually. The hunter (at-
tacker) is constantly listening/receiving. Once the attacker hears the first message,
it knows which node among the neighborhood sent that message, and will move
to the transmitting node. If the attacker does not hear any message for a certain
time, it goes back one step and keeps listening. The attacker repeats this process
until it reaches the source.
We also consider attackers that are located close to the event or in the sensor
field between the event and the BS. These are important to analyze as well because
attackers may be present in the sensor field, although it is not considered in [22].
We assume that the attacker has sufficient resources (e.g., in storage, computa-
tion and communication). The attacker can launch a rate monitoring attack and
a time correlation attack. In a rate monitoring attack, the attacker pays more
attention to the nodes with different (especially higher) transmission rates. In a
time correlation attack, the attacker may observe the correlation in transmission
time between a node and its neighbor, attempting to deduce a forwarding path.
3.3.3 Design Goals
Encryption and authentication can be used to prevent content-based analysis.
However, these techniques cannot prevent the attack model we described above. To
defend against this traffic analysis attack, we hide the event related traffic. There
are different ways to hide this traffic. For example, we can initiate network-wide
dummy traffic; this provides the best privacy but the worst performance (high
collision rate, low delivery ratio, high network traffic, low network lifetime, etc.).
On the other hand, if we reduce the amount of dummy traffic, the performance
improves but at the cost of privacy.
In this chapter, we try to achieve source anonymity, but also consider the
traffic overhead. It is always desirable for the network to have low traffic overhead;
however, the privacy level will be low if the source node sends the event notification
directly to the BS. For example, the attacker can determine the forwarding path
and trace it back to the source. Decreasing traffic overhead without losing privacy
is a challenge.
![Page 35: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/35.jpg)
20
3.3.4 Privacy Protection Simulation Model
We built a simulator based on CSIM to study the privacy level of different solutions.
The network has 10,000 nodes and the BS is located at the center of the network.
An event is created at a random location and stays there until it is captured or
a certain amount of time expires. Once the attacker gets close to the event (e.g.,
within certain hops of the panda), the event is considered captured. As soon as
the event appears at a location, the closest sensor node, which becomes the source,
will start sending packets to the BS reporting its observations.
The source generates a new packet every 50 clock ticks until the simulation
ends, which occurs either when the attacker catches the event source or when the
attacker cannot catch the event source within a certain amount of time. In the
trace back simulations, the attacker starts from the BS and moves toward the
transmitting node. If the attacker fails to hears any traffic within 200 clock ticks,
it goes back one step.
3.4 The Naive Solution
Observation: Beacons perform as a “heart-beat” in a beacon-supported network
even when there are no events detected. Therefore, if we add event information to
the MAC layer beacons, the event information can be spread to the BS without
incurring extra routing layer traffic.
3.4.1 Privacy Protection
Event ID
Timestamp
2 2 8
Node ID
Figure 3.2. modified beacon frame format in the naive solution
Based on the above observation, we propose a naive privacy protection scheme.
After a source node detects a certain event, instead of passing the event information
![Page 36: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/36.jpg)
21
to the routing layer, the node encodes its node id, event id and timestamp in a
beacon frame constructing a modified beacon frame as shown in Figure 3.2. The
source node sends out the modified beacon frame to its neighbors in MAC layer
encryption mode. After a neighbor node receives the modified beacon frame, it
decrypts it to extract the event information and adds the event information into
its own beacon frame, which will be sent out at the next beacon interval. Every
node in the sensor network repeats this process and the event information will
eventually arrive at the BS.
Base Station
1 st interval
2 nd interval
3 rd interval
4 th interval
Figure 3.3. Naive Solution
Figure 3.3 shows the process of the naive solution. The solid square node
detects an event and broadcasts this event to its neighbors. This process repeats
and it eventually reaches the BS. In this small network, 3 hops are needed for the
event to reach the BS.
Beacon frames are flooded to all the neighbor nodes. Therefore, in order to
stop the event information from circulating in the network, every node maintains
a record of the event information they receive from the beacon frame. Each time
a node receives a beacon frame, it checks if it already has the event information.
If so, it refills the beacon frame with dummy information. If not, it saves the
event information and sends it out through its own beacon. Old event information
entries are removed after a certain number of beacon intervals to save memory.
For example, if the maximum hop from any node to BS is 10, the old entries can
be safely removed after 10 beacon intervals.
![Page 37: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/37.jpg)
22
3.4.2 Privacy
The naive solution is simple and achieves perfect privacy, because all the modified
beacon frames perform exactly the same as the regular beacons. From the view
of the attacker, every node sends beacons as defined in the protocol. Therefore,
the probability to identify the source node is 1/N , where N is the total number of
sensor nodes in the network.
3.4.3 Latency
The event information is only sent out on beacon intervals. The expected waiting
time on every node is tb/2, where tb is the beacon interval. Thus, the latency for
an event is T = tw + (tb/2 + (sb + se)/R)× h, where tw is the time from when an
event happens to the end of the current beacon interval (tw < tb), sb is the size
of a regular beacon frame, se is the size of the extra event information, R is the
transmission rate and h is the number of hops from the source to the BS. Because
tb/2≫ (sb + se)/R and tw is expected to be tb/2, T = (tb/2)× (h + 1).
The latency is decided by tb and h. As introduced in section 3.2.1, the beacon
interval (tb) could be very large. Thus, the latency may be too long. Generally,
this naive solution only works in a small-scale network.
In this naive solution, it is impossible for the attacker to find the source location
or the occurrence of an event, because the network performs the same before and
after an event occurs. However, the latency is long and uncontrollable. In the
following sections, we propose several solutions to reduce the latency.
3.5 A Cross-Layer Solution
Observation: Compared to sending event notification within beacon frames at
beacon intervals, sending packets through the routing layer is much faster, but at
the cost of lower privacy. Therefore, we want to combine routing and MAC layer
solutions.
![Page 38: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/38.jpg)
23
3.5.1 Privacy Protection
The cross-layer solution has two phases: MAC broadcast and routing. In the first
phase (MAC broadcast), nodes perform in the same way as the naive solution.
After a sensor node detects some event, it broadcasts the event information within
MAC layer beacon frames for several hops (a system parameter H). Then, it
switches to the second phase (routing). One node, referred to as the the pivot
node, passes the event information to the routing layer and sends it to the BS via
routing.
Base Station
1 st interval
2 nd interval
3 rd interval
4 th interval
Figure 3.4. A Cross-Layer Solution
As shown in Figure 3.4, after the solid square node detects the event, it broad-
casts the event information inside beacons for 4 hops (first phase). One node on
the 4th hop is selected to send the event information to the BS directly through
using conventional routing (second phase).
In Figure 3.4, if the same pivot node is used for routing all event informa-
tion, the attacker will be able to easily trace back to the pivot node by observing
routing layer traffic. Therefore, different pivot nodes are used to send each event
information. This forms different traffic flows to the BS.
The source node is responsible for selecting the pivot node. The source node
knows which nodes are H hops away based on the cell information. It randomly
picks one of these nodes as the pivot node for each event occurrence and adds that
node id to the beacon frame as shown in Figure 3.5.
![Page 39: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/39.jpg)
24
Event ID
Timestamp
2 2 8
Pivot ID
2
Node ID
Figure 3.5. modified beacon frame format in the cross-layer solution
3.5.2 Privacy
Base Station
A1 A6
A5
A4
p2
p1
A3
A2
r1
Attacker
Sensor Node
Figure 3.6. Privacy Analysis
To analyze the privacy of the cross-layer system consider Figure 3.6. The dark
circle represents the sensor that detects an event. Ai represents the location at
which an attacker may be located. The circle around the event corresponds to
the range over which the event information is carried in beacon broadcasts. We
consider attackers located within the range of the sensor detecting the event (A1),
an attacker within the transmission range of the pivot node p1 (A2), and an attacker
near the BS (A3).
We define F as the area of the sensor field, N as the number of nodes in the
network, D as the degree of a node (the average number of neighbors each node
has), r as the transmission range of a node, and H as the number of hops over
which the MAC layer broadcast of the event information propagates.
Consider attacker A1. This attacker only hears periodic beacons transmitted
from within its reception range, r. These beacons provide no information as to the
![Page 40: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/40.jpg)
25
location of an event; in fact, they provide no information as to whether an event
has occurred at all. Therefore, this attacker, if it knows an event has occurred,
either must search the entire sensor field area, F , to find the event, or guess from
the pool of all sensors in the network, N , which sensor generated the event. In
other words, against A1, the solution provides perfect privacy. In fact, this is true
for all attackers within the beacon propagation range of the event, as long as they
are not within range of the pivot node or on the path from the pivot node to the
BS.
Consider attacker A2. This attacker can overhear that p1 has not received a
routing layer packet, but that it does perform a routing layer transmission. Because
this transmission is not a beacon, the attacker can conclude an event has occurred
within H hops of this pivot node. The attacker cannot traceback to the source
because between the pivot node and the source the event information is hidden
within the beacon messages which provide no information as described above.
Thus, in this case, the attacker can search the area of the circle, (π · (Hr)2), or it
can guess which node within the circle initiated the event. Due to the broadcasting
the number of such nodes is DH . Note that because the pivot node changes with
each event, A2 will not be within range of the chosen pivot node for all events, so
pivot node discovery itself does not have a high likelihood.
Now consider attacker A3. This attacker can overhear the transmission to the
BS. Because this transmission is not a beacon, the attacker can conclude it is the
result of an event. In this case the attacker must traceback to find the source of this
message. This is difficult because the pivot node p1 changes with each transmission,
cause the path of the flow to the BS to change. Therefore the attacker may take
many search iterations to find this pivot node, if at all. Once the pivot node is
found, the ability to find the event is the same as for attacker A2. In fact, any
attacker place between the pivot node and the BS will follow the same process for
traceback as described below. The difference is that if the attacker is located at
the BS, they will always hear an event transmission to the BS. If they are located
on the path to the BS, because the selection of the pivot node changes, they may
miss many event transmissions.
Here we describe the process by which the attacker may traceback to the pivot
node if it is in the position of A3 (or along the path from the pivot node to the
![Page 41: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/41.jpg)
26
BS). Assume the attacker’s listening range is the same as other sensor nodes. In
this cross-Layer solution, the pivot node changes on every event. Thus, the routing
layer traffic arriving at the BS originates from different places and follows different
routes. If the attacker starts from the BS, it will move towards the transmitting
node when it hears a message. Because the next message will come from a different
pivot node, the attacker will likely not hear the subsequent message because the
traffic might follow another path. If the attacker hasn’t heard a message for a
period of time, it will go back one step and listen again. This may repeat many
times with the attacker repeatedly moving around the BS.
τ = attacker′s hearing rangesensor node′s hearing range
1 2 3 4 5
MAXHOP
4 - 204 65 32 236 - - 137 97 388 - - 389 219 13910 - - 401 235 150
Table 3.1. Average hop number to capture the source node when source and BS are 47hops away
It is possible for the attacker to go through the event source before reaching the
pivot node, which is hard to analyze, Thus, we use simulations to show the privacy
level of our solution. In Table 3.1, MAXHOP specifies the delay tolerance of the
network. When MAXHOP = 4, it means the event information will broadcast
for 4 hops before entering the second phase. As shown in Table 3.1, when the
attacker’s hearing range is the same as a normal node, the attacker cannot locate
the pivot node. When the attacker’s hearing range increases, it takes less time to
locate the event source.
Figure 3.7 shows the detailed movement of an attacker in a 100*100 network1
when MAXHOP = 6 during the time that 66 event messages are generated by a
source. The attacker starts from (50,50) where the BS is located and the stays at
(3,3) (not shown in the Figure). When the attacker’s hearing range is two times
that of a normal node, it fails to find the pivot node after 84 hops of trials; the
closest point it can reach is (27,28). However, when the attacker’s hearing range
increases to 5 times that of a normal node, it captures a pivot node within 54 hops.
1We use similar parameters as that in [22]. Although the real sensor network may not be atsuch a large scale, it is used to demonstrate the privacy level of different solutions.
![Page 42: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/42.jpg)
27
25 30 35 40 45 5025
30
35
40
45
50111
510
15
20
25
30
35
40
45
50
5560
6570
75
80
84
(a) 84 hops’ trace, hunter fails with τ = 2
0 10 20 30 40 500
5
10
15
20
25
30
35
40
45
50 1
510
1520
25
30
35
40
45
50
54
(b) 54 hops’ trace, hunter success with τ = 5
Figure 3.7. hunter’s trace with MAXHOP=6 in a 100*100 network when 66 eventmessages are sent. The number in the figure shows the hunter hop count.
3.5.3 Latency
The latency of this solution is fully decided by phase one which completes when
the event reaches the pivot node. Meanwhile, the distance to the pivot node can
be selected based on the application latency limitation or other conditions defined
by the source node. To achieve lower latency, the pivot node will be closer to the
source node, i.e., H will be reduced, and thus the attacker has a smaller area to
search when the find the pivot node.
In the next section, we propose a second solution to improve the privacy, espe-
cially against nodes near the BS, while controlling latency.
3.6 A Double Cross-Layer Solution
3.6.1 Privacy Protection
The cross-layer solution provides a way to control the latency, but some privacy
is scarified because attackers near the BS have a possibility of tracing back to the
pivot node. To address this issue, we propose a double cross-layer solution which
still controls latency while improving privacy.
In the double cross-layer solution, the MAC broadcasting phase is divided into
two parts. Similar to the cross-layer solution, after the first MAC broadcast, a
![Page 43: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/43.jpg)
28
pivot node is chosen. However, the pivot node does not send the event information
to the BS directly. Instead, it sends the event information to a randomly chosen
node in the network through the routing layer. Then, this random node will enter
the second MAC broadcasting mode. At the end of the second MAC broadcast, a
pivot node is chosen which routes the event information to the BS directly.
Base Station
1 st interval
2 nd interval
1 st interval
2 nd interval
p
Figure 3.8. A Double Cross-Layer Solution
As shown in Figure 3.8, after the solid square node detects the event, it broad-
casts the event information for two hops (the first MAC broadcasting). One node
on the second hop is selected to send the event information to a random node
(p) through routing. This node enters into the second MAC broadcast mode and
broadcasts the event information for another two hops. One node is selected again
on the second hop relative to the new node as the pivot node. This node sends the
event information to the BS directly through routing. Compared with Figure 3.4,
both solutions spend the same number of beacon intervals (4).
3.6.2 Privacy
To analyze the privacy of the double cross layer solution, we add attackers A4−A6
and a second pivot node, p2 to Figure 3.6. The privacy for attacker A1 remains
the same. Pivot node p1 now transmits to a reception node r1. The choice of p1
changes randomly so the privacy with respect to A2 also remains the same. Now
consider attacker A4. This attacker can easily traceback to p1 if it happens to
![Page 44: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/44.jpg)
29
overhear the reception and transmission of node r1, so it has the same ability to
find the event as does A2.
Attacker A5 hears information similar to A1. Therefore, it does not know an
event has even occurred. If it is told an event has occurred, it needs to guess among
all nodes within the radius of its circle, DH , to find the reception node, r1 from
which it could trace back to p1 and narrow the event area. Because both the pivot
node p1 and r1 are chosen randomly and independently, this is equivalent to the
attacker searching the entire network of N nodes. In other words, no information
is gained by A5.
Attacker A6 will overhear the transmission from p2 and realize and event has
occurred. However, it must traceback to r1, which can be any node in the network
for each transmission, from where it can traceback to p1. This makes its ability to
discovery the event the same as A5.
Attacker A3 will still hear the event arriving at the BS. In the double cross
solution, A3 can trace back to the second pivot node, p2. At this point, it is in the
same position as A6, so its ability to find the event is the same as guessing among
the N nodes in the network. The ability of A3 to find the pivot node is not helped
by an increase in reception range.
50 52 54 56 58 6040
41
42
43
44
45
46
47
48
49
50
51
1
20
10
30
40 50
6070
78
(a) 78 hops’ trace, hunter fails with τ = 2
45 50 55 60 65 7030
32
34
36
38
40
42
44
46
48
50 110
20
30 40
50
60
70
80
116
90100
110
(b) 116 hops’ trace, hunter failes with τ = 5
Figure 3.9. hunter’s trace in a 100*100 network with MAXHOP=6 when 133 eventmessages are sent. The number in the figure shows the hunter hop count.
Figure 3.9 shows the detailed movement of an attacker in the role of A3 tries
to traceback to the source. In this example MAXHOP = 6 during the time that
![Page 45: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/45.jpg)
30
133 event messages are sent out. The attacker starts from (50,50) where the BS
is located and the source stays at (3,3). When the attacker’s hearing range is two
times that of a normal node, it fails to find the pivot node after 78 hops of trials.
From Figure 3.9(a), we can see that the attacker is checking a totally unrelated
area. When the attacker’s hearing range increases to 5 times that of a normal
node, it still falls into the wrong area.
3.6.3 Latency
Since the routing delay is much shorter compared to the Beacon interval, the
latency in this solution is fully decided by MAC broadcast. By carefully choosing
MAXHOP, the latency of the double cross layer solution may be made similar to
the cross-layer solution.
This analysis shows that the double cross layer solution provides much stronger
privacy against nodes near the BS while still being able to control latency.
3.7 Performance Evaluations
In this section, we evaluate the performance and privacy of the three solutions in
our chapter (the naive solution, the cross-layer solution and the double cross-layer
solution) and compare them with the phantom routing scheme proposed in [22].
In phantom routing, the source node that detects an event propagates the event
information using a random walk. After a certain number of hops (hwalk), a node
that receives the event information broadcasts it with a certain probability (Pfwd).
The information is then broadcast under the control of Pfwd by all receiving nodes
from this point on. Eventually, the BS will receive the information.
Performance Simulation Setup: In our simulation, each beacon message adds
an extra 12 or 14 bytes for event information depending on the solution. In the
cross-layer and the double cross-layer solutions, MAC flooding takes two hops. We
follow the parameters defined in IEEE 802.15.4, with the data rate at 40 Kb/s,
the preamble lasting 800µs for each packet, and the network is considered to be
synchronized.
In the simulation, we set up a network with 100*100 cells which is similar to
![Page 46: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/46.jpg)
31
[22]. The base station is located at the center of the network by default. An
alternate location (5, 5) is also evaluated for comparison. Source node location is
static during each simulation. The distance between the source and the BS ranges
from 10 to 40 and is 30 by default. The impact of different beacon intervals,
ranging from 0.1 to 1 second, is evaluated and the beacon interval is set as 1
second by default. Event detection interval is 1 seconds.The probabilistic flooding
rates (Pfwd) of phantom routing are 0.7 and 1.0. Each simulation runs for 100,000
seconds.
Two metrics are used to evaluate the performance of the proposed schemes:
latency and traffic overhead.
• Latency: The time for an event message traveling from the source to the BS.
• Traffic Overhead: The average number of bytes transmitted in each cell each
second.
Privacy Simulation Setup: For the privacy evaluation, we mainly compare the
privacy between double cross-layer solution and phantom routing (Pfwd = 0.7)
scheme. Capture likelihood and attacker’s hop count are both used to measure
the privacy level. Following the simulation model defined in Section 3.3.4, we run
the simulation for a total of 200,000 clock ticks and repeat each simulation 10,000
times starting with different random seeds and record the total number (ncapture)
that the attacker succeeds in capturing the event. The capture likelihood is defined
as ncapture
10,000. Moreover, within all the successful captures, we define attacker’s hop
count as the average hop number for a attacker to capture an event.
As mentioned in the previous sections, there are other kinds of attacker models.
For example, the attackers can be located randomly, perhaps even close to events,
within the sensor field. In such cases, our cross-layer solution should have much
better privacy compared to phantom routing due to the use of beacons at the MAC
layer. In other words, even the attacker hears the communication from the pivot
node, it still has hard time to locate the event source. Since the phantom routing
has much worse privacy compared to ours under this attacker model, we will not
further compare them.
![Page 47: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/47.jpg)
32
3.7.1 The Impact of Source-Destination Distance
10 15 20 25 30 35 400
5
10
15
20
Hop Number between the source and the BS
Late
ncy
(s)
naivecross−layerdouble cross−layerphantom routing(0.7)phantom routing(1.0)
10 15 20 25 30 35 400
10
20
30
40
50
60
70
80
Hop Number between the source and the BS
Net
wor
k T
raffi
c (b
yte/
seco
nd/c
ell)
naivecross−layerdouble cross−layerphantom routing(0.7)phantom routing(1.0)
2 20 47 2 20 470
0.2
0.4
0.6
0.8
1
Hop Number between the source and the BS
Cap
ture
Lik
elih
ood
τ=1 τ=2 τ=3 τ=4 τ=5Double Cross−layer Phantom Routing
(a) Latency (b) Network Traffic (c) Capture Likelihood
Figure 3.10. The Impact of Source-Destination Distance
In this section, we investigate the impact of source-destination distance (s-d
distance) on performance.
Figure 3.10(a) shows that the latency of the naive scheme which linearly in-
creases as the s-d distance increases. For every 10 hops increase in s-d distance,
the latency of the naive solution increases 5s which is caused by the beacon in-
terval. In the mean time, we notice that the cross-layer and double cross-layer
solutions have a slight delay increase compared to phantom routing. The latency
of the cross-layer and the double cross-layer schemes almost remain stable when
s-d distance changes. The latency of the cross-layer and the double cross-layer so-
lutions consist of two parts, MAC flooding latency and routing latency. The first
part relates to the beacon interval and the second part relates to the routing delay.
Since the beacon interval is much greater than the routing delay, the latency of
the two schemes is mainly decided by the MAC flooding latency which is fixed.
Figure 3.10(b) shows the network traffic for all the schemes which are indepen-
dent of the s-d distance. Network traffic is caused by beacon traffic and routing
traffic. Beacon traffic always exists in the network and is not affected by the s-d
distance. However, routing traffic is decided by the number of nodes involved in
the routing process. In all of the schemes, random nodes are introduced, so the
number of nodes involved in the routing process is variable. Phantom routing in-
curs a high traffic overhead. The network traffic of phantom routing (0.7) is about
25 times higher than ours.
The capture likelihood is compared in Figure 3.10(c) under 2, 20 and 47 hops
![Page 48: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/48.jpg)
33
of s-d distance. When the s-d distance increases, the capture likelihood decreases.
This is because the attacker has to travel further to discover where the source is
and the attacker may be diverted on the way. Under the same s-d distance, the
capture likelihood is lower in the double cross-layer solution than in the phantom
routing solution. Furthermore, when the attacker has the same hearing range as
the nodes (τ = 1), the capture likelihood is low except in phantom routing with s-
d=2. When the hearing range increases, the capture likelihood of phantom routing
increases to 1 immediately. However, the capture likelihood of the double cross-
layer changes from .5962 to .9540 and finally to 1.0000 with s-d=20. Obviously,
the higher the capture likelihood is, the lower the privacy is. But the capture
likelihood might change when the simulation time changes, therefore it cannot be
used to measure the privacy by itself. In a later section (Section 3.7.4), we will
discuss it in more detail.
3.7.2 The Impact of Beacon Interval
0.2 0.4 0.6 0.8 10
5
10
15
Beacon Interval (s)
Late
ncy
(s)
naivecross−layerdouble cross−layerphantom routing(0.7)phantom routing (1.0)
0.2 0.4 0.6 0.8 10
10
20
30
40
50
60
70
80
Beacon Interval (s)
Net
wor
k T
raffi
c (b
yte/
seco
nd/c
ell)
naivecross−layerdouble cross−layerphantom routing(0.7)phantom routing (1.0)
(a) Latency (b) Network Traffic
Figure 3.11. The Impact of Beacon Interval
In this section, we investigate the impact of beacon interval on performance.
In Figure 3.11(a), we can see that the latency of phantom routing is independent
to the beacon interval because it’s determined by the routing delay. However, our
three solutions all show the pattern that the latency increase as the beacon interval
increases because these schemes use beacons to transfer data and their latency is
mostly decided by the beacon interval.
The network traffic of phantom routing remains unchanged when the beacon
interval changes, because it does not rely on beacons. However, our solutions
![Page 49: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/49.jpg)
34
depend on beacons to send out messages. When the beacon interval increases,
less beacons are sent out, which causes the network traffic to decrease. As shown
in Figure 3.11(b), the network traffic of phantom routing is twice as high as the
traffic of all our solutions when beacon interval is 0.1s, which is due to the routing
layer flooding in the phantom routing scheme. And this different increases further
to about 25 times when beacon interval increases to 1s.
3.7.3 The Impact of Base Station Location
0
5
10
15
Naive
Cross
−laye
r
Double
Cross
−laye
r
Phant
om
P fwd=0
.7
Late
ncy
(s)
Phant
om
P fwd=1
.0
BS locates at (5,5)BS locates at the center
0
10
20
30
40
50
60
70
Naive
Cross
−laye
r
Double
Cross
−laye
r
Phant
om
P fwd=0
.7
Net
wor
k T
raffi
c (b
yte/
seco
nd/c
ell)
Phant
om
P fwd=1
.0
BS locates at (5,5)BS locates at the center
(a) Latency (b) Network Traffic
Figure 3.12. The Impact of Base Station Location when s-d distance is 30
In this section, an alternative BS location is considered and compared.
From Figure 3.12, we can see that the BS location only has a slight influence
over the network performance in our solutions. As we discussed above, network
traffic doesn’t change when s-d distance changes. Similarly, no matter where the
BS is, the network traffic depends on the beacon traffic and the routing traffic,
and neither of them relies on the BS location. For the latency, when s-d distance
is fixed, the latency of the naive solution is fixed no matter where the BS is. The
latency of the cross-layer or the double cross-layer solutions mainly depends on
MAC flooding which is not related to the BS location.
3.7.4 The Impact of the Attacker’s Hearing Range
As discussed above, capture likelihood cannot be used to measure the privacy level
by itself, that is, the same capture likelihood doesn’t mean the same privacy level.
![Page 50: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/50.jpg)
35
Double Cross−layer Phantom Routing (0.7)0
500
1000
1500
2000
2500
Atta
cker
’s h
op c
ount
to c
aptu
re th
e pa
nda
τ=2τ=3τ=4τ=5
MAXHOP=4
MAXHOP=10
hwalk
=0
hwalk
=40
Double Cross−layer Phantom Routing (0.7)0
500
1000
1500
2000
2500
3000
3500
4000
Atta
cker
’s h
op c
ount
to c
aptu
re th
e pa
nda
τ=2τ=3τ=4τ=5
MAXHOP=10
MAXHOP=4
hwalk
=0
hwalk
=40
(a) Source is 20 hops away from the BS (b) Source is 47 hops away from the BS
Figure 3.13. The Impact of Attacker’s Hearing Range
So, in this section, we measure how many hops it actually takes the attacker to
capture the event source.
In figure 3.13(a), we see that when τ = 2, it takes the attacker more than 2000
hops to capture an event which is only 20 hops away with the double cross-layer
solution. This indicates that if the event ends anytime earlier before sending 2000
messages, it will not be captured. When the hearing range increases, the attacker’s
hop count decreases, but it still takes more than 100 hops, which is 5 times of the
s-d distance.
On the other hand, with phantom routing, the attacker could use only 23 hops
to capture an event when hwalk = 0 and τ = 5, which is not safe at all. When
phantom routing adopts a higher hwalk, the attacker takes longer time to find
the event, but is still much shorter than that of the double cross-layer solution.
Similar observations can be seen from Figure 3.13(b), which indicates that the
double cross-layer solution has much higher privacy than phantom routing.
3.8 Conclusion
In this chapter, we proposed several solutions on source location privacy protection
for sensor networks. The proposed solutions offer different levels of location privacy,
network traffic and latency by exploring the cross-layer features. Simulation results
verified that double cross-layer solution can reduce the traffic overhead with a
reasonable latency. More importantly, it can achieve these benefits without losing
source location privacy (1/N).
![Page 51: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/51.jpg)
Chapter 4Preserve Source Location Privacy
Under External, Global, Passive
Attacks
4.1 Introduction
Most of the existing research on source location privacy in sensor networks focus
on addressing the local attack models. In phantom routing [22], the attacker has
limited coverage, comparable to that of sensors. Therefore, only a single source is
under the attacker’s consideration at a time and the attacker tries to trace back
to the source in a hop-by-hop fashion. When the attacker becomes more powerful,
e.g., has a hearing range more than three times longer than that of the sensors,
the capture likelihood is as high as 97%. In addition, a large number of anonymity
techniques [42] designed for general networks are not appropriate to be used for
sensor networks. This is not only because the privacy problem is different but also
because these techniques are too expensive to be employed.
In this chapter, we aim to provide source anonymity for sensor networks un-
der a powerful attacker, i.e., a global observer who can monitor and analyze the
traffic over the whole network. Clearly, if all the traffic in the network is real
event messages, it is unlikely to achieve source anonymity under such a strong
attack model. Therefore, we employ network-wide dummy messages to achieve
![Page 52: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/52.jpg)
37
global privacy. The basic idea is as follows. Every node in the network sends
out dummy messages with intervals following a certain kind of distribution, e.g.,
constant or probabilistic. When a node detects a real event, it transmits the real
event messages with intervals following the same distribution. As such, neither can
an attacker discern the occurrence of a real event, nor can he find out the location
of the real event source.
Although this network-wide dummy messages provides high privacy level, it is
prohibitively expensive for sensor networks. The huge number of bogus messages
not only consume the constrained energy of sensor nodes for transmissions, but
also lead to high channel collision and consequently low delivery ratio of real event
messages. To mitigate all these disadvantages brought in by the extremely heavy
traffic, we can either adopt a low message transmission rate, or filter some bogus
messages on their way to the base station.
adopt a low message transmission rate: In this case, the real event report
latency could be high, because a source node needs to postpone the trans-
mission of a real event message to the next interval. Therefore, the questions
we try to answer are: how to achieve global source anonymity without caus-
ing high real event notification latency? Is it possible to provide perfect global
privacy without losing performance benefit?
filter bogus messages on the way: In this case, we select some sensors as prox-
ies to collect and filter dummy messages from surrounding sensors. It can
greatly reduces the communication cost of the system by dropping many
dummy messages before they reach the base station. But the message over-
head reduced by this method is usually dependent on the locations of the
proxies. So, the question is: what’s the optimal proxy placement in order to
reduce the message overhead and maintain the privacy level?
In the rest of the chapter, we will introduce these two solutions seperately.
4.2 Towards Statistically Strong Source Anonymity
We make the following contributions in this part. First, we demonstrate that it
is difficult to achieve perfect global privacy without sacrificing performance ben-
![Page 53: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/53.jpg)
38
efit. Therefore, we have to relax the perfect source anonymity requirement and
for the first time propose a notion of statistically strong source anonymity for sen-
sor networks. Second, we devise a realization scheme, called FitProbRate (Fitted
Probabilistic Rate) scheme, in which the event notification delay is significantly
reduced while keeping statistically strong source anonymity through selecting and
controlling the probabilistic distribution of message transmission intervals.
The rest of this section is organized as follows. We first discuss the assumptions
and design goal in Section 4.2.1. Then, we propose the notion of statistically
strong source anonymity in Section 4.2.2. After that, Section 4.2.3 presents the
FitProbRate scheme. Its performance is evaluated in Section 4.2.4 and its security
property is analyzed in Section 4.2.5. Finally, we conclude in Section 4.2.6.
4.2.1 System Model and Design Goal
4.2.1.1 Network Model
As in other sensor networks [63], our system also assumes that a sensor network
is divided into cells (or grids) where each pair of nodes in neighboring cells can
communicate directly with each other. A cell is the minimum unit for detecting
events; for example, a cell head coordinates all the actions inside a cell. Each cell
has a unique id and every sensor node knows in which cell it is located through
its GPS or an attack-resilient localization scheme [64, 65]. Moreover, we assume
a base station (BS) located in the network and works as the network controller
to collect event data. Every event has an event id; for example, we may assign
a unique id to each type of animals. When a cell detects an event, it will send
a triplet (cell id, event id, timestamp), which provides the BS with the source
location of the event as well as the time it is detected.
4.2.1.2 Adversary Model
According to the classification in [21], we assume that the adversary is external,
global and passive. By external, we assume that the adversary does not compromise
or control any sensors. By global, we assume that the adversary can monitor,
eavesdrop and analyze all the communications in the network. The adversary
may launch active attacks by channel jamming or other denial-of-service attacks.
![Page 54: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/54.jpg)
39
However, since these attacks are not related to source anonymity, we do not address
them in this work.
Next, we discuss how an adversary may analyze the collected traffic. First, the
attacker may simply examine the content of an event message that may contain
the source location id. Second, even if the message is encrypted, it is easy for
the global adversary to trace back to the source of the message if the encrypted
message remains the same during its forwarding, because the adversary is capable
of identifying the immediate source of a message. Third, he may perform more
advanced traffic analysis including rate monitoring and time correlation. In a rate
monitoring attack, the adversary pays more attention to the nodes with different
(especially higher) transmission rates. In a time correlation attack, he may observe
the correlation in transmission time between a node and its neighbor, attempting
to deduce a forwarding path. We assume that the attacker has sufficient resources
(e.g., in storage, computation and communication) to perform these advanced
attacks.
4.2.1.3 Design Goal
To provide source anonymity under such a strong attack model where the attacker
has the capability to monitor all the network traffic is challenging. To prevent
content-based analysis, we may simply use a globally shared key for encryption
and authentication of an event packet during its forwarding, and also make all
the event packets in the network of the same length. However, these techniques
cannot prevent rate monitoring and time correlation attacks. To defend these
traffic analysis attacks, we notice that there is a tradeoff between security and
performance (more details could be found in Section 4.2.2). Hence, when we try to
achieve source anonymity, we also consider the latency issue. It is always desirable
for the BS to receive an event message as early as possible. On the other hand, it
is not appropriate that every node forwards an event message immediately upon
receiving it. For example, if all the nodes on a routing path relay an event message
immediately, the adversary will easily determine the forwarding path and trace
back to the source. To this end, forwarding delay must be introduced to thwart
this attack. Therefore, how to reduce latency without losing privacy guarantee is
a challenging problem here.
![Page 55: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/55.jpg)
40
4.2.2 Problem Definition
According to [66], a mechanism to achieve anonymity appropriately combined with
dummy traffic yields unobservability, which is the state that items of interest (IOIs)
are indistinguishable from any IOI of the same type. All the subjects under con-
sideration constitute an unobservability set. In our case, the unobservability set
consists of all the N cells in the network. Specifically, we are interested in event
unobservability, which is defined as follows.
Definition 1. Event unobservability is a privacy property that can be satisfied if
after observation an attacker cannot determine whether real events have happened
or not.
Straightforward solutions exist to provide event unobservability by means of
dummy traffic. For example, in a ConstRate scheme, all the cells in the network
send out messages at a constant rate no matter there are real events or not. Since
the traffic in the network always keeps the same pattern, it effectively defeats any
traffic analysis techniques. Clearly, the average transmission latency in a source cell
is half of the interval. Although this deterministic solution provides the property
of perfect event unobservability, it has an inherent difficulty in setting the constant
rate. If the rate is too small, the message delay will be too high; if the rate is too
large, this approach will introduce too much dummy traffic into the network.
This motivated us to design probabilistic solutions, which have the flexibility of
reducing the waiting time. For example, we may adopt an exponential distribution
to determine the time intervals between message transmissions, which are referred
to as the ProbRate scheme. Under our attack model, a global attacker can easily
know the distribution and its mean by a statistic test over the collected time
intervals. As long as we keep the seed for generating random numbers secret from
an attacker, the attacker will not be able to notice if a message transmission is
due to a real event or a dummy message even if a cell sends out a real event
message immediately. Intuitively, a cell cannot always send real event messages
immediately in the presence of burst events; otherwise, an attacker may notice the
change of the underlying distribution. Therefore, it is difficult to guarantee perfect
event unobservability while providing low latency.
Hence, the question becomes: how to reduce the latency as much as possible
![Page 56: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/56.jpg)
41
while still providing a satisfactory degree of event unobservability? In order to
achieve low latency, we need to relax the perfect event unobservability requirement
and accept statistically strong event unobservability.
Let the inter-message delay (imd) between message k(k > 0) and k+1 from cell
i(1 ≤ i ≤ N) be imdik = tik+1 − tik, where tik is the transmission time of message k
from cell i. A global observer can see a sequence of continuous inter-message delays,
which can be represented by a distribution X i = {imdi1, imdi
2, · · · }. Ideally, in a
scheme with perfect privacy, inter-message delays from all the cells follow the same
distribution. In our case with statistically strong guarantee, distributions of inter-
message delays are actually statistically indistinguishable from each other. Next,
we first introduce the definition of statistically indistinguishable distributions.
Definition 2. Two probabilistic distributions X i and Xj(1 ≤ i, j ≤ N, i 6= j)
are statistically indistinguishable from each other iff they follow the same type
of probabilistic distribution with the same parameter (i.e., they have the same
distribution function) statistically. They are indistinguishable from each other in
the sense that by a statistic test we cannot differentiate them.
Take the exponential distribution as an example. This distribution has only one
parameter λ(= 1/µ). Hence, if two probabilistic distributions are both exponential
distributions with very close means, they are statistically indistinguishable from
each other. Note that if a distribution is controlled by multiple parameters (e.g.,
two in a normal distribution), two data sets are statistically indistinguishable only
when all these parameters of the probabilistic distribution derived from the two
data sets are the same or very close. Clearly, the more parameters a distribution
has, the harder it is to prove its statistical indistinguishability. As such, in the
following we will limit our discussion on a one-parameter distribution.
For the one-parameter distribution, the property of statistically strong event
unobservability is related to two security parameters α and ǫ, where α controls the
goodness of fit to a specific probabilistic distribution and ǫ controls the closeness of
the parameter derived from the observations to that of the population. These two
security parameters are used together so that message transmission time intervals
from all the cells in the network, including the real sources if any, follow the “same”
distribution with the “same” parameters. Here “same” means that an attacker
![Page 57: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/57.jpg)
42
cannot tell the difference through a statistical hypothesis test. More formally, we
call this statistically strong event unobservability as (α, ǫ)-unobservability, because
two parameters α and ǫ tightly relate to this privacy property.
Definition 3. (α, ǫ)-unobservability (α, ǫ > 0) is a type of statistically strong
event unobservability, in which a distribution X i (with parameter λi) is statisti-
cally indistinguishable from a probabilistic distribution X (e.g., exponential with
parameter λ) under the following conditions:
1. n∫ +∞
−∞[F (X i)− F (X)]2Ψ[F (X)]dF (X) ≤ c,
2. (1− ǫ)λ ≤ λi ≤ (1 + ǫ)λ,
where n is the sample size, F is a cumulative distribution function (CDF), Ψ is a
weight function, and c is a critical value determined by α.
The left side of Condition 1 calculates the distance between two CDFs. Details
of evaluating the distance between two distributions could be found in [67, 68]. If
the distance between two distributions is smaller than a critical value determined
by significance level α and their parameters are close to each other in a way deter-
mined by ǫ in Condition 2, these two distributions satisfy (α, ǫ)-unobservability.
The above distance evaluation of CDFs was used in Anderson-Darling test [69]
for goodness of fit tests; therefore, to achieve (α, ǫ)-unobservability our schemes
will directly use Anderson-Darling tests. The above definition is rather general,
which leaves a large room for defining α and ǫ according to different applications
or extending it to the multiple-parameter case.
4.2.3 The FitProbRate Scheme
In this section, we discuss the building blocks of our scheme, including the policy
for dummy traffic generation and the policy for embedding real event messages,
as shown in Fig. 4.1. Finally, a running example is used to illustrate the entire
process of our scheme.
4.2.3.1 Policy for Dummy Traffic Generation
Since dummy messages incur high message overhead, the message transmission rate
is very important. As discussed in Section 4.2.2, high rate causes high message
![Page 58: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/58.jpg)
43
�� �����Figure 4.1. The illustration of the FitProbRate scheme. The triangular nodes aresending real event messages, with paths denoted by solid lines. The square nodes aresending dummy messages, with paths following dotted lines. The reference systemsbesides nodes indicate the PDF of the message transmission intervals.
overhead whereas low rate increases the delay of reporting real events. In addition,
the ProbRate scheme where message transmission rate follows a probabilistic dis-
tribution has the advantage of providing flexibility in reducing latency, compared
with ConstRate scheme where message transmission rate is fixed. Hence, we prefer
probabilistic message transmission intervals.
The latency we consider here is the time delay between message generation and
message transmission at the source. All nodes in our scheme transmit messages at
a specific probabilistic rate, so the message generation time, mainly due to message
encryption, is negligible with respect to the waiting time for the next transmission
interval.
Now we need to decide what probabilistic distribution to use. There are many
probability distributions; e.g., exponential, uniform, weibull, normal. The advan-
tage of an exponential distribution is that it has only one parameter (λ = 1/µ,
where µ is mean), which makes it relatively easy to achieve (α, ǫ)-unobservability.
Therefore, to maximize the communication randomness and to simplify the prob-
lem, we choose the exponential distribution to control the rate of dummy traffic
generation.
Specifically, Algorithm 1 implements our idea of probabilistic dummy traffic
![Page 59: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/59.jpg)
44
generation. Suppose there are a series of k dummy messages, our goal is to make
the time intervals between two consecutive messages (imdi, i = 1, 2, · · · , k − 1)
follow exponential distribution. Given a mean µ and a global variable seed, the
algorithm returns the time interval to transmit the next dummy message. The
mean µ of the exponential distribution is a system parameter and we assume
it is known to the adversary because he can calculate it from observed message
intervals. However, the seed for generating random numbers is kept secret from
the adversary, and the seed is hard to guess and different for each sensor node.
Algorithm 1 Probabilistic Traffic GenerationInput: mean µ;Output: a time interval following the exponential distribution with mean µ;Procedure PTG:
1: seed(seed);{Assign seed as the seed for probabilistic generation, seed ispreloaded in each sensor.}
2: return exponential(µ);
4.2.3.2 Policy for Embedding Real Traffic
When a real event happens, by exactly following the ProbRate scheme, i.e., to
determine the waiting time based on Algorithm 1, in the long run, we cannot gain
too much over the ConstRate scheme if the message transmission rates in these
two schemes have the same mean. On the other hand, if the real event message
is sent out right away, the distribution of time intervals could be skewed (i.e., the
mean becomes smaller and smaller), leaving some evidences to the adversary that
a real event is happening. How to make the message transmission intervals follow
the same distribution while reducing the real event report latency?
More formally, when a real event Ek happens after the dummy events Ei(1 ≤
i ≤ k− 1), the corresponding message should be sent out only when the next time
interval (imdk) and the earlier ones (imdi, i = 1, 2, · · · , k− 1) satisfy the following
two conditions:
• The whole series {imd1, imd2, · · · , imdk−1, imdk} still follow the same expo-
nential distribution;
• imdk is as small as possible.
![Page 60: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/60.jpg)
45
Based on Kerckhoff’s Principle [70], we assume that the attacker knows our
techniques to achieve statistically strong source anonymity. From attacker’s per-
spective, in order to detect real event messages, he may perform a statistic test
to determine if the message transmission intervals always follow the same expo-
nential distribution of the same mean µ, after monitoring the network traffic and
collecting sufficient message transmission intervals. More specifically, the statistic
test can be broken into two parts: test if the distribution is exponential and test if
the mean is µ. To defend against the attacker’s strategies, we adopt the following
techniques to maintain source anonymity:
1. A statistic test called Anderson-Darling Test is adopted to keep the mes-
sage intervals of each cell following an exponential distribution controlled by
parameter α;
2. A method is used to ensure the measured sample means of the distribution do
not deviate too far from the true mean µ which is controlled by the parameter
ǫ.
Next, we introduce these two techniques separately.
Anderson-Darling Test
Anderson-Darling Test [71] (A-D test in short) is a goodness fit test to de-
termine if a series of data follow a certain probabilistic distribution. The basic
idea is to evaluate the distance between the distribution of the sample data and
a specified probabilistic distribution. If the distance is statistically significant, the
data do not follow this distribution. More formally, the test is defined as follows.
• H0: The data follows a specified distribution;
• Ha: The data do not follow a specified distribution;
• Test Statistic: A2 = −n− S, where
S =
N∑
i=1
2i− 1
n[log F (Xi) + log(1− F (Xn+1−i))].
Here F is the CDF of interest, n is the sample size, and Xi denotes the ith
datum;
![Page 61: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/61.jpg)
46
• Significance Level: α (typically equals to 0.05);
• Critical Region: The critical values for the A-D test depends on the specific
distribution being tested. Tabulated values and formulas have been published
by Stephens [71]. If the test statistic A2 is greater than the corresponding
critical value c, the hypothesis that the distribution is of a specific form will
be rejected.
Algorithm 2 shows some details of this A-D test for an exponential distribution.
The input is a series of xi, i.e., the time interval between two consecutive messages
sent out from a cell, and the output is a decision if the series follow an exponential
distribution.
Algorithm 2 Goodness of Fit Test
Input: a sequence of data {xi, 1 ≤ i ≤ n};Output: TRUE, if {xi, 1 ≤ i ≤ n} follows an exponential distribution; FALSE,
otherwise.Procedure Anderson-Darling:
1: sort xi into an ascending order: x1 ≤ x2 ≤ · · · ≤ xn;2: calculate the test statistic: A2;3: if A2 < c then4: return TRUE;5: else6: return FALSE;7: end if
This algorithm mainly involves a sorting and a statistic calculating operation.
The time complexity for sorting is O(n log n) (e.g., by quicksort) and time com-
plexity for calculating the test statistic is O(n), where n is the size of the input.
Therefore, the complexity of this algorithm is O(n log n).
In our problem setting, we want to use A-D test to find an appropriate inter-
message delay (imd) for transmitting the real event message, such that Algo-
rithm 2 will return TRUE when being given the whole series of time intervals
{imd1, imd2, · · · , imdk−1, imd}. Because the A-D test is a statistical test, the so-
lution to pass the test is not unique. Therefore, the A-D test is repeated until the
test is passed. Because a small but random delay is preferred, the search process
for imd starts from 0, and increases in a small random pace whenever it fails the
A-D test.
![Page 62: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/62.jpg)
47
Algorithm 3 Search for a Proper Delay to Send a Real Event Message
Input: a sequence of inter-message time intervals {imdi(1 ≤ i ≤ n)};Output: a proper delay imd to send a real event message;Procedure search delay:
1: µ = mean(imd1, imd2, . . . , imdn);2: INCREASEPIECE = rand(0, first quartile);3: imd = -INCREASEPIECE;4: repeat5: if imd > upper bound then6: INCREASEPIECE = rand(0, first quartile);7: imd = -INCREASEPIECE;8: end if9: imd += INCREASEPIECE; {A-D test begins from 0}
10: ret = Anderson-Darling({imd2, imd3, . . . , imdn, imd});11: until ret == TRUE12: return imd;
Algorithm 3 shows the details of the search algorithm. It has a series of time
intervals as input and returns the first imd that can pass the A-D test. The
selection of the granularity (INCREASEPIECE) affects the running time. We set
INCREASEPIECE to be a random number between 0 and the first quartile of the
input data set, as shown in the algorithm (line 2). Based on our experiments,
this can achieve a relatively small delay within a relatively short time. From line
4 to line 11, the test repeats until it finds a value which can pass the A-D test
or a value which cannot be accepted because the delay becomes larger than a
specified upper bound (line 5), e.g., the maximum value of imd1,imd2,. . .,imdn.
In the latter case, another INCREASEPIECE will be selected (line 6) and the
searching process starts over from the value of 0. The whole algorithm terminates
when a proper delay is found. Because many values can pass the A-D test with
the same input, an appropriate value can be found quickly. This has been verified
by experiments. With sample sizes of 20, 40, 80, 160, 320, 640, 1280, 2560, 5120
and 10240, Algorithm 3 always terminates within 2 to 10 rounds.
Sample Mean Recovery
If there are multiple continuous real events happening, Algorithm 3 will be
called repeatedly. In this case, the sample mean will gradually decrease as smaller
delays are favored in Algorithm 3. According to the Central Limit Theorem, the
![Page 63: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/63.jpg)
48
sample means follow a normal distribution. From the perspective of an attacker,
every time he observes a new time interval, he will need to make a “yes” or “no”
decision on whether a real event has occurred. If “yes”, he will take an action (e.g.,
to check the suspicious cell by himself); otherwise, he will do nothing. However,
when he makes a “yes” decision, it is possible that it is a wrong decision. Thus,
as a balance between false positive rate and false negative rate, an attacker needs
to determine a threshold. Once the difference between the sample mean and the
true mean is beyond this threshold, he will consider the occurrence of a real event
and take an action.
Thus, we need to deliberately recover the sample mean so that it will never
deviate from the true mean beyond this threshold. Specifically, in our scheme we
will set this threshold as ǫµ, because in definition 3 the condition (1− ǫ)λ ≤ λi ≤
(1 + ǫ)λ is equivalent to (1− ǫ)µ ≤ µi ≤ (1 + ǫ)µ for the exponential distribution
with λ = 1/µ. We will search for an appropriate new time interval for the next
message (real or dummy event) such that the sample mean of the entire time series
including the new one is within ǫµ from the true mean. Algorithm 4 serves for
this purpose. It calculates the value needed to recover the mean (line 3) and a
random number is selected between this value and a value following exponential
distribution with mean µ (line 4) until this random number can pass A-D test (line
5− 8).
Algorithm 4 Recovery of Mean
Input: mean µ, a sequence of data {xi, 1 ≤ i ≤ n};Output: a proper delay to send out the next message;Procedure recovery:
1: sum = sum(x2, x3, · · · , xn);2: dx = µ-sum/(n− 1);3: y1 = (µ + dx) ∗ n-sum;4: y2 = PTG(µ); {defined in Algorithm 1}5: repeat6: x = rand(y1, y2);7: ret = Anderson-Darling({x2, x3, · · · , xn, x});8: until ret == TRUE9: return x;
From the above discussions, the significance level α defined in the A-D test
![Page 64: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/64.jpg)
49
is used to control the acceptable distance between the observed distribution of
message transmission time intervals and the exponential distribution. ǫ reflects an
acceptable difference between the sample mean and the true mean, which will not
cause suspicion from the attacker. With these two parameters, our FitProbRate
scheme can achieve the statistically strong source anonymity defined by (α, ǫ)-
unobservability.
4.2.3.3 A Running Example
To illustrate the whole process, a running example is shown in Fig. 4.2. According
to Algorithm 1, three dummy messages are supposed to be sent out at time A,
B and E, respectively. At time C, a real event happens, so Algorithm 3 is called
and this real event is sent out at time D. After this, according to Algorithm 4,
the dummy message at time E is canceled and rescheduled at time F . From the
attacker’s point of view, he can only see the intervals between A and B, B and D,
D and F , which follow an exponential distribution and the mean is stable. Thus,
the attacker cannot tell if any real event has happened.��������� � ������ ��������� � ������� �� � !"�#� $%%���
��������� &������ ��������� ' ������( ) * + , -Figure 4.2. A running example to illustrate the entire process.
All algorithms can be easily implemented in sensor networks because they only
involve simple operations. For example, TinyOS supports all functions used in our
algorithms such as log and exp. These algorithms can be further optimized. For
example, in Algorithm 2, the calculation of S involves a summation of n values.
Whenever Algorithm 3 calls the A-D test (Algorithm 2), n− 1 values in the time
series are the same as that in the previous call. Thus, only one additional log and
one additional exp operations are needed.
![Page 65: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/65.jpg)
50
4.2.4 Performance Evaluations
In this section, we compare the performance of the FitProbRate scheme, the Con-
stRate scheme, and the ProbRate scheme.
4.2.4.1 Comparison between FitProbRate and ConstRate
In the simulation, the mean of the dummy message generation rate is 20s. The
real event arrival is modeled by Poisson Arrival with a mean changing from 20s to
100s. Fig. 4.3 shows the delay to send a real event by both schemes. As can be
seen, the average latency in the FitProbRate scheme is less than 1s, whereas the
average latency in the ConstRate scheme is 10.87s, which verifies that FitProbRate
can significantly reduce the transmission delay of the real event messages.
20 40 60 80 1000
2
4
6
8
10
12
Mean of Real Events
Ave
rage
Del
ay
FitProbRate
ConstRate
Figure 4.3. Comparing average delay in the FitProbRate scheme (α = 0.05, ǫ = 0.1)with the ConstRate scheme.
To see the impact of window size (n) in the FitProbRate algorithms, we com-
pare the average delay under different window sizes. As shown in Fig. 4.4(a), when
the window size increases from 60 to 100, the average delay of real event messages
are decreased.
We also evaluate the impact of the event arrival pattern. We compare the
average latency resulted from Poisson Arrival with Constant Rate Arrival. As
shown in Fig. 4.4(b), the arrival pattern of the real event does not have much
impact on the average latency.
![Page 66: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/66.jpg)
51
20 40 60 80 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean of Real Events
Ave
rage
Del
ay
window size = 60window size = 100
(a) Impact of window size
20 40 60 80 1000
0.1
0.2
0.3
0.4
0.5
0.6
Mean of Real Events
Ave
rage
Del
ay
Poisson ArrivalConstant Rate Arrival
(b) Impact of real event arrival pattern
Figure 4.4. The impact of window size and real event arrival pattern in the FitProbRatescheme.
4.2.4.2 Comparison between FitProbRate and ProbRate
In this simulation, the mean of the dummy message generation rate is 40s. A total
of 3600-second simulation time is run. For easy illustration, we only show part
of the simulation result in Fig. 4.5. In the ProbRate scheme, real event messages
and dummy messages are treated equally; that is, their transmission time intervals
are determined by the output of Algorithm 1. To make a more comprehensive
comparison, we check three traffic patterns at different levels of burst arrivals for
real event message generation, as shown in three different columns of Fig. 4.5.
In Fig. 4.5(a), each real event message arrives at the time point according to an
exponential distribution; in Fig. 4.5(b) and (c), three and five real event messages
are generated in a burst respectively, at the same time points as in Fig. 4.5(a).
Figs. 4.5(d)-(f) visualize the time slots at which real event messages are ready
as shown by the solid lines. The dotted lines are the time points when real event
messages are actually forwarded. From these figures, we can observe that real event
messages are forwarded more frequently in our scheme than the ProbRate scheme.
As a result, the transmission latencies of real event messages in our scheme will be
much smaller than that in ProbRate.
Figs. 4.5(g)-(i) verify these observations. As shown in the figure, the FitPro-
bRate scheme can significantly reduce the real event message forwarding latency
compared with the ProbRate scheme. If the real events happen in burst, the la-
tency will be higher. For example, Fig. 4.5(g) corresponds to traffic pattern 1, and
![Page 67: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/67.jpg)
52
(a) Traffic Pattern 1 (b) Traffic Pattern 2 (c) Traffic Pattern 3
0 100 200 300 400 500
FitProbRate
ProbRate
Time
(d)
50 100 150 200 250
FitProbRate
ProbRate
Time
(e)
60 80 100 120 140
FitProbRate
ProbRate
Time
(f)
5 10 20 30 40 50 60 70 800
100
200
300
400
500
600
700
800
900
1000
Mean of Real Events
Ave
rage
Del
ay
(g)
5/310/3 20/3 30/3 40/3 50/3 60/3 70/3 80/30
100
200
300
400
500
600
700
800
900
1000
Mean of Real Events
Ave
rage
Del
ay
(h)
FitProbRateProbRate
5/510/5 20/5 30/5 40/5 50/5 60/5 70/5 80/50
100
200
300
400
500
600
700
800
900
1000
Mean of Real EventsA
vera
ge D
elay
(i)
Traffic Patterns of Real Events
Time Sequence Example of the Simulations
Delay vs. Mean of Real Events
Figure 4.5. Performance comparison between the FitProbRate scheme (α = 0.05,ǫ = 0.1) and the ProbRate scheme under different real traffic patterns. In (a)-(c), 1, 3,or 5 real event messages are generated in a burst. In (d)-(f) the solid lines are the timepoints when real events are ready and the dotted lines are the time points when realevent messages are actually forwarded. (g)-(i) show the numerical values of real eventtransmission latency under three different real traffic patterns.
Fig. 4.5(i) corresponds to traffic pattern 3. As traffic pattern 3 has more arrival
bursts than traffic pattern 1, the average delay in Fig. 4.5(i) is also much higher
than that of Fig. 4.5(g). This is because the average waiting time becomes longer
when more messages need to be sent out within a certain time.
4.2.5 Security Analysis
We first prove that the FitProbRate scheme has the property of (α, ǫ)-unobservability.
Then, we show that in our scheme it is unlikely for the attacker to detect real events
even if he adopts available powerful statistical tools for traffic analysis.
![Page 68: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/68.jpg)
53
4.2.5.1 Security Property
We have the following theorem on the security property of the FitProbRate scheme.
Theorem 1. The FitProbRate scheme has the property of (α, ǫ)-unobservability.
Proof. To prove that the FitProbRate scheme has the property of (α, ǫ)-unobservability,
we need to prove that the statistically strong event unobservability has been
achieved under the control of parameters α and ǫ.
Under the control of parameter α, by Algorithm 3 the distribution X i of mes-
sage transmission intervals from any cell i(1 ≤ i ≤ N) can pass the A-D test. This
means that the difference between the empirical cumulative distribution function
(CDF) from the ordered sample data and the cumulative distribution function of
the corresponding exponential distribution X is smaller than the critical value c de-
cided by the predetermined significance level α, according to the nature of A-D test.
Namely, the following formula holds: n∫ +∞
−∞[F (X i)− F (X)]2Ψ[F (X)]dF (X) ≤ c,
where n is the sample size, F is the CDF and Ψ is the weight function of the
goodness of fit test.
Moreover, under the control of parameter ǫ, once the sample mean µi from any
cell i deviates from the population mean µ of the exponential distribution in a
noticeable way, i.e., |µi − µ| ≥ ǫ, Algorithm 4 will be automatically triggered to
recover the mean. Hence, the sample mean from any cell in the network cannot
be differentiated from the population mean under the control of ǫ. That is, at any
time (1− ǫ)µ ≤ µi ≤ (1 + ǫ)µ.
In summary, probabilistic distributions of message transmission intervals from
real sources are statistically indistinguishable from those of other cells that send out
dummy messages. By Definition 3, under the control of α and ǫ, real events cannot
be detected by the attacker because the FitProbRate scheme has the property of
(α, ǫ)-unobservability.
Assuming the employment of our scheme, we consider what the attacker can
do to detect real events. The attacker has enough resources (e.g., in storage and
computation) to collect and analyze message time intervals from all the cells in the
network. Then, the attacker will try to identify sources with different distributions
of message time intervals. To do this, the attacker can first conduct some goodness
![Page 69: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/69.jpg)
54
of fit tests to check whether the probabilistic distributions of message time intervals
from every cell follow the exponential distribution. If the distribution from any
cell cannot pass the goodness of fit test, the corresponding cell will be marked as
a potential real source. Two widely used distribution test tools are checked here:
Anderson-Darling (A-D) test [67] and Kolmogorov-Simirnov (K-S) test [72]. For
those distributions that can pass the goodness of fit test, the attacker then further
performs the mean test. Those cells whose sample means deviate from the true
mean beyond a certain degree will be marked as potential real sources, too. The
SPRT (Sequential Probability Ratio Test) [73] is a good choice for the mean test,
because SPRT could minimize the number of samples required to make a decision
after continuous observations.
Next, we demonstrate the robustness of our FitProbRate scheme in defending
against all these detection techniques from the attacker, including its robustness
to the distribution tests as well as its robustness to the mean test.
4.2.5.2 Robustness to Distribution Tests
To detect abnormal probabilistic distributions of message time intervals, the at-
tacker can check whether a probabilistic distribution X i is exponential. For the
attacker, the hypotheses in the test are:
H0-the distribution is exponential: F (X i) = F (X).
H1-the distribution is not exponential: F (X i) 6= F (X).
When the attacker makes a decision, there are some risks for him to get wrong
decisions. The decision is called a detection, if H1 is accepted when it is actually
true. If in this case H0 is accepted, then it is called false negative. On the other
hand, if H0 is in fact true, accepting H1 is a false positive. For the attacker, the
false positive rate is denoted as α′ and the false negative rate is denoted as β ′.
Note that in our scheme the false positive rate is actually equal to the significance
level α defined in the A-D test and we denote our false negative rate as β. The
attacker’s rates are denoted as α′ and β ′ correspondingly, to differentiate with ours.
One may argue that if the attacker selects a significance level α′ better than
that in our algorithm (α), then the attacker may detect the perturbed probabilistic
distributions from real sources. However, there is a tradeoff between false positive
rate α′ and false negative rate β ′ in attacker’s distribution test. To explain this, let
![Page 70: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/70.jpg)
55
us consider two extreme cases. If the rejection region has critical values −∞ and
+∞, then the attacker always accepts H0. In this case, α′ = 0 and β ′ = 1. On the
contrary, if the rejection region has the critical values 0 and 0, then the attacker
always rejects H0. In this case, α′ = 1 and β ′ = 0. Hence, it is impossible for the
attacker to make both α′ and β ′ arbitrarily small for a fixed sample size n. If the
attacker chooses a very small α′ in the test, then he is at the risk of having a high
β ′, which means he has a high chance of failing to detect real events. Likewise, if
the attacker chooses a smaller β ′, then the attacker is at the cost of checking more
fake sources.
0.02 0.04 0.06 0.08 0.10
0.2
0.4
0.6
0.8
1
Significance Level
Fal
se P
ositi
ve( α
’)/F
alse
Neg
ativ
e( β
’)
K−S test False Negative (β’)A−D test False Negative (β’)K−S test False Positive (α’)A−D test False Positive (α’)
Figure 4.6. A tradeoff between α′ and β′ for the attacker (α = 0.05).
When we try to use simulations to verify the above statement, we have the
following observations (Fig. 4.6). First, the false negative rate β ′ of the attacker’s
test is high because it is hard for the attacker to detect the disturbed message
transmission intervals of real events, which means the real event detection rate
of the attacker is actually low. Second, if the attacker tries to decrease the false
negative rate β ′ by selecting a higher significance level in the distribution test,
then the false positive rate α′ will increase. More specifically, as shown in Fig. 4.6,
when the significance level α in our scheme is fixed to be 0.05, if the significance
level in the attacker’s test increases from 0.010 to 0.100, the false positive rates
(α′) of the attacker are increased from 0.010 to 0.100 in A-D test or from 0.010
to 0.105 in K-S test, respectively. At the same time, the attacker’s false negative
rate (β ′) decreases from 0.920 to 0.838 in case of A-D test or from 0.992 to 0.852
in case of K-S test. In summary, no matter what statistical distribution test the
![Page 71: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/71.jpg)
56
attacker uses, there is a tradeoff between false negative rate and false positive rate
for the attacker, thus he cannot make both of them as small as possible.
We also check the impact of the significance level in our scheme to the detection
rate of the attacker. If the significance level α in the A-D test of our scheme is
larger, e.g., it is increased from 0.05 to 0.10, then the distributions of message
time intervals from real sources in our scheme exhibit less abnormality, i.e., F (X i)
is closer to F (X). Hence, it is harder for the attacker to detect the real events
and thus the false negative rate of the attacker is slightly higher (the figure is not
shown here because it has only a slight difference with Fig. 4.6).
4.2.5.3 Robustness to Mean Test
After the distribution test, the attacker may conduct the mean test (e.g., SPRT
test) to detect the disturbed means due to real event message transmissions. In
the SPRT test, after the attacker chooses a threshold ǫ′ (in contrast to the cor-
responding recovery threshold ǫ defined in our scheme), the attacker can do the
following to detect real event messages:
• Test two alternatives H0 : µi ≥ µ, H1 : µi ≤ µ1, if we denote µ1 = (1− ǫ′)µ,
where µi is the sample mean from cell i and µ is the population mean of
the exponential distribution. Because in our scheme sample mean tends to
be smaller than population mean according to Algorithm 3, with µi ≥ µ the
attacker can safely decide that there is no real event happening;
• Choose among three possible decisions: (i) accepting H0 means that there
are no real event messages; (ii) accepting H1 means that there are real event
messages; or (iii) continue the test due to insufficient observations.
Following [73], the above composite hypotheses could be converted to simple hy-
potheses H0 : µi = µ and H1 : µi = µ1. Accepting H0 may cause false negative
(β ′) and accepting H1 may cause false positive (α′).
In more detail, the SPRT mean test for the attacker works as follows. Each
time a new message time interval imdik(k ≥ 1) from cell i is observed, the following
statistics will be calculated
sk = logf(imdi
1, µ1) · · · f(imdik, µ1)
f(imdi1, µ) · · ·f(imdi
k, µ),
![Page 72: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/72.jpg)
57
where f is the probability density function of the exponential distribution. Two
boundaries A and B are decided according to the predetermined false positive rate
α′ and false negative rate β ′: A = log 1−β′
α′and B = log β′
1−α′. If sk ≤ B, the test
is terminated and H0 : µi = µ is accepted. If sk ≥ A, the test is terminated and
H1 : µi = µ1 is accepted. If B < sk < A, more observations are needed to make a
decision.
Simulation results of the SPRT test for the attacker are presented in Table 4.1
and Table 4.2. In both tables, the significance level α = 0.05 and the recovery
threshold ǫ = 0.05 in our scheme; the sample data could pass the exponential
distribution test, among which about one half of the message transmission intervals
are disturbed ones from randomly distributed real event messages. In Table 4.1,
we fix the attacker’s false negative rate β ′, and check the impact of α′ and ǫ′ to the
number of observations needed for the attacker to make a decision. In Table 4.2,
we check the impact of β ′ and ǫ′ under the condition that α′ is fixed.
From these two tables, we can see that there is a high chance for the attacker to
fail in real event detections. The test results always accept H0, which increases the
attacker’s false negative because message time intervals from real events cannot be
detected by the attacker. In addition, there is long delay for the attacker to make
a decision. For example, when α′ = 0.05, β ′ = 0.20, and ǫ′ = 0.05, after the first
message there are more than 1,000 observations needed for the attacker to draw a
decision. Even if the attacker’s conclusion is correct in the end, this may render
the attacker’s mean test worthless.
We also have the following observations from the tables. First, the number
of observations for the attacker to make a decision decreases with the attacker’s
false positive/negative rate. When the number of observations to make a decision
decreases, both the false negative rate β ′ and false positive rate α′ of the attacker
are higher. That is, if the attacker wants to make a faster decision, there is a
higher chance for the attacker to make a mistake. On the other hand, if the
attacker wants to reduce the false positive rate and the false negative rate, then
the attacker needs more observations to make a decision, which takes longer time.
Second, the number of observations for the attacker to make a decision decreases
with the attacker’s recovery threshold ǫ′. If the recovery threshold is higher (e.g.,
increased from 0.05 to 0.10), the sample data exhibit less abnormality according
![Page 73: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/73.jpg)
58
to the attacker’s criteria. Therefore, the attacker draws a quicker conclusion to
say that there are no real event messages (although it is still a wrong decision).
In conclusion, the attacker cannot effectively detect the happening of real events
even after he employs the SPRT mean test. We notice that SPRT test is not the
only choice for the attacker to detect changed sample mean, but we believe that
the observations will be similar no matter what mean test the attacker adopts.
Table 4.1. # of observations to draw a decision in SPRT when α′ changes (β′ = 0.05)
α′ 0.01 0.05 0.10 0.20 Test result
# of obs. 2198 2192 2058 2054 accept H0
(ǫ′ = 0.05) (false negative)
# of obs. 612 612 611 591 accept H0
(ǫ′ = 0.10) (false negative)
Table 4.2. # of observations to draw a decision in SPRT when β′ changes (α′ = 0.05)
β′ 0.01 0.05 0.10 0.20 Test result
# of obs. 3156 2192 1799 1316 accept H0
(ǫ′ = 0.05) (false negative)
# of obs. 921 612 472 361 accept H0
(ǫ′ = 0.10) (false negative)
4.2.6 Conclusion
In this section, after analyzing the source anonymity problem under a strong attack
model where the attacker has a global view over all the network traffic, we find
out a tradeoff between performance and privacy. For the first time, we propose
the notation of statistically strong source anonymity for sensor networks. We also
devise a realization scheme called FitProbRate, which achieves statistically strong
source anonymity under such a specific circumstance. Performance evaluations
demonstrate that by this scheme, the event report latency is largely reduced and
source location privacy could be preserved even if the attacker conducts various
statistical tests.
![Page 74: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/74.jpg)
59
4.3 Preserve Source Anonymity with Minimum
Network Traffic
As we mentioned before, the message overhead reduced by the scheme is usually
dependent on the locations of the proxies. For this reason, based on local search
heuristics we devise a proxy placement algorithm for each scheme to minimize
the overall message overhead. Since real event messages may be delayed at the
source due to the need to postpone their transmission, we also select suitable
parameters for the buffers at the proxies to reduce buffering delay while preserving
event source unobservability. Our simulation results indicate that our schemes not
only find nearly optimal proxy placement efficiently but also yield high delivery
ratio and low bandwidth overhead, relative to the baseline scheme. A prototype of
our schemes is implemented for TinyOS-based Mica2 motes, which consumes only
about 400 bytes in the RAM space.
The rest of the section is organized as follows. We first describe the problem and
build our model in Section 4.3.1. Then, Section 4.3.2 presents our PFS scheme and
Section 4.3.3 presents the TFS scheme. Some issues are discussed in Section 4.3.4.
After that, simulation and implementation results are presented in Section 4.3.5.
Finally, we conclude in Section 4.3.6.
4.3.1 System Model and Design Goals
4.3.1.1 Network Model
As in [74], our system assumes that a sensor network is divided into cells (or grids)
where each pair of nodes in neighboring cells can communicate directly with each
other. A cell is the minimum unit for detecting events; a cell head coordinates
all the actions inside a cell. Each cell has a unique integer id (in the range [1, n],
where n is the total number of cells) and every sensor node knows the cell in which
it is located through its GPS or an attack-resilient localization scheme [65]. Also,
we assume that a base station (BS) is located at the center of the network and
works as the network controller to collect event data. An event report contains
such information as the id of the detecting cell, the event type, and the detection
time.
![Page 75: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/75.jpg)
60
4.3.1.2 Attack Model
We assume that the adversary is external, passive and global. By external, we
mean that the adversary will not compromise or control any sensors; by passive,
we assume that the attacker does not conduct active attacks such as traffic injec-
tion, channel jamming and denial of service attack; by global, we assume that the
adversary can collect and analyze all the communications in the network. Note
that such a global attacker does not necessarily mean the attacker’s capability of
detecting the occurrence of real events in any places of the network by himself, be-
cause (1) real event detection devices are often costly, whereas message collection
devices are inexpensive and off-the-shelf; (2) real event detection devices such as
animal-monitoring camera normally do not have sizes as small as regular sensors,
so they are easy to be detected and destroyed. Although we do not consider sen-
sor node compromises in the attack model, we will discuss this problem later in
Section 4.3.4.3.
To be more specific, the adversary may launch the following attacks in our
model. First, he may simply examine the content of an event message to see if it
contains the source location id. Second, even if the message is encrypted, it is easy
for him to trace back to the source of the message if the encrypted message remains
the same during its forwarding, because the adversary is capable of identifying
the immediate source of a message transmission. Third, he may perform more
advanced traffic analysis including rate monitoring and time correlation. In a rate
monitoring attack, the adversary pays more attention to the nodes with different
(especially higher) transmission rates. In a time correlation attack, he may observe
the correlation in transmission time between a node and its neighbor, attempting
to deduce a forwarding path.
4.3.1.3 Design Goals
Providing event source unobservability under the global attack model is challeng-
ing. To prevent content-based analysis, we may encrypt all the packets during their
forwarding, and also make all the packets in the network of the same length. How-
ever, these techniques cannot defend against rate monitoring and time correlation
attacks.
![Page 76: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/76.jpg)
61
B S
Figure 4.7. Illustration of PFS. Blank circles and filled circles represent sources andproxies, respectively; dashed lines and solid lines denote bogus messages and real mes-sages, respectively.
To solve these traffic analysis attacks, we notice that there exist trade-offs
between various performance and security metrics, such as privacy, delay, and
communication cost. If all packets in the network are real event packets and every
node reports and forwards a real event message immediately, it will be easy for
a global attacker to trace back to the real source. Therefore, not only network-
wide dummy traffic [23] but also delays in event reporting and forwarding have
to be introduced at the nodes. Clearly, dummy traffic will significantly increase
the network traffic, which is undesirable for sensor networks where communication
overhead dominates the entire energy expenditure. To guarantee event source
unobservability without causing the explosion of network traffic, in this section
our goal is to minimize the network traffic. Since it is hard to minimize the event
report delay simultaneously, the proposed schemes are best suitable for applications
in which a certain degree of delay could be tolerated.
4.3.2 Proxy-based Filter (PFS) Scheme
4.3.2.1 Scheme Overview
To employ dummy traffic to hide real events without incurring much message over-
head, we select some sensors (in certain cells) as proxies to filter dummy messages
before they reach the BS. The locations of these proxies are determined during
![Page 77: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/77.jpg)
62
network planning with the goal of minimizing aggregate network traffic. After
network deployment, each proxy broadcasts a “hello” message with TTL (time to
live) information that is large enough to reach every cell in the network. Every
cell receiving these “hello” messages records the proxy which is nearest to it as its
default proxy. Every cell also sends back responses to its proxy so that each proxy
knows which cells it serves for.
We assume each cell can establish a pairwise key with a proxy on the fly based
on an appropriate keying scheme [75] and each proxy shares a key with the BS.
When the network is in operation, each cell sends encrypted messages via unicast
to its default proxy through a multi-hop routing protocol such as GPSR [76]. To
satisfy our requirement of event source unobservability, the time intervals of these
messages follow an exponential distribution (selecting other message generation
patterns, such as a constant rate, does not affect our filtering schemes). When
a cell detects an event, it postpones the transmission of the encrypted real event
message to the next probabilistic interval, so that based on time analysis this
message cannot be differentiated from dummy traffic.
Upon receiving a bogus message, a proxy performs filtering by discarding such
a message. Upon receiving a real event message, the proxy reencrypts it (with
a key shared with the BS) and forwards it towards the BS after an appropriate
buffering time. In case of no real event message available, a proxy sends encrypted
dummy ones instead. Note that proxies can differentiate dummy messages from
real messages because they can properly decrypt the message using the correspond-
ing pairwise key. If a proxy receives messages from other proxies, it just forwards
them to the next hop. Figure 4.7 shows an example where PFS is employed for
privately reporting elephant locations.
In summary, dummy messages are generated to hide real event messages. Dur-
ing this process, we minimize network traffic through optimal proxy deployment
and preserve event source unobservability through appropriate filtering behavior
inside proxies. Next, we describe optimal proxy placement in Section 4.3.2.2 and
present proxy operations in Section 4.3.2.3. Finally, we analyze the security prop-
erty of PFS in Section 4.3.2.4.
![Page 78: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/78.jpg)
63
4.3.2.2 Proxy Placement
Problem Statement
Deploying proxies in the right locations is crucial to the performance of our
network. For example, if all the proxies are deployed close to each other, network
traffic cannot be reduced effectively. Similarly, if all the proxies are placed far
away from the BS, the number of bogus messages that can be filtered by proxies
will be very limited. We consider the minimization of aggregate network traffic as
the optimization criterion for our proxy placement. Aggregate traffic is defined as
traffic rate×message size×number of hops (unit is byte×hop/second). Since the
sizes of all the event messages are the same, we only need consider traffic rate and
total message transmission hops in the optimization problem.
In more detail, our proxy placement problem could be formalized in the fol-
lowing way. Suppose the set of all n cells in the network is denoted as V (i.e.,
|V | = n). Moreover, P is the set of proxy cells with size k(k ≤ n) and P is a
subset of V . Since the closest proxy is the one that filters dummy messages for
the cell, for a normal cell i in the network, its distance (i.e., number of hops) to
the corresponding proxy could be expressed as:
d(i) = minj∈P
d(i, j), (4.1)
where d(i, j) is the distance between cell i and proxy j (1 ≤ i ≤ n, 1 ≤ j ≤ k).
Suppose that all the cells in the network send out event messages (dummy or real)
following the same traffic rate rsource and the outgoing traffic rate from proxies is
rproxy (the relationship between rsource and rproxy is determined by the buffering
behavior inside the proxies). Our purpose is to minimize the following cost:
cost = rsource ·∑
i∈V
d(i) + rproxy ·∑
j∈P
c(j). (4.2)
That is,
cost = rsource ·∑
i∈V
minj∈P
d(i, j) + rproxy ·∑
j∈P
c(j), (4.3)
where c(j) is the distance from proxy j to the BS.
A solution to the proxy placement problem consists of the number of proxies
![Page 79: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/79.jpg)
64
needed and their locations that minimize the cost described above. This prob-
lem is NP-hard, because one proxy’s even one-step of position change may cause
recalculations for all cells, which cannot be effectively solved in polynomial time.
The well-known facility location problem [77] reduces to it. We adapt heuristics
based on localized search [78, 79] to efficiently solve the proxy placement prob-
lem. Next, based on this idea, we devise proxy placement algorithm that can find
approximately optimal locations for proxies, given any rsource and rproxy.
Algorithm 5 Proxy Placement Algorithm in PFS
Input: a cell-based network topology with node set V ; the total number of nodesn;Output: a set of proxies P ;Procedure:
1: P ← Φ; P ′ ← Φ; {cost(Φ) =∞}2: for k ← 1 to n− 1 do3: placement(k); {Update P ′}4: if cost(P ′) < cost(P ) then5: P ← P ′;6: end if7: end for8: return P ;9:
10: placement(k)11: P ′[0]← BS;12: for i← 1 to k do13: P ′[i]← i;{Initialize P ′[0] . . . P ′[k]}14: end for15: for ∀i ∈ P ′ and ∀j 6∈ P ′ and i, j ∈ V do16: P ′′ ← P ′ − i + j; {Swap i and j}17: if cost(P ′′)< cost(P ′) then18: P ′ ← P ′′;19: end if20: end for;{Loop ends after we try all the combinations of i and j}
Proxy Placement Algorithm
The details of our proxy placement algorithm are presented in Algorithm 5.
Given k, the number of proxies to be deployed, we begin with a random initial set
P ′ of size k (e.g., the first k nodes) with the BS as a default proxy. Our algorithm
proceeds in steps, in each of which we try to swap a node in set P ′ with a node
![Page 80: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/80.jpg)
65
that is currently not in set P ′, if such a swap would reduce the aggregate network
traffic. If a swap succeeds, then we update the set P ′ accordingly. We repeat this
process until no more swaps are possible. At this point, the cost reaches a local
minimum. Note that for any k this process is guaranteed to converge since each
swap results in a reduction in the cost of the solution which is lower bounded by a
positive value corresponding to the optimal solution for k. We vary k from 1 to n
and record the set P with the minimum cost. After we obtain P , the size of P is
the number of proxies in our deployment and cells corresponding to the set P are
those where we place these proxies.
0 20 40 60 80
100
150
200
250
Number of Proxies
Tra
ffic
(14,96)PFS
Figure 4.8. The optimal number of prox-ies in PFS.
B S P F S
B r u t e F o r c e
Figure 4.9. The optimal proxy placement.
We have the following results on the time complexity of Algorithm 5. For the
average case, it is hard to analyze the number of iterations in each placement(k).
Similar to [80], we resort to experiment and find that the average-case complexity
of this algorithm tends to be O(n4).
In the worst case, the initial set P ′ is shifted by a maximum distance. For
example, suppose the initial set P ′ = {BS, 1, 2, · · · , k} and the algorithm returns
{BS, n−k+1, · · · , n} (BS is a default proxy). Then, each element in P ′ is shifted
by a distance n − k + 1, one step each time. Take element k in the initial set for
an instance, the shifting sequence will be k, k + 1, · · · , n. For the first shift from
k to k + 1, we need to compare k + 1 with all the elements in P ′ which includes
1, 2, · · · , k and the last attempt succeeds. Totally k swaps have been tried. For
the second shift from k +1 to k +2, first, we compare k with all the elements in P ′
![Page 81: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/81.jpg)
66
81 169 289 441 62510
20
30
40
50
60
70
Network Size
Opt
imal
Pro
xy N
umbe
r
Figure 4.10. The impact of network scale on the optimal proxy number in PFS.
including 1, 2, · · · , k−1, k+1 and all attempts fail (the last attempt between k and
k +1 can be ignored, but we count it here for simplicity). Then, we compare k +2
with all the elements in P ′ which includes 1, 2, · · · , k−1, k+1 and the last attempt
succeeds. Hence, totally 2k swaps are attempted. Similarly, this process continues
until the last shift from n − 1 to n which has (n − k + 1)k swaps. Therefore, for
each element like k in set P ′, there are k ·∑n−k+1
1 i = O((n − k)2) swaps being
attempted. To shift the initial set of P ′ to its final set, the total number of swaps
is∑k
i=1 i(n− k + 2)(n− k + 1)/2 = O(k2(n− k)2). Since the time to execute each
swap is O(k(n − k)), the time for one placement(k) is O(k3(n − k)3). We try all
the possible k from 1 to n. Hence, the worst-case time complexity of Algorithm 5
is∑n
k=1 O(k3(n − k)3) = O(n7), which is much less than that of the brute force
method (where we try all the possibilities to finally find an optimal placement) in
O(n!) (when n ≥ 2).
Through employing local search heuristics, Algorithm 5 can be completed in
polynomial time. Note that Algorithm 5 is executed during network planning by
network controller who has sufficient computation and energy resources, with the
results being preloaded in each sensor’s memory before node deployment. Individ-
ual sensors do not need to compute this separately.
Performance Evaluation
We conducted simulations to evaluate Algorithm 5. In the simulation, we setup
a network with n = 81 and continuously run placement(k) with k = 0, · · · , 80.
![Page 82: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/82.jpg)
67
As shown in Figure 4.8, the minimum cost obtained by our algorithm is 96 when
k = 14. It is as accurate as the globally optimal value returned by the brute force
method. We also evaluate the efficacy of our algorithm in determining the locations
where the proxies should be deployed through simulation. As can be seen in
Figure 4.9, positions of proxies returned by our algorithm and the optimal positions
obtained by the brute force method are almost the same. Hence, we believe that
our algorithm can quickly find a placement solution that closely approximates the
optimal solution.
We also check the impact of network scale on the number of proxies chosen by
our placement algorithm. The results are shown in Figure 4.10. When the network
sizes are 81(9 × 9), 169(13 × 13), 289(17 × 17), 441(21 × 21), and 625(25 × 25),
the numbers of proxies determined by our algorithm are 14, 22, 37, 50, and 64,
respectively. These results will also be used in our following performance evaluation
in Section 4.3.5. Apparently, the optimal number of proxies increases with a larger
network scale.
4.3.2.3 Proxy Operations
Upon receiving a message, a proxy performs the following operations to reduce the
network traffic while preserving event source unobservability:
• First, the proxy decrypts the message so that the proxy can differentiate real
event messages from bogus ones;
• Second, the proxy drops the message immediately if it is a bogus message.
If on the other hand the message corresponds to a real event, the proxy
re-encrypts the decrypted message;
• Third, the proxy puts this re-encrypted real event messages into its message
buffer. After a constant time, a message, either bogus or real, will be sent
out from the proxy node.
Let us discuss these internal operations of a proxy in more detail. These op-
erations can be understood using the state transition diagram of a proxy shown
in Figure 4.11. As shown, a proxy is initialized to be in the Waiting state. Upon
receiving a message, the proxy first decrypts the packet. If the received message
![Page 83: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/83.jpg)
68
W a i t i n g
B o g u s R e a l
R e c v a
b o g u s m
s g
a n d q u e u
e i s e
m p t y
R e c v a r e a l m s g
o r q u e u e i s n o t e m p t y
R e c v a b o g u s m s g
R e c v a b o g u s / r e a l m s g
A f t e r T
b , s e n
d o u t
a b o g
u s m s g
A f t e r T b , s e n d o u t
a r e a l m s g
R e c v a r e a l m s g
Figure 4.11. State transitions of proxies (there are three states: waiting, bogus, real).
is bogus, the proxy drops this message immediately. If its message buffer is now
empty, the proxy changes its state to Bogus. Otherwise, if the received message
is real then the proxy saves the real message in its message buffer. Whenever
there are real messages in the message buffer to be sent out, the proxy switches its
state to Real. Additionally, if the proxy is in state Bogus, it will change to state
Real if a real message is received and remain in state Bogus if a bogus message is
received. On the other hand, if the proxy is in state Real, it will remain in state
Real regardless of the type of message received as long as there is at least one real
message in the buffer.
Recall that the goal of each proxy is to ensure that the outgoing traffic con-
firms to a rate of rproxy requests per time unit. Recall also that in this work we
consider outgoing traffic with requests (messages) equally spaced in time. That is,
a proxy emits a message once every Tproxy = 1rproxy
time units. To achieve this, the
proxy repeats the following process over non-overlapping and successive intervals
of duration Tproxy each. During each of such an interval, the proxy adds all the
received real messages to its buffer. If the buffer is full, then the new incoming real
messages are dropped (in our real implementation for Mica2 sensor nodes, such
messages are buffered in the large flash memory to avoid dropping). At the end
of each interval, the proxy sends out a message depending on what state it was
in: if it is in state Bogus, a bogus message is sent out; if it is in state Real, a real
message is sent out (FIFO ordering is used in case multiple real messages have
![Page 84: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/84.jpg)
69
0
20
40
60
80
100
120
140
160
180
5 10 15 20 25 30 35 40 45 50
Del
ay (
unit)
Tproxy
Theoretical ValueSimulation Value
(a) Delay
2
4
6
8
10
12
14
16
18
20
5 10 15 20 25 30 35 40 45 50
Max
Que
ue S
ize
Tproxy
Simulation Value
(b) Max Queue Length
Figure 4.12. Delay and max queue length under λrealP
= 1/60 per time unit.
buffered up).
While the introduction of buffering at the proxies enables the network to
achieve its goals of event source unobservability (as will be shown formally in
Section 4.3.2.4), it degrades performance by introducing additional delays in the
delivery of real event messages. How can we estimate the delay in message deliv-
ery resulting from PFS? We use queuing theoretic model of a proxy to conduct a
simple analysis of this delay.
Let λP denote the rate at which messages arrive at the proxy P. Recall that
our design makes source nodes generate traffic with exponentially distributed inter-
arrival times with a rate rsource that is identical across all the source nodes. There-
fore, λP = nP · rsource, where nP is the number of source nodes associated with the
proxy P. Since the processing time at the proxy (in the millisecond level for a Mica
node, which is mainly due to decryption/reencryption) is significantly smaller than
Tproxy, we make the simplifying assumption that this process time can be ignored.
In other words, the delay caused in the delivery of a real message is essentially
the sum of the delays caused by the buffering at the source node where the event
occurred and at the proxy that the message passes through on its way to the BS.
Finally, we assume that the fraction f realP
of the messages arriving at P is real. We
can now view the proxy P as an M/G/1 [81] queuing system since: (i) the arrival
process is Poisson with a rate f realP· λP and (ii) the time a real request spends in
the message buffer may be considered as a random variable sP representing the
“service time” of the request. Note that the notion of servicing in this queue is an
![Page 85: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/85.jpg)
70
2.5
3
3.5
4
4.5
5
5.5
10 20 30 40 50 60 70 80 90 100
Del
ay (
unit)
1/λpreal
Theoretical ValueSimulation Value
(a) Delay
3
4
5
6
7
8
9
10 20 30 40 50 60 70 80 90 100
Max
Que
ue S
ize
1/λpreal
Simulation Value
(b) Max Queue Length
Figure 4.13. Delay and max queue length under Tproxy = 5 time units.
abstract one and the time spent waiting in the message buffer is being modeled as
the service time in a hypothetical server. Also, our assumption above implies that
the dummy traffic arriving at P plays no role in our analysis. Therefore, a request
will be meant to signify a real request in the rest of the analysis.
We can derive some useful properties of the random variable sP using standard
queuing theory. Let pP denote the probability that a newly arriving request finds
the queue representing P idle. The well-known PASTA property (“Poisson Arrivals
See Time Averages”)[81] states that for queuing systems with Poisson arrivals, the
fraction of requests finding the queue idle upon arrival is exactly the same as the
fraction of time the system is idle. This implies that pP is equal to the probability
that the queue is idle. Clearly, requests that arrive to the queue when it is idle
will experience an average service time Tproxy
2because in this case service times
follow a uniform distribution U [0, Tproxy] (and hence the variance is T 2proxy/12);
other requests will experience a service time Tproxy since requests emerge from P
at the rate rproxy. Therefore, we have,
E[sP ] = pPTproxy
2+ (1− pP)Tproxy; (4.4)
E[s2P] = pP
T 2proxy
3+ (1− pP)T 2
proxy (4.5)
![Page 86: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/86.jpg)
71
In Formula (5)T 2
proxy
3is derived from the variance and the mean of the service time
when the queue is idle. Furthermore, queuing theory [81] gives us,
pP = 1− f realP · λP · sP (4.6)
Combining these equations, and denoting f realP
λP as λrealP
, we have,
E[sP ] =Tproxy
2− λrealP
Tproxy
; (4.7)
E[s2P ] = (1− λreal
P E[sP ])T 2
proxy
3(4.8)
+λrealP E[sP ]T 2
proxy
The average delay dP experienced by a message at the proxy P is then given by
the following result known for the average sojourn time of a request in our M/G/1
queuing system,
dP = E[sP ] +λrealP
(E2[sP ] + E[s2P])
2 · (1− λrealP
E[sP ])(4.9)
To verify our theoretical results we use CSIM [82] to simulate this queueing
model. In our simulation setup, Tproxy changes from 5 to 50 time units and λrealP
changes from 1/10 to 1/100 per time unit. We obtain the queuing delay as shown in
Figure 4.12(a) and Figure 4.13(a). The error-bars show the 95% confidence interval
of the simulation result. From these figures, we can see that our theoretical values
and simulation results match well. In Figure 4.12(a), if the incoming rate λrealP
is
fixed, then the average delay increases as the interval Tproxy increases, because of
the longer queue length and buffer time. On the other hand, in Figure 4.13(a), if
the interval Tproxy is fixed, then the average delay decreases as the incoming rate
λrealP
decreases, since the buffer is less occupied.
We also use simulation to check the queue length. Our simulation results in
Figure 4.12(b) and Figure 4.13(b) show the maximum queue lengths under different
simulation settings. These results could be used to provide guidance for allocating
proper buffer size. If we select a larger interval Tproxy, then the buffer size should
also be increased accordingly. Also, if the incoming rate λrealP
is lower, then a
smaller buffer size will be enough for our application.
![Page 87: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/87.jpg)
72
4.3.2.4 Security Analysis
According to [83], we have the following definitions on event source unobservability.
Definition 4. If for each possible observation O that an attacker A can make,
the probability of an event E is equal to the probability of E given O, that is:
∀O, P (E) = P (E|O), then E is called unobservable.
Definition 5. A system has the property of event source unobservability if
any event E happening in this system is unobservable: ∀E, ∀O, P (E) = P (E|O).
Next, we prove that our system can achieve the property of event source un-
observability.
THEOREM 1. PFS has the property of event source unobservability.
Proof. (Sketch) P (E) = P (E|O) in Definition 4 means that event E and observa-
tion O are independent (P (E ∩ O) = P (O) · P (E|O) = P (O) · P (E)). Therefore,
according to the above definitions, if we want to prove that PFS has the property of
event source unobservability, then we need prove that any event E is independent
of the observation O that the attacker A makes.
Let us first consider all the possible observations that an attacker can make
from the system. The adversary can observe messages being generated at each
cell with intervals following a certain probabilistic distribution. However, as the
messages are encrypted and of the same length, the attacker cannot distinguish the
real ones from the dummy ones. These messages are then relayed on visible multi-
hop paths to proxies. Proxies drop and delay messages, but the attacker does
not know which messages are dropped and which are forwarded by proxies due
to message reencryption and constant-rate eviction. The outgoing messages from
proxies are finally forwarded to the BS in always the same way. Thus, even with
all these observations, the attacker cannot gain any additional information on real
events. The occurrence of a real event E is independent of attacker’s observation
O. Therefore, every real events are unobservable for the attacker. According to
Definition 5, PFS has the property of event source unobservability.
![Page 88: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/88.jpg)
73
10 20 30 4085
90
95
100
105
110
115
Number of Proxies
Tra
ffic
PFSTFS−2TFS−3
Figure 4.14. Improvement of TFS over PFS.
010
2030
4050
0
10
20
300
50
100
150
200
250
300
Tproxy1
Tproxy2
Del
ay (
Uni
t)
Theoretical Value
Simulation Value
(a) λ = 1/60
2040
6080
100
0
2
4
62
3
4
5
6
7
8
1/λT
proxy2
Del
ay(u
nit)
Theoretical Value
Simulation Value
(b) T 1
proxy= 5
Figure 4.15. Delay in TFS under different Tproxy and λ (tree level l = 2).
4.3.3 Tree-based Filter Scheme (TFS)
If the number of proxy nodes is large enough, further reduction in the dummy
traffic is possible by allowing messages to be filtered at multiple proxies on their
way from source nodes to the BS. Note that in PFS, even though a message may
traverse through multiple proxies, it is filtered only at the default proxy of the cell
that this message originates from. Building upon this core idea, we propose a tree-
based filtering scheme (TFS) in which the proxies in our network are organized
in the form of a tree rooted at the BS. Proxies in TFS, thus, form a hierarchy
with each proxy having a parent node and possibly multiple child nodes. With the
![Page 89: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/89.jpg)
74
resulting multi-level filtering, we expect lower network traffic because more dummy
messages will be dropped before they reach the BS. The reduction in traffic due to
TFS, however, will come at the expense of increased latency of real event delivery
since each message may now incur buffering-induced delays at multiple proxies
on its way to the BS. This trade-off between traffic and latency is central to the
research issues involved in the design and efficacy of TFS.
4.3.3.1 Hierarchical Proxy Placement
Randomly picking up proxies might end up assigning more proxies to some paths
than others, which may limit the overall filtering efficacy of TFS. Therefore, similar
to our approach in PFS, we devise a heuristic based on localized search to derive
effective proxy placement in an efficient way. Through experiments, we found
that if there are more constraints on the proxy placement then the result of the
algorithm is better (i.e., closer to the globally optimal solution). Hence, in the
algorithm we specify the total number of levels for the proxy tree. Another benefit
of this mechanism is that by assigning a small number of levels we can also reduce
the real event report latency since real event messages need go through fewer levels
of proxies before they reach the BS.
We adapt Algorithm 5 to implement the proxy placement algorithm for TFS-l
(the proxy level l is specified as a parameter). We do not present its details since
these two algorithms follow the same local search heuristics so the changes are
not much. Different from Algorithm 5 in PFS, here the set P ′ has an internal
hierarchical structure. Because of this, an inner-swap, which works by exchanging
the positions of proxies within the current set, might reduce the cost too. The proxy
number k is iterated from 1 to n. For each given k, we try all the combinations of
proxy numbers in every level and record the set with the minimum cost as P . The
algorithm keeps on running until no more swap or inner-swap that could reduce
the cost exists, so that the total cost reaches its local minimum.
We also briefly analyze the time complexity of this new algorithm. Analyz-
ing the time complexity for the average-case is difficult since quantifying the to-
tal number of iterations in each placement(k) is unwieldy. Through repetitive
experiments and curve fitting, we find that the average time complexity of this
algorithm is about O(n3+l). In the worst case, the time complexity of this algo-
![Page 90: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/90.jpg)
75
rithm is O(n6+l). When l = 1, this time complexity is exactly the same as that of
Algorithm 5, which verifies the correctness of our analysis.
We use simulations to compare the traffic generated by the two schemes. With
n = 81, for proxy number k ranging from 5 to 40, we check the trends of traffic un-
der different number of tree levels (i.e., l = 1, 2, 3, respectively). The improvement
of TFS over PFS in overall traffic is evident in Figure 4.14. However, we notice
that event latency may be increased in TFS since each message may go through
several proxies, each of which has a buffering period.
4.3.3.2 Multi-level Buffering Delays
In the hierarchical organization of proxies employed in TFS, the proxies at each
level have an interval that determines the rate at which traffic emerges from them.
For l levels in the hierarchy, we use the notation T 1proxy, ...T
lproxy to represent the
intervals, with level-1 corresponding to the leaf nodes and level-l corresponding
to the children of the BS. The analysis in PFS extends directly to the “leaf-level”
proxies in the TFS hierarchy of proxies. For intermediate proxies, however, the sit-
uation becomes more complicated. The incoming traffic to an intermediate proxy
Pint consists of: (i) the outgoing traffic from one or more other proxies that are
its “children” in the tree imposed on the proxies by the placement algorithm, and
(ii) messages from the cells that Pint is in-charge of. The outgoing traffic from a
proxy described in PFS was designed to follow a constant rate rproxy, whereas the
messages received from a cell arrive according to a Poisson process with a rate of
rsource. As a result, the incoming traffic for the intermediate proxies in TFS does
not conform to a Poisson process (as it did for the proxies in PFS and continues to
do for the “leaf” nodes in TFS). This does not compromise the privacy guarantees
we wish to provide. However, it does mean that the applicability of our M/G/1
queueing model for determining the buffering-induced delay at such intermediate
proxies becomes questionable. This can be addressed easily by re-designing the
buffering mechanism in our proxies such that they emit traffic that conforms to
a Poisson process instead of generating the deterministic traffic described in PFS.
That said, we note that the estimation of delay is not a key concern in our cur-
rent research since our focus is on providing privacy for relatively delay-tolerant
applications. Therefore, we continue with a proxy design that generates output
![Page 91: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/91.jpg)
76
traffic deterministically with fixed time-gaps between messages and use the M/G/1
model as an approximate representation of a proxy. We evaluate this approxima-
tion empirically below.
We conduct simulations to check the relationship of real event report delay with
the buffer intervals for each level as well as the traffic arrival rate. The real event
report delay under consideration is an aggregate delay since each message may go
through several proxies, each of which has a buffering period. In Figure 4.15(a), we
fix the traffic rate arriving at an illustrative 2-level hierarchy consisting of proxies
P1 and P2 (P1 is the only child of P2 and the only source of input traffic to P2 in
this example) to be 1/60 per time unit and observe the change of delay introduced
by the buffering at P1 and P2 with different T 1proxy and T 2
proxy. Clearly, delay
increases as T 1proxy or T 2
proxy increases. In Figure 4.15(b), we fix T 1proxy = 5 time
units and present the change of delay with the traffic arrival rate, with the results
showing that delay decreases as the mean arrival rate decreases.
4.3.3.3 Security Analysis
THEOREM 2. TFS has the property of event source unobservability.
The correctness of this theorem can be proved based on the similar arguments
we used to prove Theorem 1. We omit the details due to space limit.
4.3.4 Practical Considerations
In this section, we address two key sets of issues that must be addressed for effective
realization and deployment of the privacy-preserving schemes developed in this
work.
4.3.4.1 System Parameters
Choosing appropriate values for the source traffic generation rate and the buffer-
ing intervals is very important to balance privacy, delay, and message overhead
according to the needs of our applications. We notice that under the purpose of
achieving event source unobservability, delay and overhead are actually tightly re-
lated to each other. How to choose parameters depends on the relative criticality
of these two requirements in our application.
![Page 92: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/92.jpg)
77
0 2 4 6 8 10
500
1000
1500
2000
2500
3000
Mean of bogus messages (s)
Ove
rhea
d (p
kts/
s)
baselinepfstfs−2
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Mean of bogus messages (s)
Del
iver
y R
atio
baselinepfstfs−2
0 2 4 6 8 100
2
4
6
8
10
12
Mean of bogus messages (s)
Del
ay (
s)
baselinepfstfs−2
(a) Overhead (b) Delivery Ratio (c) Delay
Figure 4.16. Performance under different bogus message generation rate (heavy-ratereal events).
100
101
102
500
1000
1500
2000
2500
3000
Mean of bogus messages (s)
Ove
rhea
d (p
kts/
s)
baselinepfstfs−2
100
101
102
0
0.2
0.4
0.6
0.8
1
Mean of bogus messages (s)
Del
iver
y R
atio
baselinepfstfs−2
100
101
102
0
1
2
3
4
5
6
7
8
9
Mean of bogus messages (s)D
elay
(s)
baselinepfstfs−2
(a) Overhead (b) Delivery Ratio (c) Delay
Figure 4.17. Performance under different bogus message generation rate (light-ratereal events).
The first parameter to be decided is the source traffic rate (consisting of real and
dummy messages) rsource. If the dummy traffic rate is too high, it will unnecessarily
cause high message overhead; if it is too low, real event messages will experience
high transmission latency at the sources. It is desirable to have this rate as close
as possible to the average real event message rate. We believe that in practice it
is not difficult to determine an appropriate rsource based on historical information
about event generation at the sources.
The more interesting issue concerns the buffering interval Tproxy employed by
the proxies in PFS (or buffering intervals, one per level, in TFS; we conduct our
discussion in the context of PFS but the ideas extend to TFS as well). Since Tproxy
determines the rate at which messages leave a proxy, we need to ensure that it is
chosen such that the this departure rate exceeds the aggregate rate at which real
![Page 93: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/93.jpg)
78
messages arrive at the proxy. Otherwise, real event messages may be dropped at
proxies. That is, we require ∀P, agg(λrealP
) < 1Tproxy
. This gives us,
1
Tproxy> aggP(λreal
P )
The choice of Tproxy concerns balancing the trade-off among the following op-
posing trends. Picking a small Tproxy reduces the probability of messages being
dropped due to the finite buffer at a proxy overflowing. Also, smaller values of
Tproxy result in smaller values of additional delay introduced by the buffering mech-
anism at a proxy. On the other hand, picking a large Tproxy reduces the overall
traffic by generating fewer dummy messages at the proxies. The choice of Tproxy
will depend on the relative criticality of these opposing requirements. As already
discussed, our queuing model serves as a reasonable predictor of the delay caused
by the choice of a particular Tproxy. Also, given Tproxy and other relevant parame-
ters, we may easily calculate the communication cost cost defined in Formula 4.3.
Thus, we may adjust Tproxy in generating acceptable delay and communication
cost.
4.3.4.2 Role Shifting among Proxy Nodes
The proxy nodes do more work than other sensor nodes: (i) reception and filtering
of dummy messages, (ii) decryption /re-encryption of real event messages, and
(iii) buffering and forwarding of real event messages. Therefore, proxy nodes drain
their energy resources faster than normal nodes. To prolong the lifetime of the
network, after a certain time period, new proxies may be selected to replace old
ones. Cell head may choose another sensor node in the same cell to act as proxy,
i.e., proxy’s cell-based position does not change. Another choice is to rotate the
proxies’ positions in the network, determined by BS. The new proxy notification
message that includes the map of new proxies could be broadcasted to the entire
network. In this case we need to deal with the issue of hand-off. After a shift, real
event messages may stay in the buffers of some old proxies. A reasonable solution
is to have the old proxies behave like regular nodes and forward all the real event
messages to their closest new proxies.
![Page 94: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/94.jpg)
79
4.3.4.3 Insider Attacks
Although we did not consider sensor node compromises in our attack model of
Section 4.3.1.2, we notice that our schemes are robust to this kind of insider attacks,
due to the following reasons. Compromised source sensors do not know which
packets will be dropped by proxies and compromised forwarding sensors do not
have appropriate decryption keys. Also, the injected false packets from these
sensors to launch denial of service attack will be dropped by proxies. Therefore,
these compromised sensors could not do much to break our scheme. Compromised
proxy sensors may be a problem because they know which packets are from real
sources, but this kind of sensor network privacy related node compromises have
been discussed in our previous work [84].
4.3.5 Performance Evaluation
In this section, we use simulations to compare the performance of PFS and TFS
schemes with the baseline scheme (i.e., a scheme without proxies, in which every
sensors send either real or bogus messages following a certain interval, as described
in Introduction). After that, implementation results are presented.
4.3.5.1 Simulation Setup
The simulation is based on GloMoSim [85]. In the simulation, 625 sensor nodes are
deployed in a 1000m×1000m area. For each sensor node, the transmission range is
50m.The BS is located at the center of the field. Five cells (sources) are randomly
selected to generate real event messages and other cells generate bogus messages,
with intervals following exponential distributions. For real event generation, we
consider two cases: a heavy-traffic case and a light-traffic case, with 10s and 400s
as the mean of inter-message intervals respectively, to simulate various situations
that possibly happen. For example, if the animal stays still for rest, the real event
generation rate from detection sensor may be low; Otherwise, if the animal moves
fast, then the real event message generation rate will be adjusted to be higher. In
addition, the mean of dummy traffic varies from 1s to 200s. The buffer interval
Tproxy is set to 5s, and the buffer size is 10 packets. For TFS, we set the tree level
as two.
![Page 95: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/95.jpg)
80
During our evaluation, we use three metrics: message overhead, packet delivery
ratio, and delay. Message overhead is defined as the product of total number of
messages and number of hops they traversed. Packet delivery ratio is the per-
centage of real event messages that successfully reach the BS. Delay is the time
difference between the time when a real event message is generated and when it
reaches the BS.
4.3.5.2 Simulation Results
Figure 4.16 and Figure 4.17 show the impact of different bogus message generation
rate (in terms of mean of bogus message intervals) to the message overhead, the
packet delivery ratio and the delay. They correspond to the cases of heavy-rate
real events and light-rate real events, respectively.
From Figure 4.16(a) and Figure 4.17(a), we can see that in all three schemes the
message overhead increases when the bogus message generation rate increases (as
the means decrease in the x-axis of these figures). However, the message overhead
of PFS and TFS increases much slower compared to the baseline scheme. Among
them, TFS incurs the overhead below 1/10 that of the baseline scheme, and is
hence the most bandwidth-efficient.
Figure 4.16(b) and Figure 4.17(b) show that packet delivery ratio decreases
when the bogus message generating rate increases. This is because the chance of
MAC layer collision increases with dummy traffic rate. Since we did not run any
end-to-end reliable protocol, some messages including the real event messages, may
be lost. (Indeed, end-to-end reliability is likely to make the situation worse.) The
figures also show that the packet delivery ratio of the baseline scheme is very low
(less than 20%) when the bogus message generating rate is high, but both PFS
and TFS have the delivery ratio close to 100%.
Figure 4.16(c) and Figure 4.17(c) indicate, without much surprise, that the
delay of PFS and TFS is normally much higher than that of the baseline scheme.
The delay of TFS is about twice of PFS because here the TFS tree is two-level. We
note in some cases, as shown in Figure 4.17(c), when the network traffic becomes
very heavy (mean of bogus messages is 1s), the delay of PFS is actually lower than
that of the baseline scheme due to high collision in the baseline scheme.
In summary, both PFS and TFS are good choices because of high packet deliv-
![Page 96: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/96.jpg)
81
ery ratio and low message overhead, whereas the baseline scheme normally incurs
low delay.
4.3.5.3 Prototype Implementation
To study the practicality of our schemes, we implement a prototype of our PFS
scheme on top of TinyOS [86] for Mica2 motes. Since a mote has only 4KB RAM
space, it is not always possible to buffer all the real messages. To avoid message
dropping in the case of burst events, in our implementation overflowed packets are
cached in the 512-KB flash memory, which is the event logging space available to
Mica2 motes. Whenever the buffer has spare space, a cached message is moved to
the end of the buffer immediately. An outside observer would not see the caching
and moving operation inside a mote.
Our code consumes 13.7KB (out of 128KB) in the program memory and 399B
(out of 4KB) in the data memory. We test the queuing behavior of a mote under
various message arrival rates, and find the results agree with what we may get
through queuing analysis and simulations.
4.3.6 Conclusion
In this section, we solve the optimal proxy placement problem by using local search
heuristics and propose a Proxy-based Filtering Scheme (PFS) and a Tree-based
Filtering Scheme (TFS), which are simple yet efficient event source unobservability
preserving solutions for sensor networks. The two methods work together, so that
we can maximally reduce the network traffic while increasing the delivery ratio
without sacrificing privacy. Performance evaluation demonstrates that our schemes
can largely improve the system performance compared with a baseline scheme.
![Page 97: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/97.jpg)
Chapter 5Secure DCS System Under Internal,
Local, Active Attacks
5.1 Introduction
Sensor networks are envisioned to be extremely useful for a broad spectrum of
emerging civil and military applications [87], such as remote surveillance, habitat
monitoring, and collaborative target tracking. Sensor networks scale in size as time
goes on, so does the amount of sensing data generated. The large volume of data
coupled with the fact that the data are spread across the entire network creates
a demand for efficient data dissemination/access techniques to find the relevant
data from within the network. This demand has led to the development of Data
Centric Sensor (DCS) networks [63, 88, 89].
DCS exploits the notion that the nature of the data is more important than
the identities of the nodes that collect the data. Thus, sensor data as contrasted to
sensor nodes are “named”, based on attributes such as event type (e.g., elephant-
sightings) or geographic location. According to their names, the sensing data are
passed to and stored at corresponding sensor nodes determined by a mapping
function such as Geographic Hash Table (GHT) [63]. As the sensing data with the
same name are stored in the same location, queries for data of a particular name
can be sent directly to the storing nodes using geographic routing protocols such
as GPSR [90], rather than flooding the query throughout the network.
![Page 98: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/98.jpg)
83
Figure 5.1 shows an example of using a DCS-based sensor network to monitor
the activities or presence of animals in a wild animal habitat. The sensed data can
be used by zoologists to study the animals or by an authorized hunter to locate
certain types of animals (e.g., boars and deers) for hunting. With DCS, all the
sensing data regarding one type of animals are forwarded to and stored in one
location. As a result, a zoologist only needs to send one query to the right location
to find out the information about that type of animals. Similarly, a soldier can
easily obtain enemy tank information from storage sensors through a DCS-based
sensor network in the battlefield.
Figure 5.1. A DCS-based sensor network which can be used by zoologists (who areauthorized to know the locations of all animals) and hunters (who should only know thelocations of boars and deers, but not elephants).
In many cases, DCS-based data dissemination offers a significant advantage over
previous external storage-based data dissemination approaches, where an external
base station (BS ) is used for collecting and storing the sensing data. If many
queries are issued from nodes within the network [91, 89], external storage-based
scheme is very inefficient since data must be sent back and forth between the sensors
and the BS, thus causing the nodes close to the BS to die rapidly due to energy
depletion. Further, for sensor networks deployed in hostile environments such as
a battlefield, external BS may not be available because the BS is very attractive
for physical destruction and compromise, thus becoming a single point of failure
from both security and operation perspectives. In contrast, the operation of a DCS
system does not assume the availability of a persistent BS; instead, mobile sinks
![Page 99: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/99.jpg)
84
(MSs) such as mobile sensors, users, or soldiers, may be dispatched on-demand to
collect the stored data (or to perform other tasks) on appropriate occasions.
The previous DCS systems, however, were not designed with security in mind.
All data of the same event type are stored at the same node [74, 63] or several
nodes [88, 89] based on a publicly-known mapping function. As long as the map-
ping function and the types of events monitored in the system are known, one can
easily determine the locations of the sensors storing different types of data. In our
previous example, a zoologist can use the DCS system to locate any animals of
interest, whereas a hunter is only permitted to hunt certain kinds of animals (e.g.,
boars and deers) but not the protected ones (e.g., elephants). Nevertheless, a non-
conforming hunter may acquire the locations of the protected animals for hunting
purpose. As such, security and privacy should be provided for DCS system.
Securing DCS systems is complicated by the network scale, the highly con-
strained system resource, the difficulty of dealing with node compromises, and
the fact that sensor networks are often deployed in unattended and hostile envi-
ronments. The low cost of sensor nodes (e.g., less than $1 as envisioned for smart
dust [92]) precludes the built-in tamper-resistance capability of sensor nodes. Thus,
the lack of tamper-resistance coupled with the unattended nature gives an adver-
sary the opportunity to break into the captured sensor nodes to read out sensor
data and cryptographic keys.
We present pDCS, a privacy enhanced DCS system for unattended sensor net-
works. To the best of our knowledge, pDCS is the first one to provide security and
privacy to data-centric sensor networks. Specifically, pDCS provides the following
features. First, even if an attacker can compromise a sensor node and obtain all its
keys, he cannot decrypt the data stored in the compromised node. Second, after an
attacker has compromised a sensor node, he cannot know where this compromised
node stored its event data generated in the previous time intervals. Third, pDCS
includes very efficient key management schemes for revoking a compromised node
once its compromise has been detected, thus preventing an attacker from knowing
the future storage location for particular events. Finally, pDCS provides a novel
query optimization scheme to significantly reduce the message overhead without
losing any query privacy.
The salient features of pDCS are due to the following techniques. Instead of
![Page 100: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/100.jpg)
85
using a publicly-known mapping function, pDCS provides private data-location
mapping based on cryptographic keys. The keys are assigned and updated to
thwart outsider attackers or insider attackers from deriving the locations of the
storage cells for previous sensor data. The key management scheme for updating
compromised keys makes a seamless mapping between location keys and logical
keys. On the other hand, as private mapping may reduce the efficiency of sending
MS queries, we also propose several query optimization techniques based on Eu-
clidean Steiner Tree [93] and keyed Bloom Filter to minimize the query overhead
while providing certain query privacy.
The rest of the chapter is organized as follows. We first discuss the assump-
tions and design goal in Section 5.2. Section 5.3 presents several secure mapping
functions, followed by a key management scheme and optimization techniques for
sending queries. In Section 5.4, we compare the performance of several query
methods. Finally, we conclude this chapter in Section 5.5.
5.2 Models and Design Goal
5.2.1 Network Model
As in other DCS systems [74, 63, 88], our pDCS system also assumes that a sensor
network is divided into cells (or grids) where each pair of nodes in neighboring
cells can communicate directly with each other. Cell is the minimum unit for
detecting events (referred to as detection cell) and for storing sensor data (referred
to as storage cell); for example, a cell head coordinates all the actions inside a
cell. Each cell has a unique id and every sensor node knows in which cell it is
located through a GPS when affordable. In the cases either GPS services are not
available or GPS devices are too expensive, attack-resilient GPS-free localization
techniques [94, 95, 96, 97] may be employed instead because pDCS does not rely
on absolute coordinates. For example, in Verifiable Multilateration (VM) [94],
distances are measured based on radio signal propagation time and it provides
secure and reasonably accurate sensor positioning.
We assume the events of interest to the MSs are classified into multiple types.
For example, when a sensor network is deployed for monitoring the activities and
![Page 101: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/101.jpg)
86
locations of the animals in a wild animal habitat, all the activities of a certain kind
of animal may be considered as belonging to one event type.
We do not assume a fixed BS in the network. Instead, a trusted MS may
enter the network at an appropriate time and work as the network controller for
collecting data or performing key management. We also assume the clocks of
sensor nodes in a network are loosely synchronized based on an attack-resilient
time synchronization protocol [98, 99].
5.2.2 Attack Model
Given the unattended nature of a sensor network, an attacker may launch various
security attacks in the network at all layers of the protocol stack [100, 101, 102].
Due to the lack of a one-for-all solution, in the literature these attacks are studied
separately and the proposed defense techniques are also attack-specific. As such,
instead of addressing all attacks, we will focus on the specific security problems
in our pDCS network. We assume that in a pDCS network the (ultimate) goal of
an attacker is to obtain the event data of his interest. To achieve this goal, an
attacker may launch the following attacks.
• Passive Attack An attacker may passively eavesdrop on the message trans-
missions in the network.
• Query Attack An attacker may simply send a query into the network to
obtain the sensor data of interest to him.
• Readout Attack An attacker may capture some sensor nodes and read out
the stored sensor data directly. It is not hard to download data from both
the RAM and ROM spaces of sensor nodes (e.g., Mica motes [20]).
• Mapping Attack In this attack, the goal of an attacker is to identify the
mapping relation between two cells. Specifically, he may either identify the
storage cell for a specific detection cell or reversely figure out the detection
cell for a storage cell of his interest. Mapping attack is normally followed by
a readout attack.
The passive attack can be relatively easily addressed by message encryption
with keys of sufficient length, and the query attack can be addressed by source
![Page 102: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/102.jpg)
87
authentication [30] so that a node only answers queries from an authorized en-
tity. Given that compromising nodes is much easier than breaking the underlying
encryption/authentication algorithm, we assume that the readout attack and the
mapping attack are more preferable to the attacker. Note that letting detection
cells encrypt sensor data and store the encrypted data locally cannot address the
readout attack because an attacker can read out the encryption keys from the
captured sensor nodes as well.
5.2.3 Security Assumption
We assume that an authorized mobile sink (MS) has a mechanism to authenticate
broadcast messages (e.g., based on µTESLA [30]), and every node can verify the
broadcast messages. We also assume that when an attacker compromises a node
he can obtain all the sensitive keying material possessed by the compromised node.
Note that although technically an attacker can compromise an arbitrary number of
current generation of sensor nodes without much effort, we assume that only nodes
in a small number (s) of cells have been compromised. For instance, it may not
be very easy for sensor nodes to be captured because of their geographic locations
or their tiny sizes. Also, the attacker needs to spend longer time on compromising
more sensor nodes, which may increase the chance of being identified. For simplic-
ity, we say a cell is compromised when at least one node in the cell is compromised.
To deal with the worst scenario, we allow an attacker to selectively compromise s
cells.
We assume the existence of anti-traffic analysis techniques if so required. If an
attacker is capable of monitoring and collecting all the traffic in the network, he
may be able to correlate the detection cells and the storage cells without knowing
the mapping functions. Therefore, we assume one of the existing schemes [103, 59,
60, 104] may be applied to counter traffic analysis if the attacker is assumed to be
capable of analyzing traffic.
5.2.4 Design Goal
Our main objective is to prevent an attacker from obtaining the data of his interest
in a DCS network through various attacks. In more detail, our goal is to address
![Page 103: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/103.jpg)
88
the types of attacks that are specific to pDCS, i.e., passive attack, query attack,
readout attack, and mapping attack. As passive attack and query attack are easy
to address, below we mainly discuss the requirements to be met for addressing the
readout attack and the mapping attack.
• Event Data Confidentiality Even if an attacker can compromise a sensor
node and obtain all its keys, he should be prevented from knowing the event
data stored in the compromised node.
• Backward Event Privacy An attacker should be prevented from obtaining
the previous sensor data for an event of his interest even if he has compro-
mised some nodes.
• Forward Event Privacy We should also thwart (if not completely pre-
venting) an attacker from obtaining the sensor data regarding an event in
the future even if he has compromised some nodes.
• Query Privacy A MS query should reveal as little location information of
the sensor data as possible. For example, if multiple events are mapped and
stored in the same storage cell, a query for one of the events will also reveal
the storage cell of the other events. As such, an attacker may eavesdrop on
MS queries to minimize his efforts in launching a mapping attack.
In addition, as sensor networks are scarce in resources, especially the non-regenerative
power, our security mechanisms should be resource efficient. For example, we
should avoid network-wide flooding and public-key operations if at all possible.
Especially, as communication normally consumes much more energy than compu-
tation [35], we will prefer computation to communication when they achieve the
same goal.
5.3 pDCS: Privacy Enhanced Data-Centric Sen-
sor Networks
In this section, we first give an operational overview of pDCS. Then we present
several schemes to randomize the mapping function and propose efficient protocols
![Page 104: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/104.jpg)
89
to manage various keys involved in the system. Finally, we describe optimization
techniques for issuing queries.
5.3.1 The Overview of pDCS
First of all, we assume that each sensor processes five types of keys, including
master key (shared only with the MS), pairwise key(shared with every neighbor),
cell key(shared by all sensors in the same cell), row key(shared by all sensors in
the same row), and group key (shared by all sensors in the network). Different
keys are useful in different schemes or under different circumstances. The details
of key management will be discussed in Section 5.3.3.
Our solution involves six basic steps in handling sensed data: determine the
storage cell, encrypt, forward, store, query, and decrypt. We demonstrate the
whole process through an example in which a cell u has detected an event E.
1. Cell u first determines the location of the storage cell v through a keyed hash
function.
2. u encrypts the recorded information (Me) with its cell key. To enable MS
queries, either the event type E or the detection time interval T is in its plain
text format, subject to the requirement of the application.
3. u then forwards the message towards the destination storage cell. Here,
techniques [59] should be applied to prevent traffic analysis and to prevent
an attacker from injecting false packets.
4. On receiving the message, v stores it locally.
5. If an authorized mobile sink (MS) is interested in the event E occurred in
cell u, it determines the storage cell v and issues a query (optimized query
schemes are discussed in Section 5.3.4).
6. After it retrieves the data of interest, the MS decrypts it with the proper cell
key (more details are discussed in Section 5.3.5).
The first step is for defending against the mapping attack. Without the map-
ping key, an attacker cannot determine the mapping from the detection cell to
![Page 105: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/105.jpg)
90
the storage cell. The second step is for preventing the readout attack. Since the
storage cell v does not possess the decryption key for Me, an attacker is prevented
from deciphering Me after he has compromised a node in v. Step 3 and Step 4 deal
with forwarding and storing the sensed data, Step 5 shows the basic operation for
issuing a MS query, and Step 6 describes the local processing of retrieved data.
The following subsections focus on the performance and security issues re-
lated to Step 1, Step 2, Step 5, and Step 6. Currently we assume some existing
schemes [59, 89] for Step 3 and Step 4; we believe research in these areas bears its
own importance and deserves independent study.
5.3.2 Privacy Enhanced Data-Location Mapping
From the system overview, we can see that an attacker can launch various attacks
if he can find the correct mapping relation between a detection cell and a storage
cell. This motivated our design of secure mapping functions to randomize the
mapping relationship among cells. Below we present three representative secure
mapping schemes in the order of increasing privacy. The following notations are
used during the discussion. Let N be the number of cells in the field, Nr and
Nc be the number of rows and the number of columns, respectively. Every cell is
uniquely identified with L(i, j), 0 ≤ i ≤ Nr − 1 and 0 ≤ j ≤ Nc − 1.
To quantify and compare the privacy levels of different schemes, we assume that
an attacker is capable of compromising totally s cells of his choice. To simplify the
analysis, we assume that there are m detection cells for the event of interest to the
attacker, and the locations of these m cells are independent and identically dis-
tributed (iid) over N cells (In real applications, the locations of these m detection
cells may correlate). We further introduce the concept of event privacy level.
Definition 6. Event Privacy Level (EPL) is the probability that an attacker cannot
obtain both the sensor data and the encryption keys for an event of his interest.
According to this definition, the larger the EPL, the higher the privacy. This
definition can be easily extended to the concepts of backward event privacy level
(BEPL) and forward event privacy level (FEPL).
![Page 106: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/106.jpg)
91
5.3.2.1 Scheme I: Group-key–based Mapping
In this scheme, all nodes store the same type of event E in the same location
(Lr, Lc) based on a group-wide shared key K. Here
Lr = H(0|K|E) mod (Nr), Lc = H(1|K|E) mod (Nc) (5.1)
To prevent the stand-alone readout attack, a cell should not store its data in its
own cell. Hence, if a cell L(x, y) finds out its storage cell is the same, that is,
Lr = x and Lc = y, it applies H on Lr and Lc until either Lr 6= x or Lc 6= y.
To simplify the presentation, however, we will not mention the above case again
during the future discussions.
Type I Query: A MS can answer the following query with one message: what
is the information about an event E? This is because all the information about
event E is stored in one location. A MS first determines the location based on
the key K and E, then sends a query to it directly to fetch the data by e.g. the
GPSR protocol [90] (shortly we will discuss several query methods with optimized
performance and higher query privacy ).
Security and Performance Analysis: In this scheme, all m detection cells are
mapped to one storage cell. An attacker first randomly compromises a node to read
out the group key, based on which he locates the storage cell for the event. Because
the data stored in the compromised node were encrypted by individual cell keys
and the ids of detection cell were also encrypted, the attacker has to randomly guess
the ids of these m detection cells. Assume that an attacker can compromise up to
s cells. If the first compromised cell is the storage cell1 (with probability 1/N),
the attacker will randomly compromise (s − 1) cells from the rest (N − 1) cells.
There are totally(
N−1s−1
)
combinations, among which(
N−1−ms−1−i
)(
mi
)
combinations
correspond to the case where i out of m detection cells are all compromised. On
the other hand, in the case when the first compromised node is not the storage
cell (with probability (N − 1)/N), the attacker first compromise the storage cell,
then randomly compromise (s − 2) cells from the rest (N − 2) cells. There are
totally(
N−2s−2
)
combinations, among which(
N−2−ms−2−i
)(
mi
)
combinations correspond
1For simplicity, we ignore the case when the first compromised cell is a detection cell. Ourstudy shows that the error introduced by this simplification is negligible.
![Page 107: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/107.jpg)
92
to the case where i out of m detection cells are all compromised. Also note that an
attacker can only obtain im
of the event data when i out of m detection cells are
compromised. Let B1 = min(s− 1, m) and B2 = min(s− 2, m), then the BEPL of
this scheme is
p1b(m, s) = 1−
1
N
B1∑
i=1
(i
m)
(
N − 1−m
s− 1− i
)(
m
i
)
/
(
N − 1
s− 1
)
−N − 1
N
B2∑
i=1
(i
m)
(
N − 2−m
s− 2− i
)(
m
i
)
/
(
N − 2
s− 2
)
010
2030
400
10
20
30
40
50
0.9
0.92
0.94
0.96
0.98
1
m s
BEPL
Figure 5.2. The BEPL as a function of m and s, where m is the number of detectioncells and s the number of compromised cells
Figure 5.2 shows the analytical result of BEPL as a function of m and s for a
network size of N = 20 ∗ 20 = 400 cells, from which we can make two observa-
tions. First, without surprise, BEPL decreases with s. Second, BEPL does not
change with m. This is due to the tradeoff between the number of detection cells
and storage cells that are probably compromised and the fraction of event data
possessed by the compromised storage cells.
Suppose the attacker compromises s cells including the storage cell at time t0.
He can come back at a time t1 in the future to obtain the event data from the
storage cell, and then simply decrypt all the data that were detected by these s
cells during t0 and t1. Assume that m cells will detect the event during t0 and t1
and the locations of these m cells are independent and identically distributed over
![Page 108: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/108.jpg)
93
N cells. On average, msN
out of s compromised nodes are detection cells and they
will provide the encryption keys. Hence, the FEPL of this scheme is simply
p1f (m, s) = 1− (ms/N)/m = 1− s/N
Note that this formulae holds after the attacker has compromised s cells and cannot
compromise any more cells. We do not consider the FEPL during the process of
compromising s cells.
Because all information about one event is stored in one location, Scheme I is
subject to a single point of failure. Furthermore, both the traffic load and resources
for storing the information are not uniformly distributed among all the nodes.
5.3.2.2 Scheme II: Time-based Mapping
In this scheme, all nodes store the event E occurring in the same time interval T
(including a start time and an end time, the duration is denoted as |T |) into the
same location (Lr, Lc) based on a group-wide shared key KT .
Lr = H(0|KT |E|T ) mod (Nr). (5.2)
Similarly, Lc = H(1|KT |E|T ) mod (Nc). In addition, every sensor node maintains
a timer which fires periodically with time period |T |. When its timer fires, a node
derives the next group key KT = H(KT ). Finally, it erases the previous key KT .
Type II Query: A MS can answer the the following query with one message: what
is about the event E during the time interval T? This is because the information
about E in T is stored in one location. A MS first determines the location based
on KT , E, T , and then sends a query to fetch the data.
Security and Performance Analysis: Due to the use of the one-way hash
function, an attacker cannot derive the old group keys from the current group key
of a captured node. Hence, the locations for storing the events occurred during the
previous time periods are not derivable. An attacker has to randomly guess the
previous storage cells and detection cells for the event of his interest. The BEPL
p2b(m, s) of the previous data is very complicated to derive because it depends on
the spatial and temporal distribution of m detection cells, the number of previous
![Page 109: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/109.jpg)
94
storage cells for the event, which in turn depends on the number of previous key
updating periods and the probability of hash collisions. For ease of analysis, we
ignore the case where a cell serves as both a detection cell and a storage cell.
Under this assumption, on average an attacker can correctly guess s/N fraction of
detection cells and s/N fraction of storage cells. Only when these detection cells
are mapped to these storage cells can the attacker decrypt the encrypted data. As
such,
p2b(m, s) = 1− (s/N)(s/N) = 1− (
s
N)2
Consider the case s = 40 and N = 400, the BEPL of Scheme II is 99%. From
Fig. 5.2 we can see the BEPL of scheme I under the same condition is slightly over
90%. Thus, Scheme II provides higher BEPL (i.e., higher backward privacy) than
Scheme I.
There are two cases for the FEPL. If the attacker changes the code of the
compromised nodes such that in the future these nodes keep their detected event
data locally, the FEPL p2f (m, s) of this scheme is simply 1− s/N . However, if the
compromised nodes follow our protocol and hence do not keep a local copy of their
data, the FEPL will increase. This is because in the future the event data might
be forwarded to new storage cells that are not controlled by the attacker (who is
assumed not to be able to compromise more than s cells). Consider that every
storage cell used in the future might have been compromised with probability s/N ,
in this case the FEPL p2f (m, s) is the same as the BEFL, i.e., p2
f(m, s) = p2b(m, s) =
1− ( sN
)2.
Compared to Scheme I, both the traffic load and resources for storing the
information in Scheme II are more uniformly distributed in all the cells.
5.3.2.3 Scheme III: Cell-based Mapping
In this scheme, all the nodes in the same cell L(i, j) of the gridded sensor field
store in the same location (Lr, Lc) the same type of event E occurring during a
time interval T , based on a cell key Kij shared among all the nodes in the cell
L(i, j). Here
Lr = H(0|i|j|E|Kij|T ) mod (Nr), (5.3)
![Page 110: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/110.jpg)
95
and Lc is computed similarly. This scheme differs from the previous schemes in
two aspects. First, in this scheme every node in cell L(i, j) updates the cell key
Kij periodically based on H such as Kij = H(Kij), and then erases the old cell
key to achieve backward event privacy. Second, since cell keys are also used for
encryption, the updating of cell keys leads to the change of encryption key for the
same event detected by the same cell but in different time periods.
Type III Query: A MS can answer the following query with one message: has
event E happened in cell L(i, j) during the time interval T? A MS first determines
the location based on the key Kij , T, E, and the detection cell L(i, j) of interest,
then sends a query to the cell to fetch the data.
Security and Performance Analysis: The updating of cell keys prevents an
attacker from deriving old cell keys based on the current cell key of a compromised
cell. Hence, the event data recorded in the previous periods are indecipherable
irrespective of the number of compromised cells (the network controller however
still keeps the older keys to decrypt previous event data). In other words, the
BEFL of this scheme is
p3b(m, s) = 1
Clearly, Scheme III provides the highest BEFL.
The FEPL p3f(m, s) of this scheme is the same as that in Scheme II. It can also
be seen that this scheme is the least subject to the single point of failure problem
compared to the previous schemes. Moreover, both the traffic load and resources
for storing the information are the most uniformly distributed among all the nodes.
5.3.2.4 Comparison of Different Mapping Schemes
Above we have presented three data-to-location mapping schemes with increasing
privacy and complexity. These three mapping schemes certainly do not exhaust
the design space, because we have three dimensions (time, space, and key) to
manipulate. In Appendix A we further introduce a row-based mapping scheme. In
general, the higher the event privacy, the larger the message overhead for query.
On the other hand, these schemes may be used simultaneously based on the levels
of privacy required by different types of data.
Next we use simulations to compare the message overhead of the three map-
![Page 111: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/111.jpg)
96
ping schemes, Group-key-based Mapping, Time-based Mapping and Cell-based
Mapping. Message overhead is defined as the total number of transmission hops
of all the messages sent out by the detection cells towards their storage cells. The
simulations were run for 20, 000 time units in a DCS network with 20×20 cells. In
each time unit, 10 events are generated from randomly selected cells and a random
event type id (ranging from 1 to 3) is assigned to each event. After an event is
sensed in a cell, the cell will calculate the storage cell coordinates based on the
mapping schemes and forward a message toward it.
10 20 30 40 500.2
0.4
0.6
0.8
1
1.2
1.4
Event Number (/time unit)
Msg
Ove
rhea
d (p
kts/
time
unit/
cell) Group−key−based Mapping
Time−based MappingCell−based Mapping
Figure 5.3. Overhead Comparisons among different mapping schemes
Figure 5.3 shows that the amortized message overhead (message overhead per
time unit per cell) linearly increases with the number of events. We observe that
cell-based mapping incurs a slightly higher message overhead than the other two
schemes. Also, even when there are as many as 50 events happening in one time
unit, the amortized message overhead is low, e.g., 1.2 in group-key-based mapping
and 1.39 in cell-based mapping.
In Figure 5.4, we use 3D plots to show the message overhead distribution over
a plane of cells. We observe that the message overhead is the most balanced with
the cell-based mapping scheme and the least balanced with the group-key-based
mapping scheme. In general, when message overhead is more balanced among all
the cells, the network can have a longer lifetime. Note that we also change the
time period |T |, the number of event types and the event rate in each time unit.
The message overhead distributions of these mapping schemes are similar.
Finally, we briefly mention the memory usage of sensor nodes. Since sensed data
have to be stored in somewhere in the network, the overall memory requirement is
![Page 112: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/112.jpg)
97
05
1015
20
05
1015
200
0.3
0.6
0.9
1.2
1.5
X CoordinatesY Coordinates
Msg
Ove
rhea
d (p
kts/
time
unit)
05
1015
20
05
1015
200
0.2
0.4
0.6
0.8
1
X CoordinatesY Coordinates
Msg
Ove
rhea
d (p
kts/
time
unit)
05
1015
20
05
1015
200
0.2
0.4
0.6
0.8
X CoordinatesY Coordinates
Msg
Ove
rhea
d (p
kts/
time
unit)
(a) Group-key-based Mapping (b) Time-based Mapping (c) Cell-based Mapping
Figure 5.4. Message Overhead Distribution of Different Mapping Schemes
the same in all these mapping schemes. But because the cell-based scheme involves
most storage cells, intuitively it will best balance the memory requirement among
sensor nodes. So we will expect similar memory usage distribution as the results
in Figure 5.4.
5.3.3 Key Management
So far we have seen several types of symmetric keys involved in pDCS. Now we are
ready to show the complete list of keys that are used in pDCS and discuss their
purposes as well as efficient ways for management of these keys.
• Master Key Every node, u, has a master key Ku shared only with MS.
Although master key is not explicitly used in the data-location mapping
schemes, it is necessary to secure the communications between the MS and
individual sensors. In our application, for example, when the node wants to
report the misbehavior of another node in the same cell to MS, it may use
the master key to calculate a message authentication code over the report,
or when MS distributes a new cell key to a cell with a node to be revoked,
the master keys of the remaining nodes in the cell can be used to encrypt
the new cell key for secure key distribution.
the new cell key can be encrypted master key
• Pairwise Key Every pair of neighboring nodes share a pairwise key. This
key is used for (i) secure distribution of keying material such as a new cell
![Page 113: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/113.jpg)
98
key among a cell, or (ii) hop-by-hop authentication of data messages between
neighboring cells for preventing packet injection attacks.
• Cell Key A cell key can be used (i) for encrypting sensed data to be stored in
a storage cell, (ii) for private cell-to-cell mapping, or (iii) as a key encryption
key (KEK) for secure delivery of a row key.
• Row Key A row key can be used (i) for private row-to-cell mapping, or (ii)
as a KEK for secure delivery of a group key.
• Group Key A group key is used (i) for secure group-to-cell mapping or (ii)
when MS broadcasts a secure query or command to all the nodes.
Of these five keys, four keys (except pairwise keys) can be organized into a
logical key tree (LKH) [40, 105, 106] data structure maintained by MS, as shown
in Figure 5.5. The first level key (i.e., root key) is the group key; the second level
of keys are row keys; the third level of keys are cell keys; the fourth level are master
keys. The out-degree of a key node is Nr, Nc, Nij , respectively where Nij is the
number of nodes in cell L(i, j). Like in LKH, every node only knows the keys on
the path from its leaf key to the root key. Unlike in LKH where group members
do not share pairwise keys, in our scheme a node shares a pairwise key with every
neighbor node. We will show shortly that pairwise keys help reduce the bandwidth
overhead of a group rekeying operation for revoking a node.
Initial Key Setup:
Next we show how nodes establish all these types of keys initially. Pairwise keys
can be established by an existing scheme introduced in Section 2.1. Group key and
master keys are easy to establish by loading every node with them before network
deployment. However, it might not be feasible to set up row keys and cell keys by
pre-loading every node with the corresponding keys for large-scale sensor networks.
For massive deployment of sensor nodes (e.g., through aerial scattering), it is hard
to guarantee the precise locations of sensor nodes. If a node does not have the
cell key for the actual cell it falls in, it will not be able to communicate with the
other nodes in the same cell. To address this key setup issue, we need to establish
row/cell keys after deployment.
![Page 114: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/114.jpg)
99
0 1 2 3
1
2
3
0
( 0 , 1 ) ( 3 , 1 )
( 2 , 3 )
u
Row Key
K 3 K 2 K 1
K 0
Group Key
Cell Key K 20 K 23 K 22 K 21
v 2
Pairwise Key
Sharing
v 1
v 0
v 3
Master Key
v 1
v 2
v 0
v 3
u
( 2 , 2 )
K u K v 0 K v 1 K v 2 K v 3
( a ) a sensor network divided into cells ( b ) a logical key tree ( each dot denotes a key )
K g
0 1 2 3
( c ) Demonstration of rekeying packet flows
0
1
2
3
Enc ( K 0 , K’ g )
Enc ( K 1 , K’ g )
Enc ( K 3 , K’ g )
Enc ( K 23 , K’ 3 ) Enc ( K’ 3 , K’ g )
Enc ( K’ 3 , K’ g ) Enc ( K 2 j , K’ 3 )
( j = 0 , 1 , 3 ) Enc ( K’ 22 , K’ 3 ) Enc ( K’ v 0 , K’ 22 )
Enc ( K’ 3 , K’ g ) Enc ( K 23 , K’ 3 ) Enc ( K’ 22 , K’ 3 ) Enc ( K’ v 0 , K’ 22 )
… … … …
Figure 5.5. The mapping between physical network into a logical key tree and therekeying packet flows for revoking node u
Based on real experiments, Deng et al. [107] showed that it is possible for an
experienced attacker to obtain copies of all the memory and data of a Mica2 mote
in minutes after a node is captured. Zhu et al. [108] showed through experiments
that it takes several seconds for a node with a reasonable node density (∼ 20
neighbors) to communicate with each neighbor and establish a secret key with
each of them. As the number of message exchanged in a localization protocol [94]
is no more than that in [108], in pDCS we would assume that during the initial
network deployment phase, a node will not be compromised before it discovers its
location based on a secure location scheme [94, 109]. This assumption also holds
if the initial deployment is monitored.
![Page 115: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/115.jpg)
100
With this assumption, our scheme works by preloading every node with the
same initial network key KI . For a node located in cell (i, j), it can derive its cell
key as follows:
Kij = H(KI , i|j) (5.4)
After this, it erases K from its memory completely. A row key can be established
similarly as Ki = H(KI , i).
Key Updating upon Node Revocations
pDCS does not include a mechanism for detecting compromised nodes although
its key updating operation introduced below is triggered by the detection of node
compromises. Instead, pDCS assumes the employment of such schemes [101, 100,
110, 111, 112].
Suppose node u in cell L(2, 2) is compromised and its cell reports its compromise
to MS. For example, a majority of the other nodes in the cell each computes a
MAC over the report using their master keys. Since node u knows keys K22, K2, Kg,
these keys will need to be updated to their new versions, say K ′22, K
′2, K
′g. Based
on LKH, MS will need to encrypt each updated key with its child keys (new version
if updated) and then broadcast all the encryptions. For example, the new group
key K ′g is encrypted by K0, K1, K ′
2, and K3, respectively, K ′2 is encrypted by
K20, K21, K ′22, and K32, respectively, and K ′
22 is encrypted by Kv0, Kv1
, Kv2, Kv3
,
respectively. In general, Nr + Nc + Nij − 1 encrypted keys will be broadcast and
flooded in the network.
Next we present a variant of the above scheme, which incorporates two tech-
niques to further improve the rekeying efficiency. The first technique is based on
network topology. Instead of flooding all the keys in the network, MS sends them
separately to different sets of nodes. This is based on the observation that nodes
in different locations should receive different sets of encrypted keys. Suppose the
node to be revoked is in cell L(i, j). For nodes in row m (r 6= i), they only need to
receive the new group key K ′g encrypted by its row key Km. Hence, MS only needs
to send one encrypted key to the cell (m, 0), and the key is then propagated to
the other cells in row m. For nodes in row i, there are two scenarios. If the nodes
are in column n (n 6= j), they only need to receive K ′g encrypted with K ′
i and K ′i
encrypted with the cell key Kin. Otherwise if they are located in the same cell as
node u, each of them needs to receive K ′ij encrypted with its own master key. In
![Page 116: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/116.jpg)
101
these scenarios, MS sends Nc +Nij−1 keys to the cell (i, 0), and the keys are then
propagated in row i. Note that a cell can remove from the keying message the
encrypted keys that are of only interest to itself before forwarding the message to
the next cell. As such, the size of a keying message decreases when it is forwarded.
Our second technique trades computation for communication because commu-
nication is more energy consuming than computation in sensor networks. It has
been shown in [35, 109] that the energy consumption for encrypting or computing
a MAC over a 8-byte packet based on RC5 is equivalent to that for transmitting
one byte. As such, instead of sending the Nij − 1 encryptions of K ′ij to the cell
(i, j) across multiple hops, MS may send only one of the encryptions to a specific
node (e.g., v0 in Figure 5.5) and then request that node to securely propagate K ′ij
to the nodes but u using their pairwise keys for encryption.
Key Management Performance Analysis
Now we analyze the performance of our rekeying scheme upon a node revocation.
For simplicity, we define the performance overhead C as the average number of
keys that traverse each cell during a rekeying event. That is,
C =Nr−1∑
i=0
Nc−1∑
j=0
sij/(NrNc) (5.5)
where sij is the number of keys that have traversed cell L(i, j). Here we do not
count the Nij−1 unicast transmission cost inside the cell L(i, j) because this cost is
relatively small when amortized over N cells. Without loss of generality, we assume
MS is in cell L(0, 0) when distributing rekeying messages. From Figure 5.5(c) we
can derive C as follows.
C = 1.5 + (N2c + N2
r + 2Nc + 2)/(2NrNc) (5.6)
For a sensor network deployed in a square field, i.e., Nc = Nr, C ≈ 2.5 keys when
Nr > 2. Compared to the intuitive scheme that broadcasts all the LKH keys and
thus has the per cell overhead of Nr + Nc + Nij − 1 keys, our rekeying scheme is
far more efficient.
![Page 117: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/117.jpg)
102
5.3.4 Improving the Query Efficiency
We have shown that the proposed mapping schemes are capable of answering
queries of different granularity and can achieve different levels of privacy. Better
privacy is normally achieved at the cost of larger query message overhead. For
example, to answer a query like “Where were the elephants in the last three days”,
one query message is enough in the group-key–based mapping; however, this may
take multiple query messages in the cell-based mapping as the data are stored at
multiple places. Next we propose several techniques to decrease the query message
overhead. ����������� �������������� ����������� �������������� ������������� ����������� ����������������(a) Basic Scheme (b) EST Scheme (c) BF Scheme
Figure 5.6. Three schemes for delivering a query to the storage cells
5.3.4.1 The Basic Scheme
Suppose a mobile sink(MS) needs to send multiple query messages to multiple
storage cells to serve a query. Due to the randomness of the mapping function,
these storage cells may be separated by other cells. In the basic scheme, as shown in
Figure 5.6(a), the MS sends one query message to each cell using a routing protocol
such as GPSR [90]. Since each query message contains the query information and
the id of the destination storage cell, these query messages are different and have
to be sent out separately. It is easy to see that this scheme has very high message
overhead.
Another weakness of the basic scheme is its lack of query privacy. Query privacy
is measured by the probability that an attacker cannot find the ids of the storage
cells from eavesdropped MS query messages. In the basic scheme, since the MS
![Page 118: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/118.jpg)
103
has to specify the ids of the destination storage cells, the query privacy of this
scheme, denoted by P1, is P1 = 0.
5.3.4.2 The Euclidean Steiner Tree (EST) Scheme
A natural solution to reduce the message overhead of the basic scheme is to organize
the storage cells as a minimum spanning tree. In this way, the MS can first
generates the minimum spanning tree which includes all the storage cells, and
then send the query message to these cells following this minimum spanning tree.
Although this solution increases the message size, it greatly reduces the number of
query messages. Because a message includes many redundant header information,
combining multiple messages can significantly reduce the overall message overhead.
Similar to the basic scheme, the MS has to include the ids of the destination storage
cells in his query messages. Thus, the query privacy of this solution is still 0.
To further reduce the message overhead, we can use Euclidean Steiner Tree
(EST) [93, 113], which has been shown to have better performance than minimum
spanning tree and is widely used in network multicasting. Figure 5.6(b) shows an
EST, which includes some cells other than the storage cells, called Steiner cells.
Note that these Steiner cells can also help improve the query privacy because they
add noise into the set of storage cells.
With EST, the cell that the MS resides will be the root cell. The MS constructs
a query message, which contains the ids of the cells in the EST, and sends it to
its child cells using routing protocols such as GPSR. When a cell head receives a
query message, it reconstructs an EST subtree by removing such information as
its own id and the ids of its sibling nodes, and only keeping the information about
the subtree rooted at itself. Then it forwards the query message with the EST
subtree to its child cell. This recursive process continues until each storage cell in
the EST receives the query message.
To construct an EST, we use a technique proposed by Winter and Zachari-
asen [93]. Since their solution may return a non-integer Steiner cell, we use the
nearest integer Steiner cell to replace the non-integer Steiner cell. Let n denote
the number of storage cells. With this solution, an EST spanning k (2 ≤ k ≤ n)
cells, has at most k−2 integer Steiner cells, which means that at most 2k−2 cells
are included in the Steiner tree. The use of Steiner cells can improve the query
![Page 119: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/119.jpg)
104
privacy up to 1− n2n−2
= n−22n−2
. That is,
P1 = 0 ≤ P2 ≤n− 2
2n− 2(5.7)
5.3.4.3 The Keyed Bloom Filter Scheme
Bloom Filter: A Bloom Filter [114] is a popular data structure used for mem-
bership queries. It represents a set S = s1, s2, · · · , sn using k independent hash
functions h1, h2, · · · , hk and a string of m bits, each of which is initially set to 0.
For each s ∈ S, we hash it with all the k hash functions and obtain their values
hi(s)(1 ≤ i ≤ k). The bits corresponding to these values are then set to 1 in the
string. Note that multiple values may map to the same bit (see Figure 5.7 for an
example). To determine whether an item s′ is in S, bits hi(s′) are checked. If all
these bits are 1s, s′ is considered to be in S.
1
1
1
m bits
BloomFilter
...........
Element s
H (s) = P
H (s) = P
H (s) = P
1 1
2 2
k k
Figure 5.7. A Bloom Filter with k hash functions
Since multiple hash values may map to the same bit, Bloom Filter may yield
false positives. That is, an element is not in S but its bits hi(s) are collectively
marked by elements in S. If the hash is uniformly random over m values, the
probability that a bit is 0 after all the n elements are hashed and their bits marked is
(1− 1m
)kn ≈ e−knm . Therefore, the probability for a false positive is (1−(1− 1
m)kn)k ≈
(1− e−knm )k. The right hand side is minimized when
k = ln 2×m/n, (5.8)
![Page 120: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/120.jpg)
105
in which case it becomes (12)k = (0.6185)m/n.
A Bloom Filter can be used to construct query messages. A basic approach
is as follows: After an MS determines the location information of all the storage
cells, it builds a Euclidean Steiner tree (EST) and gathers the ids of all the cells
covered by the tree. The MS then inserts the ids into a Bloom Filter, which is
sent with other query information to the root cell of the EST using the GPSR
algorithm (as shown in Figure 5.6 (c)). When a query message arrives at a cell,
the cell checks the embedded Bloom Filter to determine which of its neighbors
belong to the Bloom Filter, and then forwards the message to them. Recursively,
every storage cell receives one query message.
Using Bloom Filter for directed forwarding provides higher query privacy than
EST. This is because Bloom Filter introduces some additional noise cells, including
the non-storage cells connecting the steiner cells in the EST and a small number
of noise cells caused by the false positive rate.
Keyed Bloom Filter: In the Bloom Filter-based scheme, an attacker can freely
check whether a cell is one of the storage cells although there could be a high
false positive rate. To further improve the query privacy, we should disable the
attacker’s capability in performing membership verification over a Bloom Filter.
This motivated our design of a keyed Bloom Filter (KBF) scheme, which uses cell
keys to “encrypt” the cell ids before they are inserted. In this way, an attacker
can derive none or only a small number of cell ids from a query message. As such,
the attacker will have negligible probability to identify the storage cells other than
randomly guessing.
In the KBF scheme, each cell id is concatenated with the cell key of its parent
node in the EST before it is inserted into the Bloom Filter. Specifically, to insert
cell id x, the bits corresponding to Hi(x|kp) (i = 1, · · · , k) are set to 1, where kp
is the cell key of the parent of cell x. When a query message arrives at a cell, the
cell concatenates its own cell key with the id of each neighboring cell that is not a
neighbor of its own parent node (to avoid redundant computation and forwarding),
and determines whether the neighbor is in the Bloom Filter. If it is, the message
is forwarded to the neighbor. Algorithm 6 and Algorithm 7 formally describe the
ways to create a Bloom Filter and to forward a query message, respectively.
Query Privacy: In this scheme, cell ids are “encrypted” with cell keys before
![Page 121: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/121.jpg)
106
Algorithm 6 Create a Bloom Filter
Input: an array of storage-cell Cartesian coordinates c[];Output: Bloom Filter BF ;Procedure:
1: initialize a Bloom Filter BF ;2: build Steiner tree based on c[];3: for each cell u in the Steiner tree do4: p = parent of u; kp = cell key of p;5: map (u|kp) into BF ;6: end for7: return BF ;
Algorithm 7 Forward a Query Message
Input: a query message received by cell u, which includes a Bloom Filter BF .Procedure:
1: ku = cell key of u;2: for for each neighboring cell u′ of u do3: if u′ 6= parent of u ∧ u′ 6= neighbor of the parent of u ∧ BF contains u′
then4: forward the query message to u′
5: end if6: end for
being inserted into the Bloom Filter. If an attacker has not compromised any cells
in the EST, he will not know any cell keys. In this case, he cannot obtain any
information about storage cells from an eavesdropped query message. Next we
consider the case that the attacker has compromised some cells in the EST. If a
compromised cell is contained in the EST, from the received query message it can
find out which of its neighboring cells also belong to the EST. However, it cannot
verify the membership of the other cells. In fact, this is one prominent advantage
of the KBF scheme over the EST scheme. To make the EST scheme more secure,
a straightforward extension would be to encrypt the EST tree. To enable every
cell in the tree to access the information for correct forwarding of a query message,
a group key will need to be used to encrypt the EST tree. Thus, an attacker can
decrypt the entire EST as long as he can compromise one cell. Clearly, the KBF
scheme offers much better query privacy than the EST scheme. The query privacy
of the KBF scheme and other schemes are compared in Section 5.4, and the results
![Page 122: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/122.jpg)
107
show that the KBF scheme has the highest privacy.
5.3.4.4 Plane Partition
The EST scheme reduces the number of query messages at the price of larger
messages. The limited packet size, e.g., 29 bytes in TinyOS [115] may prevent the
MS to piggyback all the storage cell ids together with the query information in a
single packet. A Bloom Filter may be designed to fit in a packet, but to maintain
a low false positive rate, only a limited number of cell ids should be included in
a packet. To address this problem, we use multiple Steiner trees, each of which
is encoded into a single packet. Because partitioning a Steiner tree into multiple
Steiner trees, known as the minimum forest partition problem, is NP-hard ([116]),
we propose heuristics to perform the partition.
������
������
������
������
������������
������������������
������
������
������������
������
������������
������������
������������
������������
����������������
����������������
������������
������
������
���������
���������
������������Path Cell
������������������������
Steiner Cell
Storage Cell
��������
������������
��������
������������
���������
���������
������
������
���������
���������
���������
���������
������������
������
������
��������
��������
���������
���������
������
������
���������
���������
������������
������������
���������
���������
������
������
��������
��������
��������
����
����
����
����
���������
���������
���������
���������
������������
��������
������
������
������
������
������
������
������������
���������
���������
������������������������
������������������������
��������������������������
������������������������
�����
�����
����������
������
Path Cell
Steiner Cell
Storage Cell
������������
������������
������������
������
������
���������
���������
������
������
������������
���������
���������
������������
��������
������
������
������
������
������
������
������������
��������
������
������
��
����
������������
����
���������
�����������������
����
������
������������������������
����������
������������
������
������
������������������������
������������
������������
����������������������������
������
������
������������
���������
���������
���������
���������
���������
���������
������
������
���������
���������
���������
���������
���������
���������
������������
������������
��������
������
������
��������
��������
������������
����������������
������
������
������������
���������
���������
��������
(a) Intuitive partition (b) Fanlike partition
Figure 5.8. 17 storage cells are partitioned into three parts
In Figure 5.8 (a), the solid lines are used to represent the EST tree, and the
shaded areas along these solid lines are used by Bloom Filters to encode the EST
tree. An intuitive partition method is to first cluster the storage cells in a top-down
and left-right fashion, and then build a sub-EST within each partition. We can let
the EST scheme and the KBF scheme have the same partitions and build the same
sub-EST trees. After the partition, the MS sends a query to each partition at the
same time. In this way, the message size can be reduced. Further, since multiple
queries are sent out at the same time, the average query delay is also reduced.
Fanlike Partition Method: With the intuitive partition, the query message
from the MS has to go through some redundant cells. For example, in Figure
5.8 (a), the query message of the MS has to go through many cells before reaching
![Page 123: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/123.jpg)
108
the top partition. To address this problem, we change the Cartesian coordinates
into Polar coordinates. In this new coordination system, storage cells are within
[−π, π]. The partition algorithm scans the plane from −π to π and collects enough
storage cells into each partition. Figure 5.8 (b) shows one example of dividing
the plane into three partitions using the Fanlike partition method. The detailed
description is shown in Algorithm 8.
Algorithm 8 Fanlike Partition Method
Input: an array of Cartesian coordinates c[], where s is the size of the array andc[0] is the cell that the MS resides;Output: Partition Sets;Procedure:
1: initiate an array degree[] to store the degree of each cell;2: for i = 1 to s do3: degree[i] = tan−1( c[i].y−c[0].y
c[i].x−c[0].x);
4: if c[i] is in the 2nd quadrant then5: degree[i]− = π;6: end if7: if c[i] is in the 3th quadrant then8: degree[i]+ = π;9: end if
10: end for11: Sort all the cells according to their degrees, and then uniformly divide the cells
into the specified number of partitions and put them into a set array A[].12: return A;
5.3.5 MS Data Processing
Through the above query process, an MS can retrieve the message of his interest,
which is encrypted by the cell key of the detection cell. To process the event, the MS
needs to decrypt the message first. However, for preventing selective compromise
attacks, in our design the id of a detection cell is also encrypted. As such, the MS
will try all the cell keys until the decrypted message is meaningful (e.g., including a
source cell id and following a certain format). The average number of decryptions is
N/2. Though this may not be a big issue for a laptop-class MS, which can perform
about 4 million en/decryptions per second [117], we will continue to design more
efficient ways in our future work.
![Page 124: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/124.jpg)
109
Another concern in pDCS is the number of keys that have to be possessed by
an MS when the MS needs to decrypt data from many cells. If we assume that
the MS could not be compromised, we can simply load it with a single key, which
is the initial group key KI . From this initial key the MS can derive the cell key
Kij of each cell (i, j) as Kij = H(KI , i|j). This is however dangerous if the MS
could be compromised, because all the cell keys would be exposed. This problem
can be relieved in the following way. Instead of applying its cell key for encryption
directly, every node may first derive some variances of its cell key for specific events
or time intervals using a hash function. The variance keys are then used to encrypt
event messages. The MS will be loaded with the variance keys for the event of
his interest. In case that the MS is compromised, the other variance keys are still
secure.
5.4 Performance Evaluations
In this section, we evaluate and compare the performance of three query schemes:
the Basic scheme, the Euclidean Steiner Tree (EST) scheme and the Keyed Bloom
Filter (KBF) scheme. In our simulation setup, each query message contains the
query information and the encoded query path. The query information occupies
4 bytes which are used to represent time and event2, and 25 bytes are used to
represent the query path. For evaluation purpose, we do not consider the overhead
of source authentication.
In the EST scheme, the query path is encoded as a Steiner tree. Each node
id is presented by two byes, so only 12 cell ids can be encoded in each packet. In
the KBF scheme, 25 bytes are used to encode the query path with Bloom Filter,
and it is expected to achieve an acceptable false positive rate, say 0.1. Considering
these limitations, we choose (n, k) = (20, 5).
These schemes are evaluated under various storage cell densities, ranging from140
to 12.5
. The storage cell density is defined as the ratio of the number of storage
cells to the number of total cells in the plane. For example, with our setting of
2Some applications may require more bytes; nevertheless, since we are interested in the com-parative results of multiple schemes, normally the payload size will not affect much. Further,the time should be in hour/minute level instead of microsecond level, and hence only need lessnumber of bits.
![Page 125: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/125.jpg)
110
20×20 cells, a density of 110
means that there are about 400∗ 110
= 40 storage cells.
Four metrics are used to evaluate the performance of the proposed schemes:
the number of query messages, the average query delay, the maximum query delay
and the message overhead. The number of query messages is the total number of
messages sent out by the MS for a query. The average query delay is the average
of the query delays for different storage cells. The maximum query delay is the
maximum among all the query delays. The message overhead is defined as the total
number of transmitted hops of all the messages sent out by the MS to serve a query.
In the KBF scheme, the message overhead also includes the extra messages due to
false positive. As query messages are forwarded in the network in a hop-by-hop
fashion, the number of query messages and message overhead also proportionally
reflect the communication costs by the sensor nodes.
5.4.1 Choosing the Partition Method
8.5
9
9.5
10
10.5
11
11.5
12
1/40 1/20 1/10 1/5 1/2.5
Av
era
ge
Qu
ery
Del
ay
Storage-cell Density
EST with Intuitive PartitionEST with Fan-like Partition
19
20
21
22
23
24
25
26
27
1/40 1/20 1/10 1/5 1/2.5
Ma
xim
um
Qu
ery
Del
ay
Storage-cell Density
EST with Intuitive PartitionEST with Fan-like Partition
0
50
100
150
200
250
1/40 1/20 1/10 1/5 1/2.5
Msg
Ov
erh
ead
Storage-cell Density
EST with Intuitive PartitionEST with Fan-like Partition
(a) Average Query Delay (b) Maximum Query Delay (c) Msg Overhead
Figure 5.9. Performance comparisons between different partitioning schemes
In this subsection, we evaluate the performance of EST with intuitive partition
and EST with Fanlike partition. As shown in Figure 5.9, the Fanlike partition
method outperforms the intuitive method in terms of average query delay, maxi-
mum query delay, and message overhead. We did not show the number of messages,
since both schemes have the same number of messages determined by the packet
size.
As discussed earlier, in the intuitive partition method, each query message is
sent from the MS to the partition, which may go through many redundant cells
and hence increase the message overhead. However, in the Fanlike partition, less
![Page 126: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/126.jpg)
111
redundant cells are involved, and hence the message overhead is lower. This also
explains why the Fanlike partition has lower average and maximum query delay
when compared to the intuitive partition.
In Figure 5.9 (a), with Fanlike partition, the average query delay drops as the
storage cell density increases. This can be explained as follows. When the storage
cell density is high, each partition is small. Therefore, the Steiner tree is limited
within a small range and the zig-zag paths from MS to storage cells tend to be
shorter. This results in smaller average query delays.
The aforementioned reason also explains the phenomenon that the maximum
query delay decreases as the storage cell density increases for the Fanlike partition
in Figure 5.9 (b). However, when the density is very low ( 140
), the intuitive partition
has a little bit lower maximum query delay than the Fanlike partition. We checked
the simulation trace and found the following reason. When the density is 140
, there
are about 10 storage cells. Due to the use of Steiner cells and that each packet is
limited to 12 cell ids, there are a very small number (one or two) of cells left into
the second packet. These leftover cells tend to be faraway in the intuitive partition
method but not in the Fanlike partition. As a result, the intuitive partition can
achieve a slightly shorter maximum delay than the Fanlike partition method when
the storage cell density is very low.
We also evaluated the performance of the KBF scheme under both partition
methods. The results are similar to EST where the Fanlike partition performs
better. Thus, we use the Fanlike partition method in the following comparisons.
5.4.2 Performance Comparisons of Different Schemes
This subsection compares the performance of three schemes: the Basic scheme, the
EST scheme and the KBF scheme.
Figure 5.10 compares the number of messages and the message overhead of
the three schemes. As can be seen, both optimization schemes (EST and KBF)
outperform the basic scheme since the optimization schemes combine several mes-
sages into one. We can also see that the message overhead of the KBF scheme is
higher than the EST scheme although both schemes have similar number of mes-
sages. This is due to the fact that the query messages in the KBF scheme may go
![Page 127: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/127.jpg)
112
0
20
40
60
80
100
120
1/40 1/20 1/10 1/5 1/2.5
Nu
mb
er o
f M
sgs
Storage-cell Density
BasicESTKBF
0
200
400
600
800
1000
1200
1/40 1/20 1/10 1/5 1/2.5
Msg
Ov
erh
ead
Storage-cell Density
BasicESTKBF
(a) Number of Msgs (b) Msg Overhead
Figure 5.10. The message overhead of different schemes
8.5
9
9.5
10
10.5
11
11.5
12
1/40 1/20 1/10 1/5 1/2.5
Av
era
ge
Qu
ery
Del
ay
Storage-cell Density
BasicESTKBF
14
15
16
17
18
19
20
21
22
23
24
1/40 1/20 1/10 1/5 1/2.5
Ma
xim
um
Qu
ery
Del
ay
Storage-cell Density
BasicESTKBF
0
0.2
0.4
0.6
0.8
1
1/40 1/20 1/10 1/5 1/2.5
Qu
ery
Pri
va
cy
Storage-cell Density
BasicEST
KBF(s=5)KBF(s=10)KBF(s=20)
(a) Average Query Delay (b) Maximum Query Delay (c) Privacy
Figure 5.11. Comparisons among different schemes
through some redundant cells due to false positive.
Figure 5.11 (a) (b) compares the average delay and the maximum delay of the
three schemes. As can be seen, the basic scheme outperforms the other two. This
is because in the basic scheme, the query messages are sent directly to the storage
cells in parallel along shortest paths, resulting in a lower query delay. Although
EST and KBF can reduce the message overhead, the query delay is increased since
the message has to go through many intermediate cells sequentially.
As shown in Figure 5.11(a) and (b), when the storage cell density is low, KBF
outperforms EST in terms of query delay. To explain this, we need to understand
the effects of the number of partitions. When the number of partitions is small
and hence each partition is large, the path to each storage cell is more zig-zag like,
which may result in long delay. As shown in Figure 5.10 (a), when the density is
low, EST has less number of messages and hence less number of partitions, which
means that EST will have large partitions and long delay. Similarly, when the
![Page 128: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/128.jpg)
113
density is high, EST has more partitions and shorter delay.
In addition, as shown in Figure 5.11(c), the KBF scheme has the highest query
privacy. Even after s = 20 cells have been compromised, the query privacy level is
still above 83%.
In summary, there is a tradeoff among query delay, message overhead, and
query privacy. The Basic scheme has the lowest delay but the highest message
overhead and the lowest query privacy. The EST scheme and the KBF scheme can
significantly reduce the number of messages and the message overhead with the
same level of query delay. Especially the query privacy level of KBF is far higher
than the other schemes.
5.5 Conclusions
In this chapter, we proposed solutions on privacy support for data centric sensor
networks (pDCS). The proposed schemes offer different levels of location privacy
and allow a tradeoff between privacy and query efficiency. pDCS also includes
an efficient key management scheme that makes a seamless mapping between lo-
cation keys and logical keys, and several query optimization techniques based on
Euclidean Steiner Tree and Bloom Filter to minimize the query message overhead
and increase the query privacy. Simulation results verified that the KBF scheme
can significantly reduce the message overhead with the same level of query delay.
More importantly, the KBF scheme can achieve these benefits without losing any
query privacy.
![Page 129: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/129.jpg)
Chapter 6Conclusions and Future Work
6.1 Summary
In this thesis, we have proposed a comprehensive solution to support source loca-
tion privacy in sensor networks. The solution provides several techniques which are
optimized for sensor networks according to different attack models. We summarize
these techniques as follows.
In Chapter 3, we adopted the concept of cross-layer. By observing the fact
that beacons always exist in the network and form constant-rate dummy traffic,
we use beacons to replace the application layer dummy messages to reduce the
network traffic. Based on this observation, we propose a naive solution. After
realizing that purely relying on beacon may increase the communication delay, we
continue to propose the cross-layer and double cross-layer solutions to limit this
communication delay. In the cross-layer solution, the event information is first
propagated several hops through a MAC-layer beacon. Then, it is propagated at
the routing layer to the base station to avoid further beacon delays. The double
cross-layer solution repeats this process one more time to further increase the
privacy level. Simulation results show that the double cross-layer solution can
maintain low message overhead and high privacy, while controlling delay.
In Chapter 4, we deal with a much harder problem - the global attacker problem.
For a global attacker, dummy traffic is considered to be the only solution. However,
dummy traffic greatly degrades the network performance. To address this issue,
we propose two major solutions. First, with the assumption of low message trans-
![Page 130: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/130.jpg)
115
mission rate, we proposed the concept of statistically strong source anonymity and
the scheme FitProbRate to generate dummy messages. The FitProbRate scheme
is based on dummy traffic with the intent to reduce the real event reporting la-
tency. By using the Anderson-Darling test to control the probabilistic distribution
type and the mean test to control the distribution mean, FitProbRate is able to
reduce the real event reporting latency without disturbing the probabilistic distri-
bution. Furthermore, our analysis and simulation results show that this scheme
has been proven to provide the desired privacy and significantly reduces real event
reporting latency. Secondly, with a normal message transmission rate, we tried to
reduce the dummy traffic being transmitted. We proposed the proxy-based filter
scheme (PFS) in which, some sensor nodes are selected as proxy nodes that filter
dummy messages. Furthermore, we proposed the tree-based filter scheme(TFS)
which builds a hierarchical proxy structure to filter more dummy messages. We
found that the proxy placement affects its filtering capability. Therefore, we de-
signed algorithms for optimal proxy placement in both PFS and TFS to minimize
network traffic and maintain source location privacy.
In Chapter 5, we proposed pDCS, a privacy enhanced DCS system for unat-
tended sensor networks. pDCS mainly includes two parts: mapping and query.
For mapping, we proposed four mapping schemes targeting on four different query
methods with different privacy, query granularity and message overhead. After the
data is mapped to the storage cell and saved there, a mobile sink can send queries
to query his desired data. Different query schemes are proposed with different
query delay, message overhead and query privacy. Among them, Keyed Bloom
Filter scheme can significantly reduce the message overhead without losing any
query privacy.
6.2 Future Directions
Our work to date has provided a series of solutions for protecting source location
privacy in sensor networks. However, there are not too many large scale sensor
networks being deployed, and many protocols are not fully tested. Further, many
sensor network applications and research problems need to be explored. Next I
outline several interesting directions for future work that one could pursue.
![Page 131: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/131.jpg)
116
Explore more attack models: We addressed three attack models in this thesis.
There are obviously more left to be addressed. Even though solutions such as
dummy traffic might be able to work in most of the attack models, it might
not be the best choice considering the high network traffic. Better solutions
have to be studied against each specific attack model.
Cross-layer security issues: Most existing work on security only focuses on one
layer, such as routing layer disruption or application layer dropping. However
when routing layer disruption and application layer information are consid-
ered together, will the problem become different?
How to achieve a global attacker: In section 4, we assumed a powerful attack
model, i.e., a global observer who can monitor and analyze the traffic over the
whole network. In order to monitor and analyze the traffic over the network,
the attacker needs to be in active mode to receive all the messages passing
by. After a node receives one message, it needs to analyze the message
according to certain algorithms. For example, in order to find out the source
location, the attacker (u) first communicates with all other attackers. Then,
all the attacking nodes compare this message with the messages in their
memories and report results to u. Finally, u gathers all the information and
gives out an area of the message source. Its not hard to see that all these
operations put a high demand on the attacking nodes energy, computation,
communication and storage ability. Although such a strong attacker exists
in theory, problems arise in real application: What can an attacker really do?
or How can we achieve a global attacker?
![Page 132: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/132.jpg)
Bibliography
[1] Papadopouli, M. and H. Schulzrinne (2001) “Effects of power conserva-tion, wireless coverage and cooperation on data dissemination among mobiledevices,” in MobiHoc ’01: Proceedings of the 2nd ACM international sym-posium on Mobile ad hoc networking & computing, ACM, New York, NY,USA, pp. 117–127.
[2] Heidemann, J., F. Silva, and D. Estrin (2003) “Matching data dissem-ination algorithms to application requirements,” in SenSys ’03: Proceedingsof the 1st international conference on Embedded networked sensor systems,ACM, New York, NY, USA, pp. 218–229.
[3] Nath, S., P. B. Gibbons, S. Seshan, and Z. R. Anderson (2004)“Synopsis diffusion for robust aggregation in sensor networks,” in SenSys’04: Proceedings of the 2nd international conference on Embedded networkedsensor systems, ACM, New York, NY, USA, pp. 250–262.
[4] Meliou, A., D. Chu, J. Hellerstein, C. Guestrin, and W. Hong
(2006) “Data gathering tours in sensor networks,” in IPSN ’06: Proceed-ings of the fifth international conference on Information processing in sensornetworks, ACM, New York, NY, USA, pp. 43–50.
[5] Fan, K.-W., S. Liu, and P. Sinha (2006) “Scalable data aggregation fordynamic events in sensor networks,” in SenSys ’06: Proceedings of the 4thinternational conference on Embedded networked sensor systems, ACM, NewYork, NY, USA, pp. 181–194.
[6] Gao, J., L. Guibas, N. Milosavljevic, and J. Hershberger (2007)“Sparse data aggregation in sensor networks,” in IPSN ’07: Proceedings ofthe 6th international conference on Information processing in sensor net-works, ACM, New York, NY, USA, pp. 430–439.
[7] Zheng, R. and R. J. Barton (2007) “Toward Optimal Data Aggregationin Random Wireless Sensor Networks,” in INFOCOM, pp. 249–257.
![Page 133: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/133.jpg)
118
[8] Starobinski, D., W. Xiao, X. Qin, and A. Trachtenberg (2007)“Near-Optimal Data Dissemination Policies for Multi-Channel, Single Ra-dio Wireless Sensor Networks,” in INFOCOM, pp. 955–963.
[9] Ye, Z., A. A. Abouzeid, and J. Ai (2007) “Optimal Policies for Dis-tributed Data Aggregation in Wireless Sensor Networks,” in INFOCOM, pp.1676–1684.
[10] He, W., X. Liu, H. Nguyen, K. Nahrstedt, and T. F. Abdelza-
her (2007) “PDA: Privacy-Preserving Data Aggregation in Wireless SensorNetworks,” in INFOCOM, pp. 2045–2053.
[11] Karp, B. and H. T. Kung (2000) “GPSR: greedy perimeter stateless rout-ing for wireless networks,” in MobiCom ’00: Proceedings of the 6th annualinternational conference on Mobile computing and networking, ACM, NewYork, NY, USA, pp. 243–254.
[12] Rao, A., S. Ratnasamy, C. Papadimitriou, S. Shenker, and I. Sto-
ica (2003) “Geographic routing without location information,” in MobiCom’03: Proceedings of the 9th annual international conference on Mobile com-puting and networking, ACM, New York, NY, USA, pp. 96–108.
[13] Niculescu, D. and B. Nath (2003) “Trajectory based forwarding and itsapplications,” in MobiCom ’03: Proceedings of the 9th annual internationalconference on Mobile computing and networking, ACM, New York, NY, USA,pp. 260–272.
[14] Lee, S., B. Bhattacharjee, and S. Banerjee (2005) “Efficient geo-graphic routing in multihop wireless networks,” in MobiHoc ’05: Proceedingsof the 6th ACM international symposium on Mobile ad hoc networking andcomputing, ACM, New York, NY, USA, pp. 230–241.
[15] Nguyen, A., N. Milosavljevic, Q. Fang, J. Gao, and L. J. Guibas
(2007) “Landmark Selection and Greedy Landmark-Descent Routing for Sen-sor Networks,” in INFOCOM, pp. 661–669.
[16] Tsai, M.-J., H.-Y. Yang, and W.-Q. Huang (2007) “Axis-Based VirtualCoordinate Assignment Protocol and Delivery-Guaranteed Routing Protocolin Wireless Sensor Networks,” in INFOCOM, pp. 2234–2242.
[17] Kannan, R., S. Sarangi, and S. S. Iyengar (2004) “Sensor-centricenergy-constrained reliable query routing for wireless sensor networks,” J.Parallel Distrib. Comput., 64(7), pp. 839–852.
[18] Fang, Q., J. Gao, and L. J. Guibas (2006) “Locating and bypassing holesin sensor networks,” Mob. Netw. Appl., 11(2), pp. 187–200.
![Page 134: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/134.jpg)
119
[19] Dong, Q., S. Banerjee, M. Adler, and A. Misra (2005) “Minimum en-ergy reliable paths using unreliable wireless links,” in MobiHoc ’05: Proceed-ings of the 6th ACM international symposium on Mobile ad hoc networkingand computing, ACM, New York, NY, USA, pp. 449–459.
[20] (2004), “CROSSBOW TECHNOLOGY INC,” .
[21] Back, A., U. Möller, and A. Stiglic (2001) “Traffic AnalysisAttacks and Trade-Offs in Anonymity Providing Systems,” in IHW ’01: Pro-ceedings of the 4th International Workshop on Information Hiding, Springer-Verlag, London, UK, pp. 245–257.
[22] Kamat, P., Y. Zhang, W. Trappe, and C. Ozturk (2005) “EnhancingSource-Location Privacy in Sensor Network Routing,” in ICDCS ’05: Pro-ceedings of the 25th IEEE International Conference on Distributed Comput-ing Systems (ICDCS’05), IEEE Computer Society, Washington, DC, USA,pp. 599–608.
[23] Dıaz, C. and B. Preneel (2004) “Taxonomy of Mixes and Dummy Traf-fic,” in Proceedings of I-NetSec04: 3rd Working Conference on Privacy andAnonymity in Networked and Distributed Systems, Toulouse, France.
[24] Gura, N., A. Patel, A. Wander, H. Eberle, and S. C. Shantz
(2004) “Comparing Elliptic Curve Cryptography and RSA on 8-bit CPUs.”in CHES, pp. 119–132.
[25] Gaubatz, G., J.-P. Kaps, and B. Sunar (2004) “Public Key Cryptogra-phy in Sensor Networks - Revisited.” in ESAS, pp. 2–18.
[26] Malan, D., M. Welsh, and M. Smith (2004), “A public-key infrastruc-ture for key distribution in TinyOS based on elliptic curve cryptography,” .URL citeseer.ist.psu.edu/malan04publickey.html
[27] Watro, R., D. Kong, S. fen Cuti, C. Gardiner, C. Lynn, andP. Kruus (2004) “TinyPK: securing sensor networks with public key tech-nology,” in SASN ’04: Proceedings of the 2nd ACM workshop on Security ofad hoc and sensor networks, ACM Press, New York, NY, USA, pp. 59–64.
[28] Schneier, B. (1996) Applied Cryptography, Second Edition, John Wiley &Sons, Inc.
[29] Karlof, C., N. Sastry, and D. Wagner (2004) “TinySec: A Link LayerSecurity Architecture for Wireless Sensor Networks,” in SenSys ’04: Pro-ceedings of the 2nd international conference on Embedded networked sensorsystems, Baltimore, pp. 162–175.URL citeseer.ist.psu.edu/757036.html
![Page 135: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/135.jpg)
120
[30] Perrig, A., R. Szewczyk, V. Wen, D. Culler, and J. Tygar (2001)“SPINS: security protocols for sensor netowrks,” in ACM Mobicom.
[31] Zhu, S., S. Setia, and S. Jajodia (2003) “LEAP: Efficient Security Mech-anisms for Large-Scale Distributed Sensor Networks,” in ACM Conferenceon Computer and Communications Security (CCS).
[32] Chan, H., A. Perrig, and D. Song (2003) “Random Key PredistributionSchemes for Sensor Networks,” in Proceedings of IEEE Security and PrivacySymposim.
[33] Du, W., J. Deng, Y. Han, and P. Varshney (2003) “A Pairwise Key Pre-distribution Scheme for Wireless Sensor Networks,” in Proceedings of the 10thACM Conference on Computer and Communications Security (CCS’03), pp.42–51.
[34] Eschenauer, L. and V. Gligor (2002) “A Key-Management Scheme forDistributed Sensor Networks,” in Proceedings of ACM CCS’02.
[35] Liu, D. and P. Ning (2003) “Establishing Pairwise Keys in DistributedSensor Networks,” in ACM Conference on Computer and CommunicationsSecurity (CCS).
[36] Chan, H. and A. Perrig (2005) “PIKE: Peer Intermediaries for Key Es-tablishment in Sensor Networks,” in Proceedings of IEEE Infocom.
[37] Zhang,W. Liu, W. Lou, and Y. Fang, Y. (2006) “Location-basedcompromise-tolerant security mechanisms for wireless sensor networks,”IEEE Journal on Selected Areas in Communications.
[38] Zhang, W. and G. Cao (2005) “Group Rekeying for Filtering False Datain Sensor Networks: A Predistribution and Local Collaboration-Based Ap-proach,” IEEE INFOCOM.
[39] Lazos, L. and R. Poovendran (2003) “Energy-Aware Secure MulticastCommunication in Ad-hoc Networks Using Geographic Location Informa-tion,” in Proceedings of IEEE ICASSP’03.
[40] Wong, C. K., M. Gouda, and S. Lam (1998) “Secure Group Communi-cation Using Key Graphs,” in Proceedings of ACM SIGCOMM 1998.
[41] Chaum, D. (1981) “Untraceable Electronic Mail, Return Address, and Dig-ital Pseudonyms,” Communications of the ACM, 24(2), pp. 84–88.
[42] (2005), “Anonymity bibliography,” .
![Page 136: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/136.jpg)
121
[43] Chaum, D. (1988) “The Dining cryptographers problem: UnconditionalSender and Recipient Untraceability,” Journal of Cryptology, 1(1), pp. 65–75.
[44] Waidner, M. (1989) “Unconditional Sender and Recipient UntraceabilityIn Spite of Active Attacks,” Advances in Cryptology: EUROCRYPT’89, pp.302–319.
[45] Goldschlag, M. Reed, and P. Syverson, D. (1999) “Onion Routingfor Anonymous and Private Internet Connections,” Communications of theACM (USA), 42(2), pp. 39–41.
[46] Reiter, M. and A. Rubin (1998) “Crowds: Anonymity For Web Trans-actions,” ACM Transactions on Information and System Security, 1(1), pp.66–92.
[47] Al-Muhtadi, R. Campbell, A. Kapadia, M. Mickunas, and S. Yi,
J. (2002) “Routing Through the Mist: Privacy Preserving Communicationin Ubiquitous Computing Environments,” ICDCS’02.
[48] Berthold, O., A. Pfitzmann, and R. Standtke (2000) “The disad-vantages of free MIX routes and how to overcome them,” in Proceedings ofDesigning Privacy Enhancing Technologies: Workshop on Design Issues inAnonymity and Unobservability (H. Federrath, ed.), Springer-Verlag, LNCS2009, pp. 30–45.
[49] Pfitzmann, A., B. Pfitzmann, and M. Waidner (1991) “ISDN-mixes:Untraceable communication with very small bandwidth overhead,” in Pro-ceedings of the GI/ITG Conference on Communication in Distributed Sys-tems, pp. 451–463.
[50] Berthold, O., H. Federrath, and S. Kopsell (2000) “Web MIXes: Asystem for anonymous and unobservable Internet access,” in Proceedings ofDesigning Privacy Enhancing Technologies: Workshop on Design Issues inAnonymity and Unobservability (H. Federrath, ed.), Springer-Verlag, LNCS2009, pp. 115–129.
[51] Moller, U., L. Cottrell, P. Palfrader, and L. Sassaman (2003),“Mixmaster Protocol — Version 2,” Draft.
[52] Freedman, M. J. and R. Morris (2002) “Tarzan: A Peer-to-PeerAnonymizing Network Layer,” in Proceedings of the 9th ACM Conferenceon Computer and Communications Security (CCS 2002), Washington, DC.
![Page 137: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/137.jpg)
122
[53] Rennhard, M. and B. Plattner (2002) “Introducing MorphMix: Peer-to-Peer based Anonymous Internet Usage with Collusion Detection,” in Pro-ceedings of the Workshop on Privacy in the Electronic Society (WPES 2002),Washington, DC, USA.
[54] Goldschlag, D. M., M. G. Reed, and P. F. Syverson (1996) “HidingRouting Information,” in Proceedings of Information Hiding: First Inter-national Workshop (R. Anderson, ed.), Springer-Verlag, LNCS 1174, pp.137–150.
[55] Syverson, P., G. Tsudik, M. Reed, and C. Landwehr (2000) “To-wards an Analysis of Onion Routing Security,” in Proceedings of DesigningPrivacy Enhancing Technologies: Workshop on Design Issues in Anonymityand Unobservability (H. Federrath, ed.), Springer-Verlag, LNCS 2009, pp.96–114.
[56] Shields, C. and B. N. Levine (2000) “A protocol for anonymous commu-nication over the Internet,” in CCS ’00: Proceedings of the 7th ACM con-ference on Computer and communications security, ACM Press, New York,NY, USA, pp. 33–42.
[57] Berthold, O. and H. Langos (2002) “Dummy Traffic Against LongTerm Intersection Attacks,” in Proceedings of Privacy Enhancing Technolo-gies workshop (PET 2002) (R. Dingledine and P. Syverson, eds.), Springer-Verlag, LNCS 2482.
[58] Dai, W. (1996), “PipeNet 1.1,” Usenet post.
[59] Deng, R. Han, and S. Mishra, J. (2004) “Intrusion Tolerance and Anti-Traffic Analysis Strategies for Wireless Sensor Networks,” International Con-ference on Dependable Systems and Networks (DSN’04).
[60] Ozturk, Y. Zhang, and W. Trappe, C. (2004) “Source-Location Pri-vacy in Energy-Constrained Sensor Networks Routing,” ACM Workshop onSecurity of Ad Hoc and Sensor Networks (SASN’04).
[61] Xi, Y., L. Schwiebert, and W. Shi “Preserving Source Location Privacyin Monitoring-Based Wireless Sensor Networks,” in in Proceedings of the 2ndInternational Workshop on Security in Systems and Networks(SSN ’06).
[62] Hoh, B. and M. Gruteser (2005) “Protecting Location Privacy ThroughPath Confusion,” securecomm, 0, pp. 194–205.
[63] RatNasamy, B. karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and
S. Shenker, S. (2002) “GHT: A Geographic Hash Table for Data-Centric
![Page 138: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/138.jpg)
123
Storage,” ACM International Workshop on Wireless Sensor Networks andApplications.
[64] Lazos, L. and R. Pooverdran (2004) “SeRLoc: Secure Range-Independent Localization for Wireless Sensor Networks,” in Proceedings ofACM Workshop WiSe’04.
[65] Liu, D., P. Ning, and W. Du (2005) “Attack-Resistant Location Estima-tion in Sensor Networks,” in Proceedings of The 4th International Conferenceon Information Processing in Sensor Networks (IPSN).
[66] Pfitzmann, A. and M. Hansen (2000), “Anonymity, Unobservability, andPseudonymity: A Consolidated Proposal for Terminology,” Draft.
[67] Anderson, T. W. and D. A. Darling (1952) “Asymptotic Theory ofCertain ”Goodness of Fit” Criteria Based on Stochastic Processes,” TheAnnals of Mathematical Statistics, 23(2), pp. 193–212.
[68] Marsaglia, G. and J. C. W. Marsaglia (2004) “Evaluating theAnderson-Darling Distribution,” Journal of Statistical Software, 9(2).
[69] Anderson, T. W. and D. A. Darling (1954) “A Test of Goodness ofFit,” Journal of the American Statistical Association, 49(268), pp. 765–769.
[70] Trappe, W. and L. Washington (2002) Introduction to Cryptographywith Coding Theory, Prentice Hall.
[71] Stephens, M. A. (1974) “EDF Statistics for Goodness of Fit and SomeComparisons,” Journal of the American Statistical Association, 69, pp. 730–737.
[72] Romeu, J. L. (2003) “Kolmogorov-Simirnov: A Goodness of Fit Test forSmall Samples,” START: Selected Topics in Assurance Related Technologies,10(6).
[73] Wald, A. (1947) Sequential Analysis, J. Wiley & Sons, New York.
[74] Ratnasamy, S., D. Estrin, R. Govindan, B. Karp, L. Yin,S. Shenker, and F. Yu (2001) “Data-centric storage in sensornets,” inProceedings of ACM First Workshop on Hot Topics in Networks.
[75] W. Zhang, S. Z., M. Tran and G. Cao. “A Compromise-ResilientScheme for Pairwise Key Establishment in Dynamic Sensor Networks.” inMobiCom ’07.
![Page 139: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/139.jpg)
124
[76] Karp, B. and H. T. Kung (2000) “GPSR: greedy perimeter stateless rout-ing for wireless networks,” in MobiCom ’00: Proceedings of the 6th annualinternational conference on Mobile computing and networking, pp. 243–254.
[77] Shmoys, D. B., E. Tardos, and K. Aardal (1997) “Approximation al-gorithms for facility location problems,” in STOC ’97: Proceedings of thetwenty-ninth annual ACM symposium on Theory of computing, pp. 265–274.
[78] Arya, V., N. Garg, R. Khandekar, K. Munagala, and V. Pandit
(2001) “Local search heuristic for k-median and facility location problems,”in STOC ’01: Proceedings of the thirty-third annual ACM symposium onTheory of computing, pp. 21–29.
[79] Korupolu, M. R., C. G. Plaxton, and R. Rajaraman (1998) “Anal-ysis of a local search heuristic for facility location problems,” in SODA ’98:Proceedings of the ninth annual ACM-SIAM symposium on Discrete algo-rithms, pp. 1–10.
[80] Preprint, F. L., “Proximity-Based Adjacency Determination for FacilityLayout,” .URL http://www.coe.montana.edu/ie/faculty/emooney/pubs/cie96/cie.html
[81] Kleinrock, L. (1975) Queueing Systems, Volume 1: Theory, John Wileyand Sons, Inc.
[82] “The CSIM Simulator,” .
[83] Fischer-Hubner., S. (2001) Privacy Enhancing Technologies, SpringerBerlin Heidelberg.
[84] Shao, M., S. Zhu, W. Zhang, and G. Cao (2007) “pDCS: Security andPrivacy Support for Data-Centric Sensor Networks,” in Proceedings of 26thAnnual IEEE Conference on. Computer Communications (Infocom’07).
[85] “GloMoSim,” .
[86] Hill, J., R. Szewczyk, A. Woo, S. Hollar, D. E. Culler, andK. S. J. Pister (2000) “System Architecture Directions for NetworkedSensors,” in Architectural Support for Programming Languages and Operat-ing Systems, pp. 93–104.
[87] Akyildiz, I. F., W. Su, Y. Sankarasubramaniam, and E. Cayirci
(2002) “Wireless sensor networks: a survey.” Computer Networks, 38(4), pp.393–422.
![Page 140: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/140.jpg)
125
[88] Ghose, J. Grobklags and J. Chuang, A. (2003) “Resilient Data-Centric Storage in Wireless Ad-Hoc Sensor Networks,” Proceedings the 4thInternational Conference on Mobile Data Management (MDM’03), pp. 45–62.
[89] Zhang, G. Cao, and T. La Porta, W. (2003) “Data Disseminationwith Ring-Based Index for Wireless Sensor Networks,” IEEE InternationalConference on Network Protocols (ICNP), pp. 305–314.
[90] Karp, B. and H. Kung (2000) “GPSR: Greedy Perimeter Stateless Routingfor Wireless Networks,” ACM Mobicom.
[91] Ye, H. Luo, J. Cheng, S. Lu, and L. Zhang, F. (2002) “A Two-Tier Data Dissemination Model for Large-scale Wireless Sensor Networks,”ACM International Conference on Mobile Computing and Networking (MO-BICOM’02), pp. 148–159.
[92] “The smartdust project,” .
[93] Winter, P. and M. Zachariasen (1997) “Euclidean Steiner minimumtrees: An improved exact algorithm.” Networks, 30(3), pp. 149–166.
[94] Capkun, S. and J. Hubaux (2005) “Secure positioning of wireless deviceswith application to sensor networks,” IEEE infocom.
[95] Akcan, H., V. Kriakov, H. Bronnimann, and A. Delis (2006) “GPS-Free Node Localization in Mobile Wireless Sensor Networks,” in MobiDE’06.
[96] Iyengar, R. and B. Sikdar (2003) “Scalable and Distributed GPS freePositioning for Sensor Networks,” in ICC’03.
[97] Bulusu, N., J. Heidemann, and D. Estrin (2000) “GPS-less Low CostOutdoor Localization For Very Small Devices,” IEEE Personal Communi-cations.
[98] Sun, K., P. Ning, and C. Wang (2006) “Secure and Resilient Clock Syn-chronization in Wireless Sensor Networks,” IEEE Journal on Selected Areasin Communications, 24(2), pp. 395–408.
[99] Song, S. Zhu, and G. Cao, H. (2005) “Attack-Resilient Time Synchro-nization for Wireless Sensor Networks,” IEEE International Conference onMobile Ad-hoc and Sensor Systems (MASS’05).
[100] Karlof, C. and D. Wagner (2003) “Secure Routing in Sensor Networks:Attacks and Countermeasures,” in Proceedings of First IEEE Workshop onSensor Network Protocols and Applications.
![Page 141: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/141.jpg)
126
[101] Cardenas, A., S. Radosavac, and J. Baras (2004) “Detection and Pre-vention of MAC Layer Misbehavior for Ad Hoc Networks,” in Proceedings ofACM Workshop on Security of Ad hoc and Sensor Networks (SASN’04).
[102] Xu, W., T. Wood, W. Trappe, and Y. Zhang (2004) “Channel surfingand spatial retreats: defenses against wireless denial of service,” in Proceed-ings of ACM Workshop on Wireless Security (WiSe).
[103] Shao, M., Y. Yang, S. Zhu, and G. Cao (2008) “Towards StatisticallyStrong Source Anonymity for Sensor Networks.” in Proceedings of IEEE In-focom.
[104] Yang, Y., M. Shao, S. Zhu, B. Urgaonkar, and G. Cao (2008) “To-wards event source unobservability with minimum network traffic in sensornetworks,” in WiSec ’08: Proceedings of the first ACM conference on Wire-less network security, ACM, New York, NY, USA, pp. 77–88.
[105] Chen, W.-T., H.-L. Hsu, and J.-L. Chiang (2005) “Logical Key TreeBased Secure Multicast Protocol with Copyright Protection,” in AINA’05.
[106] Hao, G., N.V.Vinodchandran, and B. Ramamruthy (2005) “A bal-anced key tree approach for dynamic secure group communication,” in IC-CCN’05.
[107] Deng, J., R. Han, and S. Mishra (2005) “A Practical Study of Tran-sitory Master Key Establishment For Wireless Sensor Networks,” in firstIEEE/CreateNet Conference on Security and Privacy in CommunicationNetworks (SecureComm 2005), pp. 289–299.
[108] Zhu, S., S. Setia, and S. Jajodia (2007) “LEAP+: Efficient securitymechanisms for large-scale distributed sensor networks,” in ACM Transac-tion on Sensor Networks (TOSN), vol. 2.
[109] Ye, F., H. Luo, S. Lu, and L. Zhang (2004) “Statistical En-route Detec-tion and Filtering of Injected False Data in Sensor Networks,” in Proceedingsof IEEE Infocom’04.
[110] Park, T. and K. Shin (2005) “Soft Tamper-Proofing via Program IntegrityVerification in Wireless Sensor Networks,” IEEE Transactions on MobileComputing, 4(3).
[111] Seshadri, A., A. Perrig, L. Doorn, and P. Khosla (2004) “SWAtt:Software-based Attestation for Embedded Devices,” in Proceedings of theIEEE Symposium on Security and Privacy.
![Page 142: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/142.jpg)
127
[112] Yang, Y., X. Wang, S. Zhu, and G. Cao (2007) “Distributed Software-based Attestation for Node Compromise Detection in Sensor Networks,” inProceedings of 26th IEEE International Symposium on Reliable DistributedSystems (SRDS).
[113] Cagalj, J. Hubaux, and C. Enz, M. (2002) “Minimum-Energy Broad-cast in All wireless Networks: NP-Completeness and Distribution,” ACMMOBICOM’02.
[114] Bloom, B. (1970) “Space/Time Trade-offs in Hash Coding with AllowableErrors,” Communications of the ACM.
[115] “The TinyDB project,” .
[116] Cordone, R. and F. Maffioli (2004) “On the complexity of graph treepartition problems,” Discrete Appl. Math., 134(1-3), pp. 51–65.
[117] “Weidai’s Crypto++ (visited in Jul. 2005),” .
![Page 143: The Pennsylvania State University The Graduate School SECURITY …mcn.cse.psu.edu/paper/thesis/shao-min-thesis08.pdf · 2010-06-18 · and false data injection, because they normally](https://reader035.vdocument.in/reader035/viewer/2022080723/5f7c030c3982b93ec907ed61/html5/thumbnails/143.jpg)
Vita
Min Shao
Min Shao was born in China. She received her B.S. degree in Computer Sciencefrom Tsinghua University, Beijing, China in July 2002. She enrolled in the Ph.D.program in Computer Science and Engineering at The Pennsylvania State Univer-sity in August 2003. She is a student member of IEEE.
PUBLICATIONS
• Min Shao, Yi Yang, Sencun Zhu, and Guohong Cao, “Towards StatisticallyStrong Source Anonymity for Sensor Networks,” IEEE Infocom, 2008.
• Min Shao, Sencun Zhu, Guohong Cao, Tom La Porta and Prasant Mohapa-tra, “A Cross-layer Dropping Attack in Video Streaming over Ad Hoc Net-works,” International Conference on Security and Privacy in CommunicationNetworks (Securecomm), 2008.
• Yi Yang, Min Shao, Sencun Zhu, Bhuvan Urgaonkar, and Guohong Cao,“Towards Event Source Unobservability with Minimum Network Traffic inSensor Networks,” ACM Conference on Wireless Network Security (WiSec),2008.
• Min Shao, Sencun Zhu, Wensheng Zhang, and Guohong Cao, “pDCS: Secu-rity and Privacy Support for Data-Centric Sensor Networks,” IEEE INFO-COM, 2007