towards efficient and lightweight security architecture ... · towards efficient and lightweight...

Towards Efficient and Lightweight Security Architecture for Big

Sensing Data Streams

by

Deepak Puthal

M. Tech. (National Institute of Technology Rourkela)

A thesis submitted to

Faculty of Engineering and Information Technology

University of Technology, Sydney

for the degree of

Doctor of Philosophy

April 2017

i

To my family and friends

ii

CERTIFICATE OF ORIGINAL AUTHORSHIP

I certify that the work in this thesis has not previously been submitted

for a degree nor has it been submitted as part of requirements for a

degree except as fully acknowledged within the text.

I also certify that the thesis has been written by me. Any help that I have

received in my research work and the preparation of the thesis itself has

been acknowledged. In addition, I certify that all information sources

and literature used are indicated in the thesis.

Signature of Student:

Date:

iii

Acknowledgement

I sincerely express my deep gratitude to my principle coordinating supervisor, Prof.

Jinjun Chen, for his experienced supervision and continuous encouragement

throughout my PhD study. And I want to show my most honest appreciation to my

co-supervisors, Dr. Surya Nepal and Dr. Rajiv Ranjan from CSIRO, for their

supervision and encouragement. Without their consistent support and supervision, I

would not have been able to complete this thesis. I express my hearty gratitude to Dr.

Ranjan for his financial support, without him it may have been difficult for me to

travel to Australia for PhD study.

I thank the Commonwealth Scientific and Industrial Research Organisation

(CSIRO) for offering me a full Scholarship throughout my doctoral program. I also

thank University of Technology Sydney (UTS) and the Faculty of Engineering and

IT (FEIT) for providing me an IRS Scholarship throughout my doctoral program.

My thanks also go to staff members, research assistants, previous and current

colleagues, and friends at UTS, and CSIRO for their help, suggestions, friendship

and encouragement; in particular, Dr. Priyadarsi Nanda, Prof. Sean He, Eryani

Tjondrowalujo, Chang Liu, Xuyun Zhang, Chi Yang, Adrian Johannes, Nazanin

Borhan, Ashish Nanda, Jongkil Kim, Nan Li, Danan Thilakanathan, Mian Ahmed

Jan, and Usman Khan.

Last but not least, I am deeply grateful to my parents Karitk Ch. Puthal,

Shakuntala Puthal, my brother, sisters and brothers-in-law for supporting me to

study abroad, understanding, encouragement and help. Most importantly, I would

like to sincerely express the deepest gratitude to almighty god.

iv

Abstract

A large number of mission critical applications from disaster management to health

monitoring are contributing to the Internet of Things (IoT) by deploying a number of

smart sensing devices in a heterogeneous environment. Resource constrained

sensing devices are being used widely to build and deploy self-organising wireless

sensor networks for a variety of critical applications. Many such devices sense the

deployed environment and generate a variety of data and send them to the server for

analysis as data streams. The key requirement of such applications is the need for

near real-time stream data processing in large scale sensing networks. This trend

gives birth to an area called big sensing data streams. One of the key problems in big

data is to ensure end-to-end security where a Data Stream Manager (DSM) must

always verify the security of the data before executing a query to ensure data

security (i.e., confidentiality, integrity, authenticity, availability and freshness) as the

medium of communication is untrusted. A malicious adversary may access or

tamper with the data in transit. One of the challenging tasks in such applications is to

ensure the trustworthiness of collected data so that any decisions are made on the

correct data, followed by protecting the data streams from information leakage and

unauthorised access. This thesis considers end-to-end means from source sensors to

cloud data centre. Although some security issues are not new, the situation is

aggravated due to the features of the five Vs of big sensing data streams: Volume,

Velocity, Variety, Veracity and Value. Therefore, it is still a significant challenge to

achieve data security in big sensing data streams. Providing data security for big

sensing data streams in the context of near real time analytics is a challenging

problem.

v

This thesis mainly investigates the problems and security issues of big sensing

data streams from the perspectives of efficient and lightweight processing. The big

data streams computing advantages including real-time processing in efficient and

lightweight fashion are exploited to address the problem, aiming at gaining high

scalability and effectiveness. Specifically, the thesis examines three major properties

in the lifecycle of security in big data streams environments. The three properties

include authenticity, integrity and confidentiality also known as the AIC triad, which

is different to CIA triad used in general data security. Accordingly, a lightweight

security framework is proposed to maintain data integrity and a selective encryption

technique to maintain data confidentiality over big sensing data streams. These

solutions provide data security from source sensing devices to the processing layer

of cloud data centre. The thesis also explore a further proposal on a lattice based

information flow control model to protect data against information leakage and

unauthorised access after performing the security verification at DSM. By

integrating the access control model, this thesis provides an end-to-end security of

big sensing data streams i.e. source sensing device to the cloud data centre

processing layer. This thesis demonstrates that our solutions not only strengthen the

data security but also significantly improve the performance and efficiency of big

sensing data streams compared with existing approaches.

vi

The Author’s Publications

So far, I have published nine refereed papers including one book chapter, one IEEE

magazine, one ERA ranked A*1 journal paper, one ERA ranked A journal paper,

three ERA ranked A conference papers and one ERA ranked B conference paper and

other papers. The publications as well as one paper that is under review are listed

below in detail. The impact factor (IF)2 of each journal paper is also stated.

Book Chapter:

1. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "End-to- End

Security Framework for Big Sensing Data Streams." in Big Data Management,

Architecture, and Processing, CRC Press, to be published 2017.

Journal Articles:

2. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Dynamic

Prime Number Based Efficient Security Mechanism for Big Sensing Data

Streams." Journal of Computer and System Sciences (JCSS). Vol. 83(1), pp. 22-

42, 2017. (A*, IF: 1.583)

3. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "DLSeF: A

Dynamic Key Length based Efficient Real-Time Security Verification Model for

Big Data Streams." ACM Transactions on Embedded Computing Systems 1 ERA ranking is a ranking framework for publications in Australia. Refer to http://www.arc.gov.au/ era/era_2010/archive/era_journal_list.htm for detailed ranking tiers. The 2010 version is used herein. For journal papers: A* (top 5%); A (next 15%). For conference papers (no A* rank): A (top 20%). 2 IF: Impact Factor. Refer to http://wokinfo.com/essays/impact-factor/ for details and query.

vii

(TECS), Vol. 16(2), pp. 51:1-51:24, 2016. (A*, IF: 1.19)

4. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "Threats to

Networking Cloud and Edge Datacenters in the Internet of Things." IEEE Cloud

Computing. Vol. 3(3), pp. 64-71, 2016.

5. Deepak Puthal, Surya Nepal, Rajiv Ranjan, Xindong Wu, and Jinjun Chen.

"SEEN: A Selective Encryption Method to Ensure Confidentiality for Big

Sensing Data Streams." IEEE Transactions on Big Data (TBD), Minor revision,

February 2017.

Conference Papers:

6. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Synchronized

Shared Key Generation Method for Maintaining End-to-End Security of Big

Data Streams." in 50th Hawaii International Conference on System Sciences

(HICSS-50), Hawaii, USA. 2017. (A)

7. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "IoT and Big Data:

An architecture With Data Flow and Security Issues." in 2nd international

conference on Cloud, Networking for IoT Systems (CN$IoT), Brindisi, Italy,

2017.

8. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Secure Big

Data Streams Analytics Framework for Disaster Management on Cloud." in 18th

IEEE International Conferences on High Performance Computing and

Communications (HPCC 2016), Sydney, Australia. 2016 (B)

9. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "A Dynamic Key

Length based Approach for Real-Time Security Verification of Big Sensing Data

Streams." in 16th International Conference on Web Information System

Engineering (WISE 2015), Miami, Florida, USA. 2015. (A)

10. Deepak Puthal, Surya Nepal, Rajiv Ranjan, and Jinjun Chen. "DPBSV – An

Efficient and Secure Scheme for Big Sensing Data Streams." in 14th IEEE

International Conference on Trust, Security and Privacy in Computing and

Communications (IEEE TrustCom-15), Helsinki, Finland. 2015. (A)

viii

11. Deepak Puthal, Surya Nepal, Cecile Paris, Rajiv Ranjan, and Jinjun Chen.

"Efficient Algorithms for Social Networks Coverage and Reach." in IEEE

BigData Congress, New York, USA, 2015.

ix

Table of Contents

Figures xiii

Tables xv

Algorithms xvi

Chapter 1 Introduction 1 1.1 Background ············································································ 1

1.1.1 Big Data with Security Issues ············································· 3

1.1.2 Cloud Computing ··························································· 5

1.2 Motivation: Securing Big Sensing Data Streams ································ 6

1.3 Overview of the Work ······························································· 9

1.3.1 Methodology ································································· 9

1.3.2 Contributions ······························································ 11

1.4 Thesis Organisation ································································ 13

Chapter 2 Background Studies and Related Work 15 2.1 General Research Trend ··························································· 15

2.2 Review of Reviews ································································· 17

2.2.1 Data Centre Security ······················································ 17

2.2.2 Network Security ·························································· 19

2.2.3 IoT Security ································································ 21

2.3 IoT Generated Data Stream Architecture ······································· 23

2.3.1 IoT Architecture ··························································· 23

2.3.2 Security Threats of Each Layer ········································· 28

2.4 Big Data Stream Security ························································· 38

2.4.1 Security Requirements ··················································· 40

2.4.2 CIA Triad Properties ····················································· 41

x

2.4.3 Confidentiality of Big Data Streams ··································· 42

2.4.4 Integrity of Big Data Streams ··········································· 45

2.4.5 Availability of Big Data Streams ······································· 51

2.5 Comparision ········································································· 56

2.6 Summary ············································································ 59

Chapter 3 Security Verification Framework for Big Sensing Data Streams 61 3.1 Introduction ········································································· 62

3.2 Preliminaries to the Chapter ······················································ 64

3.3 Research Challenges and Research Motivation ································ 65

3.3.1 Research Challenges ······················································ 66

3.3.2 Research Motivation ······················································ 67

3.4 Dynamic Prime-Number Based Security Verification ························ 70

3.4.1 DPBSV System Setup ···················································· 70

3.4.2 DPBSV Handshaking ···················································· 72

3.4.3 DPBSV Rekeying ························································· 72

3.4.4 DPBSV Security Verification ··········································· 74

3.5 Security Analysis ··································································· 76

3.5.1 Security Proof ····························································· 76

3.5.2 Forward Secrecy ·························································· 81

3.6 Experiment and Evaluation ······················································· 81

3.6.1 Sensor Node Performance ··············································· 82

3.6.2 Security Verification······················································ 83

3.6.3 Performance Comparision ··············································· 86

3.6.4 Required Buffer Size ····················································· 87

3.7 Summary ············································································· 88

Chapter 4 Lighweight Security Protocol for Big Sensing Data streams 89 4.1 Introduction ········································································· 89

4.2 Preliminaries to the Chapter ······················································ 92

4.3 Research Challenges and Research Motivation ································ 94

4.3.1 Research Challenges ······················································ 94

4.3.2 Research Motivation ······················································ 96

4.4 DLSeF Lightweight Security Protocol ·········································· 96

4.4.1 DLSeF System Setup ····················································· 97

4.4.2 DLSeF Handshaking ···················································· 100

xi

4.4.3 DLSeF Rekeying ························································ 101

4.4.4 DLSeF Key Synchronisation ··········································· 104

4.4.5 DLSeF Security Verification ·········································· 108

4.5 Security Analysis ·································································· 110

4.5.1 Security Proof ···························································· 111

4.6 Experiment and Evaluation ······················································ 115

4.6.1 Sensor Node Performance ·············································· 115

4.6.2 Security Verification····················································· 117

4.6.3 Performance Comparision ·············································· 120

4.6.4 Required Buffer Size ···················································· 121

4.7 Summary ············································································ 123

Chapter 5 Seletive Encryption Method to ensure Confidentiality of Big Sensing Data Streams 124

5.1 Introduction ········································································ 125

5.2 Design Consideration ····························································· 127

5.2.1 System Architecture ····················································· 128

5.2.2 Adversary Model ························································· 130

5.2.3 Attack Model ····························································· 131

5.3 Research Challenges and Research Motivation ······························· 132

5.3.1 Research Challenges ····················································· 132

5.3.2 Research Motivation ····················································· 134

5.4 Selective Encryption Method for Big Data Streams ························· 135

5.4.1 Initial System Setup ····················································· 136

5.4.2 Rekeying ·································································· 138

5.4.3 New Node Authentication ·············································· 139

5.4.4 Reconfiguration ·························································· 141

5.4.5 Enctyption/Decryption ·················································· 142

5.4.6 Tradeoffs ·································································· 143

5.4.7 Requirement Resources for SEEN····································· 144

5.5 Theoritical Analysis ······························································· 147

5.5.1 Security Proof ···························································· 147

5.5.2 Forward Secrecy ························································· 150

5.6 Experimental and Evaluation ···················································· 150

5.6.1 Security Verification····················································· 151

xii

5.6.2 Performance Comparision ·············································· 153

5.6.3 Required Buffer Size ···················································· 154

5.6.4 Network Performance ··················································· 155

5.7 Summary ············································································ 157

Chapter 6 Access Control Framework for Big Sensing Data streams 158 6.1 Introduction ········································································ 158

6.2 Background Studies ······························································· 161

6.2.1 Stream Processing························································ 161

6.2.2 Stream Security ··························································· 162

6.2.3 Chinese Wall Policy ····················································· 163

6.3 Design Consideration ····························································· 163

6.3.1 System Architecture ····················································· 163

6.3.2 Defination ································································· 166

6.3.3 QoS Requirements ······················································· 167

6.3.4 Adversary Model ························································· 169

6.4 Access Control Model ···························································· 170

6.5 Experimental Evaluation ························································· 173

6.5.1 System Setup ····························································· 173

6.5.2 Results Discussion ······················································· 175

6.6 Summary ············································································ 176

Chapter 7 Conclusion and Future Work 177 7.1 Conclusion ·········································································· 177

7.2 Future Work ········································································ 181

Bibliography 183

xiii

Figures

Figure 1-1 Typical Lifecycle of Security Framework for Big Sensing Data Streams 6

Figure 2-1 Cloud computing security architecture ······································ 19

Figure 2-2 Layer wise IoT Security architecture ········································· 22

Figure 2-3 layer wise IoT architecture from IoT device to cloud data centre ······· 26

Figure 2-4 Communication protocol in IoT ·············································· 28

Figure 2-5 Cloud computing security threats, attacks and vulnerabilities············ 38

Figure 2-6 CIA triad of data security either data in transit or in rest ·················· 41

Figure 3-1 A simplified view of a DSMS to process and analyse input data stream 62

Figure 3-2 Overlay of our architecture from sensing device to data centre ·········· 65

Figure 3-3 Pair of dynamic relative prime number generation ························ 68

Figure 3-4 The sensors used for experiment ·············································· 81

Figure 3-5 Estimated power consumption during the key generation process ······ 83

Figure 3-6 Scyther simulation environment result page ································ 84

Figure 3-7 Performance of the security scheme comparision ·························· 85

Figure 3-8 Performance comparison of minimum buffer size required ·············· 87

Figure 4-1 High level of architecture from source sensing device to big data

processing centre ······························································· 93

Figure 4-2 Secure authentication of Sensor and DSM ································· 100

Figure 4-3 Neighbour node discovered to get the key generation properties ······· 105

Figure 4-4 Neighbour discovery with all possible conditions ························ 107

Figure 4-5 Performance computation of two different sensors ······················· 116

Figure 4-6 Energy consumption by using COOJA in Contiki OS ···················· 116

Figure 4-7 Scyther simulation environment result page ······························· 118

xiv

Figure 4-8 Security verification results of Scyther during neighbour authentication

··················································································· 119

Figure 4-9 Performance comparison ······················································ 121

Figure 4-11 Efficiency comparison of minimum buffer size required to process · 121

Figure 5-1 High level architectural diagram for SEEN protocol ····················· 130

Figure 5-2 Initial authentication methods with 4 steps process ······················· 138

Figure 5-3 Key Selection ··································································· 139

Figure 5-4 Shared key management for robust clock skew ··························· 140

Figure 5-5 Method to the data sensitivity level ········································· 141

Figure 5-6 Scyther simulation result page of security verification ··················· 152

Figure 5-7 Performance comparison SEEN method ··································· 153

Figure 5-8 Efficiency comparison by comparing required buffer size ·············· 154

Figure 5-9 Energy consumption ··························································· 155

Figure 6-1 Overview of access control of big data streams using lattice model ··· 166

Figure 6-2 Lattice model for data access ················································· 171

Figure 6-3 Experiment Setups ····························································· 172

Figure 6-4 Mapping time for HT Sensor Dataset ······································· 173

Figure 6-5 Mapping time for Twin Gas Sensor Dataset ······························· 174

xv

Tables

Table 2-1 Network layer security threats ················································· 31

Table 2-2 Possible threats of IoT generated Big Dat streams in CIA triad

reprentation······································································ 57

Table 2-3 Comparison of IoT generated big data stream security threats and

solutions according to CIA triad method ··································· 58

Table 3-1 DPBSV Notations ································································ 69

Table 3-2 Notations Symmetric key (AES) algorithm takes time to get all possible

keys using most advanced Intel i7 Processor ······························ 77

Table 4-1 Notations used in this DLSeF model ·········································· 98

Table 5-1 SEEN Notations ································································· 135

Table 5-2 Performance and Properties of Security Solutions ························· 156

Table 5-3 Communication overhead of SEEN protocol ······························· 156

Table 6-1 Machine specification ·························································· 174

Table 6-2 Dataset information ····························································· 174

xvi

Algorithms

Algorithm 3-1 Security Framework for Big Sensing Data Stream ···················· 74

Algorithm 3-2 Dynamic Prime Number Generation ···································· 78

Algorithm 4-1 Synchronisation of Dynamic Key Length Generation ··············· 102

Algorithm 4-2 Key Generation (Rekeying) Process ···································· 107

Algorithm 4-3 Lightweight Security Protocol for Big Sensing Data Stream ······· 109

Algorithm 5-1 Rekeying process ·························································· 140

Algorithm 5-2 Selective encryption method for big sensor data streams ··········· 145

1

Chapter 1

Introduction

This chapter mainly introduces the research background and motivation, as well as a

brief summary of the work. Specifically, Section 1.1 briefly introduces the notions

of cloud computing and big data as research background knowledge. To motivate

the research, Section 1.2 provides a motivation to securing big sensing data streams.

Section 1.3 summarises the work and outlines its contribution. Finally, Section 1.4

presents the organisation of the thesis.

1.1 Background

Nowadays, we have entered into a big data era of petabytes. Big data is

widespread in both industry and scientific research applications where the data is

generated with high Volume, Velocity, Variety, Veracity and Value. It is difficult to

process using existing database management tools or traditional data processing

applications. Big data sets can come from many areas, including meteorology,

connectomics, complex physics simulations, genomics, biological study, gene

analysis and environmental research [1-2]. According to literature [1-2], since the

1980s, generated data doubles its size in every 40 months all over the world. For

example, in the year of 2012, there were 2.5 quintillion (2.5×1018) bytes of data

being generated every day. Currently, the data size is measured with exabyte, and in

the year 2015, there were around 10,000 exabytes of digital data being generated.

Following that digital data explosion, the size of big data is expected to surpass

2

40,000 exabytes in the year 2020 [1-3]. Hence, how to process big data has become

a fundamental and critical challenge for modern society. More and more research

interest and effort have been noticed under the theme of big data and its related

issues. In this thesis, research is concentrated on the data security and access control

[4-9] technologies for big datasets from modern sensing systems. Big data streams is

even gaining more research attention in recent days. Lots of applications including

healthcare, military, natural disaster etc. require stream data analysis to detect events.

There are several existing security solutions to protect data which can be classified

into two classes: Communication Security [10-12] and Server side data security [13-

16]. Communication security solutions protect data when data is in motion whereas

the server side security solutions protect data when it is at rest. Communication

security is primarily proposed to protect from network and communication related

potential attacks. The communication attacks are broadly divided into two parts i.e.

external attack and internal attack. In-order to avoid these potential attacks, security

solutions have been proposed for each TCP/IP layer. Server side data security is

mainly proposed for physical data centres, when data is at rest and accessed through

applications. There are several solutions proposed to secure the data for both data

communication and data stored in a server, but those are not necessarily applicable

for a big data stream environment.

In addition to the above two approaches, there is also a need to address security

aspects of big data streams. Stream data processing is a generally process to take

quick decisions or save lives in several critical applications as stated above. In such

situations, it is an important task to protect big data streams before they are evaluated

and control the access to authorised users and query processors only. Another major

motivation is to perform the security verification on near real time in-order to

synchronise with the processing speed of Stream Processing Engines (SPEs) [43].

Stream data analysis performance should not degrade because of the security

processing time, and there are several applications where we need to perform data

analysis in real time. According to the features of a big data stream (i.e. 4Vs) existing

security solutions need a huge buffer size to process security verification.

Cloud computing and big data, two disruptive trends at present, offer a large

number of business and research opportunities, and likewise pose heavy challenges

for the current information technology (IT) industry and research communities [17-

3

18]. The next section briefly introduces the notions of security issues in big data and

cloud computing.

1.1.1 Big Data with Security Issues

Big Data refers to the inability of traditional data architectures to efficiently handle

the new datasets [20]. Big data collection in applications has been growing

tremendously and getting increasingly complicated so that traditional data

processing tools are incapable of handling the data processing pipeline including

collection, storage, processing, mining, sharing, etc. within a tolerable elapsed time.

Big data is characterised by the broadly recognised 3Vs proposed by Douglas

Laney1: Volume, Velocity and Variety, and Veracity2. The 4Vs are detailed as

follows.

Volume (i.e. the size of the dataset): It refers to the huge amount of data

generated every second. About 90% of the world’s data has been generated

in the last two years. The high volume of data being generated and collected

daily creates an immediate challenge for real time processing in several

applications. The typical examples include emails, social media messages,

photos, video clips and sensing data that we produce and share every second.

This data explosion makes datasets too large to store and analyse using

traditional database technology.

Security: Retargeting traditional relational database security to non-

relational databases has been a challenge. An emergent phenomenon

introduced by Big Data variety that has gained considerable

importance is the ability to infer identity from anonymized datasets

by correlating with apparently innocuous public databases.

Velocity (i.e. rate of flow): It refers to the speed at which new data is

generated and the speed at which data moves to cloud data centre. The New

York Stock Exchange captures about 1 terabyte of trade information daily.

Reacting fast enough and analysing the streaming data is critical to

1 Douglas Laney, “3D Data Management: Controlling Data Volume, Velocity and Variety”,

Application Delivery Strategies, Gartner, February 2001 2 http://www.villanovau.com/university-online-programs/what-is-big-data/, accessed April, 2015

4

businesses, with speeds and peak periods often inconsistent. Big data

filtering technology should be able to analyse the data without accessing

traditional databases.

Security: As with non-relational databases, distributed programming

frameworks such as Hadoop were not developed with security as a

primary objective.

Variety (i.e. data from multiple repositories, domains, or types): It refers

to the different types of data that could be encountered. In the past we

focused on structured data that neatly fits into tables or relational databases.

In fact, 80% of the world’s data is unstructured, even heterogeneous data.

Therefore it cannot simply be put into tables or relational databases. For

example, big data sets can have data from images, graphs, video sequences

or social media updates at the same time. With big data technology we

should harness different types of data including messages, social media

conversations, photos, sensor data, video/voice data together.

Security: The volume of Big Data has necessitated storage in multi-

tiered storage media. The movement of data between tiers has led to a

requirement of systematically analyzing the threat models and

research and development of novel techniques.

Veracity: It refers to the data uncertainty and impreciseness. With many

types of big data, quality and accuracy are less controllable. For example,

twitter posts with hashtags, abbreviations, typos and colloquial speech. Big

data and analytics technology should allow people to work with all these

types of data. The volumes often make up for the lack of quality or accuracy.

Data stream: Uninterrupted flow of a long sequence of data, such as in

audio and video data files.

Big data is not only about the data characteristics themselves, but also about a whole

new big data technology architecture including new storage, computation models

and analytic tools, in search of appropriate problems in big data applications to solve.

Advances in big data storage, processing and analysis include new parallel and

distributed computing paradigms such as the Apache Hadoop ecosystem. These

technologies rapidly evolving in current days.

5

1.1.2 Cloud Computing

Nowadays, cloud computing is one of the most hyped IT innovations, providing a

new way of delivering computing resources and services and having sparked plenty

of interest in both the IT industry and academic research communities. Recently, IT

giants such as Amazon, Google, IBM and Microsoft have invested huge sums of

money in building up their public cloud products, and indeed they have developed

their own cloud products, e.g., Amazon’s Web Services3, Google Compute4, IBM

Cloud5 and Microsoft’s Azure6. Several corresponding open source cloud computing

solutions have also been developed, like Eucalyptus 7 , OpenStack 8 and Apache

Hadoop9. The core technologies that cloud computing is principally built on include

web service technologies and standards, virtualization, novel distributed

programming models like MapReduce [21], and cryptography.

The cloud computing definition published by the U.S. National Institute of

Standards and Technology (NIST) comprehensively covers the commonly agreed

aspects of cloud computing [22]. Accordingly, cloud computing is defined as a

model for enabling convenient, on-demand network access to a shared pool of

configurable computing resources (e.g. networks, servers, storage, applications,

services) that can be rapidly provisioned and released with the minimal management

effort or interactions with service providers. In terms of the definition, the cloud

model consists of five essential characteristics, three service delivery models and

four deployment models. Specifically, the five key features encompass on-demand

self-service, broad network access, resource pooling (multi-tenancy), rapid elasticity

and measured services. The three service delivery models are Cloud Software as a

Service (SaaS), e.g. Google Docs10, Cloud Platform as a Service (PaaS), e.g. Google

App Engine11, and Cloud Infrastructure as a Service (IaaS), e.g. Amazon EC2 and

S3 cloud services. The four deployment models include private cloud, community

3 http://aws.amazon.com/, accessed April, 2015 4 https://cloud.google.com/, accessed April, 2015 5 http://www.ibm.com/cloud-computing/au/en/, accessed April, 2015 6 http://www.azure.microsoft.com/en-us/, accessed April, 2015 7 https://www.eucalyptus.com/, accessed April, 2015 8 https://www.openstack.org/, accessed April, 2015 9 http://hadoop.apache.org/, accessed April, 2015 10 https://docs.google.com/, accessed April, 2015 11 https://appengine.google.com/, accessed April, 2015

6

cloud, public cloud and hybrid cloud, where hybrid cloud can contain the other three

types of cloud.

1.2 Motivation: Securing Big Sensing Data Streams

Data Stream Management Systems have been increasingly used to support a

wide range of real-time applications such as military applications and network

monitoring, battlefield, sensor networks, health monitoring, and financial monitoring

[23]. These applications need real-time processing of data streams, where the

application of the traditional “store-and-process” method is limited [24]. Most of the

above applications need to protect sensitive data from unauthorised accesses. For

example, in battlefield monitoring, the position of soldiers should only be accessible

to the battleground commanders. Even if the data is not sensitive, it may still be of

commercial value to restrict their accesses. Another example is real-time health

monitoring applications. Here privacy protection of personal health data is crucial. A

patient may be living at home with a monitoring device attached to him, which can

detect early health abnormalities and transmit alert signals to relevant personnel.

However, the patient may prefer only certain users, such as his doctor or a nurse, to

Figure 1-1: Typical Lifecycle of Security Framework for Big Sensing Data

Streams

7

have access to his streaming data and prevent access to any third-parties (e.g.

insurance companies or other hospitals). Only if his vital signs go far above the

norm and he is in imminent danger, needing urgent care, would the closest hospital

gain access to his streaming data. As a result, a new security verification module

needs to be developed to clean and drop modified/unwanted data before data streams

are evaluated in SPEs.

SPEs deal with the specific types of challenges and are intended to process data

streams with a minimal delay [23, 25-27]. In SPEs, data streams are processed in real

time (i.e. on-the-fly) rather than batch processing after storing the data in the cloud as

shown in Figure 1-1. The above specified applications require real-time processing of

very high-volume and high-velocity data streams (also known as big data streams).

The complexity of big data streams is defined through 4Vs (i.e. volume, variety,

velocity, and veracity). These features introduce huge open doors and enormous

difficulties for big data stream computing. A big data stream is continuous in nature

and it is important to perform real-time analysis as the lifetime of the data is often

very short (data is accessed only once) [28-29]. As the volume and velocity of the

data is so high, there is not enough space to store and process; hence, the traditional

batch computing model is not suitable. Cloud computing has become a platform of

choice due to its extremely low-latency and massively parallel processing

architecture [30]. It supports the most efficient way to obtain actionable information

from big data streams [28, 31-33].

Big data stream processing has become an important research topics in the current

era, whereas the data stream security has received little attention from researchers.

Some of these data streams are analysed and used in very critical applications (e.g.

surveillance data, health monitoring, military applications), where data streams need

to be secured in every aspect to detect malicious activity. The problem is exacerbated

when thousands to millions of small sensors in self-organising wireless networks

become the sources of the data stream. How can we provide security for big data

streams? In addition, compared to conventional store-and-process, these sensors will

have limited processing power, storage, bandwidth, and energy. Furthermore, data

streams ought to be processed on-the-fly in a prescribed sequence. This chapter

addresses these issues by designing an efficient architecture for real-time processing

of big sensing data streams, and the corresponding security scheme.

8

Streaming data security can be broadly divided into two types of security

punctuations: (i) the “data security punctuations” (dsps) describing the data-side

security, and (ii) the “query security punctuations” (qsps) representing the query-side

security [82]. We introduced a new module called DSM (Data Stream Manager),

where we perform security verifications of data streams for dsps before data analysis

and followed by qsps for secure query processing.

One of the security threats is the man-in-the-middle attack, in which a malicious

attacker can access or modify the data stream from sensors. This situation arises as it

is not possible to monitor a large number of sensors deployed in the untrusted

environment. We need to maintain an end-to-end security. The common approach is

to apply a cryptographic model. Keeping data encrypted is the most common and

safe choice to secure data in transmission, if encryption keys are managed properly.

There are two common types of cryptographic encryption methods: asymmetric and

symmetric. Asymmetric-key encryption algorithms (e.g. RSA, ElGamal, DSS, YAK,

Rabin) perform a number of exponential operations over a large finite field.

Therefore, they are 1000 times slower than symmetric key cryptography [34-35].

Efficiency becomes an issue if asymmetric-key cryptology based infrastructure such

as the Public-Key Infrastructure PKI [36-37] is applied to big data streams. Thus,

symmetric-key encryption is the most efficient cryptographic solution for such

applications. However, symmetric-key algorithms (e.g. DES, AES, IDEA, RC4) fail

to meet the requirements of real-time, on-the-fly processing of big data streams and

synchronise the speed of recent advanced stream processing engines. Hence, there is

a need for an efficient scheme for securing big data streams. The possible types of

attacks in big data streams are attacks on authenticity, confidentiality, integrity and

availability.

Another important issue is to maintain data privacy or data confidentiality over big

sensing data streams. While protecting data, we always need strong encryption and

more processing power to protect big data streams for a longer time. All data do not

carry the same sensitivity level to require application of strong encryption, because

strong encryption takes longer to process security verification. So it is really

important to find the data sensitivity level prior to applying encryption or decryption

techniques. In this way, we can provide strong encryption for high sensitivity data

and weak encryption for low sensitivity data. So it is always challenging task to

9

identify the data sensitive level in big data streams. Controlling access from

unauthorised users or query processors is also a challenging task in big data stream

environments. There is a high chance of information leakage while giving access for

data processing or evaluation. Access control over big data streams is gaining lots of

interest from a global perspective.

This thesis addresses the solutions for different types of threats and attacks

between Chapters 3 and 6.

1.3 Overview of the Work

This section highlights the overview of our architecture with research

methodology followed by research contributions.

1.3.1 Methodology

In order to address the challenges mentioned above in an organised and

comprehensive manner, we propose to investigate the whole lifecycle of a security

framework for big sensing data streams or securing end-to-end security of big

sensing data streams i.e. from source sensors to cloud data centre. The research

problems in each phase of the lifecycle are identified and corresponding solutions

are put forth. A typical lifecycle is shown in Figure 1-1. Brief descriptions about the

lifecycle of big sensing data streams and its security aspects follow.

We have divided the complete architecture into five different phases of life cycle

such as collection, evaluation, coalition, analysis, and dissemination. We describe the

complete architecture with these five different standard steps from Figure 1-1. Figure

1-1 shows the complete architecture of a big data stream starting from source sensors

up to DSM for a security framework and followed by access control for query

processor and end user. Data transfer to stream, clustering and Bayesian network are

standard data processing, but we did not address all these in our architecture. These

five steps are defined as follows.

Collection: Data collected from different sources such as sensors, for data

analysis and event detection at cloud data centre.

10

Evaluation: Stream data verified for security evaluation to maintain the

originality of the data and go for online stream query processing.

Collation: Evaluate data from different DSMs and aggregate for access to

query processor or end user. Data also move to the cloud for batch

processing after this step.

Analysis: Analyse the data to find its sensitivity level and accordingly give

access to the query processor. Followed by query processor analysis of data

streams for event detection.

Dissemination: This step is the output of the data analysis and distributes

emergency alert messages if necessary. We have not highlighted the

dissemination in our architecture because this falls outside the scope of our

work.

We have described the complete architecture by considering the above five steps

as follows. The description starts with data collection and ends with alert

dissemination. The proposed architecture may be applicable for different applications

though our description is based on a disaster management application.

In the collection step, data are collected from various sources for analysis and

event detection. In our architecture we considered sensors as the source to our data

streams. These collected data streams move to the STREAM collection system [23]

after security verification at DSM. In the evaluation step, there are always two types

of evaluation process in big data: batch processing and stream processing. In this

thesis, we focus on stream processing to detect emergency events in real-time. In the

evaluation step, we address the security evaluation before data analysis. Generally

sources use an untrusted medium to transfer sensed data to the cloud for

evaluation/analysis. So security verification is one of the important features that need

to be addressed on big data streams to filter out unwanted and modified data. DSM

processes data streams on-the-fly. DSM is designed to handle high-volume and

bursty data streams with a large number of complex continuous queries. In the

collation step, evaluated data from DSM are further processed for access control. In

the analysis step, data are structured based on sensitivity level and mapped to the

respective query processor and end user. The access control mechanism protects data

from unauthorised access and information leakage. Nowadays data sources generate

11

terabytes to petabytes of data on a daily basis [41]. Given the volume of data being

generated, real-time computation has become a major challenge. A scalable real-time

computation system that we have used effectively is the open-source Apache Storm

tool, which was developed at Twitter and is sometimes referred to as “real-time

Hadoop”. In the dissemination phase, alert messages are disseminated after data are

evaluated from stream data processing.

1.3.2 Contributions

The main contributions of this thesis are in four contributions chapters. Firstly, a

security solution is proposed for big sensing data streams, which will be

synchronised with the speed and performance of stream processing engine at cloud

data centre. Secondly, a more efficient security solution is proposed by proposing 2-

dimensional security, where it will be more difficult for an attacker or intruder to

guess the secret shared key and followed by synchronisation technique to get key

generation properties from its neighbours without communicating with DSM.

Thirdly, a selective encryption technique is proposed to protect data streams based

on data sensitivity level. Finally, an access control technique is proposed to give

access to big data streams to only authorized and authenticated query processor or

end user. The framework and its implementation have been reported in the author’s

publications (see the section The Author’s Publications for details).

A large number of mission critical applications ranging from disaster

management to smart cities are built on the Internet of Things (IoT) platform

by deploying a number of smart sensors in a heterogeneous environment. The

key requirement of such applications is the need for near real-time stream data

processing in large scale sensing networks. This trend gives birth to an area

called big data stream. The security aspects in big data streams is a very

challenging task because of the 4Vs properties. And because of these

properties, we cannot apply existing security solutions. So we propose a new

security solution by updating dynamic prime number at both source and DSM.

This scheme is based on a common shared key that is updated dynamically

without further communication after the handshaking process. Moreover, the

proposed security mechanism not only reduces the verification time or buffer

12

size in DSM, but also strengthens the security of the data by constantly

changing the shared keys. The results of this chapter have been reported in the

author’s publications 1, 2 and 10.

One of the key problems in big data stream is to ensure end-to-end security.

We refer to this as an online security verification problem. To address this

problem, we propose a Dynamic Key Length Based Security solution based

on a shared key derived from synchronised prime numbers; the key is

dynamically updated at short intervals to thwart potential attacks to ensure

end-to-end security. One of the major shortcomings of these methods is that

they assume synchronisation of the shared key. Later on we also solve the

synchronisation problem and integrate with the main security framework.

The results of this chapter have been reported in the author’s publications 3,

6 and 9.

Many sensing devices are deployed in the environment to generate a variety

of data and send them to the server for analysis as data streams. A DSM at

the server collects the data streams (often called big data) to perform real

time analysis and decision-making for these critical applications. To ensure

the confidentiality of collected data, we need to prevent sensitive information

from reaching the wrong people by ensuring that the right people are getting

it. So we proposed a Selective Encryption method to secure big sensing data

streams that satisfies the desired multiple levels of confidentiality and

integrity. This method protect data against several attacks based on its

sensitive level. The results of this chapter have been reported in the author’s

publications 4 and 5.

Another important step is to control the information leakage of big data

streams. We refer to this as an access control or information flow control

problem over big sensing data streams. To address this, we propose a lattice

based information flow control over big sensing data streams. We consider

static lattices to process the information flow model faster, because we are

dealing with big data streams i.e. high volume and velocity of data streams.

The results of this chapter have been reported in the author’s publication 7, 8

13

and another paper is about lattice based secure information flow control in

big sensing data streams under preparation to submit.

1.4 Thesis Organisation

The rest of this thesis is organised as follows:

Chapter 2 provides the basic background knowledge relevant to our research

to facilitate the discussion, including IoT basics, big data stream models,

security models and stream data processing basics. This is followed by an in-

depth literature review of the state-of-the-art techniques in security issues in

the big data context. We divided the security reviews in basic CIA triad to

classify properly. We also surveyed recent developments of the security

processing techniques with their performance speed. Finally, we compared

existing security solutions to define the need for a security protocol for big

sensing data streams.

Chapter 3 investigates the problem of how to achieve an efficient security

scheme for big sensing data streams to synchronise the speed with a stream

processing engine. We proposed a Dynamic Prime Number Based Security

Verification (DPBSV) scheme for big data streams. Our scheme is based on

a common shared key that is updated dynamically by generating

synchronised prime numbers.

Chapter 4 explores the problem of achieving a scalable and secure solution

by proposing a multidimensional security solution. We propose a Dynamic

Key Length Based Security Framework (DLSeF) based on a shared key

derived from synchronised prime numbers; the key is dynamically updated at

short intervals to thwart potential attacks to ensure end-to-end security.

There follows a proposed synchronisation technique to get the

synchronisation properties from source node neighbours without further

contacting DSM.

Chapter 5 studies the problem of how to achieve data confidentiality and

integrity based on the data sensitivity level in big sensing data streams. We

14

propose a Selective Encryption (SEEN) method to secure big sensing data

streams that satisfies the desired multiple levels of confidentiality and

integrity. Our method is based on two key concepts: common shared keys

that are initialised and updated by DSM without requiring retransmission,

and a seamless key refreshment process without interrupting the data stream

encryption/decryption.

Chapter 6 presents the problem of control, the unauthorised access and

information flow of big sensing data streams. We propose a lattice based

information flow control over big sensing data streams. We consider static

lattices to process the information flow model in faster way, because we are

dealing with big data streams i.e. high volume and rapid arrival rate.

Chapter 7 concludes the thesis and points out future work.

15

Chapter 2

Background Studies and Related Work

The security concerns in big sensing data streams have drawn quite a lot attention

from research communities, but much less work has already been done in this area.

This chapter presents an in-depth literature review on existing work related to our

research. Section 2.1 reviews general research trends in terms of IoT and IoT

generated big sensing data streams. Furthermore, we defined the review of reviews

in Section 2.2 where we review the data centre security, network related data

security and IoT security. Then in Section 2.3, we define IoT generated big sensing

data streams. Here the complete architecture is divided in to different layers in terms

of IoT and each layer properties are highlighted followed by security issues and

solutions. Moreover, the security issues and existing solutions reviewed in Section

2.4 can be applied in big data stream properties. The existing solutions with their

associated properties are classified in Section 2.5. Finally, Section 2.6 summarises

this chapter.

2.1 General Research Trend

The Internet of Things (IoT) is a widely used expression, although still a fuzzy one,

mostly due to the large amount of concepts it encompasses. The IoT materializes a

vision of a future source of data where any sensing device possessing computing and

sensorial capabilities is able to communicate with other devices using Internet

16

communication protocols, in the context of sensing applications. Many such

applications are expected to employ a large amount of sensing and actuating devices,

and in consequence its cost will be an important factor. On the other hand, cost

restrictions dictate constraints in terms of the resources available in sensing

platforms, such as memory and computational power, while the unattended

employment of many devices will also require the usage of batteries for energy

storage. Overall, such factors motivate the design and adoption of communications

and security mechanisms optimized for constrained sensing platforms, capable of

providing its functionalities efficiently and reliably.

Several of these applications are approaching the bottleneck of current data

streaming infrastructures and require real-time processing of very high-volume and

high-velocity data streams (also known as big data streams). The complexity of

big data is defined through V5s: 1) volume– referring to terabytes, petabytes,

or even exabytes (10006bytes) of stored data, 2) variety– referring to unstructured,

semi-structured and structured data from different sources like social media

(Twitter, Facebook etc.), sensors, surveillance, image or video, medical records

etc., 3) velocity– referring to the high speed at which the data is handled

in/out for stream processing, 4) variability– referring to the different

characteristics and data value where the data stream is handled, 5) veracity–

referring to the quality of data. These features introduce huge open doors and

enormous difficulties for big data stream computing. A big data stream is continuous

in nature and it is important to perform real-time analysis as the lifetime of the data

is often very short (data is accessed only once) [4-5, 42-43]. As the volume and

velocity of the data is so high, there is not enough space to store and

process; hence, the traditional batch computing model is not suitable.

Even though big data stream processing has become an important research topic in

the current era, data stream security has received little attention from researchers [4-

5]. Some of these data streams are analysed and used in very critical

applications (e.g. surveillance data, military applications, Supervisory Control

and Data Acquisition (SCADA), etc.), where data streams need to be secured

in order to detect malicious activities. The problem is exacerbated when

thousands to millions of small sensors in self-organising wireless networks become

the sources of the data stream. How can we provide the security for big data streams?

17

In addition, compared to conventional store-and-process, these sensors will

have limited processing power, storage, bandwidth, and energy. Furthermore,

data streams ought to be processed on-the-fly in a prescribed sequence. This

thesis addresses these issues by designing an efficient architecture for real-time

processing of big sensing data streams, and the corresponding security scheme.

Throughout this survey we focus on security issues of big data streams, analysing

both the solutions available in the context of the various security threats starting

from IoT devices communication technologies, as well as those proposed in the

literature. We also identify and discuss the open challenges and possible strategies

for future research work in the area. As our focus is on standardized communication

protocols for the IoT, our discussion is guided by the protocol stack enabled by the

various IoT communication protocols available or currently being designed, and we

also discuss security aspects of big data stream by following a CIA triad method.

We divided the whole big data stream security into three directions (i.e. CIA triad).

In our discussion we include works available both in published research proposals

and in the form of currently active Internet-Draft (I-D) documents submitted for

discussion in relevant working groups.

2.2 Review of Reviews

This section reviews existing security reviews of different research aspects from IoT

generated data sources up to the cloud data centre. According to the architecture

from Figure 1-1, we divided the security related reviews in cloud security (data

centre security), network security, IoT security and other Security Reviews in the

following subsections.

2.1.1 Data Centre Security (Cloud Security)

Recent advances have given rise to the popularity and success of cloud computing.

However, outsourcing the data and business application to a third party causes the

security and privacy issues to become a critical concern. In reality, the cloud service

provider is identified as the core scientific problem that separates cloud computing

18

security from other topics in computing security. In the last few years, the research

community has been focusing on the non-functional aspects of the cloud paradigm,

among which cloud security stands out. Several approaches to security have been

described and summarised in general surveys on cloud security techniques where,

Ardagna et al. focuses on the interface between cloud security and cloud security

assurance [44]. Here the authors classified the vulnerability, threat, and attack in

service layer wise with considering the five most common security properties i.e.

Confidentiality, Integrity, Availability, Authenticity, Privacy. These five properties

in terms of cloud are listed as follows.

Confidentiality: The capability of limiting information access and disclosure

to authorised clients only.

Integrity: The capability of preserving structure and content of information

resources.

Availability: The capability of guaranteeing continuous access to data and

resources by authorised clients.

Authenticity: The capability of ensuring that clients or objects are genuine.

Privacy: The capability of protecting all information pertaining to the

personal domain.

There is a basic review of cloud security by identifying unique security requirements

and then presenting a viable solution that eliminates these potential threats of

individual service layers [13]. The different layer security survey along with

solutions and solution directives are listed in [47]. Xiao et al. identified the five most

representative security and privacy attributes (i.e. confidentiality, integrity,

availability, accountability, and privacy-preservability) of cloud computing and

discussed the vulnerabilities which may be exploited by adversaries in order to

perform various attacks [45]. Service delivery is the most important feature of cloud

computing over distributed computing. Subashini et al. [46] surveyed the security

issues by identifying several potential security elements and threats only focusing of

software service level. In the same way, Huang et al. [49] surveyed the security

mechanisms of infrastructure service layer of the cloud system. Authors surveyed

the academic and industrial IaaS security mechanism separately and finally find

relation between these. Rong et al. [48] emphasized three areas of particular cloud

service, namely SLAs, trusted data sharing, and accountability in the cloud. In this

19

review, they outlined ongoing work on security SLAs for cloud computing, and

briefly presented a scheme to address the security and privacy issue in the cloud. In

[50], the authors provide a comprehensive review on intrusion detection techniques

in the Cloud. They classified the various instruction detection methods in cloud

computing and possible attacks. By focusing on application authors reviewed the

mobile cloud system [51]. The cloud’s service layer wise possible security threats

shown in Figure 2-1.

2.1.2 Network Security

Sensor nodes are an important source of IoT for many types of applications. There

are always unique security requirements for sensor nodes or sensor networks

because of a node’s limited energy and processing power. As sensor networks

become wide-spread, security issues become a central concern, especially in

emergency applications. Chen et al. [52] surveyed the threats and vulnerabilities to

Wireless Sensor Networks (WSNs) and summarised the defence methods based on

the networking protocol layer. The authors divided the issues and analysed them in

seven different categories: cryptography, key management, attack detections and

preventions, secure routing, secure location security, secure data fusion, and other

security issues. A comprehensive survey of WSNs security listed the security

requirements, security challenges, attacks with existing key management solutions

[53]. The authors focused on individual security threats and solutions instead of

communication layer wise security and later on compared and evaluated security

Figure 2-1: Cloud computing security architecture

20

protocols based on each of these five categories. WSNs concept turns to Visual

sensor networks (VSNs) when source devices are sensors, adequate processing

power, and memory. In [54], the authors presented an overview of the characteristics

of VSN applications, the involved security threats and attack scenarios, and the

major security challenges. Their central contribution in this survey is the

classification of VSN security aspects into data-centric, node-centric, network-

centric, and user-centric security. They identified and discussed the individual

security requirements and presented a profound overview of related work for each

class.

Mobile devices are another major data source for IoT. Security in mobile ad hoc

networks is difficult to achieve, notably because of the vulnerability of wireless

links, the limited physical protection of nodes, the dynamically changing topology,

the absence of a certification authority, and the lack of a centralised monitoring or

management point. Earlier studies on mobile ad hoc networks (MANETs) aimed at

proposing protocols for some fundamental problems, such as routing, and tried to

cope with the challenges imposed by the new environment [87, 89]. These protocols,

however, fully trust all nodes and do not consider the security aspect. They are

consequently vulnerable to attacks and misbehaviour. A complete review of

different network layers of MANETs’ security problems along with the proposed

solutions (as of July 2005) is in [55]. The authors consider the security issues

including routing and data forwarding, medium access, key management and

intrusion detection systems (IDSs). Abusalah et al. [56] reviewed the different

routing protocol with a particular focus on security aspects. The authors chose four

representative routing protocols for analysis and evaluation including: Ad Hoc on

demand Distance Vector routing (AODV), Dynamic Source Routing (DSR),

Optimized Link State Routing (OLSR) and Temporally Ordered Routing Algorithm

(TORA). Secure ad hoc networks have to meet five security requirements:

confidentiality, integrity, authentication, non-repudiation and availability [89].

Peer to peer security and solutions at the state of the art, mentioning the suitability

and drawbacks of the different schemes are reviewed by Chopra et al. [57]. The

authors classified the security requirements in File-sharing applications and Real-

time communication applications. Information-centric networking (ICN) is a new

communication paradigm that focuses on content retrieval from a network regardless

21

of the storage location or physical representation of this content. AbdAllah et al. [58]

provide a survey of attacks to ICN architectures and other generic attacks that have

an impact on ICN. It also provides a taxonomy of these attacks in ICN, which are

classified into four main categories, i.e. naming, routing, caching, and other

miscellaneous related attacks. Later on the authors presented the severity levels of

ICN attacks and discuss the existing ICN security solutions. Long Term Evolution

(LTE) networks security aspects are reviewed in [59]. The authors presented an

overview of the security functionality of the LTE followed by, security

vulnerabilities then the existing solutions to these problems are classically reviewed.

2.1.3 IoT Security

The IoT is enabled by the latest developments in RFID, smart sensors,

communication technologies, and Internet protocols. The basic premise is to have

smart sensors collaborate directly without human involvement to deliver a new class

of applications [20]. As security will be a fundamental enabling factor of most IoT

applications, mechanisms must also be designed to protect communications enabled

by such technologies. Granjal et al. [20] reviewed the first article on IoT

communication security. Other surveys do exist that, rather than analysing the

technologies currently being designed to enable Internet communications with

sensing and actuating devices, focus on the identification of security requirements

and on the discussion of approaches to the design of new security mechanisms [60],

or on the other end discuss the legal aspects surrounding the impact of the IoT on the

security and privacy of its users [61].

Granjal et al. [20] performed a survey by analysing existing protocols and

mechanisms to secure communications in the IoT. The authors also analysed how

existing approaches ensure fundamental security requirements and protect

communications in the IoT. The authors classified the communication layer wise

security requirements, security threats, and security solutions and set a comparison

between them.

IoT architecture is generally divided into three layers, including perception layer,

network layer, and application layer. Some systems take the network support

technology (such as network processing, computing technology, middleware

22

technology, etc.) as the processing layer [62]. Figure 2-2 shows the layer wise IoT

security architecture, which is an extended idea from [63]. We have highlighted the

security measures as follows:

The Security Problems of Perception Layer Data Information Collection and

Transmission: Sensor nodes have many varieties and high heterogeneity.

They have generally simple structure and processor. These mean they cannot

have complex security protection capability.

The Traditional Security Issues of Network Layer: Although Internet

security architecture is very mature, there are still many means of attack. For

example, if a large number of malicious nodes send data at the same time it

will lead to DoS attack. So the specific network should be built for fitting

IoT information transmission.

The Application Layer Security Problems: For the different application field,

there are many complex and varied security issues.

Figure 2-2: Layer wise IoT Security architecture

23

2.3 IoT Generated Data Stream Architecture

2.3.1 IoT Architecture

The connection of physical things to the Internet makes it possible to access remote

sensor data and to control the physical world from a distance. The Internet of Things

is based on this vision. A smart object, which is the building block of the Internet of

Things, is just another name for an embedded system that is connected to the

Internet [64]. Al-Fuqaha et al. in [65] clearly defined the individual elements of IoT,

which includes identification, sensing, communication, computation, services, and

semantics. There is another technology that points in the same direction as RFID

technology. The novelty of the Internet-of-Things (IoT) is not in any new disruptive

technology, but in the pervasive deployment of smart objects. A critical requirement

of an IoT is that the things in the network must be inter-connected. IoT system

architecture must guarantee the operations of IoT, which bridges the gap between

the physical and the virtual worlds. And IoT should possess a decentralised and

heterogeneous nature. Due to the fact that things may move geographically and need

to interact with others in real-time mode, IoT architecture should be adaptive to

make devices interact with other things dynamically and support unambiguous

communication of events [66]. We broadly divided the complete architecture of IoT

into three different layers, such as source smart sensing device, communication

(Networks) layer and cloud data centre as shown in Figure 2-3. These layers can be

related to the service layer of IoT, where service layer and interface layer are

integrated into the data centre in our architecture. The service level architecture of

IoT consists of four different layers with functionality such as sensing layer,

network layer, service layer, and interfaces layer [66-67].

Sensing layer: This layer is integrated with available hardware objects

(sensors, RFID, etc.) to sense/control statuses of things.

Network layer: This layer supports the infrastructure for networking over

wireless or wired connections.

Service layer: This layer creates and manages services requirements

according to the user’s need.

24

Interfaces layer: this layer provides interaction methods to users and

applications.

2.3.1.1 Sensing Layer IoT is expected to be a world-wide physical inner-connected network, in which

things are connected seamlessly and can be controlled remotely. In this layer, more

and more devices are equipped with RFID or intelligent sensors, connecting things

becomes much easier [68]. The smart systems on tags or sensors are able to

automatically sense the environment and exchange data among devices. Individual

objects in IoT hold a digital identity which helps to track easily in the domain. The

technique of assigning a unique identity to an object is called a universal unique

identifier (UUID). In particular, UUID is critical to successful services deployment

in a huge network like IoT. The identifiers might refer to names and addresses.

There are a few aspects that need to be considered in the sensing layer such as

deployment (devices need to deployed randomly or incrementally), heterogeneity

(devices have different properties), communication (needs to communicate with

each other in order to get access), network (devices maintain different topology for

data transmission process), cost, size, resources and energy consumption. As the use

of IoT increases day by day, a large number of hardware and software components

are involved in it. IoT should have these two important properties: energy efficiency

and protocols [66].

Energy efficiency: Sensors should be active all the time to acquire real-time

data. This brings the challenge to supply power to sensors; high energy

efficiency allows sensors to work for a longer period of time without the

discontinuity of service.

Protocols: Different things existing in IoT provide multiple functions of

systems. IoT must support the coexistence of different communications such

as ZigBee, 6LoWPAN etc.

2.3.1.2 Networking Layer The role of the networking layer is to connect all things together and allow things to

share information with other connected things. In addition, the networking layer is

capable of aggregating information from existing IT infrastructures [33], data can

25

then be transmitted to cloud data centre for the high-level complex services. The

communication in the network might involve the Quality of Service (QoS) to

guarantee reliable services for different users or applications [36]. Automatic

assignment of the devices in an IoT environment is one of the major tasks, it enables

devices to perform tasks collaboratively. There are some issues related to the

networking layer as listed below [66]:

Network management technologies including managing fixed, wireless,

mobile networks

Network energy efficiency

Requirements of QoS

Technologies for mining and searching

Data and signal processing

Security and privacy

Among these issues, information confidentiality and privacy are critical because of

the IoT device deployment, mobility, and complexity. For information

confidentiality, the existing encryption technology used in WSNs can be extended

and deployed in IoT. However, it may increase the complexity of IoT. The existing

network security technologies can provide a basis for privacy and security in IoT,

but more work still needs to be done. Granjal et al. [20] divided the communication

layer for IoT applications into five different parts: Physical layer, MAC layer,

Adaptation layer, network/routing layer, application layers. They also mentioned the

associated protocols for energy efficiency as shown in Figure 2-4.

26

2.3.1.3 Service Layer A main activity in the service layer involves the service specifications for

middleware, which are being developed by various organisations. A well-designed

service layer will be able to identify common application requirements.

The service layer relies on the middleware technology, which provides

functionalities to integrate services and applications in IoT. The middleware

technology provides a cost-effective platform, where the hardware and software

platforms can be reused. The services in the service layer run directly on the network

to effectively locate new services for an application and retrieve metadata

dynamically about services. Most of the specifications are undertaken by various

standards developed by different organisations. However, a universally accepted

service layer is important for IoT. A practical service layer consists of a minimum

set of the common requirements of applications, application programming interfaces

(APIs), and protocols supporting required applications and services.

All of the service-oriented activities, such as information exchanging and storage,

management of data, ontologies database, search engines and communication, are

performed at the service layer. The activities are conducted by the following

components:

Figure 2-3: Layer wise IoT architecture from IoT device to cloud data centre

27

Service discovery finds objects that can provide the required service and

information in an effective way.

Service composition enables interaction among connected things. The

discovery exploits the relationships of things to find the desired service, and

the service composition schedules or re-creates a more suitable service to

obtain the most reliable services.

Trustworthiness management aims at understanding how the information

provided by other services has to be processed.

Service APIs provide the interactions between services required by users.

2.3.1.4 Interface Layer In IoT, a large number of devices are involved; those devices can be provided by

different vendors and hence do not always comply with same standards. The

compatibility issue among the heterogeneous things must be addressed for the

interactions among things. Compatibility involves information exchanging,

communication, and events processing. There is a strong need for an effective

interface mechanism to simplify the management and interconnection of things. An

interface profile (IFP) can be seen as a subset of service standards that allows a

minimal interaction with the applications running on application layers. The

interface profiles are used to describe the specifications between applications and

services. An illustration of the interface layer is the implementation of Universal

Plug and Play (UPnP), which specifies a protocol for seamless interactions among

heterogeneous things.

28

2.3.2 Security Threats of Each Layer

This subsection lists the security threats and security issues is each individual layer

as divided in the above subsections.

2.3.2.1 Sensing Layer The sensing layer is responsible for frequency selection, carrier frequency

generation, signal detection, modulation, and data encryption [20, 69]. An adversary

may possess a broad range of attack capabilities. A physically damaged or

manipulated node used for attack may be less powerful than a normally functioning

node. Destabilized nodes that interact with the network only through software are as

powerful as other nodes. Nodes in a sensor network use wireless communication

because the network’s ad hoc, large-scale deployment makes anything else

impractical. Base stations or uplink nodes can use wired or satellite communication,

but limitations on their mobility and energy make them scarcer. As with any radio-

Figure 2-4: Communication protocol in IoT.

29

based medium, there exists the possibility of jamming in sensor network. In addition,

nodes in sensor networks may be deployed in hostile or insecure environments

where an attacker has easy physical access. Network jamming and source device

tampering are the major types of possible attack in the sensing layer. The features of

sensing layers follows from Figure 2-4.

Jamming: Interference with the radio frequencies a network’s nodes are using

Tampering: Physical compromise of nodes

Solutions: spread spectrum communication, jamming reports, accurate and complete

design of the node physical package.

2.3.2.2 Network Layer The security mechanisms designed to protect communications with the previously

discussed protocols must provide appropriate assurances in terms of confidentiality,

integrity, authentication and non-repudiation of the information flows. Other

relevant security requirements are privacy, anonymity, liability and trust, which will

be fundamental for the social acceptance of most of the future IoT applications

employing Internet integrated sensing devices. According to the communication

protocol in IoT, we divided in five different layer as shown in Figure 2-4. Table 2-1

classifies the security threats in each individual communication layer. We listed four

different layers such as MAC layer, Adaptation layer, networking/routing layer,

Application layer [20]. We use the physical layer as the sensing layer which is

discussed in the previous subsection.

MAC Layer

The MAC layer manages, besides the data service, other operations, namely

accesses to the physical channel, network beaconing, validation of frames,

guaranteed time slots, node association and security. The standard distinguishes

sensing devices by its capabilities and roles in the network. A full-function device

(FFD) is able to coordinate a network of devices, while a reduced-function device

(RFD) is only able to communicate with other devices (of RFD or FFD types). By

using RFD and FFD devices, IEEE 802.15.4 can support network topologies such as

peer-to-peer, star and cluster networks. The mechanisms defined in IEEE 802.15.4e

will be part of the next revision of the IEEE 802.15.4 standard, and as such open the

30

door for the usage of Internet communication technologies in the context of time-

critical (e.g. industrial) applications [70].

Network Layer

One fundamental characteristic of the Internet architecture is that it enables packets

to traverse interconnected networks using heterogeneous link-layer technologies,

and the mechanisms and adaptations required to transport IP packets over particular

link-layer technologies are defined in appropriate specifications. With a similar goal,

the IETF IPv6 over Low-power Wireless Personal Area Networks (6LoWPAN)

working group was formed in 2007 to produce a specification enabling the

transportation of IPv6 packets over low-energy IEEE 802.15.4 and similar wireless

communication environments. 6LoWPAN is currently a key technology to support

Internet communications in the IoT, and one that has changed a previous perception

of IPv6 as being impractical for constrained low energy wireless communication

environments. No security mechanisms are currently defined in the context of the

6LoWPAN adaptation layer, but the relevant documents include discussions on the

security vulnerabilities, requirements and approaches to consider for the usage of

network layer security.

Routing Layer

The Routing Over Low-power and Lossy Networks (ROLL) working group of the

IETF was formed with the goal of designing routing solutions for IoT applications.

The current approach to routing in 6LoWPAN environments is materialized in the

Routing Protocol for Low power and Lossy Networks (RPL) [71] Protocol. Rather

than providing a generic approach to routing, RPL provides in reality a framework

that is adaptable to the requirements of particular classes of applications. In the

following discussion we analyse the internal operation of RPL, and later the security

mechanisms designed to protect communications in the context of routing operations.

The information in the Security field indicates the level of security and the

cryptographic algorithms employed to process security for the message. What this

field doesn’t include is the security-related data required to process security for the

message, for example a Message Integrity Code (MIC) or a signature. Instead, the

31

security transformation itself states how the cryptographic fields should be

employed in the context of the protected message.

Application Layer

As previously discussed, application-layer communications are supported by the

CoAP [72] protocol, currently being designed by the Constrained RESTful

Environments (CoRE) working group of the IETF. We next discuss the operation of

the protocol as well as the mechanisms available to apply security to CoAP

communications. The CoAP Protocol [72] defines bindings to DTLS (Datagram

Transport-Layer Security) [73] to secure CoAP messages, along with a few

mandatory minimal configurations appropriate for constrained environments.

Table 2-1: Network layer security threats Communication layers Security threats

MAC Confidentiality Data integrity Data authenticity Message Replay Attacks Access Control Mechanisms Time-Synchronised Communications

Adaptation layer Security Vulnerabilities

Network /Routing Layer Selective Forwarding Integrity and Data Authenticity Replay Attacks Sinkhole Sybil attack

Application Confidentiality

Authentication

Integrity

Non-Repudiation

Replay Attacks.

2.3.2.3 Service Layer (Middleware Security) Due to the very large number of technologies normally in place within the IoT

paradigm, a type of middleware layer is employed to enforce seamless integration of

devices and data within the same information network. Within such middleware,

32

data must be exchanged respecting strict protection constraints. IoT applications are

vulnerable to security attacks for several reasons: first, devices are physically

vulnerable and are often left unattended; second, it is difficult to implement any

security countermeasure due to the large scale and the decentralised paradigm;

finally, most of the IoT components are devices with limited resources, that can’t

support complex security schemes [74]. The major security challenge in IoT

middleware is to protect data from data integrity, authenticity, and confidentiality

attacks [75]. It also has issues with access control.

Both the networking and security issues have driven the design and the development

of the VIRTUS Middleware, an IoT middleware relying on the open XMPP protocol

to provide secure event driven communications within an IoT scenario [74].

Leveraging the standard security features provided by XMPP, the middleware offers

a reliable and secure communication channel for distributed applications, protected

with both authentication (through TLS protocol) and encryption (SASL protocol)

mechanisms.

Security and privacy are responsible for confidentiality, authenticity, and

nonrepudiation. Security can be implemented in two ways – (i) secure high-level

peer communication which enables higher layers to communicate among peers in a

secure and abstract way and (ii) secure topology management which deals with the

authentication of new peers, permissions to access the network and protection of

routing information exchanged in the network [76]. Other approaches to implement

security and privacy in IoT-middleware are of trust management, device

authentication, and integrity service and access control. The major IoT security

requirements are data authentication, access control, and client privacy [61].

An AmI framework also called Otsopack provides two core features: (i) it is

designed to be simple, modular and extensible and (ii) it runs in different

computational platforms, including Java SE and Android [77]. As regards security,

given the data-centric nature of the framework, there are mainly two core

requirements: (i) a data provider may only grant access to certain data to a certain set

of users and (ii) a data consumer may trust only a set of providers for a certain set of

acquired data. A derived issue is how to authenticate each other in such a dynamic

scenario. In order to support the first requirement, an OpenID-based solution has

33

been built. An Identity Provider securely identifies data consumers to the data

providers. Data providers can establish which graphs can be accessed by which users.

Therefore, the provider will return a restricted graph only if the valid user is

requesting it. In other words, the same application can get different amounts of

information depending on whether it provides credentials or not.

The authors in [78] suggest the use of lightweight symmetric encryption (for data)

and asymmetric encryption protocols (for key exchange) in Trivial File Transfer

Protocol (TFTP). The target implementation of TFTP is the embedded devices such

as Wi-Fi Access Points (AP) and remote Base Stations (BS), which could be

attacked by malicious users or malwares with the installation of malicious code (e.g.,

backdoors). The authors emphasize finding a solution for strengthening the

communication protocol among AP and BS [78]. To verify this proposal, the authors

decided to use UBOOT (Universal Boot loader). Two schemes are implied: AES,

used to protect personal and sensitive data, and DHKE (Diffie-Hellman Key

Exchange), for exchanging cryptographic keys between two entities that do not

know each other.

In [79] a Naming, Addressing and Profile Server (NAPS) is presented as a

middleware to bridge different platforms in IoT environments. Given massive

amount of heterogeneous devices deployed across different platforms, NAPS serves

as key module at the back-end data centre to aid the upstream, the content-based

data filtering and matching and the downstream from applications. The system deals

with Authentication, Authorization and Accounting (AAA). Although it is not the

focus of this work, the design can largely leverage the Network SEcurity Capability

(NSEC) SC in ETSI M2M service architecture. Note that the device domain is

organised in a tree structure. It uses a key hierarchy, composed of root key, service

key and application keys. Root key is used to derive service keys through

authentication, and key agreement between the device or gateway and the M2M SCs

at the M2M Core. The application key, derived from service key, is unique for M2M

applications.

Several recent works tried to address the presented issues. For example [80] deals

with the problem of task allocation in IoT. In more detail, the cooperation among

nodes has to perform an interoperability towards a collaborative deployment of

34

applications, able to take into account the available resources, such as energy,

memory, processing, and object capability to perform a given task. In order to

address such an issue, a resource allocation middleware for the deployment of

distributed applications in IoT is proposed. Starting from this component, a

consensus protocol for the cooperation among network objects in performing the

target application is added, which aims to distribute the burden of the application

execution, so that resources are adequately shared. Such a work exploits a

distributed mechanism and demonstrates better performance than its centralised

counterpart.

2.3.2.4 Cloud Security Cloud computing is a merger of several known technologies including grid and

distributed computing, utilising the Internet as a service delivery network. The

public Cloud environment is extremely complex when compared to a traditional data

centre environment [22]. Under the paradigm of Cloud computing, an organisation

surrenders direct control over major aspects of security, conferring a substantial

level of trust onto the Cloud provider. A recent survey regarding the use of Cloud

services made by IDC highlights that security is the greatest challenge for the

adoption of Cloud [13, 46, 47]. Figure 2-5 is extended from an idea outlined in [47],

which shows the vulnerabilities, threats and possible attacks in cloud environments.

The following subsections describe these.

2.3.2.4.1 Vulnerabilities in the Cloud Environment This section discusses major Cloud specific vulnerabilities, which pose serious

threats to Cloud computing.

Vulnerabilities in virtualization/multi tenancy

Virtualization/multi-tenancy serves as the basis for Cloud computing architecture.

There are mainly three types of virtualization used: OS level virtualization,

application based virtualization, and Hypervisor based virtualization.

Vulnerabilities in Internet protocol

Vulnerabilities in Internet protocols may prove to be an implicit way of attacking the

Cloud system that include common types of attacks like man-in-the-middle attack,

35

IP spoofing, ARP spoofing, DNS poisoning, RIP attacks, and flooding. ARP

poisoning is one of the well-known vulnerabilities in Internet protocols.

Unauthorised access to management interface

In Cloud, users have to manage their subscription including Cloud instance, data

upload or data computation through a management interface. Unauthorised access to

such a management interface may become very critical for a Cloud system.

Injection vulnerabilities

Vulnerabilities like SQL injection flaw, OS injection flaw, and Lightweight

Directory Access Protocol (LDAP) injection flaw are used to disclose application

components. Such vulnerabilities are the outcomes of defects in design and

architecture of applications.

Vulnerabilities in browsers and APIs

Cloud providers publish a set of software interfaces (or APIs) that customers can use

to manage and interact with Cloud services. Service provisioning, management,

orchestration, and monitoring are performed using these interfaces via clients (e.g.

Web browser).

2.3.2.4.2 Attacks on Cloud Computing By exploiting vulnerabilities in Cloud, an adversary can launch the following attacks.

Zombie attack

Through the Internet, an attacker tries to flood the victim by sending requests from

innocent hosts in the network. These types of hosts are called zombies. In the Cloud,

the requests for Virtual Machines (VMs) are accessible by each user through the

Internet. An attacker can flood a large number of requests via zombies.

Service injection attack

Cloud system is responsible for determining and eventually instantiating a free-to-

use instance of the requested service. The address for accessing that new instance is

to be communicated back to the requesting user. An adversary tries to inject a

malicious service or new virtual machine into the Cloud system and can provide

malicious service to users. Cloud malware affects the Cloud services by changing

(or blocking)Cloud functionalities.

Virtualization attack

36

There are mainly two types of attacks performed over virtualization: VM Escape and

Rootkit in hypervisor. These types of attacks are possible with the cloud

virtualization concept.

Man-in-the Middle attack

In Cloud, an attacker is able to access the data communication among data centres.

Proper SSL configuration and data communication tests between authorised parties

can be useful to reduce the risk of Man-in-the-Middle attack.

Metadata spoofing attack

In this type of attack, an adversary modifies or changes the service’s Web Services

Description Language (WSDL) file where descriptions about service instances are

stored.

Phishing attack

Phishing attacks are well known for manipulating a web link and redirecting a user

to a false link to get sensitive data. In Cloud, it may be possible that an attacker uses

the cloud service to host a phishing attack site to hijack accounts and services of

other users in the Cloud.

Backdoor channel attack

It is a passive attack, which allows hackers to gain remote access to the

compromised system. Using backdoor channels, hackers can control victim’s

resources and can make a zombie for attempting a DDoS attack.

2.3.2.4.3 Threats to Cloud computing Cloud security alliance presented a primary draft for threats relevant to the security

architecture of Cloud services. We discuss here some potential threats relevant to

Cloud and relevant mitigation directives.

Changes to business model

Cloud computing changes the way in which IT services are delivered. As servers,

storage and applications are provided by off-site external service providers,

organisations need to evaluate the risks associated with the loss of control over the

infrastructure. This is one of the major threats which hinder the usage of Cloud

computing services.

Abusive use of Cloud computing

37

Cloud computing provides several utilities including bandwidth and storage

capacities. Some vendors also give a predefined trial period to use their services.

However, they do not have sufficient control over the attackers, malicious users or

spammers that can take advantages of the trials. These can often allow an intruder to

plant a malicious attack and prove to be a platform for serious attacks.

Insecure interfaces and API

Cloud providers often publish a set of APIs to allow their customers to design an

interface for interacting with Cloud services. These interfaces often add a layer on

top of the framework, which in turn would increase the complexity of Cloud. Such

interfaces allow vulnerabilities (in the existing API) to move to the Cloud

environment. Improper use of such interfaces would often pose threats such as clear-

text authentication, transmission of content, improper authorizations, etc.

Malicious insiders

Most of the organisations hide their policies regarding the level of access to

employees and their recruitment procedure for employees. However, using a higher

level of access, an employee can gain access to confidential data and services. Due

to lack of transparency in a Cloud provider’s process and procedure, insiders often

have the privilege. Insider activities are often bypassed by a firewall or Intrusion

Detection system (IDS) assuming it to be a legal activity.

Shared technology issues/multi-tenancy nature

In multi-tenant architecture, virtualization is used to offer shared on-demand

services. The same application is shared among different users having access to the

virtual machine. However, as highlighted earlier, vulnerabilities in a hypervisor

allow a malicious user to gain access and control of the legitimate users’ virtual

machine.

Data loss and leakage

Data may be compromised in many ways. This may include data compromise,

deletion, or modification. Due to the dynamic and shared nature of the Cloud, such a

threat could prove to be a major issue leading to data theft.

Service hijacking

Service hijacking may redirect the client to an illegitimate website. User accounts

and service instances could in turn make a new base for attackers. Phishing attack,

38

fraud, exploitation of software vulnerabilities, reused credentials, and passwords

may pose service or account hijacking.

Risk profiling

Cloud offerings make organisations less involved with ownership and maintenance

of hardware and software. This offers significant advantages. However, this makes

them unaware of internal security procedures, security compliance, hardening,

patching, auditing, and logging process and exposes the organisation to greater risk.

Identity theft

Identity theft is a form of fraud in which someone pretends to be someone else, to

access resources or obtain credit and other benefits. The victim (of identity theft) can

suffer adverse consequences and losses and be held accountable for the perpetrator’s

actions. Relevant security risks include weak password recovery workflows,

phishing attacks, key loggers, etc.

2.4 Big Data stream Security

Applications dealing with large data sets obtained via simulation or actual real-time

sensor networks/social network are increasing in abundance [81]. The data obtained

from real-time sources may contain certain discrepancies which arise from the

dynamic nature of the source. Furthermore, certain computations may not require all

the data and hence this data must be filtered before it can be processed. By installing

adaptive filters that can be controlled in real-time, we can filter out only the relevant

parts of the data thereby improving the overall computation speed.

Figure 2-5: Cloud computing security threats, attacks and vulnerabilities.

39

Nehme et al. [82] proposed a system, StreamShield, designed to address the problem

of security and privacy in the data stream. They have clearly highlighted the need for

two types of security in data stream i.e. (1) the “data security punctuations” (dsps)

describing the data-side security policies, and (2) the “query security punctuations”

(qsps) in their paper. The advantages of such a stream-centric security model include

flexibility, dynamicity and speed of enforcement. A stream processor can adapt to

not only data-related but also to security-related selectivity, which helps reduce

waste of resources, when few subjects have access to streaming data.

Security verification is very important in data stream in order to avoid the

unwanted and corrupted data.

Another important problem need to address is to perform the security

verification in near real-time.

Security verification should not degrade the performance of the stream

processing engine. i.e. speed of security verification should be much more

efficient than stream processing engine.

There are several applications where sensor nodes work as the source of the data

stream. Here we list several applications such as real-time health monitoring

applications (Health care), industrial monitoring, geo-social networking, home

automation, war front monitoring, smart city monitoring, SCADA, event detection,

disaster management and emergency management.

From all the above applications, we found data needs to be protected from malicious

attacks to maintain originality of data before it reaches a data processing centre [83].

As the data sources is sensor nodes, it is always important to propose lightweight

security solutions for data streams [83].

These applications require real-time processing of very high-volume data streams

(also known as big data stream). The complexity of big data is defined through 4Vs

i.e. volume, variety, velocity, veracity. These features present significant

opportunities and challenges for big data stream processing. Big data stream is

continuous in nature and it is important to perform the real-time analysis as the life

time of the data is often very short (applications can access the data only once) [4-5].

40

2.4.1 Security Requirements

The goal of security services in a big data stream is to protect the information and

resources from malicious attacks and misbehaviour. The security requirements in

big data stream include:

Availability: Which ensures that the data stream is accessible to

authenticated users and specific applications according to the predefined

features.

Authorization: Which ensures that only authorised users can be involved in

data analysis and modification.

Authentication: Which ensures that the data are from the legitimate sources

by providing end-to-end security services. Data should not be from any

malicious sources.

Confidentiality: which ensures that a given message cannot be understood

by anyone other than the desired recipients

Integrity: which ensures that a message sent from sources to the data centre

(cloud) is not modified by malicious intermediate

Nonrepudiation: which denotes that a source cannot deny sending a

message it has previously sent

Freshness: which implies that the data is recent and ensures that no

adversary can replay old messages.

Moreover, forward and backward secrecy should also be considered, when we get

the data from net set of sources.

Forward secrecy: a source device should not be able to read any future

messages after it leaves the network.

Backward secrecy: a joining source device should not be able to read any

previously transmitted message.

By considering the above security requirements, we divided the security issues,

threats, and solutions of IoT generated big data stream based on the CIA

(Confidentiality, Integrity, and Availability) triad. The following sections give the

details about this.

41

2.4.2 CIA Triad Properties

Confidentiality, integrity and availability, also known as the CIA triad, is a

model designed to guide policies for information security within big data streams.

The CIA triad is shown in Figure 2-6. The model is also sometimes referred to as

the AIC triad (availability, integrity and confidentiality) to avoid confusion with the

Central Intelligence Agency. The elements of the triad are considered the three most

crucial components of security.

Confidentiality – secrecy of the data either in transit or at rest, our data stream deals

with data in transit. Def: Confidentiality is a set of rules or a promise that limits

access or places restrictions on certain types of information in the data stream.

Partial confidentiality of the data, when there is considerable informational

disclosure on some situation [84]. Weak confidentiality is in the case where some

parts of the original data blocks can be reconstructed explicitly from fewer than m

pieces [85]. Information Dispersal Algorithms (IDAs) have weak confidentiality and

an eavesdropper can reconstruct some segments of the original file F explicitly from

fewer than m pieces in the case of weak confidentiality.

System measure the impact on confidentiality of a successful exploit of the

vulnerability on the target system.

Partial: There is considerable informational disclosure.

Complete: A total compromise of critical system information.

Figure 2-6: CIA triad of data security either data in transit or at rest.

42

Integrity –Integrity, in terms of big data streams, is the assurance that information

can only be accessed or modified by those authorised users or applications.

Measures taken to ensure integrity include controlling the physical environment of

networked terminals and servers, restricting access to data, and maintaining rigorous

authentication practices. The authentication process is the part of the integrity

process.

System measure the impact on Integrity of a successful exploit of the vulnerability

on the target system.

Partial: Considerable breach in integrity.

Complete: A total compromise of system integrity.

Authentication: Measures whether or not an attacker needs to be authenticated in a

big data stream in order to exploit the vulnerability. Authentication is required to

access and exploit the vulnerability.

Availability –Data availability is a term of big data streams to ensure that data

continues to be available at a required level of performance in situations ranging

from normal through data streams. In general, data availability is achieved through

providing access to authenticate users and/or applications.

System measure the impact on Availability of a successful exploit of the

vulnerability on the target system.

Partial: Considerable lag in or interruptions in resource availability.

Complete: Total shutdown of the affected resource

2.4.3 Confidentiality of Big Data Streams

Confidentiality is the ability to cover messages from a passive attacker so that

information in big data streams remains confidential. This is one of the most

important issues for several applications such as health care, military etc.

Confidentiality must be maintained throughout the entire lifetime of the data, from

source smart IoT device to long-term archiving in a cloud data centre.

43

Confidentiality is typically achieved via data encryption. This is a kind of passive

attack, where unauthorised attackers monitor and listen to the communication

medium. The attacks against data privacy and is always passive in nature.

Access Authorization [54] — Access to confidential data must be limited to a group

of legitimate users to analyse data. An access authorization scheme must ensure that

only persons with adequate security clearance get access to stream data. For access

to especially sensitive data, involvement of more than one operator should be

required to prevent misuse. If a video stream contains different levels of information

(e.g. full video, annotations), access should be managed separately for each level.

Additionally, all attempts to access confidential data should be securely logged.

Attack Against Privacy — In our classification, privacy is a subproperty of

confidentiality. Whereas confidentiality denotes protection of all data against

external parties, privacy means protection of sensitive data against misuse by

legitimate users (i.e., insiders). In fact, much information from sensor networks

could probably be collected through direct site surveillance [54]. Rather, sensor

networks intensify the privacy problem because they make large volumes of

information easily available through remote access. Hence, adversaries need not be

physically present to maintain surveillance. They can gather information at low-risk

in an anonymous manner [86]. For system operators who perform monitoring tasks,

behavioural information is usually sufficient and identity information is not required.

This can be achieved by automatic detection and removal of sensitive data from data

streams.

Monitor and Eavesdropping — This is the most common attack on confidentiality.

By snooping to the data, the adversary could easily discover the communication

contents. When the traffic conveys the control information about the sensor network

configuration, which contains potentially more detailed information than accessible

through the location server, the eavesdropping can act effectively against the privacy

protection [86].

Traffic Analysis — Even when the messages transferred are encrypted, it still leaves

a high possibility of analysis of the communication patterns. Sensor activities can

potentially reveal enough information to enable an adversary to cause malicious

harm to the sensor network [86].

44

Camouflage Adversaries — One can insert their node or compromise the nodes to

hide in the sensor network. After that these nodes can act as a normal node to attract

the packets, then misroute the packets, conducting the privacy analysis.

Related works on data confidentiality

Lou et al. [87] proposed a novel scheme, Security Protocol for REliable dAta

Delivery (SPREAD), focusing on confidentiality as a service. The proposed

SPREAD scheme aims to provide further protection to secret messages from being

compromised (or eavesdropped) when data are in an insecure network. Authors

presented the overall system architecture and investigate the design issues and as a

result SPREAD is more secure and also provides a certain degree of reliability

without sacrificing the security. Their simulation results show that SPREAD can

provide more secure data transmission when messages are transmitted across an

insecure wireless medium. Location Privacy Routing Protocol (LPR) is able to

minimize the traffic direction information that an adversary can retrieve from

eavesdropping [88]. A novel anonymous on demand routing protocol, named as

MASK [89], can accomplish both MAC-layer and network-layer communications

without disclosing real IDs of the participating nodes in any adversary environment.

MASK offers the anonymity of senders, receivers, and sender-receiver relationships

in addition to node unlocatability and untrack ability and end-to-end flow

intractability. Saini et al. [90] proposed a privacy protection method that adopts

adaptive data transformation involving the use of selective obfuscation and global

operations to provide robust privacy even with unreliable detectors in surveillance

environments. Experimental results of the proposed method show that the proposed

method incurs 38% less distortion of the information needed for surveillance in

comparison to earlier methods of global transformation, while still providing close

to zero privacy loss. Security protocols optimized for sensor networks: Luo has two

secure building blocks: SNEP and μTESLA [91]. SNEP includes: data

confidentiality, two-party data authentication, and evidence of data freshness.

Luo et al. [92] proposed a new approach to protect confidentiality against a parasitic

adversary by obtaining measurements in an unauthorised way. The low-complexity

solution, GossiCrypt, leverages the large scale of sensor networks to protect

45

confidentiality efficiently and effectively. GossiCrypt protects data by symmetric

key encryption at their source nodes and re-encryption at a randomly chosen subset

of nodes and route to the sink. The authors validate GossiCrypt analytically and with

simulations, showing it protects data confidentiality with a probability of almost one.

Chan et al. [93] proposed key management schemes for data confidentiality, when

data are disseminated to multiple destinations. The authors categorized these

schemes into four groups: key tree-based approaches, contributory key agreement

schemes supported by the Diffie-Hellman algorithm, computational number

theoretic approaches, and secure multicast framework approaches. Through

examples, authors describe the operation of the schemes and compare their

performances. Traffic analysis poses a serious threat while data transmitted through

wireless medium [91]. Strong encryption and traffic padding are often used to hide

message contents to maintain confidentiality of the data. Jiang et al. [94] discussed

different methods of constructing traffic cover mode, formulate an optimality

problem and presenting a solution.

2.4.4 Integrity of Big Data Streams

In the following we list the possible integrity attacks on big data streams. First we

highlight the possible attacks on big data streams.

Spoofed, Altered, or Replayed data stream Information — The most direct attack

against data in transit after IoT devices. An attacker may spoof, alter, or replay

information in order to interrupt traffic in the network [92]. These interruptions

include the creation of routing loops, attracting or repelling network traffic,

extending and shortening source routes, generating fake error messages, partitioning

the network, and increasing end-to-end latency [69].

A countermeasure against spoofing and alteration is to append a message

authentication code (MAC) after the message. By adding a MAC to the message, the

receivers can verify whether the messages have been spoofed or altered. To defend

against replayed information, counters or timestamps can be included in the

messages [94].

46

Selective Forwarding — A significant assumption made in multihop networks is

that all intermediate nodes in the network will accurately forward received messages.

An attacker may create malicious nodes which selectively forward only certain

messages and simply drop others [63]. A specific form of this attack is the black

hole attack in which a node drops all messages it receives. One defence against

selective forwarding attacks is using multiple paths to send data [95]. A second

defence is to detect the malicious node or assume it has failed and seek an

alternative route [69].

Sinkhole — In a sinkhole attack, an attacker makes a compromised node look more

attractive to surrounding nodes by forging routing information [95-96]. It chooses

intermediate nodes to transfer the big data stream. The end result is that surrounding

nodes will choose the compromised node as the next node to route the data stream.

This type of attack makes selective forwarding very simple, as all traffic from a

large area in the network will flow through the adversary’s node [69].

Sybil — The Sybil attack is a case where one node presents more than one identity

in the source network [95, 97]. Protocols and algorithms which are easily affected

include fault-tolerant schemes, distributed storage, and network-topology

maintenance. For example, a distributed storage scheme may rely on there being

three replicas of the same data to achieve a given level of redundancy [69]. If a

compromised node pretends to be two of the three nodes, the algorithms used may

conclude that redundancy has been achieved while in reality it has not.

Wormholes — A wormhole is a low-latency link between two portions of the

network over which an attacker replays network information [95]. This link may be

established either by a single node forwarding messages between two adjacent but

otherwise non-neighbouring nodes or by a pair of nodes in different parts of the

network communicating with each other. The latter case is closely related to the

sinkhole attack, as an attacking node near to the data centre can provide a one-hop

link to that base station via the other attacking node in a distant part of the network

[69]. Hu et al. presented a novel and general mechanism called packet leashes for

detecting and defending against wormhole attacks [98]. Two types of leashes were

introduced: geographic leashes and temporal leashes. The proposed mechanisms can

also be used in sensor networks.

47

Hello Flood Attacks — Many protocols which use HELLO packets make the naive

assumption that receiving such a packet means the sender is within radio range and

is therefore a neighbour. An attacker may use a high-powered transmitter to trick a

large area of nodes into believing they are neighbours of that transmitting node [69,

95]. If the attacker falsely broadcasts a superior route to the base station, all of these

nodes will attempt transmission to the attacking node, despite many being out of

radio range in reality.

Acknowledgment Spoofing — Routing algorithms used in sensor networks

sometimes require Acknowledgments to be used. An attacking node can spoof the

Acknowledgments of overheard packets destined for neighbouring nodes in order to

provide false information to those neighbouring nodes [69, 95]. An example of such

false information is claiming that a node is alive when in fact it is dead.

Desynchronisation — Desynchronisation refers to the disruption of an existing

connection [69]. An attacker may, for example, repeatedly spoof messages to a data

centre, causing missed frames. If timed correctly, an attacker may degrade or even

prevent the ability of the end hosts to successfully exchange data, thus causing them

to instead waste energy by attempting to recover from errors which never really

existed.

A possible solution to this type of attack is to require authentication of all packets

communicated between hosts [96]. Provided that the authentication method is itself

secure, an attacker will be unable to send the spoofed messages to the end hosts.

Time Synchronisation — Every source IoT device has its own local clock. For

example, to correlate events detected by multiple nodes, a common time base among

the participants of the data transmission is required. Since the clocks of the sensor

nodes operate independently, the time readings of the nodes will differ. These time

differences are increased further by the individual drifts of the nodes’ oscillators

[54]. Consequently, clock and time synchronisation is required to enable meaningful

comparison of observed events and to jointly solve distributed tasks. From a security

perspective, it is apparent that time synchronisation protocols are an attractive target

for attackers who want to disrupt the services of a big data stream. Boukerche et al.

[99] defined three different groups of attackers on time synchronisation. The first

group is malicious outsiders who can eavesdrop the on communication and who can

48

inject messages. The second group is able to jam communication channel and can

delay and replay captured packages. Finally, the third group includes insiders who

have managed to capture an IoT source device and therefore also have access to the

cryptographic keys of the node. Protection against malicious outsiders is based on

cryptographic techniques and is not different from protecting any other protocol in

data streams.

Eavesdropping (Passive Attacks) — Wireless medium of data communication

happens in IoT generated big data stream and the wireless channel can be easily

accessible. Moreover, the promiscuous mode, which means capturing packets by a

node that is not the appropriate destination, is allowed and employed by protocols to

operate or to ensure more efficiency, e.g. a routing protocol may use this mode to

learn routes [55]. These features can be exploited by malicious nodes to eavesdrop

on packets in transit, then analyse them to obtain confidential and sensitive

information. The obvious preventive solution to protect information is to encrypt

packets, but data encryption does not prevent malicious nodes from eavesdropping

and trying to break decryption keys [55]. Since breaking keys is always possible and

key revocation is problematic, as we will see later, eavesdropping remains a serious

attack against data forwarding.

Dropping Data Packets Attack — Since packets follow multi-hop routes, a

malicious node can participate in routing, include itself in routes, and drop all

packets it gets to forward. To do this, the malicious node first attacks the routing

protocol to gain participation in the routing, using one or more of the attacks

presented previously [55]. This attack has the same effects as the selfish

misbehaviour presented hereafter and the same solutions may be applied.

Selfish Behaviour on Data Forwarding — In many civilian applications, such as

networks of cars and the provision of communication facilities in remote areas in

IoT, the source devices typically do not belong to a single authority and do not

pursue a common goal. In such networks, forwarding packets for others is not in the

direct interest of nodes, so there is no good reason to trust nodes and assume that

they always cooperate. Indeed, a selfish node may try to preserve its resources,

particularly battery power, which is a precious resource, by dropping packets it is

asked to forward while using other nodes’ services and consuming their resources to

49

transmit its own packets toward remote nodes. This is not an intentional attack but a

selfish misbehaviour.

Related works on data integrity

There are so many works already being done in and related to protect against

integrity attack.

Perrig et al. [91] presented a suite of security protocols optimized for sensor

networks: SPINS. SPINS has two secure building blocks: SNEP and μTESLA.

SNEP includes: data confidentiality, two-party data authentication, and evidence of

data freshness. μTESLA provides authenticated broadcast for severely resource-

constrained environments. Their implementation of these protocol shows that they

are practical even on minimal hardware: the performance of the protocol suite easily

matches the data rate network. Security is important for many sensor network

applications and a particularly harmful attack against sensor and ad hoc networks is

known as the Sybil attack [97], where a node illegitimately claims multiple identities.

Authors systematically analyse the threat posed by the Sybil attack to wireless

sensor networks and demonstrate that the attack can be exceedingly detrimental to

many important functions of the sensor network such as routing, resource allocation,

misbehaviour detection, etc. Then they propose several novel techniques to defend

against the Sybil attack, and analyse their effectiveness quantitatively. Many new

time synchronisation algorithms have been proposed, and a few of them provide

security measures against various degrees of attacks. In [99] the authors reviewed

the most commonly used time synchronisation algorithms and evaluate these

algorithms based on factors such as their countermeasures against various attacks

and the types of techniques used. Deng et al. [83] introduced an INtrusion-tolerant

routing protocol for wireless SEnsor NetworkS (INSENS). INSENS is secure and

efficient, and constructs tree-structured routing for wireless sensor networks (WSNs).

The key objective of an INSENS network is to tolerate damage caused by an

intruder who has compromised deployed sensor nodes and is intent on injecting,

modifying, or blocking packets. To limit or localize the damage caused by such an

intruder, INSENS incorporates distributed lightweight security mechanisms,

including efficient one way hash chains and nested keyed message authentication

50

codes that defend against wormhole attacks. An enhanced single-phase version of

INSENS scales to large networks, integrates bidirectional verification to defend

against rushing attacks. With considering the importance of security issues Du et al.

[100] summarise typical attacks on sensor networks and survey the literature on

several important security issues relevant to the sensor networks, including key

management, secure time synchronisation, secure location discovery, and secure

routing, where sensors are the data source of IoT.

Tubaishat et al. [101] designed a Secure Routing Protocol for Sensor Networks

(SRPSN) to safeguard the data packet passing on the sensor networks under

different types of attacks (integrity attack). The authors also proposed a group key

management scheme, which contains group communication policies, group

membership requirements and an algorithm for generating a distributed group key

for secure communication. The authors used a highly efficient symmetric key and

use hierarchical architecture, which greatly lowers the computation and

communication overhead. Considering eavesdrop and trace packet movement in the

network, Jian et al. [86] proposed a location privacy routing protocol (LPR) that is

easy to implement and provides path diversity. Combining with fake packet

injection, LPR is able to minimize the traffic direction information that an adversary

can retrieve from eavesdropping. The authors evaluate the system based on three

criteria: delivery time, privacy protection strength, and energy cost. The

performance of this protocol can be tuned through a couple of parameters that

determine the tradeoff between energy cost and the strength of location-privacy

protection.

Routing plays an important role in the security of the entire network. In general,

routing security in wireless MANETs appears to be a problem that is not trivial to

solve. Deng et al. [102] studied the routing security issues of MANETs, and analyse

in detail the black hole attacks for MANETs. Their purposed solution follows the

black hole problem for ad hoc on-demand distance vector (AODV) routing protocol.

The limitation of this work is that malicious nodes do not work as a group, although

this may happen in a real situation.

51

2.4.5 Availability of Big Data Streams

Data availability in big data streams is to control the access to the end users or

specific applications. This subsection classifies the different access control with data

steam properties. There are two questions required for the access control process i.e.

who? and what?.

Several access control models exist. Their corresponding access control

mechanisms—the concrete implementations of those access control models—can

take several forms, make use of different technologies and underlying infrastructure

components, and involve varying degrees of complexity. In some cases, the more

complicated models expand upon and enhance earlier models, while in other cases

they represent a rethinking of the fundamental manner in which access control

should be done. In many cases, the newer, more complicated models arose not from

deficiencies in the security that earlier models provide, but from the need for new

models to address changes in organisational structures, technologies, organisational

needs, technical capabilities, and/or organisational relationships [78].

Mandatory Access Control System (MAC) — Mandatory Access Control, or MAC,

relies on labels that correspond to the sensitivity levels of information for clients and

objects [103]. MAC policy compares the sensitivity label at which the user is

working to the sensitivity label of the object being accessed and refuses access

unless certain MAC checks are passed. MAC is mandatory because the labelling of

information happens automatically, and ordinary users cannot change labels unless

an administrator authorises them [79]. In big data streams, the authorised user or

administrator can access the data to read and analyse to detect even in near real time.

Role-Based Access Control (RBAC) — Role-based access control (RBAC) models

are receiving increasing attention as a generalized approach to access control [104].

In an RBAC model, roles represent functions within a given organisation. For big

data streams, roles can be assigned to the specific application and the access control

mechanism can be granted to roles, rather than single users. The authorizations

granted to a role are strictly related to the data objects and resources that are needed

for exercising the functions associated with the role. Users are thus simply

authorised to “play” the appropriate roles, thereby acquiring the roles’ authorizations

52

[105]. When users log in, they can activate a subset of the roles they are authorised

to play. We call a role that a user can activate during a session an enabled role.

View-Based Access Control (VBAC) — VBAC [106] extends RBAC by introducing

a view as a static, typed language construct for the description of fine grained access

rights, which are permissions or denials for operations of distributed objects. In

essence, VBAC is based on the classical access matrix model with roles as subjects

and views as matrix entries. Individual subjects or roles, but a principal has access to

an operation of an object if (s)he has a view on the object with a permission for the

operation.

Activity-based access control (AcBAC) — Activity-based access control is a term

seldom used and seems to be an earlier version of attribute-based access control.

An activity-based access control (AcBAC) model has been introduced recently,

which was designed for collaborative work environments [107]. Workflow is

defined as a set of activities (tasks) that are connected to achieve a common goal.

AcBAC separates access right assignment for users and access right activation. Even

if a user was allocated access rights on the workflow template, he/she can exercise

his rights during the activation of the task in the specific workflow instance [107].

Attribute based access control — The attribute based access control (AtBAC)

model has the following characteristics [108].

• Users have a set of identity attributes that describe properties of users. For example,

organisational role(s), seniority, applications and so on.

• Data is associated with AtBAC policies that specify conditions over identity

attributes.

• A user whose identity attributes satisfy the AtBAC policy associated with a data

item is allowed to access the data item.

Proximity based access control — The model relies on an intuitive notion that

“proximity” means the users are present (or not) within the same physical space.

This lack of a rigorous understanding of proximity can lead to surprising

interpretations [109]. Location based or application based users can access stream

data. This method can be applied to application oriented access to data stream. In

PBAC, administrators can write policies that specify either the presence or absence

53

of other users within a protected area. PBAC also makes a distinction between

policies that require authorization only a single time prior to access and policies that

specify conditions that must continue to hold for as long as the permission is used.

Encryption based access control — Big Data technologies are increasingly used to

stream data analysis and other sensitive data. In order to comply with various

regulations and policies, such data need to be stored encrypted and the access to

them needs to be controlled based on the identity attributes of users. Nableel et al.

[110] proposed an efficient symmetric key encryption scheme for access control for

big data. Unlike the direct application of symmetric key encryption, keys are not

stored in the system; they are dynamically derived when data is to be decrypted.

This approach is an order of magnitude moreefficient than the ABE based approach

as ours is based on symmetric key encryption and broadcast group key management.

The main bottleneck in the approach is the key generation operation.

Privilege State based access control (PSAC) — To support fine-grained intrusion

response, manually moving a privilege to the suspend state provides the basis for an

event based continuous authentication mechanism. Similar arguments can be made

for attaching the taint state to a privilege that triggers auditing of the request in

progress. The decision semantics of an access control system using privilege states

is called a privilege state based access control (PSAC) system [111]. For the

completeness of the access control decisions, a privilege, assigned to a user or role,

in PSAC can exist in the following five states: unassign, grant, taint, suspend, and

deny.

Risk based access control [112] — Fuzzy inference is a promising approach to

implement risk-based access control systems. However, its application to access

control raises some novel problems that have not yet been investigated. Risk-based

access control, though it improves information flow and better addresses

requirements from critical organisations, may result in damage by malicious users

before mitigating steps are taken. The time required by a fuzzy inference engine to

estimate risks may be quite high, especially when there are tens of parameters and

hundreds of fuzzy rules. However, an access control system may need to serve

hundreds or thousands of users. Ni et al. [104] investigated these issues and present

new solutions or answers to above issues.

54

Discretionary access control (DAC) [113] — Discretionary access control, based on

checking access requests against users' authorizations, does not provide any way of

restricting the usage of information once it has been "legally" accessed [80]. This

makes discretionary systems vulnerable to Trojan Horses maliciously leaking

information. Therefore the need arises for providing additional controls limiting the

indiscriminate flow of information in the system. Conventional authorization models

enforcing discretionary policies are based on authorizations which specify, for each

user or group of users in the system, the accesses he/she is allowed to execute on

objects. Bertino et al. [113] proposed a model that allowed the specification of

temporal dependencies among authorizations. Temporal dependencies allow the

derivation of new authorizations on the basis of the presence or absence of other

authorizations in given time intervals. This model allows association of each

authorization temporal constraint which restricts the validity of the authorization.

Related works on data availability

This section gives some standardised related work on data availability. This related

work contains the proposed solutions against the above specified access control

mechanism.

Bertino et al. [113] defined discretionary access control, based on checking access

requests against users' authorizations. It does not provide any way of restricting the

usage of information once it has been "legally" accessed. Conventional authorization

models enforcing discretionary policies are based on authorizations which specify,

for each user or group of users in the system, the accesses he is allowed to execute

on objects. In [113], the authors proposed an authorization model in association with

authorization temporal constraints which restrict the authorization validity. This

model allows temporal dependencies among authorizations and temporal

dependencies allow the derivation of new authorizations on the basis of the presence

or absence of other authorizations in given time intervals.

Ni et al. [112] (Risk-based Access Control) proposed risk-based access control

systems, which is built on fuzzy inferences. The authors shows that fuzzy inference

is a good approach for estimating access risks. Specific problems concerning the

application of fuzzy inference to access control are investigated and solved in their

55

paper. The time required by a fuzzy inference engine to estimate risks may be quite

high especially when there are tens of parameters and hundreds of fuzzy rules.

Kamra et al. [111] proposed a privilege states based access control model more

specifically developed to support fine-grained response actions, such as request

suspension and request tainting, in the context of an anomaly detection system for

databases. In their model author privileges, assigned to a user or role, have a state

attached to them, thereby resulting in a privilege states based access control (PSAC)

system. PSAC has been designed to also take into account role hierarchies that are

often present in the access control models of current DBMSs. They implemented

PSAC in the PostgreSQL DBMS and discussed relevant implementation issues.

Nabeel et al. [110] proposed a novel approach using attribute based group key

management. This approach is an order of magnitude more efficient than the ABE

based approach as ours is based on symmetric key encryption and broadcast group

key management. They utilise a MapReduce framework to improve the performance

of the key generation by generating intermediate keys during the Map phase and

generating the final key during the Reduce phase. The encryption is performed at the

granularity of HDFS (Hadoop Distributed File System) blocks.

Prox-RBAC (Role-based Access Control) was proposed as an extension to consider

the relative proximity of other users with the help of a pervasive monitoring

infrastructure [109]. In this work, the authors presented a more rigorous definition of

proximity based on formal topological relations. In addition to that this concept can

be applied to several additional domains, such as social networks, communication

channels, attributes, and time. Thus, this policy model and language is more flexible

and powerful than the previous work. However, that work offered only an informal

view of proximity, and unnecessarily restricted the domain to spatial concerns.

DBMask is a novel solution that supports fine-grained cryptographically enforced

access control, including column, row and cell level access control, when evaluating

SQL queries on encrypted data [108]. In their solutions, the authors do not require

modifications to the database engine, and thus maximise the reuse of the existing

DBMS infrastructures.

56

Oh et al. [107] proposed an integration model of RABC and ABAC. For this the

authors described the basic concept and limitations of RBAC and ABAC models and

introduced the concept of classifications for tasks. We use task by means of

connection between RBAC and ABAC models. Also we discuss the effect of the

new integration model. An improved access control model for enterprise

environment is examined and a task–role-based access control (T–RBAC) model is

founded on the concept of classification of tasks [107]. T–RBAC deals with each

task differently according to its class, and supports task level access control and

supervision role hierarchy. Demurjian et al. [103] presented a new constraint base

security model to implement distributed database to enhance sensitive data security

of Mandatory Access Control security system (MAC), and increase communication

among each distributed site in current government classified information system and

achieve data replication.

Another most important way to control the access is using lattice based information

flow control [114]. The first article for access control over data streams supports a

very expressive access control model and, at the same time, is, as much as possible,

independent from the target DSMS [115]. The lattice structure is based on a typical

lightweight big data stream. The lattice is designed to control the information flow

and map data from a source lattice to the destination lattice.

2.5 Comparison

Table 2-2 presents the possible threats and attacks of IoT generated big data streams

and the classification is based on the CIA (Confidentiality, Integrity, and

Availability) triad from the previous section. As we have classified the complete

security threats and solutions over big data streams in the CIA triad in Section 4,

here in Table 2-2 we give the pictorial classifications. As seen from Table 2-2,

security threats in big data streams always fall in any properties of CIA triad.

Protocols in [54, 86, 87, 90, 92, 93] focus solely on data confidentiality. Protocols in

[55, 63, 69, 91, 95 - 99] defined the security threats, which falls in the integrity

attacks on big data stream, whereas, protocols in [103 - 115] shows the threats

against stream data availability.

57

Table 2-2: Possible threats of IoT generated big data streams in CIA triad representation Confidentiality

References [54, 86, 87,

90, 92, 93]

Integrity

References [55, 63, 69, 91, 95

- 99]

Availability

References [103 - 115]

Access Authorization

Spoofed, Altered, or Replayed data stream Information

Mandatory Access Control System

Attack Against Privacy

Selective Forwarding Role Based Access Control

Monitor and Eavesdropping

Sinkhole View Based Access Control

Traffic Analysis Sybil Activity based access control

Camouflage Adversaries

Wormholes Attribute based access control

Hello Flood Attacks Proximity based access control

Acknowledgment Spoofing

Encryption based access control

Desynchronisation Privilege State based access control

Time Synchronisation Risk based access control

Eavesdropping (Passive Attacks)

Discretionary access control

Dropping Data Packets Attack

Information flow control model

Selfish Behaviour on Data Forwarding

Table 2-3 shows a comparative evaluation of existing security literature based on the

classification criteria defined in Section 4. Most security models with potential

solutions for big data streams classified in Table 2-3. This table shows various

properties of security have received little attention in the literature, such as

confidentiality, authentication, integrity and availability. We classified the security

threats and solutions of big data stream.

58

Table 2-3: comparison of IoT generated big data stream security threats and solutions according to CIA triad method.

Protocol name Confidentiality Authentication Integrity Availability

SPINS [91] × ×

SPREAD [87] × × ×

Robust Privacy Protection [90]

× × ×

GossiCrypt [92] × × ×

Key Management for Data Confidentiality [93]

×

Traffic Analysis [94]

× × ×

INSENS [11] × ×

Secure Time Synchronisation [99]

× × ×

Routing Security [102]

× × ×

SECURITY IN WSN [100]

× ×

SRPSN [85] × ×

LPR [88] × × ×

Sybil Attack in Sensor Networks [97]

× × ×

MASK [89] × ×

Lightweight and Secure TFTP [78]

× ×

NAPS [71] × ×

MAC [103] × ×

RBAC [104] × × ×

TRBAC [105] × × ×

T–RBAC [107] × × ×

Integration Model of RBAC and ABAC [107]

× × ×

59

DBMask [108] ×

Prox-RBAC [109]

×

encryption-based access control [110]

× ×

PSAC [111] × ×

Risk-based Access Control [112]

× × ×

Discretionary Access Control [13]

× ×

DPBSV [4] Partial ×

DLSeF [5] Partial ×

SEEN ×

2.6 Summary

A glimpse of the IoT may be already visible in current deployments where networks

of smart sensing devices are being interconnected with a wireless medium, and IP-

based standard technologies will be fundamental in providing a common and well

accepted ground for the development and deployment of new IoT applications.

According to the 4Vs features of big data, the current data stream heading towards

the new term as big data stream where sources are the IoT smart sensing devices.

Considering that security may be an enabling factor of many of IoT applications,

mechanisms to secure data stream using data in flow for the IoT will be fundamental.

With such aspects in mind, in the survey we perform an exhaustive analysis on the

security protocols and mechanisms available to protect big data streams on IoT

applications. We also address existing research proposals and challenges providing

opportunities for future research work in the area.

In Table 2-2 we summarise the security threats over big data streams by following

the CIA triad. In Table 2-3 we summarise the main characteristics of the

mechanisms and proposals analysed throughout the survey, together with its security

60

properties and existing solutions to support the CIA triad. In conclusion, we believe

this survey may provide an important contribution to the research community, by

documenting the current status of this important and very dynamic area of research,

helping readers interested in developing new solutions to address security in the

context of IoT generated big data streams.

61

Chapter 3

Security Verification Framework for

Big Sensing Data Streams

From this chapter on, we begin to explore research problems with the solutions on

big sensing data stream security issues. While dealing with big sensing data streams

in sensor networks, a DSM must always perform the security verification (i.e.

authenticity, integrity, and confidentiality) of the data to ensure an end-to-end

security as the medium of communication is untrusted, and malicious attackers

could access and modify the data. Existing technologies for data security verification

are not suitable for data streaming applications, as the verification in real time

introduces a delay in the data stream. This chapter proposes a Dynamic Prime

Number Based Security Verification (DPBSV) scheme for big data streams. This

scheme is based on a common shared key that is updated dynamically by generating

synchronised prime numbers. The common shared key updates at both ends, i.e.

source sensor and DSM, without further communication after handshaking.

Moreover, the proposed security mechanism not only reduces the verification time

or buffer size in DSM, but also strengthens the security of the data by constantly

changing the shared keys.

62

3.1 Introduction

A large number of application scenarios (e.g. telecommunications, network

security, large-scale sensor networks, SCADA) require real-time processing of data

streams, where the application of the traditional “store-and-process” method is

limited [24]. There is an extensive variety of applications for data stream processing

in the cloud (e.g. data from large scale sensors, information monitoring, web

exploring, data from social networks like Twitter and Facebook, surveillance data

analysis, financial data analysis). These applications are described at this very

moment, ongoing, and have a large volume of data input, and consequently require

an alternate ideal model of data processing. As a result, a new computing paradigm

based on Stream Processing Engines (SPEs) has appeared. SPEs deal with the

specific types of challenges and are intended to process data streams with a minimal

delay [23, 25 - 27]. In SPEs, data streams are processed in real time (i.e. on-the-fly)

rather than batch processing after storing the data in the cloud as shown in Figure 3-1.

Several applications such as network monitoring and fraud detection by

surveillance camera are approaching the bottleneck of current data streaming

infrastructures [167]. These applications require real-time processing of very high-

volume and high-velocity data streams (also known as big data streams). A big data

stream is continuous in nature and it is important to perform real-time analysis as the

Figure 3-1: A simplified view of a DSMS to process and analyse input data

stream [23].

63

lifetime of the data is often very short (data is accessed only once) [28 - 29]. As the

volume and velocity of the data is so high, there is not enough space to store and

process; hence, the traditional batch computing model is not suitable. Cloud

computing has become a platform of choice due to its extremely low-latency and

massively parallel processing architecture [30]. It supports the most efficient way to

obtain actionable information from big data streams [28, 31 - 33].

Big data stream processing has become an important research topic in the current

era, whereas the data stream security has received little attention from researchers.

Some of these data streams are analysed and used in very critical applications (e.g.

surveillance data, military applications, etc), where data streams need to be secured in

every aspect to detect malicious activity. The problem is exacerbated when thousands

to millions of small sensors in self-organising wireless networks become the sources

of the data stream. How can we provide security for big data streams? In addition,

compared to conventional store-and-process, these sensors will have limited

processing power, storage, bandwidth, and energy. Furthermore, data streams ought

to be processed on-the-fly in a prescribed sequence. This chapter addresses these

issues by designing an efficient architecture for real-time processing of big sensing

data streams, and the corresponding security scheme.

In order to address the challenge, we have designed and developed a Dynamic

Prime-Number Based Security Verification (DPBSV) scheme. This scheme takes in

to account a typical shared key that is updated dynamically by producing synchronise

prime numbers. The synchronised prime number generation at both source sensing

device and DSM enables reduction of the communication overhead without

compromising security. Due to the reduced communication overhead, this scheme is

suitable for big data streams as it verifies the security on-the-fly (near real time). The

proposed scheme uses smaller key length (64-bit). This enables faster security

processing at DSM without compromising security. The same level of security is

accomplished by changing the key progressively in a specific interval of time.

Dynamic key generation is based on random prime numbers, which initialise and

synchronise at source sensors and DSM without further communications between

them after handshaking. Due to the reduced key length, the scheme is suitable for

processing high volumes of data without any delay. This makes DPBSV highly

efficient at DSM for processing secured big data streams.

64

The remainder of this chapter is organised as follows: preliminary to the chapter

is reviewed in the next section, Section 3.3 provides the research challenges and

research motivations, Section 3.4 describes the DPBSV key exchange scheme,

Section 3.5 presents the security analysis of the scheme formally, Section 3.6

evaluates the performance and efficiency of the scheme through experimental results

and Section 3.7 summarises the contributions in this chapter.

3.2 Preliminaries to the Chapter

One of the security threats is the man-in-the-middle attack, in which a malicious

attacker can access or modify the data stream from sensors. As described from

Introduction section, even symmetric keys fail to meet the requirements of real-time

processing of big sensing data streams. So, there is a need for an efficient scheme for

securing big data streams. The possible types of attacks in big data streams are

attacks on authenticity, confidentiality and integrity. This chapter addresses the

authentication, confidentiality and integrity attacks and proposes a solution to process

efficient security verification of data streams in real-time.

The Data Encryption Standard (DES) has been a standard symmetric key

algorithm since 1977. However, it can be cracked quickly and inexpensively. In

2000, the Advanced Encryption Standard (AES) [31] replaced the DES to meet the

ever-increasing requirements of data security. The Advanced Encryption Standard

(AES), also known as the Rijndael algorithm, is a symmetric block cipher that can

encrypt data blocks of 128 bits using symmetric keys of 128, 192 or 256 bits [38 -

40]. AES was introduced to replace the Triple DES (3DES) algorithm used for a

significant length of time universally. AES was acquainted with supplant the Triple

DES (3DES) algorithm utilised for a significant length of time all around. Hence, we

have compared the proposed solution against AES.

We also assume that deployed source nodes operate in two modes: trusted and

untrusted. In the trusted mode, the nodes operate in a cryptographically secure space

and adversaries cannot penetrate this space. Nodes can incorporate Trusted Platform

Module (TPM) to design trusted mode of operation. The TPM is a dedicated security

chip following the Trust Computing standard specification for cryptographic

65

microcontroller systems [134]. TPM provides a cost effective way of “hardening”

many recently deployed applications, those are previously based on software

encryption algorithms with keys kept on a host’s disk [134]. It provides hardware

based trust, which contains cryptographic functionality like key generation, store, and

management in hardware. The detailed architecture is at [134]. We assume that the

proposed prime number generation procedure Prime (Pi) and secret key calculation

operate in the trusted mode.

The proposed scheme is efficient in comparison to AES, as it reduces the

computational load and execution time significantly compared to the original AES;

furthermore, it also strengthens the security of the data, which is the main research

contribution of this chapter.

3.3 Research Challenges and Research Motivation

This section presents the research challenges and motivations in detail. Here we

have highlighted the challenges towards the proposed approach followed by

motivations to the research problem by following our architectural diagram as


Figure 3-2: Overlay of our architecture from sensing device to cloud data

processing centre.

66

3.3.1 Research challenges

As discussed earlier in this section, a symmetric cryptographic solution is the best

way to protect data in faster processing time. Existing symmetric cryptographic based

security solutions for data security are either static shared key or centralised dynamic

key. In static shared key, we need to have a long key to defend against a potential

attacker. Length of the key is always proportional to security verification time. From

the required features of big data streams specified in the last subsection, it is clear

that security verification should be in real-time. For the dynamic key, centralising

processor rekeying and distributing keys to all the sources is a time consuming

process. A big data stream is always continuous in nature and huge in size. This

makes it impossible to halt data for rekeying, distribution to the sources and

synchronisation with DSM. To address this problem, we are proposing a scheme for

big data stream security verification without the need for key exchange for rekeying.

The additional benefit of this is that it reduces the communication overhead and

increases the efficiency of the security verification process at DSM.

The common problem in the data flow between sensors and DSM is that attackers

may read the data in the middle while still in transit. Existing solutions to this

problem are based on the symmetric key algorithm. The periodic key update message

in symmetric key algorithms may disclose secret information, which may result in an

intruder getting information about the encryption process. Even when a nonce is used

at the periodic packet, an intruder always comes to know when the server is going to

change the key. This increases the chances of attacks in future. In this proposed

scheme, key exchanges happen only once as described before, but the shared key is

updated periodically with equal time intervals. Synchronisation between a source and

DSM is important in dynamic symmetric key update; otherwise it will result in

unsuccessful security verification.

Buffer size for the security verification is another major issue because of the

volume and velocity of big data streams. According to the features of big data

streams (i.e. 4Vs) we cannot halt the data for more time before security verification.

This leads to appointing a bigger buffer size and may reduce the performance of

SPEs. So buffer size reduction is one of the major challenges for big data streams.

67

Proposed security solutions for big data streams should deal with reduction of buffer

size (smaller buffer size).

The proposed scheme is as follows: we use a common shared key for both sensors

and DSM. The key is updated dynamically by generating synchronised prime

numbers without having further communication between them. This reduces the

communication overhead, required by rekeying in existing methods, without

compromising security. Due to the reduced communication overhead, this scheme

performs the security verification with minimum delay and reduced buffer usage. The

communication is required at the beginning for the initial key establishment and

synchronisation because DSM sends all the keys and key generation properties to the

sources in this step. There will not be further communication between the source

sensor and DSM after handshaking, which increases the efficiency of the solution.

Based on the shared key properties, individual source sensors update their dynamic

key independently.

3.3.2 Research motivation

The four most important features of the big data stream from the point of view of

security verification:

1. Security verification needs to be performed in near real time (on-the-fly).

2. Verification framework has to deal with high volume and high velocity data.

3. Data items can be read once in the prescribed sequence.

4. Unlike the store-and-process paradigm, original data is not available for

comparisons in the context of the stream processing paradigm.

68

In light of the above features and properties of big data streams, we classified

existing security systems into two classes: Communication Security [10 - 12] and

Server side data security [13 – 16]. Communication security deals with data security

when it is in motion and server side security deals with data security when it is at rest.

The security threats and solutions proposed in the literature outlined in the following

section are either dealing with the data stored at the server/cloud or the data flow.

They are not suitable to use in big data streams for the following reasons.

Communication security is primarily proposed for network communication and

communication related attacks are broadly divided into two types i.e. external and

internal. To avoid such attacks, security solutions have been proposed for every

individual TCP/IP layer. Several security solutions exist to avoid these

communication threats but are not suitable according to the properties of big data

stream stated above. Server side data security is basically proposed for physical data

centres, when data is at rest and accessed through applications. There are several

proposed security solutions to overcome from server side data security. Those

proposed solutions are suitable for store-and-process, however they are not plausible

for big data streams.

Another major motivation is to perform the security verification on near real time

in order to synchronise with the processing speed of SPEs [82]. Stream data analysis

performance should not degrade because of security processing time, there are

Figure 3-3: Pair of dynamic relative prime number generation, one at DSM, and another in distributed sensor node, are maintained with a standard time interval. Information is communicated from the sensors to the DSM only if encrypted with the Pi based secret key.

69

several applications need to perform data analysis on real time. According to the

features of big data streams, existing security solutions needs huge buffer size to

process security verification. It is simply impossible to maintain such big buffers for

data streams because of the continuous nature of data. So a lightweight security

mechanism is very important to perform security verification.

Table 3-1. Notations

Acronym Description

ith Sensor’s ID.

ith Sensor’s Secret key.

ith Sensor’s Session Key.

Generated key for the authentication.

Secret key calculated by the sensor and DSM.

Encrypted with sensor’s secret key for user authentication.

Calculated hash value.

Pseudorandom number generated by the sensors.

Interval time to generate the prime number.

Random prime number.

Secret key of the DSM.

k Initial shared key for sensor and DSM for authentication.

j Integrity checking interval.

Encrypted data for integrity check.

Secret key for authenticity check.

Encryption function.

One-way hash function.

Random prime number generation function.

KeyGen Key generation procedure.

Bitwise X-OR operation.

Concatenation operation.

Fresh data at sensor before encryption.

Retrieve key from DSM database by knowing specific source.

Randomly generate the key.

70

3.4 Dynamic Prime-Number Based Security Verification

This section describes the DPBSV scheme for big sensing data streams. Similar to

any secret key based symmetric key cryptography, the proposed DPBSV scheme

consists of four independent components: system setup, handshaking, rekeying, and

security verification. Table 3-1 provides the notations used in describing the scheme.

We next describe the security scheme in detail.

3.4.1 DPBSV System Setup

We have made various sensible and practical assumptions while characterizing the

security scheme. First, we assume that DSM has all deployed sensors’ identities (IDs)

and secret keys at the time of deployment because the network is fully untrusted. We

increase the number of key exchanges between the sensors and DSM for the starting

session key establishment process to accomplish better security. The main aim is to

make this session more secure because we transmit all the secret information of

KeyGen to individual source sensors. Second, we assume that each sensor node Si

knows the identity of its DSM and both maintain the same secret key i.e. k for initial

authentication process.

Step 1:

A sensor (Si) generates a pseudorandom number r and sends it to the DSM in

association with its own identity as {Si, r}. There are n numbers of sensors deployed

in the area such as S1, S2, S3, ..., Sn and Si is denoted as the id of ith sensor. In this

security scheme, sensors never communicate between each other to reduce the

communication overhead. It also updates the dynamic shared key on both ends to

prevent potential attacks or key compromise from traffic behaviour analysis. Initially

both sensors and DSM maintain a secret key i.e. k for the authentication process.

1) Si → DSM: {Si, r}.

Step 2:

When the DSM receives the request from a sensor, it retrieves the corresponding

sensor’s secret key, i.e., , DSM selects a random session key

. In order to share this with corresponding senor (Si), DSM generates a key based

71

on a selected session key and the corresponding sensor’s private key i.e.

. Then DSM encrypts the generated key with the session key

and it performs the hash function to generate C, i.e.

. Finally, DSM sends the value of C and to Si. The complete computational

steps are listed as follows.

,

, by using random selected session key

(3-1)

2) Si ← DSM: { , }

Step 3:

The corresponding sensor gets the authentication packet from DSM and starts

calculating its session key from based on its own secret key i.e.

. The sensor finds out the value of based on the value of and , i.e.

by using the initial secret key k. Then it gets the hash

from Equation 3-1 and checks whether or not it is equal to C. If the hashes are equal

and , Si can authenticate DSM. However, if it is not equal, then Si ends the

protocol. Following the authentication, it transmits to

DSM as follows by identifying sensor failure to authenticate the DSM.

, to extract the session key for own.

, to authenticate the DSM

(3-2)

3) Si → DSM: { }.

Step 4:

DSM receives , DSM compares it with , which is computed

from Equation 3-2 to see whether or not they are equal. If yes, DSM authenticates Si.

Otherwise, the protocol is terminated. After authentication of both parties, the DSM

and sensors can share the session key and .

72

(3-3)

4) Si ← DSM: { }

3.4.2 DPBSV Handshaking

The DSM sends its all properties to sensors (S1, S2, S2, …, Sn) based on their

individual session key. Generally, the larger the prime number of secret shares used

in the pairwise key establishment process, the better security the pairwise key will

achieve. However, using a larger prime number for the secret shares requires a

greater computation time. In order to make the security verification lighter and faster,

we reduce the prime number size.

The dynamic prime number generation function is defined in Theorem 2 later. We

calculate the prime number on both source and DSM sides to reduce communication

overhead and minimize the chances of disclosing the shared key.

Step 5:

computes the relative prime number on both sides with a time interval

t. In the handshaking process, DSM transmits all its procedures to generate the key

and prime number like to individual sensors

by encrypting with the initial shared key (k).

5) Si ← DSM: { }

In this step, DSM sends all the parameters and properties of KeyGen to source

sensors. All of this transferred information is stored in trusted parts of sensors (e.g.

TPM).

3.4.3 DPBSV Rekeying

We propose a novel rekeying concept by calculating prime numbers dynamically

on both source sensors and DSM. Figure 3-3 shows the synchronisation of the shared

key. In this security scheme, a smaller size of the key makes the security verification

faster. But we change the key very frequently in the DPBSV rekeying process to

ensure that the protocol remains secure. If any types of damage happen at the source,

the corresponding sensor is desynchronised with DSM. The source sensor follows

73

Step 3 to reinitialise and synchronise with DSM. According to our assumption, we

store all the secret information at a trusted part of the sensor. So the sensor can

reinitialise the synchronisation by sending its own identity to DSM. Once DSM

authenticates the source sensor, it sends the current key and time of key generation.

Authenticated sensors can update the next key by using the key generation process

from a secure module of the sensor (TPM).

Rekeying is often accomplished by running initial exchanges all over again. The

following presents an alternative approach to rekeying and the corresponding

analysis in terms of efficiency.

Step 6:

The above defined DPBSV Handshaking process makes sensors aware about the

Prime (Pi) and KeyGen. We now describe the complete secure data transmission and

verification process using those functions and keys. As mentioned above, this

security scheme uses the synchronised dynamic prime number generation Prime (Pi)

on both sides, i.e. sensors and DSM as shown in Figure 3-3. At the end of the

handshaking process, sensors have their own secret keys, initial prime number and

initial shared key generated by the DSM. The next prime generation process is based

on the current prime number and the given time interval. Sensors generate the shared

key using the prime number and DSM secret key . Each

data block is associated with the authentication tag and contains two different parts.

One is encrypted DATA based on its secret key and shared key for integrity

checking (i.e., ), and the other part is for the authenticity

checking (i.e., ). The resulting data block is:

. The key generation and individual block encryption process listed are

as follows.

(3-4)

6) Si → DSM: { }.

74

3.4.4 DPBSV Security Verification

Security verification should be performed in real time (with minimal delay) based

on the features of big data streams stated above. In the following step, we perform the

security verification of the proposed scheme. In this step, DSM verifies for

authenticity in each individual data block and for integrity in specific selected data

blocks. The aim is to maintain the end-to-end security of the proposed scheme.

Step 7:

The DSM verifies whether the data is modified or comes from an authenticated

node. As DSM has the common initial shared key, it decrypts the complete block to

find out the individual data blocks for the integrity and authenticity check. The DSM

first checks for the authenticity in each data block and checks for the integrity

with random interval data blocks . This random value is calculated based on the

corresponding prime number i.e. . The calculated values vary from 0 to 6,

i.e. the maximum interval of 6 blocks and if the value of j is 0, then it will not skip

any data block. For the authenticity check, the DSM decrypts with shared

key . Once Si is obtained, the DSM checks its source database and

extracts the corresponding secret key for the integrity check according to the value

of j. Given , the DSM calculates/decrypts data and checks MAC for integrity

check . All the security verification process happens based

on shared key from Equation 3-4.

The complete mechanism beginning from source sensing device and DSM

authentication to handshaking, security verification mentioned in algorithmic format

is shown in Algorithm 3-1. Algorithm 3-1 represents the description of the proposed

mechanism in the stepwise process.

Algorithm 3-1. Security Framework for Big Sensing Data Stream

Description Based on the dynamic prime number generation at both source

sensor and DSM end, the proposed security framework of big

sensing data streams works more efficiently without

75

compromising security strength.

Input Process inputs the prime generation process , key

generation process , sensor and DSM secret key and

session key for handshaking.

Output Successful security verification without any malicious attack

and comparatively faster security verification than standard

symmetric key solution (AES).

Step 1 DPBSV System setup

1.1 Si → DSM: {Si, r}, ith sensor sends its random number with its identity 1.2 Si ← DSM: { , }, DSM identifies the sensor and generates the

session key for it. Then DSM encrypts and sends back to the ith sensor 1.3 Si → DSM: { }, ith sensor identifies the DSM based on its own secret

key. If sender is not authenticated then it starts authentication transaction.

1.4 Si ← DSM: { } DSM authenticates the last transaction and sends back to ith sensor with this format. Otherwise protocol terminates to start the new process.

Step 2 DPBSV Handshaking

DSM sends its properties to individual sensors based on their individual session key. It includes the prime number generation and time interval to generation etc.

2.1 DSM ← Si: { ( )}, for details refer to Table 3-1.

Step 3 DPBSV Rekeying

Key updates on both source sensor and DSM and they are aware of the Prime (Pi) and KeyGen. Sensors generate the shared key

and each data block is associated with two different parts. One is encrypted i.e., and another for authenticity checking i.e., .

3.1 Si → DSM: { }, these blocks for authentication, integration, and confidentiality checks.

Step 4 DPBSV Security Verification

The DSM checks for authenticity in each data block and checks for the integrity with random interval data blocks and random value is calculated based on the corresponding prime number i.e. .

4.1 For the authenticity check, the DSM gets source ID. Once Si is obtained, the DSM checks source database and extracts corresponding secret key

for the integrity check according to the value of j. 4.2

Given , the DSM calculates/decrypts data and checks MAC for integrity check.

76

3.5 Security Analysis

This section provides theoretical analysis of the security scheme to show that it is

safe against attacks on authenticity, confidentiality and integrity.

3.5.1 Security Proof

Assumption 1: Any participant in the scheme cannot decrypt data that was

encrypted by a symmetric-key algorithm, unless it has the session/shared key which

was used to encrypt the data at the source side.

Assumption 2: As DSM is located at the cloud server side, we assume that DSM is

fully trusted and no one can attack it.

Assumption 3: Sensor’s secret key, Prime (Pi) and secret key calculation procedures

reside inside trusted parts of the sensor (like TPM) so that they are not available to

intruders.

Similar to most cryptological analyses to public-key communication protocols, we

now define the attack models for the purpose of verifying the authenticity,

confidentiality and integrity.

Definition 1 (attack on authentication): A malicious attacker Ma is an adversary who

is capable of monitoring, intercepting, and introducing itself as an authenticated

source node to send data in the data stream.

Definition 2 (attack on confidentiality): A malicious attacker Mc is an unauthorised

party who has the ability to access or view the unauthorised data stream before it

reaches DSM.

Definition 3 (attack on integrity): A malicious attacker Mi attacks integrity, and is an

adversary capable of monitoring the data stream regularly and trying to access and

modify the data blocks before they reach DSM.

Theorem 1: The security is not compromised by reducing the size of shared key

( ).

77

Proof: We reduce the size of the prime number to make the key generation process

faster and more efficient. The ECRYPT II recommendations on key length say that a

128-bit symmetric key provides the same strength of protection as a 3,248-bit

asymmetric key. Low length of key also provides more security in a symmetric key

algorithm because it is never shared publicly. Advanced processor (Intel i7

Processor) took about 1.7 nanoseconds to try out one key from one block. With this

speed it would take about 1.3 × 1012 × the age of the universe to check all the keys

from the possible key set [35]. By reducing the size of the prime number, we fixed

the key length to 64-bit to make the security verification faster at DSM using the

data from Table 3-2. From Table 3-2, a 64-bit symmetric key takes 3136e +19

nanoseconds (more than a month), so we fixed interval time to generate prime

number as a week (i.e. t=168 hours). The dynamic shared key calculates based on

the calculated prime number. Based on this calculation, we conclude that an attacker

cannot calculate within the interval time t. we are changing the shared key without

exchanging information between the sensors and DSM. Brute-force attack may be

able to get the shared key once an intruder has the key length, but this possibility is

also associated with 128-bit cryptographic solution. It confuses malicious nodes that

are listening to the data flow continuously. The key has already been changed four

times before an attacker knows the key and this knowledge is not known to the

attackers. This leads to the conclusion that even if we reduced the key size to 64 bit,

we get the same security strength by changing the key in time interval t.

Table 3-2: Notations Symmetric key (AES) algorithm time taken to get all

possible keys using most advanced Intel i7 Processor.

Key Length 8 16 32 64 128

Key domain

size

256 65536 4.295e+09 1.845e +19 3.4028e+38

Time (in

nanoseconds)

1435.2 1e+05 7.301e+09 3136e +19 5.7848e+35

Theorem 2: Relative prime number Pi calculated in Algorithm 3-2 synchronises

between the source sensors (Si) and DSM.

78

Proof: The normal method to check the prime number is 6k+1, k N+ (an integer).

Here, we initially initialise the value of k based on this primary test formula. Our

prime generation method is based on this concept and from the extended idea of

[117]. In this security scheme, the input Pi is the currently used prime number

(initialised by DSM) and the return Pi is the calculated new prime number. Intially Pi

is intialized by the DSM at DPBSV Handshaking process and the interval time is t

seconds.

Algorithm 3-2. Dynamic Prime Number Generation Prime ( )

1. 2. Set 3. Set 4. If then 5. 6. GO TO: 3 7. If S then 8. GO TO: 14 9. Set 10. If S then 11. GO TO: 14 12. 13. GO TO: 3 14. 15. Return ( ) // calculated new prime number

From Algorithm 3-2, we calculate the new prime number based on the previous

one . The complete process of the prime number calculation is based on the value

of m and m is initialised from the value k. The value of k is constant at source

because it is calculated from the current prime number. This process is initialised

during DPBSV Handshaking. Since the value of k is the same on both sides, the

procedure Prime (Pi) returns identical values. In Algorithm 3-2, the value of S(m) is

computed as follows.

79

(3-5)

If from equation 3-5 then x is prime, otherwise x is not a prime.

The following procedure validates the above features

if x is prime

Then put the value of x as a prime number, then

k within the specified range i.e , then

(3-6)

Same is also 1 as shown in Equation 3-6 and then

(3-7)

Hence, the property of is proved.

Theorem 3: An attacker Ma cannot read the secret information from sensor node (Si)

or introduce itself as an authenticated node in DPBSV.

Proof: Following Definition 1, we know that an attacker Ma can gain access to the

shared key by monitoring the network thoroughly, but Ma cannot get secret

information such as Prime (Pi) and KeyGen. Considering the computational

hardness of secure modules (such as TPM), we know that Ma cannot get the secret

information for Pi generation, Ki and KeyGen. So there are no possibilities for the

malicious node to trap a sensor and process according to it, but Ma can introduce

him/herself as the authenticated node to send its information. In this security scheme,

sensor (Si) sends , where the second part of

the data block is used for the authentication check. DSM decrypts this

80

part of the data block for authentication checks. DSM retrieves Si after decryption

and matches corresponding Si within its database. If the calculated Si matches with

the DSM database, it accepts; otherwise it rejects the node as a source and it is not

an authenticated sensor node. All required secured information for prime number

and key generation procedure is stored at trusted parts of the sensor node (i.e. TPM).

According to the features of TPM, an attacker cannot get the information from TPM

as discussed before. Hence we conclude that attacker Ma cannot attack big data

streams.

The proposed scheme avoids or drops the data blocks which are from malicious

sources with minimum computation time by process only during

authentication. The proposed mechanism also able to avoid DDoS attack.

Theorem 4: An attacker Mc cannot access or view the unauthorised data stream in

the proposed DPBSV.

Proof: Following Algorithm 3-2, we found prime number is generated at

sensors and DSM dynamically without any further communication. Shared secret

key calculates based on the generated prime number. Considering the

computational hardness of secure modules (such as TPM), we know that Mc cannot

get the secret information for Pi generation, Ki and KeyGen within the time frame.

Following Definition 2, we know that an attacker Mc can gain access to the shared

key but no other information. In this security scheme, source sensor (Si) sends

data blocks in the format , where the first part

of the data block contains the original data. Getting the

original data is impossible from this because Mc does not have other

information and at the same time shared key updates dynamically at equal

intervals of time (t). As the data is protected and cannot be read within the time

frame (i.e. before update of the shared key),

Theorem 5: An attacker Mi cannot read the shared key within the time interval t

in DPBSV scheme.

Proof: Following Definition 3, we know that an attacker Mi has full access to the

network to read the shared key , but Mi cannot get correct secret information

such as KSH. Considering the method described in Theorem 1, we know that Mi

81

cannot get the currently used KSH within the time interval t, because the proposed

scheme calculates Pi randomly after time t and then uses the value Pi sensor to

generate KSH. For more details on computation analysis, refer to Theorem 1.

3.5.2 Forward Secrecy

As with other symmetric key procedures, shared keys used for encrypting

communications are only used for certain periods of time (t) until the new prime

number is generated. Thus, a previously used shared key or secret keying material is

worthless to a malicious opponent even when a previously-used secret key is known

to the attackers. This is one of the major advantages of frequent changing of the

shared key. This is one of the reasons we did not choose symmetric key cryptography

or an asymmetric-key encryption algorithm.

3.6 Experiment and Evaluation

The proposed DPBSV security scheme is generic though it is deployed in big

sensing data streams in this chapter. In order to evaluate the efficiency and

effectiveness of the proposed architecture and protocol, even under adverse

conditions, we observe each individual data blocks for authentication checks and

selected data blocks for integrity attacks. The integrity attack verification interval is

dynamic in nature and the data verification is done at the DSM only.

To validate the proposed security scheme, we experimented in multiple

simulation environments to validate that the security mechanism works perfectly in

(a) (b)

Figure 3-4: The sensors used for experiment (a) Z1 low power sensor. (b)

TmoteSky ultra low power sensor.

82

big sensing data streams. We first measured the performance of sensor nodes using

COOJA in Contiki OS [118], then verified the security scheme using Scyther [119],

and finally measured the efficiency of the scheme using JCE (Java Cryptographic

Environment) [120]. We also checked the minimum buffer size required to process

this proposed scheme and compared with the standard AES algorithm using Matlab

[121].

3.6.1 Sensor Node Performance

We experimented with the performance of the sensor in COOJA simulator in

Contiki OS. We took the two most common types of sensor, i.e. Z1 and TmoteSky

sensors, for our experiment and performance checking as shown in Figure 3-4. In

this experiment, we checked the performance of sensors while computing or

updating the shared key.

Z1 sensor nodes are produced by Zolertia, which is a low-power WSN module

that is designed as a general purpose development platform for WSN researchers. It

is designed for maximum backwards compatibility with the successful Tmote like

family motes while improving the performance and maximum flexibility and

expandability with regards to any combination of power-supplies, sensors and

connectors. It supports the open source operating systems currently most employed

by the WSN community, like Contiki [118]. COOJA is a network simulator for

Contiki, which provides real time sensor node features to simulate.

A Z1 sensor node is equipped with the low power microcontroller MSP430F2617,

which features a powerful 16-bit RISC CPU @16MHz clock speed, built-in clock

factory calibration, 8KB RAM and a 92KB Flash memory. Z1 hardware selection

guarantees maximum efficiency and robustness with low energy cost. As TmoteSky

is an ultra-low power sensor, it is equipped with the low power microcontroller

MSP430F1611, which has built-in clock factory calibration, 10KB RAM and a

48KB Flash memory.

We successfully demonstrated in the COOJA Simulator that our key generation

process works successfully in both types of sensors i.e. Z1 sensor and TmoteSky

sensor. These sensors support the security mechanism. The energy consumption

83

during the key generation process is shown in Figure 3-5. This shows the normal

power consumption behaviour for the key generation process. From this experiment

we conclude that this proposed security verification mechanism DPBSV is

supported by most common types of sensors and feasible for big sensing data

streams.

3.6.2 Security Verification

The scheme is written in the Scyther simulation environment using Security

Protocol Description Language (.spdl). According to the features of Scyther, we

define the role of D and S, where S is the sender (i.e. sensor nodes) and D is the

recipient (i.e. DSM). In our scenario, D and S have all the required information that

is exchanged during the handshake process. This enables D and S to update their

own shared key. S sends the data packets to D and D performs the security

verification. In the simulation, we introduce two types of attacks. The first type of

attack is defined for the transmission between S and D (integrity) and the second

attack is defined where an adversary acquires the property of S and sends the attack

data packets to D (authentication). In this experiment, we evaluated all packets at D

(DSM) for security verification. We experimented with 100 runs for each claim (also

known as bounds) and found out the number of attacks at D as shown in Figure 3-6.

Apart from these, we follow the default properties of Scyther.

Attack model: Many types of cryptographic attacks can be considered. In this

case, we focus on integrity attacks, confidentiality attacks and authentication attacks

Figure 3-5: Estimated power consumption during the key generation process

84

as discussed above. In integrity attacks, an attacker can only observe encrypted data

blocks/packets travelling on the network that contain information about sensed data

as shown in Figure 3-2. The attacker can perform a brute force attack on captured

packets by systematically testing every possible key, and we assumed that he/she is

able to determine when the attack is successful. In confidentiality attack, the attacker

continuously observes the data flow and tries to read the data. In authentication

attacks, an attacker can observe a source node, and try to get the behaviour of the

source node. We assume that he/she is able to determine the source node’s

behaviour. In such cases, the attacker can introduce an authenticated node and act as

the original source node. In our concept, we are using trusted modules in sensors to

store the secret information and procedure for key generation and encryption (such

as TPM).

Experiment model: In practice, attacks may be more sophisticated and efficient

than brute force attacks. However, this does not affect the validity of the proposed

DPBSV scheme as we are interested in efficient security verification without

periodic key exchanges and successful attacks. Here, we model the process as

described in the previous section and fixed the key size at 64 bits (see Table 3-2).

We used Scyther, an automatic security protocol verification tool, to verify the

proposed mechanism.

Figure 3-6: Scyther simulation environment with parameters and result page

of successful security verification at DSM.

85

Results: We did the simulation using variable numbers of data blocks in each run.

The experiment ranges from 100 to 1000 instances with 100 intervals. We check

authentication for each data block, whereas the integrity check is performed on

selected data blocks. As the secure information such as

are stored within the trusted module of the sensor, no one can get access to

that information except the corresponding sensor. Without this information,

attackers cannot authenticate encrypted data blocks. Hence, we did not find any

attacks for authentication checks. For integrity attacks, it is hard to get the shared

key ( ), as we are frequently changing the shared key ( ) based on the dynamic

prime number on both source sensor ( ) and DSM. In the experiment, we did not

encounter any attack in integrity check. As the shared key is changing with time

interval t, an attacker cannot read a data stream within the time interval which leads

to the conclusion that the proposed mechanism provides weak confidentiality.

Figure 3-6 shows the result of security verification experiments in the Scyther

environment. This shows that the scheme is secured from integrity and

authentication attacks even after reducing key size. As we are updating the rekey

process at equal intervals of time, we found this security scheme is secured with 64

Figure 3-7: Performance of the security scheme compared in efficiency to 128

bit AES and 256 bit AES.

86

bit key length. From the observations above, we can conclude that the proposed

scheme is secure.

3.6.3 Performance Comparison

Experiment model: It is clear that the actual efficiency improvement brought by

this scheme highly depends on the size of key and rekeying without further

communication between sensor and DSM. We have performed experiments with

different sizes of data blocks. The results of the experiments are given below.

We compare the performance of the proposed scheme DPBSV with advanced

encryption standard (AES), the standard symmetric key encryption algorithm [38 -

39]. The scheme efficiency is compared with two standard symmetric key

algorithms such as 128-bit AES and 256-bit AES. This performance comparison

experiment was carried out in JCE (Java Cryptographic Environment), and we

compared the processing time with different data block size. This comparison is

based on the features of JCE in Java virtual machine version 1.6 64 bit. JCE is the

standard extension to the Java platform which provides a framework implementation

for cryptographic methods. We experimented with many-to-one communication. All

sensor nodes communicate to the single node (DSM). All sensors have similar

properties whereas the destination node has the properties of DSM (more powerful

to initialise the process). The rekey process is executed at all the nodes without any

intercommunication. Processing time of data verification is measured at the DSM

node. The experimental results are shown in Figure 3-7; the result validates the

theoretical analysis presented in Section 5.

Results: The performance of the scheme is better than the standard AES

algorithm when different sizes of data blocks are considered. Figure 3-7 shows the

processing time of the proposed DPBSV scheme in comparison with base 128-bit

AES and 256-bit AES for different sizes of the data block. The performance

comparison shows that the proposed scheme is efficient and faster than the baseline

AES protocols.

We calculated the time taken for DPBSV encryption and decryption in AMD K7-

700 MHz processor and compare with standard AES-128 bit algorithm [122]. Based

87

on the calculation DPBSV takes 3.2 microseconds and AES (128-bit) 35.8

microseconds for encryption, whereas DPBSV takes 3.3 microseconds and AES

(128-bit) 36 microseconds for decryption.

3.6.4 Required Buffer Size

Experiment model: We experimented with the features of the DSM buffer by

using MATLAB as the simulation tool [121]. This performance is based on the

processing time performance calculated in Figure 3-7. Here we compared this

security scheme with standard 128-bit AES and 256-bit AES as the processing time

performance comparison. The minimum buffer size required to process security

verification at DSM with various data rate starts from 50 to 200 MB/S with 50

interval. Performance comparison was used to measure the efficiency of the

proposed scheme (DPBSV).

Results: The performance of the scheme is better than the standard AES

algorithm with different rates of data. Figure 3-8 shows the minimum buffer size

required to process security at DSM and proposed DPBSV scheme performance

compared with base 128-bit AES and 256-bit AES. The performance comparison

Figure 3-8: Performance comparison of minimum buffer size required to

process the security verification with various data rates to DSM.

88

shows that the proposed scheme is efficient and required less buffer to process

security than the baseline AES protocols.

From the above experiments, we conclude that the proposed DPBSV scheme is

secured (from authenticity, confidentiality and integrity attacks), and efficient

(compare to standard symmetric algorithms such as 128-bit AES and 256-bit AES).

The proposed scheme also needs less buffer to process the security verification.

3.7 Summary

This chapter discussed a novel authenticated key exchange scheme, namely

Dynamic Prime-Number Based Security Verification (DPBSV), which aims to

provide an efficient and fast (on-the-fly) security verification scheme for big data

streams. The scheme has been designed based on symmetric key cryptography and

random prime number generation. By theoretical analyses and experimental

evaluations, we showed that the DPBSV scheme has provided significant

improvement in processing time, required less buffer for processing and prevented

malicious attacks on authenticity, confidentiality and integrity. In this security

method, we decrease the communication and computation overhead by dynamic key

initialisation at both sensor and DSM end, which in effect eliminates the need for

rekeying and decreases the communication overhead. DSM implements before

stream data processing as shown in the main architecture diagram. Several

applications (e.g. emergency management and event detection) need to discard

unwanted data and get original data for stream data analysis. The proposed security

verification scheme (i.e. DPBSV) performs in near real time to synchronise with the

performance of the stream processing engine. The main aim is not to degrade the

performance of stream processing such as Hadoop, S4, and Spark etc.

89

Chapter 4

Lightweight Security Protocol for Big


Chapter 3 solved the first important step of security verification in big sensing data

streams. The important step is to make the security verification model more

lightweight to satisfy the properties of big data streams. We refer to this as an online

security verification problem. To address this problem, we propose a Dynamic Key

Length Based Security Framework (DLSeF) based on a shared key derived from

synchronised prime numbers; the key is dynamically updated at short intervals to

thwart potential attacks to ensure end-to-end security. Theoretical analyses and

experimental results of the DLSeF framework show that it can significantly improve

the efficiency of processing stream data by reducing the security verification time

and buffer usage without compromising security.

4.1 Introduction

A variety of applications, such as emergency management, SCADA (Supervisory

Control and Data Acquisition), remote health monitoring, telecommunication fraud

detection and large scale sensing networks, require real-time processing of data

90

streams, where the traditional store-and-process method falls short of the challenge

[24]. These applications have been characterized as producing high speed, real-time,

sensitive and large volume data input, and therefore require a new paradigm of data

processing. The data in these applications falls in the big data category, as its size is

beyond the ability of typical database software tools and applications to capture,

store, manage and analyse in real time [123]. More formally, the characteristics of big

data are defined by “4Vs” [124 - 125]: Volume, Velocity, Variety, and Veracity; the

streaming data from a sensing source meets these characteristics. This chapter

focuses on providing end-to-end security for real-time high volume, high velocity

data streams.

A big data stream is continuous in nature and it is critical to perform real-time

analysis as: (i) the lifetime of the data is often very short (i.e. the data can be accessed

only once) [28 - 29] and (ii) the data is utilised for detecting events (e.g. flooding of

highways, collapse of railway bridge) in real-time in many risk-critical applications

(e.g. emergency management). Since a big data stream in risk-critical applications

has high volume and velocity and the processing has to be done in real-time, it is not

economically viable and practically feasible to store and then process (as done in the

traditional batch computing model). Hence, stream processing engines (e.g. Spark,

Storm, S4) have emerged in the recent past that have the capability to undertake real-

time big data processing. Stream processing engines offer two significant advantages.

Firstly, they circumvent the need to store large volumes of data and secondly, they

enable real-time computation over data as needed by emerging applications such as

emergency management and industrial control systems. Further, integration of stream

processing engines with elastic cloud computing resources has further revolutionized

big data stream computation as stream processing engines can now be easily scaled

[28, 31, 33] in response to changing volume and velocity.

Although stream data processing has been studied in recent years within the

database research community, the focus has been on query processing [126],

distribution [127] and data integration. Data security related issues, however, have

been largely ignored. Many emerging risk-critical applications, as discussed above,

need to process big streaming data while ensuring end-to-end security. For example,

consider emergency management applications that collect soil, weather, and water

data through field sensing devices. Data from these sensing devices are processed in

91

real-time to detect emergency events such as sudden flooding, and landslides on

railways and highways. In these applications, compromised data can lead to wrong

decisions and in some cases even loss of lives and critical public infrastructure.

Hence, the problem is how to ensure end-to-end security (i.e. confidentiality,

integrity, and authenticity) of such data streams in near real-time processing. We

refer to this as an online security verification problem.

The problem in processing big data becomes extremely challenging when millions

of small sensors in self-organising wireless networks are streaming data through

intermediaries to the data stream manager. In these cases, intermediaries as well as

the sensors are prone to different kinds of security attacks such as Man in the Middle

Attacks. In addition, these sensors have limited processing power, storage, and

energy; hence, there is a requirement to develop lightweight security verification

schemes. Furthermore, data streams need to be processed on-the-fly in the correct

sequence. This chapter addresses these issues by designing an efficient model for

online security verification of big data streams.

The most common approach for ensuring data security is to apply cryptographic

methods. In the literature, the two most common types of cryptographic encryption

methods are asymmetric and symmetric key encryption. Asymmetric key encryption

(e.g. RSA, ElGamal, DSS, YAK, Rabin) performs a number of exponential

operations over a large finite field and is therefore 1000 times slower than symmetric

key cryptography [34 - 35]. Hence, efficiency becomes an issue if an asymmetric key

such as Public Key Infrastructure (PKI) [37] is applied to securing big data streams.

Thus, symmetric key encryption is the most efficient cryptographic solution for such

applications. However, existing symmetric key methods (e.g. DES, AES, IDEA,

RC4) fail to meet the requirements of real-time security verification of big data

streams because the volume and velocity of a big data stream is very high (refer to

the performance evaluation section for the performance values). Hence, there is a

need to develop an efficient and scalable model for performing security verification

of big data streams. The main contributions of this chapter can be summarised as

follows:

We have designed and developed a Dynamic Key Length Based Secure

Framework (DLSeF) to provide end-to-end security for big data stream

processing. This model is based on a common shared key that is generated by

92

exploiting synchronised prime numbers. The proposed method avoids

excessive communication between data sources and Data Stream Manager

(DSM) for the rekey process. Hence, this leads to reduction in the overall

communication overhead. Due to this reduced communication overhead, this

model is able to do security verification on-the-fly (with minimum delay)

with minimal computational overhead.

The proposed model adopts a moving target approach, using a dynamic key

length from the set 128-bit, 64-bit, and 32-bit. This enables faster security

verification at DSM without compromising security. Hence, this model is

suitable for processing high volumes of data without any delay.

We compare the proposed model with the standard symmetric key solution

(AES) in order to evaluate the relative computational efficiency. The results

show that this security model performs better than the standard AES method.

The remainder of this chapter is organised as follows: preliminary to the chapter

is reviewed in the next section, Section 4.3 provides the research challenges and

research motivations, Section 4.4 describes the DPBSV key exchange scheme,

Section 4.5 presents the security analysis of this model formally, Section 4.6

evaluates the performance and efficiency of this model through experimental results

and Section 4.7 summarises the contributions in this chapter.

4.2 Preliminaries to the Chapter

Figure 4-1 shows the overall architecture for big data stream processing from

source sensing devices to the data processing centre including the proposed security

framework. Refer to [128] for further information on stream data processing in

datacentre clouds. In sensor networks, data packets from the sources are transmitted

to the sink (data collector) through multiple intermediary hops (e.g. routers and

gateways). Collected data at sink nodes are then forwarded to the DSM as data

streams may also pass through many untrusted intermediaries. The number of hops

and intermediaries depends on the network architecture designed for a particular

application. The intermediaries in the network may behave as a malicious attacker by

93

modifying and/or dropping data packets. Hence, traditional communication security

techniques [52, 91, 129] are not sufficient to provide end-to-end security. In this

framework, both queries and data security related techniques are handled by DSM in

coordination with the on-field deployed sensing devices. It is important to note that

the security verification of streaming data has to be performed before the query

processing phase and in near real-time (with minimal delay) with a fixed (small)

buffer size. The processed data are stored in the big data storage system supported by

cloud infrastructure [30]. Queries used in DSM are defined as “continuous” since

they are continuously applied to the streaming data. Results (e.g. significant events)

are pushed to the application/user each time the streaming data satisfies a predefined

query predicate.

The discussion of the architecture above clearly identifies the following most

important requirements for security verification for big data stream processing. In

summary, they include: (a) the security verification needs to be performed in real

time (on-the-fly), (b) the framework has to deal with a high volume of data at high

velocity, (c) the data items should be read once in the prescribed sequence, and (d)

the original data is not available for comparisons which are widely available in a

store-and-process batch processing paradigm. The above requirements need to be met

by a big data stream processing framework in addition to end-to-end data security as

stated in last chapter.

Figure 4-1: High level of architecture from source sensing device to big data processing centre.

94

Based on the above requirements of big data stream processing, we categorize

existing data security methods into two classes: communication security [131 - 132]

and server side data security [13 - 14]. To address this problem, we propose a

distributed and scalable model for big data stream security verification.

The Data Encryption Standard (DES) has been a standard symmetric key

algorithm since 1977. However, it can be cracked quickly and inexpensively. In

2000, the Advanced Encryption Standard (AES) [38] replaced the DES to meet the

ever increasing requirements of data security. The Rijndael algorithm, i.e. Advanced

Encryption Standard (AES), is a symmetric block cipher that encrypts data blocks of

128 bits using different sizes of symmetric keys such as 128, 192 or 256 bits [38 –

39, 132]. AES was introduced to replace the Triple DES (3DES) algorithm used for a

significant time universally. Hence, we have compared the proposed solution against

AES.


This section presents the research challenges and motivations in detail. Here we

have highlighted the challenges for the proposed approach followed by motivations

to the research problem.

4.3.1 Research Challenges

As discussed earlier, symmetric key cryptography is one of the best way to protect

big data stream data in a lightweight manner. Current symmetric key cryptographic

security solutions use a static shared key which is controlled in a centralised manner.

to security processing time i.e. for both encryption and decryption time. As the big

data streams volume and velocity is very high, the security verification should be in

near real-time and synchronise with the speed of the stream processing engine.

Another major concern is the communication overhead i.e. shared key initialisation

between DSM and source sensor. A big data stream is always continuous in nature

and huge in size. Initialising and distributing the shared key is a difficult task to

perform in real-time and in a centralised process. This can increase the efficiency of

95

the security verification process at DSM by initialising key generation at the source

end.


may read the data in the middle while still in transit. Existing solutions to this

problem are based on the symmetric key algorithm. The periodic key update message

in symmetric key algorithms may disclose secret information, which may result in an

intruder getting information about the encryption process. Even when a nonce is used

at the periodic packet, an intruder always comes to know when the server is going to

change the key. This increases the chances of attacks in future. In this proposed

model, key exchanges happen only once as described before, but the shared key is

updated periodically with equal time intervals. Synchronisation between a source and

DSM is important in dynamic symmetric key update; otherwise it will result in

unsuccessful security verification.



streams (i.e. 4Vs) we cannot halt the data for more time before security verification.

This leads us to appoint a bigger buffer size and may reduce the performance of

SPEs. So buffer size reduction is one of the major challenges for big data stream.

Proposed security solutions for big data stream should deal with reduction of buffer


The proposed model is as follows: we use a common shared key for both sensors

and DSM. The shared key is updated dynamically by using dynamic prime numbers

without having further communication between them. The shared key is not only

changed before time but also changed in length. Our synchronisation method also

adopts a unique neighbour authentication technique to get the lost synchronisation

properties from its neighbours. The communication is required at the beginning for

the initial key establishment. There will not be further communication between the

source sensor and DSM after handshaking, which increases the efficiency of the

proposed solution. Based on the shared key properties, individual source sensors

updates their dynamic key independently. This method performs security verification

in a faster manner with minimum delay and also reduces the use of buffers.

96

4.3.2 Research Motivation

The four most important features of the big data stream from the point of view of

security verification as stated in the last chapter. In light of the above features and

properties of big data streams, we classified existing security systems into two

classes: Communication Security and Server side data security and proposed the

DPBSV model to deal with big data streams in the last chapter. We need a better

security solution to protect the big data streams in a lightweight or faster fashion. So

we need a faster and more efficient security solution to deal with big data streams.

Communication security protect data when it is in transit. There are two types of

communication attacks: external and internal. To avoid such attacks, security

solutions have to exist for every individual TCP/IP layer. Several security solutions

exist to avoid these communication threats but are not suitable according to the

properties of big data stream stated above.

4.4 DLSeF Lightweight Security Protocol

This security model is motivated by the concept of moving target defence. The

basic idea is that the keys are the targets of attacks by adversaries. If we keep on

moving the keys in spatial (dynamic key size) and temporal (same key size, but

different key) dimensions, we can achieve the required efficiency without

compromising the security. The proposed model, Dynamic Key Length Based

Security Framework (DLSeF), provides a robust security solution by changing both

key and key length dynamically. In this security model, if an intruder/attacker

eventually hacks the key, the data and time period is selected in such a way that

he/she cannot predict the key or its length for the next session. we argue that it is very

difficult for an intruder to guess the appropriate key and its length as the model

dynamically changes both across the sessions. Though the proposed model has weak

confidentiality (eventually the intruder may able to detect the keys if he/she has

sufficient processing and storage capabilities), it provides sufficient confidentiality

for the duration of online real-time processing. Hence, such a weak confidentiality

model is sufficient for a disaster management application scenario. It is important to

97

note that no compromise is made on the authenticity and integrity of the data, which

are important for making decisions from the data.

Similar to any secret key-based symmetric key cryptography, the DLSeF model

consists of four independent components and related processes: system setup,

handshaking, rekeying, and security verification. Stream processing is expected to be

performed in near real-time. The end-to-end delay is an important QoS parameter to

measure the performance of sensor networks [133]. We are collecting data from

sensor nodes to process for any emergency situation, data need to be collected at the

DSM in real time. So we assume there should not be much delay in data arrival at the

DSM for the model. Table 4-1 provides the notations used in the model. We next

describe the model.

4.4.1 DLSeF System setup

We have made various sensible and practical assumptions while characterizing

this model. First, we assume that DSM has all deployed sensors’ identities (IDs) and

secret keys at the time of deployment because the network is fully untrusted. We

increase the number of key exchanges between the sensors and DSM for the starting

session key establishment process to accomplish better security. Our aim is to make

this session more secure because we transmit all the secret information of KeyGen to

individual source sensors. Second, we assume that each sensor node Si knows the

identity of its DSM and both maintain the same secret key i.e. k for initial

authentication process.

Sensing devices and DSM implement some common primitives such as hash

function (H( )), and common key (K1), which are executed during the initial

identification and system setup steps.

The proposed authentication process includes five different steps. The first three

steps are for the sensing device and DSM authentication process and the final two

steps are for the session key generation process as shown in Figure 4-2. The shared

key is utilised during the handshaking process.

98

Table 4-1: Notations used in this model

Acronym Description

ith source sensing device’s ID

ith source sensing device’s secret key

ith source sensing device’s session key

Key length

Initial keys for authentication

Secret shared key calculated by the sensing device and DSM

Previous secret shared key maintained at DSM

Communicated format during authentication

Random number generated by the sensing devices

Interval time to generate the prime number

j Integrity checking interval

Timestamp added with data blocks

Random prime number

Secret key of the DSM

Encrypted data for integrity check

Secret key for authenticity check

Encryption function

One-way hash function

Random prime number generation function

KeyGen Key generation procedure

Key-Length ( ) Key length selection procedure

X-OR operation

Concatenation operation

Fresh data at sensing device before encryption

T′ Current time

T′′ Time to start the process

RQA Authentication request message

RPA Authentication response message

Step 1:

99

A sensing device (Si) generates a pseudorandom number (r) and encrypts it along

with its own secret key Ki. The encryption process uses the common shared key (K1),

which is initialised during the deployment. The output of encryption (EK1(r Ki)) is

denoted as P1. The output is then sent to the DSM: Si DSM: P1

Step 2:

Upon receiving the message, the DSM decrypts P1 (i. e. DK1(P1)) and retrieves the

corresponding source ID from secret key ( ). If the source

sensor’s ID is found in the database, it accepts; otherwise it rejects. The DSM

computes the hash of the key to generate another key for encryption K2 H(K1).

The DSM then encrypts the pseudorandom number (r) with the newly generated key

as P2 EK2(r) and sends it to the source sensing device for DSM authentication: Si

DSM:

Step 3:

The corresponding sensing device receives the encrypted pseudorandom number

and decrypts it to authenticate the DSM, i.e. r′ DK2(P2). It calculates the current

secret shared key using the hash of the existing shared key i.e.K2 H(K1). If the

received random number is the same as the sensor had before (i.e. r = r′), the sensing

device sends an acknowledgement (ACK) to the DSM. The ACK is encrypted with

the new key, which is computed using the hash of the current key (K3 H(K2)).

The encrypted ACK is denoted as P3 EK3(ACK), and sent to the DSM: Si DSM:

Step 4:

The DSM decrypts the ACK (i.e. ACK DK3(P3)) to confirm that the sensor is

now ready to establish the session. The current secret key is updated using the hash

of the existing secret key i.e. K3 H(K2). After the confirmation of ACK, the DSM

generates a random session key i.e. Ksi randomKey() for handshaking. The

generated session key (Ksi) is encrypted with the hash of the current key e.g. (K4

H(K3)) and then sent to individual sensors as Si DSM: { }, where P4

EK4(Ksi).

Step 5:

The sensor decrypts P4 and extracts the session key for handshaking (Ksi

DK4(P4)). It follows the same procedure as before, i.e. the current shared key is

updated with the hash value of the existing shared key (K4 H(K3)). We update the

100

shared key in every transaction to ensure the strength of security for handshaking.

The complete authentication process works as shown in Figure 4-2.

4.4.2 DLSeF Handshaking

In the handshaking process, the DSM sends the key generation and

synchronisation properties to sensors based on their individual session key (Ksi)

established earlier. Generally, a larger prime number is used to strengthen the

security process. However, a larger prime number requires greater computation time.

In order to make the rekeying process efficient (lighter and faster), we recommend

reducing the prime number size. The challenge is how to maintain security while

avoiding large prime number sizes. We achieve this by dynamically changing the key

size as described next.

The dynamic prime number generation function is defined in Algorithm 3-2. This

algorithm computes the relative prime number, which always depends on the

previous prime number. This relation between the current and previous prime number

process helps to synchronise the newly generated prime number. We have given the

mathematical proofs of Algorithm 3-2, that the generated number will always be a

prime number and will synchronise between source device and DSM. We calculate

Figure 4-2: Secure authentication of Sensor and DSM.

101

the prime number and shared key on both sensing source and DSM ends to reduce

communication overhead and minimize the chances of disclosing the shared key. The

computed shared keys have multiple lengths (32 bit, 64 bit, and 128 bit) which are

varied across the sessions. Initial key length is set to 64 bit and is dynamically

updated as per the logic depicted in Algorithm 4-1. This algorithm selects the key

length and the associated time interval to generate the shared key. The key and key

length selection process follows based on the time taken to find all possible keys in

the key domain by following Table 3-2. In Table 3-2, we compute the key domain

size and time required to find all possible keys for different key lengths (i.e. 8, 16, 32,

64, and 128) by using the most advanced Intel i7 processor. So Algorithm 4-1

follows the properties from Table 3-2 to initialise the rekeying time interval

according to the key length. After the time interval, the next shared key is generated

by applying Algorithm 3-2 where the size is determined by Algorithm 4-1 as follows:

periodically computes the relative prime number at both the sensor and

DSM ends after a time interval t, which is updated based on function .

The shared secret key ( ) generation process needs and . In the handshaking

process, the DSM transmits all properties required to generate a shared key to

sensors as follows: Si

DSM: { )}

All of the transferred information outlined above is stored in the trusted part of the

source for future rekeying processes (e.g. TPM) [134].

4.4.3 DLSeF Rekeying

We propose a novel rekeying concept by calculating prime numbers dynamically

on both source sensors and DSM. Figure 4-2 shows the synchronisation of the shared

key. In this model, a smaller size of the key makes the security verification faster. But

we change the key very frequently in the DPBSV rekeying process to ensure that the

protocol remains secure. If any type of damage happens at the source, the

corresponding sensor is desynchronised with DSM. The source sensor follows Step 3

to reinitialise and synchronise with DSM. According to our assumption, we store all

the secret information at a trusted part of the sensor. So the sensor can reinitialise the

synchronisation by sending its own identity to DSM. Once DSM authenticates the

102

source sensor, it sends the current key and time of key generation. Authenticated

sensors can update the next key by using the key generation process from a secure

module of the sensor (TPM).

ALGORITHM 4-1. Synchronisation of Dynamic Key Length Generation Key-Length ( )

1: 64 (for first iteration) 2: 3: i 4: If i = 0 then 5: Set kl 128 6: t 720 hours (1 month) 7: j no checking 8: Else If i = 1 then 9: Set kl 64 10: t 168 hours (1 week) 11: j Pi % 9 12: Else 13: Set kl 32 14: t 20 hours (1 day) 15: j Pi % 5 16: End If 17: End If Return ( ) // use to initialise for next iteration.

The proposed model not only calculates the dynamic prime number to update the

shared key without further communication after handshaking, but also proposes a

novel way of dynamically changing key length at source and DSM according to steps

described in Algorithm 4-1. We change the key periodically in the DLSeF Rekeying

process to ensure that the protocol remains secured. If there are any types of key or

data compromise at a source, the corresponding sensor is desynchronised with DSM

instantly. Following that the source sensor needs to reinitialise and synchronise with

DSM as described above. We assume that the secret information is stored in the

trusted part of the sensor (e.g. TPM) and it is sent by the sensor to DSM for

synchronisation. According to the properties of the TPM, no one has access to the

information stored inside the TPM. Only the sensor can access TPM properties. Even

if the sensor is destroyed, an adversary cannot get the information from the trusted

module of the sensor (i.e. TPM). In some cases, a data packet can arrive at DSM after

the shared key is updated. Such data packets are encrypted using the previous shared

103

key. We add a time stamp field to individual data packets to identify the encrypted

shared key. If the data is encrypted using the previous key then DSM uses key

for the security verification; otherwise, it follows the normal process.

The above defined DLSeF Handshaking process makes sensors aware of the

Prime (Pi), KeyLength, and KeyGen. We now describe the complete secure data

transmission and verification process using those functions and keys. As mentioned

above, our model uses the synchronised dynamic prime number generation Prime

(Pi) on both sides, i.e. sensors and DSM, as shown in Figure 3-3. At the end of the

handshaking process, sensors have their own secret keys, initial prime number and

initial shared key generated by the DSM. The next prime generation process is based

on the current prime number and the time interval as described in Algorithm 3-2. The

prime number generation process (Algorithm 3-2) always calls Algorithm 4-1 to

fetch the shared key length information and associated time interval. Sensors

generate the shared key =( ( , )) using the prime number , and the DSM’s

secret key (P , ). We use the secret key of DSM to improve the robustness of the

security verification process. We fixed the initial key length at 64 bits and 168 hours

as the initial time interval for rekeying. Each data block is associated with the

authentication and integration tag and contains two different parts. One is encrypted

DATA based on shared key for integrity checking (i.e. = ), and

the other is for authenticity checking (i.e. = ). The resulting data block

((DAT ) ( )) is sent to DSM as follows: Si DSM:

{( ( T))}. The time stamp which indicates the encrypted shared keys is always

associated with the authentication part. We prefer to add the time stamp with the

authentication part because the DSM can easily identify the data block if it is

encrypted with the previous shared key. More details about the time stamp are

described in the following subsection and the complete procedure of the key

generation (rekeying) process is shown in Algorithm 4-2. This algorithm takes

information from Algorithm 3-2 and Algorithm 4-1 to in order to perform the

rekeying process. From Algorithm 3-2, Algorithm 4-2 takes the dynamic prime

number ( ) to compute a shared key and from Algorithm 4-1, it takes the key

size and time interval for the rekeying process.

104

4.4.4 DLSeF Key Synchronisation

Synchronisation is one of the major issues during the rekeying process between

sensors and DSM, as they are not interacting after the handshaking process. The

shared key synchronisation is based on the initial key generation process followed by

the rekeying. So the initial key synchronisation is to make a common time to start the

key generation process. In this model, DSM works as a centralised controller. So

DSM initiates the key generation process. As defined before, during the handshaking

process DSM sends back to the source (Si) with a time stamp T′′ to initialise the key

generation process.

There are potentially two cases: (i) sensor starts the process on time to maintain

synchronisation; (ii) sensor may be missing the time stamp or later receives the

key generation properties after time stamp. In the second case, the source sensor

sends a request to get the next time stamp for the key generation process.

There are several reasons for sensors to be out of sync such as inability of the

source node to generate the shared key by some computational overhead or by any

natural disaster or by any malicious activity. Even if a sensor missed the

synchronisation, it does not miss the key generation properties because of the TPM

features [134]. In such cases, the source sensor (Si) gets synchronisation properties

from its neighbours. According to the source network structure, sensors do not have

neighbour information. So it’s a challenging task to identify the neighbours and get

the key synchronisation properties. The procedure to obtain shared key properties

from unknown neighbours is given below.

4.4.4.1 Initial Setup

Let us assume that sensor (Si) missed the synchronisation. The Sensor (Si)

computes a Pseudo Random Number, i.e. PRN(r), using the current prime number

(Pi) and the shared key (KSH) to generate the authentication request message (RQA)

i.e. RQA ← H(EKSH(r Pi Kd)). Then the resultant RQA, DSM ID (Di) and time

stamp (T) encrypt with mutual key K4 from the system setup steps (EK4(RQA T

Di)) (refer to Figure 4-2). We use this key for encryption because all the authenticated

nodes have this key from DSM during the system setup phase.

105

4.4.4.2. Synchronisation Phase

The out of sync sensor (Si) broadcasts this to its one-hop neighbours. When the

neighbour sensors receive the information, it decrypts with its mutual key i.e. K4

(DK4(RQA T Di)). It compares the received time frame (T) with its current time

(T′) to check the data freshness and avoid the replay attack (T - T′ ≤ ΔT). If the time

difference is less then ΔT, then it accepts the data packet; otherwise it is discarded.

Here ΔT is the average time required to transmit data packets between source and

DSM.

The neighbour node (denoted as Sj) compares the received DSM ID with its own

DSM ID to validate the source as the authenticated one. To make the authentication

process stronger, we perform two layer encryption of the request (RQA). Sensor (Sj)

performs the hash and decrypts the second layer with the shared key (KSH), i.e.

H(DKSH(r Pi Kd)). It uses the previous shared key if the shared key is

modified in the meantime and compares the DSM ID by retrieving it using the DSM

secret key (Di← retriveKey(Kd)).

Figure 4-3: Neighbour node discovered to get the current state of key

generation properties.

106

After the authentication process, Sj prepares an authentication response message

(RPA) by including its own ID, DSM ID and pseudo random number r (RPA ←

EKSH(Sj Di r)). It then encrypts the RPA along with the DSM key and time stamp

by using the same key K4 (EK4(RPA Kd T)).

Once Si receives the RPA, it is processed in the same way to authenticate the node

Sj (DK4(RPA Kd T)). First it compares the time to avoid replay attack (T - T′ ≤ ΔT)

and compares the DSM ID (Di← retriveKey(Kd)) and value of r to perform

authentication. Here desynchronised source node (Si) encounters three different types

of neighbours: malicious node, desynchronised authenticated node and synchronised

authenticated node as shown in Figure 4-4. Malicious neighbours cannot decrypt Si

request because it is encrypted by the secret key. But a desynchronised authenticated

node can read the request. Once it comes to know that the source (Si) is seeking the

key synchronisation properties, it sends the response with its desynchronisation

indication. The source discards the RPA received from such nodes. If the source node

receives RPA from an authenticated synchronised neighbour, Si choses such node by

sending the ACK in order to get the key synchronisation properties (EKSH(ACK Si

T)).

This acknowledgement message (i.e. ACK) confirms the mutual authentication

between the source and synchronised neighbour to obtain the key synchronisation

properties (DKSH(ACK Si T)). After receiving the acknowledgement message, the

authenticated neighbour gets the source node ID and sends the shared key properties

(Pi, KSH, t) to the source node as EKSH(Pi, KSH, t, T).

When the desynchronised source gets the shared key synchronisation properties

(DKSH(Pi, KSH, t, T)), it can generate the shared key by itself, because it has the prime

number (Pi), shared key (KSH), and time to change the next key (t). Every time we are

checking the time interval in order to avoid the replay and DoS attacks. The stepwise

representation of the neighbour authentication to obtain the shared key properties is


107

4.4.4.3 New Node Synchronisation

If there is a new source node joining the network, then it starts the authentication

process with DSM to get the key generation properties. After receiving the key

generation properties from DSM, the node (n) either starts the process or

authenticates with the neighbour nodes to compare the synchronisation properties.

ALGORITHM 4-2. Key Generation (Rekeying) Process at Sensor(Si) and DSM(D)

1. Session key ( ) from Figure 4-3 2. Dynamic prime number ( ) computed from Algorithm 3-2. 3. Time interval (T) computed.

3.1 T= {t1, t2, t3, …} Here t1, t2, t3, … are the time intervals of key generation.

3.2 Sensor ( ) and DSM (D) update the key after the time interval. 4. As stated before sensor and DSM have properties like H( ), E. The new key

generation = (H( , )). 5. The encryption process at sensor happens in two steps

5.1 = 5.2 =

6. DSM: {( ( T))}

Figure 4-4: Neighbour discovery to get the key synchronisation properties with

all possible conditions. (a) node Si sends RQA message to all its one-hop

neighbours; (b) the sender receives the RPA of individual RQA; (c) Si send

ACK to only authenticated synchronised neighbours; (d) node Si receives the

synchronisation properties.

108

4.4.5 DLSeF Security Verification

In this step, the DSM first checks the authenticity in each individual data block

and then the integrity with randomly selected data blocks . The random value is

calculated based on the corresponding prime number i.e. = % 5, when the key

length is 32; = % 9 when the key length is 64; and there is no integrity verification

when the key length is 128. We differ the integrity verification interval randomly for

individual key lengths. We prefer to change the integrity verification interval more

frequently when the key length is shorter because key length is inversely proportional

to possibilities to read/modify the data. As the key length 128 is computationally hard

and can last for a long time, we do not check the integrity verification. We update the

shared key before there is a possibility of attack. The DSM also checks the time

stamp of each individual data block to find the shared key used for encryption. For

the authenticity check, the DSM decrypts with shared key = . Once Si

is obtained, the DSM checks its source database and extracts the corresponding secret

key ( ). In the integrity check process, the DSM decrypts

the selected data such as = to get the original data and checks MAC

for data integrity.

In this step, the DSM first checks the authenticity in each individual data block

and then the integrity with the randomly selected data blocks . Here the data block

is divided into two blocks for authenticity checking and integrity checking. Along

with authenticity checking, we add timestamp (T) in order to get the data freshness

and avoid replay attack. The data block at DSM for security verification is

represented as: {( ( T))}. DSM first checks the authentication part to get the

timestamp. It compares its own timestamp with the received one i.e. T - T′ ≤ ΔT. If

the time interval is less than or equal to the predefined time ΔT, then it accepts the

data; otherwise it is rejected. This will help to maintain the data freshness and avoid

the replay attack. Initial time checking and the authenticated source checking can

avoid the DoS (denial of service) attack. Another important advantage of adding the

time stamp (T) is to get the shared key used for the encryption process. If the shared

key is updated after receiving the data block encryption, then DSM uses the previous

shared key ( ) for decrypting the data instead of the current key ( ).

109

The complete mechanism beginning from source and DSM authentication to

handshaking and security verification as mentioned in algorithmic format is shown in

Algorithm 4-3. Algorithm 4-3 represents the description of the proposed mechanism

as a stepwise process.

Algorithm 4-3. Lightweight Security Protocol for Big Sensing Data Streams

Description Based on the prime number generation on both sensor and

DSM ends, the proposed dynamic key length based security

framework of big data streams works more efficiently than

before without compromising security.

Input the prime generation process ,

key length generation process Key-Length ( ),

key generation process , and session key .

Output Successful security verification without detecting any

malicious attacks.

Step 1 DLSeF System setup

1.5 Si DSM: {EK1(r Ki)}, ith sensor sends the random number with its identity which is encrypted with common shared key i.e. K1.

1.6 Si DSM: { }, DSM identifies the sensor and generates a new key which is the hash of current key for encryption K2 H(K1). Then DSM encrypts the random number and sends back to the ith sensor

1.7 Si DSM: { }, ith sensor identifies the DSM by decrypting the packet. If sender is authenticated then it performs the hash of the current key (K3 H(K2)) to get a new key for encryption and sends back the acknowledgement.

1.8 Si DSM: { } DSM authenticates the last transaction and sends back to ith sensor with this format. DSM generates a session key Ksi randomKey() and encrypts with the newly generated key (k4) with the hash function of current key (k3).

1.9 Sensor authenticates the packet and gets the session key for handshaking (Ksi DK4(P4)).

Step 2 DLSeF Handshaking

DSM sends its properties to individual sensors based on their individual

session key. It includes the prime number generation and time interval to

generation etc.

2.2 DSM Si: { ( )} Step 3 DLSeF Rekeying

Key updates on both source sensor and DSM and both are aware about

110

the Prime (Pi) and KeyGen. Sensors generate the shared key

and each data block is associated with two different parts.

One is encrypted i.e. and the other is for authenticity

checking i.e., .

3.2 Si DSM: { }, these blocks for Authentication, integration, confidential check, and Time stamp for synchronisation.

Step 4 DLSeF Synchronisation

Desynchronised node gets the synchronisation properties from its

neighbours.

4.3 Sensor (Si) gets the synchronisation properties from its neighbours (See Figure 4-3 and 4-4)

Step 5 DLSeF Security Verification

The DSM checks for authenticity in each data block and checks for

the integrity with random interval data blocks and random value is

calculated based on the corresponding prime number.

5.1. DSM checks the timestamp (T) at every packet to get the key for decryption. If the timestamp is not the current one then it decrypts with

. 5.2.

For the authenticity check, the DSM gets the source ID. Once Si is

obtained, the DSM checks the source database and extracts the

corresponding secret key for the integrity check according to the value

of j.

5.3. DSM calculates/decrypts data and checks MAC for integrity.

4.5 Security Analysis

This section provides a theoretical analysis of the security model. We made the

following assumptions: (a) any participant in our security model cannot decrypt the

data that was encrypted by the DLSeF algorithm unless it has the shared key which

was used to encrypt the data; (b) as DSM is located at the big data processing system

side, we assume that DSM is fully trusted and no one can attack it; and (c) a sensors’

111

secret key, Prime (Pi) and secret key calculation procedures reside inside the trusted

part of the sensor (such as the TPM) so that they are not accessible to intruders.

Similar to most security analyses of communication protocols, we now define the

attack models for the purpose of verifying confidentiality, authenticity and integrity.


Definition 1 (attack on authentication). A malicious attacker Ma can attack the

authenticity if it is capable of monitoring, intercepting, and introducing itself as an

authenticated source node to send data in the data stream.

Definition 2 (attack on integrity). A malicious attacker Mi can attack the integrity if

it is an adversary capable of monitoring the data stream regularly and trying to

access and modify a data block before it reaches the DSM.

Definition 3 (attack on partial confidentiality): A malicious attacker Mc is an

unauthorised party which has the ability to access or view the unauthorised data

stream before it reaches the DSM (within the time bound).

We define the threat model, which is similar to most cryptologic analyses, to the

shared key properties as follows:

Theorem 1: The security is not compromised by changing the size of shared key

(KSH).

Proof: Refer to Chapter 3 Theorem number 1.

Theorem 2. According to the proposed synchronisation method, the shared key (KSH)

is always synchronised between Source sensor (Si) and DSM.

Proof: According to DLSeF properties, the dynamic shared key length varies

between 32 bit, 64 bit, and 128 bit; these keys are updated at both source and DSM

ends. The shared key is updated without further communications between Si and

DSM after handshaking. A variation in key length introduces a complexity to the

attackers to predict the next shared key. The ECRYPT II recommendations on key

length say that a 128-bit symmetric key provides the same strength of protection as a

3,248-bit asymmetric key. An advanced processor (Intel i7 Processor) took about 1.7

nanoseconds to try out one key from one block. With this speed, it would take about

112

1.3 × 1012× the age of the universe to check all the keys from the possible key set

[35]. All the related key domains and the time required to get the possible keys by

using Intel i7 processor are listed in Table 3-2.

Here, we are highlighting the synchronisation in two places (i) source sensor with

DSM at initial key generation process and (ii) while obtaining the synchronisation

properties from a neighbour. For the first option (during the handshaking process),

DSM sends the key generation properties to Si along with the timestamp (T′′) to set

the key generation time. Then both DSM and Si generate the shared key with

dynamic length and interval as in DLSeF method. This means the shared key will be

synchronised at both ends. In the second option (obtaining the synchronisation

properties from neighbours), if any of the source is desynchronised, it initiates the

neighbour authentication process to discover authenticated synchronised neighbours

(see Figure 4-3). After authentication, the neighbour sends the key generation

properties EKSH(Pi, KSH, t, T), where T is for data freshness and t is the start of the

key generation process. Then source Si can use the current key and use these

properties to update the next key (i.e. =( ( , ))) after time t. Now source Si

becomes synchronised with other sources and DSM.

Theorem 3: An attacker Ma cannot read the secret information from a sensor node

(Si) or introduce itself as an authenticated node in DLSeF.

Proof: Following Definition 1 and considering the computational hardness of a

secure module (such as TPM), we know that Ma cannot get the secret information

for Pi generation, Ki and KeyGen. So there are no possibilities for the malicious node

to trap the sensor, but Ma can introduce him/herself as an authenticated node to send

its information. In this model, a sensor (Si) sends , where the second

part of the data block is used for an authentication check. The DSM

decrypts this part of the data block for the authentication check. The DSM retrieves

Si after decryption and matches corresponding Si within its database. If the

calculated Si matches with the DSM database, it accepts; otherwise it rejects the

node as a source and it is not an authenticated sensor node. Hence, we conclude that

an attacker Ma cannot attack the big data stream.

113

Theorem 4: An attacker Mc cannot access or view the unauthorised data stream in

the proposed DLSeF within the time bound.

Proof: Following Algorithm 3-2, it is clear that the prime numbers are

generated at sensors and DSM dynamically without any further communication.

Shared secret key is calculated based on the generated prime number.

Considering the computational hardness of secure modules (such as TPM), we know

that Mc cannot get the secret information such as Pi generation, Ki and KeyGen

within the time frame. Following Definition 2, we know that an attacker Mc can gain

access to the shared key but no other information. In this model, source sensor

(Si) sends data blocks in the format , where

the first part of the data block contains the original data.

Getting the original data is impossible from this because Mc does not have

other information and at the same time the shared key is updated dynamically in

the interval of time (t). If Mc has sufficient processing and storage capabilities, it

may be able to get the shared key, but in the meantime the shared key must be

changed. In such a case, Mc can read the message. This does not affect the

application we are focusing on (e.g. disaster management) by stream data

processing. So our model DLSeF provides weak confidentiality by not breaking the

confidentiality in real time.

Theorem 5: An attacker Mi cannot read the shared key within the time interval

t in the DLSeF model.

Proof: Following Definition 2, we know that an attacker Mi has full access to the

network to read the shared key , but Mi cannot get correct secret information

such as KSH. Considering the method described in Theorem 1, we know that Mi

cannot get the currently used KSH within the time interval t (see Table 3-2), because

the proposed model calculates Pi randomly after time t and then uses the value Pi to

generate KSH as described in Theorems 1 and 2.

Theorem 6: The proposed DLSeF requires a comparatively smaller buffer size than

standard symmetric key solutions for security verification.

Proof: Following Algorithm 4-2, it is clear that the proposed DLSeF is a lightweight

security model for security verification. We are decrypting the identity of sensing

114

devices for authentication checks from every data block, whereas the selected data

block decrypts for integrity checks. Another important mechanism is the key length

used for encryption/decryption. As we are using the smaller key length to encrypt

the data blocks, it also makes the security verification faster. These above two

processes make the security verification much faster than other security mechanisms.

As we all know, the speed of the security verification is directly proportional to the

required buffer size. Finally, we conclude that the proposed DLSeF model for

security verification needs a comparatively smaller buffer size. The evaluation proof

is in the following section.

Theorem 7. Neighbour synchronisation is also protected against authentication,

integrity and partial confidentiality.

Proof: By considering TPM properties, we know that an attacker cannot get the

secret information (Pi, Ki, KSH) or the key generation properties (KeyGen). During

the neighbourhood authentication procss, a sensor (Si) shares the synchronisation

properties after authentication and gets the DSM ID and the secret key (see Figure 4-

3). So there are no possibilities for the malicious nodes to trap authenticated sensors

to get the shared key generation properties. Following neighbour synchronisation

properties, malicious nodes cannot interfere because neighbours identify each other

through the DSM ID (Kd) and the encryption process uses the secret key (EK4).

Those properties are not known to malicious nodes. We know that an intruder

cannot get the currently used KSH within the time interval t (see Table 3-2), because

the proposed method calculates Pi randomly after time interval t and then uses the

value Pi to generate KSH. But an attacker can introduce itself as an authenticated

node to send packets.

From above, we conclude that an attacker cannot get the shared key information

during neighbour synchronisation.

Theorem 8. After applying the synchronisation, the security verification models

also avoid replay attacks.

Proof: There are potentially two places for replay attacks (i) during the neighbour

authentication; (ii) the security verification at DSM. In both of these cases we are

adding a time stamp i.e. T in packets. During the security verification at DSM, DSM

115

checks for the data freshness by comparing the time interval between the sent and

received time of data blocks such as T - T′ ≤ ΔT. If the interval is less than or equal

to ΔT, then the data block is accepted; otherwise it is rejected. Application of this

rule keeps rejecting the delayed data packets but maintains the data freshness and

avoids the replay attacks. Through the time interval (ΔT), it is easy for DSM to find

the shared key used for encryption ( or KSH). We also follow the same method

to avoid replay attack during neighbour authentication. By using such a method, the

model is proven to be more efficient to avoid DoS attacks.


The proposed DLSeF security model, though deployed in a big sensor data

stream in this chapter, is a generic approach and can be used in other application

domains. In order to evaluate the efficiency and effectiveness of the proposed

architecture and protocol, even under adverse conditions, we experimented with

different approaches in multiple simulation environments. We first measure the

performance of sensor nodes by using a COOJA simulator in Contiki OS [118];

second, we verify the proposed security approach using Scyther [119]; third, we

measure the performance of the approach using JCE (Java Cryptographic

Environment) [120]; finally, we compute the minimum buffer size required to

process the proposed approach by using MatLab [121] in order to measure the

efficiency of the proposed model.

4.6.1 Sensor Node Performance

We tested the performance of sensors in a COOJA simulator in Contiki OS to

measure the performance of sensors while running the proposed security verification

model. We took the two most common types of sensor, i.e. Z1 and TmoteSky

sensors, for the experiment and performance checking as shown in Figure 3-4. In

this experiment, we checked the performance of sensors while computing or

updating the shared key and the highest possible number of shared key generation

with specified energy level. Initially all sensor nodes have the same level of energy,

1.6 joule. [135].

116

Z1 sensor nodes are produced by Zolertia and are a low-power WSN module,

designed as a universal purpose development platform for sensor network

researchers. Most of the WSN communities prefer this because it supports most

employed open source operating systems, like Contiki. COOJA is a network

simulator for Contiki, which provides real time sensor node features for simulation.

The Z1 sensor is equipped with the low power microcontroller MSP430F2617,

(a) (b)

Figure 4-5: Performance computation of two different sensors (a) Estimated

power consumption during the key generation process. (b) Possible number

of key generation with initial 1.6 J power of sensors.

(a) (b)

Figure 4-6: Energy consumption by using COOJA in Contiki OS, (a) Energy

for neighbour authentication; (b) Energy for security verification.

117

which has features like a powerful 16-bit RISC CPU @16MHz clock speed, 8KB

RAM, built-in clock factory calibration, and a 92KB Flash memory. Z1 hardware

selection always guarantees robustness and maximum efficiency with low energy

cost. Similarly, TmoteSky is an ultra-low power sensor. It is equipped with the low

power microcontroller MSP430F1611, which has built-in clock factory calibration,

10KB RAM and a 48KB Flash memory.

From the features of the above two types of sensors, we successfully established

in the COOJA Simulator that the key generation process works successfully in both

types of sensors i.e. Z1 sensor and TmoteSky sensor. These sensors can easily

support the security model. The energy consumption during the key generation

process is shown in Figure 4-5 (a), and the maximum number of possible key

generations in Figure 4-5 (b). On average, the above sensors can generate the shared

key around 280 times, where sensors can support over a year to perform security

mechanism. From this experiment, we conclude that the proposed security

verification approach DLSeF is supported by most common types of sensors (tested

with Z1 and TmoteSky sensors) and feasible for big sensing data streams to work for

longer times.

In experiment for neighbour node authentication for getting synchronisation

properties, we measured the performance of sensors while they transmit/receive

information from neighbours or dynamically update the shared key for undertaking

security verification process. Figure 4-6 (a) shows the energy required by sensors

during transmitting/receiving synchronisation properties from neighbours and Figure

4-6 (b) shows the power consumption behaviours for the key generation process.

From these experiments, we conclude that the proposed model is lightweight as both

the application of synchronisation properties and security verification model

consume very little sensor battery power.


The protocols in the proposed model are written in a Scyther simulation

environment using Security Protocol Description Language (.spdl). According to the

features of Scyther, we define the roles of S and D, where S is the sender (i.e. sensor

118

nodes) and D is the recipient (i.e. DSM). In this scenario, S and D have all the

required information that is exchanged during the handshake process. This enables S

and D to update their own shared key. S sends the data packets to D and D performs

the security verification. In the simulation, we introduce three types of attack by

adversaries. In the first type of attack, a malicious attacker changes the data while it

is being transmitted from S to D through intermediaries (integrity attack). In the

second type of attack (authentication attack), an adversary acquires the property of S

and sends the data packets to D pretending that it is from S. In the third type of

attack (attack on confidentiality), an adversary gets the data block to analyse and

tries to read the data within the time bound. We experimented with 100 runs for each

claim and found no attacks at D as shown in Figure 4-7.

Experiment model: In practice, attacks may be more sophisticated and efficient

than brute force attacks. However, this does not affect the validity of the proposed

DLSeF model as we are interested in efficient security verification without periodic

key exchanges and successful attacks. Here, we model the process as described in

the previous section and vary the key size between 32 bits, 64 bits, and 128 bits (see

Table 3-2). We used Scyther, an automatic security protocol verification tool, to

verify our proposed model.

Figure 4-7: Scyther simulation environment with parameters and result page

of successful security verification of DLSeF protocol.

119

Results: We did the simulation using a different number of data blocks in each

run. The experiment ranged from 10 to 100 instances with 10 intervals. We checked

authentication for each data block, whereas the integrity check is performed on

selected data blocks. As the key generation process is saved in the trusted part of the

sensors, no one can get access to that information except the corresponding sensor.

Hence, we did not find any authentication attacks. For integrity attacks, it is hard to

get the shared key ( ), as we frequently change the shared key ( ) and its

length based on the dynamic prime number on both source sensor ( ) and DSM.

In the experiment, we did not encounter any integrity attacks. Figure 4-7 shows the

result of security verification experiments in the Scyther environment. This shows

(a) Secure authentication results.

(b) Security verifications results at DSM.

Figure 4-8: Security verification results of Scyther during neighbour

authentication and synchronisation.

120

that the security model is secured from integrity and authentication attacks.

During the neighbour authentication to get the synchronisation properties, both

sensors Si and Sj authenticate themselves while hiding the DSM ID and secret key.

In the experiment, we did not encounter any attacks that can compromise the

security properties of the big data streams. Results shown in Figure 4-8 (a) validate

the above hypothesis; it also shows the neighbour authentication in the Scyther

environment. We perform the security verification at DSM; here, we follow the

same concept while adding the new key synchronisation process. Figure 4-8 (b)

shows the results of the security verification at DSM after combining the

synchronisation method with DLSeF.


Experiment model: It is clear that the actual efficiency improvement brought by

the security model is highly dependent on the size of the key and rekeying without

further communication between sensor and DSM. We have performed experiments

with different sizes of data block. The results of the experiments are given below.

We compare the performance of the proposed model DLSeF with advanced

encryption standard (AES), and our previously proposed model for big sensing data

stream (DPBSV), the standard symmetric key encryption algorithm [38 - 39]. The

security model is efficient compared with DPBSV and two standard symmetric key

algorithms, 128-bit AES and 256-bit AES. This performance comparison

experiment was carried out in JCE (Java Cryptographic Environment). We

compared the processing time with different data block sizes. This comparison is

based on the features of JCE in Java virtual machine version 1.6 64 bit. JCE is the

standard extension to the Java platform which provides a framework implementation

for cryptographic methods. We experimented with many-to-one communication. All

sensor nodes communicate to the single node (DSM). All sensors have similar

properties whereas the destination node has more power to initialise the process

(DSM). The rekey process is executed at all the nodes without any

intercommunication. The processing time of data verification is measured at the

DSM node. The experimental results are shown in Figure 4-9.

121

Results: The performance of the security model is better than the standard AES

algorithm when different sizes of data blocks are considered. Figure 4-9 shows the

processing time of the DLSeF model in comparison with base 128-bit AES, and

256-bit AES for different sizes of data blocks. The performance comparison shows

that the proposed model is efficient and faster than the baseline AES protocols.

From the above two experiments, we conclude that the proposed DLSeF model is

secured (from both authenticity and integrity attacks), and efficient (compared to

standard symmetric algorithms such as DPBSV, 128-bit AES and 256-bit AES).


Experiment model: We experimented with the features of the DSM buffer by

using MATLAB as the simulation tool [121]. This performance is based on the

processing time performance calculated in Figure 4-10. Here we compared our

scheme with DPBSV and standard 128-bit AES and 256-bit AES, the same as the

processing time performance comparison. The minimum size of buffer required to

process security verification at DSM with various data rates starts from 50 to 250

Figure 4-9: Performance comparison of the scheme with DPBSV and

standard AES algorithm i.e. 128 bit AES and 256 bit AES.

122

MB/S with a 50 MB/S interval. Here we compare the efficiency of the proposed

model (DLSeF).

Results: The performance of the security model is better than the standard AES

algorithm with different rates of data. Figure 4-10 shows the minimum buffer size

required to process security at the DSM and proposed DLSeF scheme performance

compared with DPBSV and base symmetric key solutions such as 128-bit AES and

256-bit AES. The performance comparison shows that the proposed model is

efficient and requires less buffer to process security than previous protocols.

From all the above experiments, we conclude that the proposed DLSeF model is

secured (from authenticity, confidentiality, and integrity attacks), and efficient

(compare to standard symmetric algorithms such as 128-bit AES and 256-bit AES

and DPBSV). We also show that the proposed model needs less buffer during the

security verification.

Figure 4-10: Efficiency comparison of minimum buffer size required to

process the security verification with various data rates to DSM.

123

4.7 Summary

This chapter proposed a novel authenticated key exchange protocol, namely

Dynamic Key Length Based Security Framework (DLSeF), which aims to provide a

real-time security verification model for big sensing data streams. The security

model has been designed based on symmetric key cryptography and dynamic key

length to provide more efficient security verification of big sensing data streams.

The proposed model is designed by two dimensional security i.e. not only the

dynamic key but also the dynamic length of the key. By theoretical analyses and

experimental evaluations, we showed that the DLSeF model has provided significant

improvement in the security processing time, and prevented malicious attacks on

authenticity, integrity and weak confidentiality. In this model, we decrease the

communication and computation overhead by performing dynamic key initialisation

along with dynamic key size at both source sensing devices and DSM, which in

effect eliminates the need for rekeying and decreases the communication overhead.

The proposed security verification model is implemented before stream data

processing (i.e. DSM) as shown in the architecture diagram. Several applications

such as disaster management, event detection etc. need to filter the modified and

corrupted data before stream data processing. These types of applications need only

original and unmodified data for analysis to detect the event. The proposed DLSeF

model performs security verification in near real time to synchronise with the

performance speed of the stream processing engine. The major concern is not to

degrade the performance of stream processing by performing security verification

near real time. Although the efficiency of big data stream security verification

benefits greatly from an efficient AES and DPBSV scheme such as DLSeF, this is

still not fast enough when verifying data blocks while maintaining as much data

security and privacy as possible.

124

Chapter 5

Selective Encryption Method to Ensure

Confidentiality for Big Sensing Data

Streams

Chapter 4 solved the big data stream issues against lightweight security solution to

protect data against authenticity, integrity and partial confidentiality attacks. Another

major concern is to maintain the privacy of sensitive data and protect against attacks

on data confidentiality. To ensure the confidentiality of collected data, sensed data

are always associated with different sensitivity levels based on sensitivity of

emerging applications or sensed data types or sensing devices. Providing multilevel

data confidentiality along with data integrity for big sensing data streams in the

context of near real time analytics is a challenging problem. This chapter proposes a

Selective Encryption (SEEN) method to secure big sensing data streams that

satisfies the desired multiple levels of confidentiality and integrity. This method is

based on two key concepts: common shared keys that are initialised and updated by

DSM without requiring retransmission, and a seamless key refreshment process

without interrupting the data stream encryption/decryption. Theoretical analyses and

experimental results of the SEEN method show that it can significantly improve the

efficiency and buffer usage at DSM without compromising confidentiality and

integrity of the data streams.

125

5.1 Introduction

A large number of mission critical applications such as disaster management,

cyber physical infrastructure systems and SCADA are building IoTs applications by

deploying a number of smart sensing devices in a heterogeneous environment. Data

produced from a large variety of sources using sensing devices are streamed towards

DSM for processing and decision making. This trend gives birth to an area called big

data stream [5, 42]. The variety of applications and data sources makes the need for

data dependability such that only trustworthy and dependable information is

considered for decision making processes. Data security (i.e. more specifically

ensuring data integrity and confidentiality) is an efficient and effective procedure to

assure data trustworthiness/dependability, since DSM processes the data streams in

near real time and performs the data analytics; the appropriate actions are performed

based on the results from the analytics. It is thus important that data trustworthiness is

assured during the lifecycle of big data stream processing. Recent research [136-137]

highlighted the key contributions on lightweight security provenance in data both in

transit and at rest.

The lifetime of a big data stream is very short because it is continuous in nature

(i.e. the data can be accessed only once) [5, 29]. Such data streams in critical

applications have high volume and velocity, but the stream processing has to be done

in near real time. It cannot follow the traditional store and process batch computing

model [24]. To address this challenge, stream processing engines (such as Spark,

Storm, S4) have emerged in the current era to provide the capability to commence big

data processing in real time [29, 128]. Stream processing engines (SPE) deal with

two important advantages: (i) there is no need to store large volumes of data and (ii)

it is capable of supporting real time computation needed by emerging applications.

As the important decisions are made in critical applications by data streams analysis

in near real time, it is important that such data are not accessed or tampered with by

malicious adversaries. This brings one of the key and open research problems in big

data streams; that is, how to ensure end-to-end security for stream data processing.

This includes guaranteeing data security properties (i.e. integrity, confidentiality,

authenticity and freshness) [4 – 5, 137].

126

There are different security requirements for different emerging critical

applications. Let us consider some applications such as disaster management,

terrestrial monitoring, military monitoring, healthcare, cyber physical infrastructure

systems, SCADA. that are the sources for big data streams [4 – 5, 138 – 139]. Some

applications, including terrestrial monitoring and disaster management, need data

integrity so that the system has high confidence in the detected events from stream

data processing; confidentiality is not that important in such applications [138, 140 -

141]. Some applications such as military applications, healthcare, and SCADA need

data confidentiality along with data integrity. The confidentiality of data depends not

only on applications, but also on data types. For example, some applications need

data confidentiality forever (i.e. strong confidentiality), whereas some applications

need to maintain data confidentiality in real time (i.e. partial confidentiality). If we

consider healthcare applications, personal health data need to be protected from

outsiders and we need strong confidentiality for such applications [138], whereas in

SCADA application data need to be protected in real time until a DSM detects the

event [136]. There are still several applications including military monitoring that

need different levels of data confidentiality [137, 140]. We have classified the

security threats and adversary models in the following sections. In such systems,

there is no need for data confidentiality for normal sensed data, but it is needed for

highly sensitive data such as movement in the battle field or detection of enemy

activities. This chapter addresses the issue specified above by designing a novel

security method for big sensing data streams.

The common approach to data security is to apply a cryptographic model. If the

encryption keys are managed properly, data encryption applying a cryptographic

method is the most widely recognised and secure way to transmit data. There are two

basic sorts of cryptographic encryption strategies: asymmetric and symmetric. It is

already proved that symmetric key cryptography is 1000 times faster than

asymmetric key cryptography [34 - 35]. We focus on symmetric key cryptography to

design a new security method for big data streams to ensure data confidentiality and

integrity.

In order to address the aforementioned challenge, we have designed and

developed a selective encryption method (SEEN) to secure and maintain

confidentiality of big data streams according to sensitivity levels of the data. This

127

method is based on a typical shared key that is initialised and updated by a DSM

without requiring retransmission. Furthermore, the proposed security method is able

to recover by detecting lost keys and perform seamless key refreshment without

interrupting ongoing data stream encryption/decryption. SEEN maintains different

levels of data confidentiality along with data integrity. The main contributions of the

chapter can be summarised as follows:

We have developed and designed a novel selective encryption method

(SEEN) to secure and maintain confidentiality of big sensing data streams

according to different data sensitivity levels. This method is based on

common shared keys and is initialised and updated by a DSM without

requiring retransmission. The security method performs seamless

refreshing of the shared key without disrupting ongoing data encryption or

decryption.

The proposed model adopts different keys for the three levels of data

confidentiality (i.e. no confidentiality, partial confidentiality and strong

confidentiality) based on the data sensitivity levels.

We validate this proposed method by theoretical analyses and

experimental results.

We compare the SEEN method with a standard symmetric key solution

(AES-128), DPBSV and DLSeF in order to evaluate efficiency.

The rest of this chapter is organised as follows. Section 5.2 gives a brief overview

of related work. Section 5.3 introduces the proposed system and the corresponding

security method. Section 5.4 provides a detailed description of the security method,

followed by its security analysis and performance evaluation in Sections 5.5 and 5.6,

respectively. Finally, Section 5.7 summarises the contributions in this chapter.

5.2 Design Consideration

This section gives the system architecture, possible threats and attack models in

different stages of the data stream flow.

128

5.2.1 System Architecture

The overall architecture of a big sensing data stream including security model is

shown in Figure 5-1. The architecture includes source sensing devices to transmit

data to the DSM through wireless networks including the security model (SEEN). We

follow [4 - 5] to design a DSM which is capable of handling high-volume and variety

data streams from multiple sources. In addition, the DSM is responsible for

performing the security verification of the incoming data streams in near real time to

synchronise with the processing speed of SPE (Spark Streaming, Apache Strom,

Apache S4, etc.). For further information on stream data processing on datacentre,

refer to [128].

Along with this, we consider both source sensor and cloud data centre deployed

with Intrusion Detection Systems (IDS). Sensor based IDS monitor a sensor’s

behaviour and generates alerts on potentially malicious activities onboard and

network traffic [142]. IDS can be set inline, attached to a spanning port of a sensor.

The idea here is to allow access to all packets we wish the IDS to monitor. LEoNIDS

(low-latency and energy efficient network IDS) is a system that determines the

energy expectancy trade off by giving both lower power utilisation and lower

recognition expectancy in the meantime [141]. By highlighting cloud based IDS, Lee

et al. [143] proposed an intrusion detection system where the learning operators

persistently process and give the redesigned methods to the discovery agents for

efficient learning and real-time detection. It generally computes inter and intra audit

record patterns, this can guide the data gathering process and simplify feature

extraction from audit data. Xie et al. [144] proposed a novel technique to analyse the

system (sensor) vulnerabilities and attack sources quickly and accurately.

In this architecture, the data streams are always in the encrypted format when they

arrive at the DSM. The idea is that while encrypting the data packets at the source,

we attach sensitivity level of data to each individual data packet. In the SEEN

method, we apply different keys to encrypt the data packets for different data

sensitivity levels. The aim is to provide different confidentiality levels based on the

applications as well as the sensitivity level of the data. In a very generic

representation, if we need n levels of data security then n-1 keys ( )

are required for encryption/decryption. In this chapter, we are considering three

129

levels of data confidentiality: strong confidentiality, partial confidentiality, and no

confidentiality; and two keys (i.e. k1, k2) for encryption methods. The strong

encryption method uses k1 and is used to provide strong confidentiality, and the weak

encryption method uses k2 to support partial confidentiality. Note that we do not

need to encrypt the data packets for no confidentiality.

Data packets can be transmitted to DSM using two different ways of encrypting

data: (i) encrypt the data stream and (ii) encrypt the data packets in the stream. In

both these ways, we are going to apply encryption methods (strong/weak encryption)

based on the data sensitivity or confidentiality level. The encrypted data stream

applies to those sensors which are deployed with the sensitivity levels, whereas the

encrypted data packet applies to the sensors with different sensitivity levels for

different types of data.

Here, we follow a three step process data collection, security verification, and

stream query processing at DSM as highlighted in Figure 5-1. The focus is to perform

the security verification at DSM by providing end-to-end security of big sensing data

streams. It is also important to perform security verification of a data stream before

stream query processing in order to maintain the originality of the data for SPE. The

security verification needs to be done on-the-fly (i.e. near real-time) with smaller

buffer size. The queries including security verification can be defined as a directed

acyclic graph and each node is an operator and edges are defined as data flows

between the nodes.The above system architecture and security requirements of big

data streams [4 - 5] lead to the following two important features:

130

Data packet needs to maintain confidentiality based on its sensitivity level.

Need optimized buffer size at DSM prior to stream query processing.

Motivated by this problem, this chapter aims to address the challenge of data

integrity and multilevel confidentiality on real time massive data streams.

5.2.2 Adversary Model

We assume that a large number of sensor nodes are sources of big sensing data

streams, fully connected and can communicate to the DSM through wireless

networks. We assume that the DSM is aware of the network topology and initially

deployed node. We assume that IDS is positioned at each source device and at the

DSM so that source sensors and DSM are capable of detecting packet-loss attacks

and data modifications [137]. The DSM is treated as fully secured and protected in

the model as it resides at the cloud data centre.

An attacker has several ways of attacking big sensing data streams:

Figure 5-1: High level architectural diagram of big sensing data streams, DSM and stream data processing system for SEEN security model.

131

After the deployment, the nodes may be captured by the attacker who will then be

able to access the data stored in these nodes, as well as reprogram them and

control their actions. The attacker could therefore make nodes refuse to forward

some of the packets (Selective Forwarding attack) or even all of them (Blackhole

attack).

The attacker may capture the data packets in the middle to get the information out

of them and modify the content of a data packet. The attacker can therefore cause

the loss of confidentiality (confidentiality attack) of sensitive information and

data integrity (integrity attack).

A replay attack (also well-known as playback attack) is a network based attack

where a data stream is maliciously delayed or fraudulently repeated.

Compromising a node to drop packets and introducing interference in the

network to access/tamper with the data are, from a high-level perspective, the two

ways in which an attacker can disrupt data transmission through a packet-loss attack.

For this reason, the adversarial model covers many different attacks that aim at

causing packet losses. The other type of attack is to capture sensitive data packets

and analysed to break the data confidentiality.

Each node whose IDS detects a packet loss attack will investigate the loss; we

assume the investigating source device to be trustworthy and not to report any false

response. This assumption is particularly important for the Majority Voting

algorithm adopted as part of this approach. However, we will also present a variant

of this algorithm able to relax this constraint, and thus able to tolerate up to a

confident number of colluding investigating source nodes.

5.2.3 Attack Model

There are three main threat approaches for attack models i.e. attack centric,

software centric and asset centric. An attack centric threat model always starts with

an attacker, whereas a software centric threat model starts with system designing.

An asset centric threat model follows the information collection and assets entrusted,

so the proposed method is an asset centric threat model.

132

We assume that multiple simultaneous attacks can be carried out at the same time

at various parts of the network. In fact, the strength of the approach is that multiple

simultaneous investigations can be carried out.

The integrity of a big data stream ensures that a message sent from sources to the

data centre (DSM) is not modified by malicious intermediates. Authentication of big

data streams ensures that the data are from legitimate sources to maintain end-to-end

security services.

Data confidentiality (privacy) is a set of guidelines that restrict access or puts

limitations on specific data streams. This guarantees that given data cannot be

comprehended by anybody other than the desired receivers whether the data is in

transit or at rest.

Data confidentiality measure on a successful exploitation of vulnerability on the

target system as follows.

Strong confidentiality: Only desired recipients can read the information.

Partial confidentiality: There is considerable informational disclosure in some

situations.

No confidentiality: A total compromise of critical system information.


In this section, we present the research challenges and motivations in detail. Here we

have highlighted the challenges towards the proposed approach followed by

motivations to the research problem.

5.3.1 Research challenges

As we proposed two solutions in last two chapters, symmetric cryptographic

solution is the best way to protect data in a faster way by saving buffer space and

133

computational overhead. Multiple levels of data streams need different keys to

perform the encryption and decryption.

We proposed two security frameworks in the last two chapters, where source

sensors initialise the shared key by themselves. This is simply impossible when we

focus on using multiple shared keys for multiple sensitive levels of data in big data

streams. It will be even more complex to reinitialise the lost or desynchronised keys.

So existing security solutions are not suitable to apply for multilevel security for big

data streams. A unique solution is needed for the above specified problems.

Existing symmetric cryptographic based security solutions for data security are

either static shared key or centralised dynamic key. In static shared key, we need to

have a long key to defend from a potential attacker. Length of the key is always

proportional to security verification time and strength of security. The confidentiality

level of big data streams depends on the strength of security encryption. It concludes

that the length of a shared key used for encryption is directly proportional to

confidentiality level. This chapter divides big sensing data streams into three levels

based on the data sensitivity level. Data levels are defined as high sensitivity, low

sensitivity and open access data with a need to apply strong confidentiality, partial

confidentiality and no confidentiality at respective data levels.

From the required features of big data streams specified in the last subsection, it is

clear that security verification should be in real-time. We need a solution to generate

and initialise the multiple shared keys seamlessly. A big data stream is always

continuous in nature and huge in size. This makes it impossible to halt data for

rekeying, distribution to the sources and synchronisation with DSM. To address this

problem, we are proposing a scheme for big data stream security verification with

seamless shared key initialisation at source sensors. This process needs to be done at

the DSM, because the DSM is located at the cloud data centre and according to our

assumptions the DSM is fully secured.


may read the data in the middle while still in transit. Also the DSM needs to identify

the sensitive level of data packets before security verification. Then DSM can apply

separate keys for different levels of the data packets to perform decryption because

the sources applied separate keys to encrypt data packets based on their data

sensitivity level. In this proposed model, key exchanges happen once during the

134

handshaking process, but DSM updates the shared keys to individual sources before

expiration of current shared keys. Synchronisation between a source and DSM is

important in key update; otherwise it will result in unsuccessful security verification.



streams (i.e. 4Vs) we cannot halt the data for more time before security verification

which leads us to appoint bigger buffer size and may reduce the performance of

SPEs. So buffer size reduction is one of the major challenges for big data streams.

Proposed security solutions for big data streams should deal with reduction of buffer


5.3.2 Research Motivation

As stated in previous chapters, we cannot always apply strong encryption techniques

to maintain data confidentiality of big data streams. Security processing time

(encryption/decryption/security verification) is directly proportional to length of the

key. Length of the key used for encryption is directly proportional to strength of

security. So we conclude that we cannot always apply longer/stronger keys to protect

data confidentiality and integrity. So we decided to divide the complete data streams

into three different classes i.e. high sensitivity data, low sensitivity data and open

access data. So now we need to provide strong confidentiality for high sensitivity

data, partial confidentiality for low sensitivity data and no confidentiality for open

access data. Strong confidentiality will not allow an adversary to read data streams in

its life time, whereas partial confidentiality will protect data streams to disclose

information in real-time. Then we need to apply strong encryption for strong

confidentiality and weak encryption for partial confidentiality and no encryption for

open access data.

Another major motivation is to perform the security verification in near real time

in order to synchronise with the processing speed of SPEs [82]. Stream data analysis

performance should not degrade because of security processing time, there are

several applications that need to perform data analysis in real time. Also DSM needs

to identify the individual data packets based on its data sensitivity level to apply

separate shared key(s) to perform the decryption or security verification. So a

135

lightweight multilevel security mechanism is very important to perform security

verification in near real time and reduce buffer size.

5.4 Selective Encryption Method for Big Data Streams

This chapter proposes a selective encryption method for big data stream (SEEN)

which is furnished with key renewability and makes a tradeoff among security,

performance and resource utilisation. The SEEN security method’s salient features

are as follows:

efficient key broadcasting without retransmission;

ability to recover the lost keys with proper detection;

seamless key refreshment without interrupting data streams; and

maintain data confidentiality based on data sensitivity level.

Table 5-1 Notations Acronym Description

ith source sensing device’s ID

ith source sensing device’s secret key

DSM secret key

Initial secret key

Initial shared key generated by DSM

KSH(1) Shared key for strong encryption

KSH(0) Shared key for weak encryption

Time of packet generation

Time packet is received at DSM

Pseudorandom number

Centralised authentication code

SL Data sensitivity level

MAC Message authentication code ( ) / D( ) Encryption/Decryption function

( ) One-way hash function X-OR operation

Concatenation operation

136

We describe the proposed security method for big sensing data streams using four

independent components: system setup, rekeying, new node authentication, and

encryption/decryption. We refer readers to Table 5-1 for all notations used in

describing the security scheme. We made a number of sensible and practical

assumptions to characterize the proposed security method. We describe those

assumptions where necessary. We next describe independent components in detail.

5.4.1 Initial System setup

We follow the symmetric key method for the initial system setup because of the

limited resource availability at the source sensors [145]. In symmetric key

encryption, hashing functions need 5.9 μJ and encryption techniques 1.62 μJ whereas

in an asymmetric key, RSA-1024 needs 304 mJ to sign and 11.9 mJ to verify and

ECDSA-160 needs 22.82 mJ to sign and 45 mJ for verification [145]. So we choose

to follow symmetric key methods for initial setup. In the system setup process, DSM

always starts the process to identify the authenticated source. After successful

authentication, DSM shares the secret shared keys to the source sensors for

encryption. The initial shared key setup phase is as follows:

DSM generates a pseudorandom number (PNR) and performs a hash function

combined with its own secret key to generate a unique secret shared key. It then

encrypts the shared key by using the pre deployed secret key (k). The process is as

follows to generate a CAC (Centralise Authentication Code). The DSM broadcasts

the CAC to all the source sensors i.e. (1, …, n).

Once all the sensors receive the broadcast CAC from the DSM, sensors decrypt it

by using a pre deployed secret key (k) (i.e. ). Here we show the

operation for a single senor (i.e. ith sensor). The following is the procedure to be

performed at the sensor and sends an encrypted CAC to the DSM. The CAC contains

source ID, random number as nonce, and a timestamp to avoid replay attacks.

137

Once the CAC is received at the DSM, it decrypts and checks the source ID ( )

for authentication and retrieves the corresponding sensor secret key from its

database (Ki retrievekey (Si)). It also checks the time stamp to avoid replay

attacks. The complete procedure for authentication and replay attack avoidance is

shown below.

Ki retrievekey (Si) // for source authentication

(T- packet generated time; T′ - Packet received time)

It compares the received time frame (T) with its current time (T′) to check the

data freshness in order to avoid a replay attack (T - T′ ≤ ΔT). If the time difference is

less than ΔT , the DSM accepts the data packet otherwise the packets are discarded.

The DSM then generates a new key by performing X-OR on the existing shared

key and sensor’s secret key. The DSM uses this shared key to encrypt the nonce and

sends back to the corresponding sensor for handshaking along with a weak

encryption shared key.

After a sensor (Si) receives the data packet, it performs the same operation as the

DSM did to find the new shared keys to encrypt the data packets. It compares the

decrypted nonce ( ) with the nonce it has ( ); if both are the same, then it accepts

otherwise it rejects and starts a new authentication process. Received uses

64-bit key for weak encryption and uses 128-bit key for strong encryption.

138

If , then the sensors accept otherwise the process starts from the

beginning. The complete authentication process is shown in Figure 5-2, where we

show the stepwise process with information flow.

5.4.2 Re-keying

After this initial key setup phase, the DSM shares the shared secret key with

sensors for encryption. For the rekeying process, we follow a LiSP protocol [146]

and modify it to make SEEN more data centric instead of communication centric.

SEEN uses a key server (KS) at the DSM, that manages the security keys for both

strong and weak encryption. We use 128-bit symmetric shared key for strong

encryption and 64-bit symmetric key for weak encryption. Shared keys from KS are

always chosen to perform the rekeying operation. Along with the shared key,

individual sensors are able to perform the hash function.

In order to make the system more secure, the shared key distribution for rekeying

must be secure and fault tolerant; where “secure” means to maintain the

confidentiality and authenticity and “fault tolerant” implies the capacity to restore

the lost shared key ( ). In the SEEN method, we always use two kinds of control

packets i.e. UpdateKey and RequestKey. UpdateKey is for periodically updating the

Figure 5-2: Initial authentication methods with four step process.

139

shared key used by DSM, whereas RequestKey is used by sensors when they missed

the shared key during the rekeying process.

We follow PRESENT [147] to generate the shared key at DSM and distribute the

key before it is used for encryption at source sensors. The sensors have two buffer

places for each key; that means four buffer places are required to save the keys as

shown in Figure 5-3. The front two shared key are always used for encryption and the

back buffers contain the next shared key before the current shared key expires.

To ensure secure shared key distribution, the DSM initiates the shared key

distributions by encrypting the control packet (UpdateKey) using the current shared

key (KSH(i-1)) to distribute the next shared key (KSH(i)). The UpdateKey is always in

the format of EKSH(i-1)(KSH

i(1) KSHi(0)), where KSH is the current shared key and all

authenticated sensors have this key to perform the encryption. Let us assume the time

to change the shared key is t; this means the DSM needs to initialise the shared key

before the time t′. If the sensor did not get the shared key at time δt (t-t′= δt), then it

calls the RequestKey. The RequestKey always has the format of EKSH(Si ti), where

the control packet encrypts with the current shared key. The control packet contains

sensor ID (Si) for authentication and time slot (ti) where the sensor needs the shared

key for encryption. In such situations, the DSM sends an UpdateKey message to the

corresponding sensors. Algorithm 5-1 shows the stepwise procedure for rekeying.

5.4.3 New Node Authentication

Joining new nodes to the network is a common property of sensor networks. We

assume that the source node is initialised by the DSM during initial deployment

[148]. In such cases, source sensors always start the process to authenticate with

DSM to get the current shared key. Sensors use a control packet (i.e. InitKey) to start

the process. InitKey contains the source ID encrypted with the initially deployed

Figure 5-3: Key Selection.

140

secret key i.e. . Once the DSM receives the control packet, it checks its

authenticity. If the DSM succeeds in the authentication process, then it follows the

Initial key setup (from Figure 5-1) phase to share the current shared key. The DSM

uses the current shared key (KSH) instead of generating a new key i.e.

. At the final stage of sharing the shared key, the DSM shares the

keys along with a time stamp (ti) to source sensors ( ). For

the robust clock skew and shared information details, the source sensor can get the

information from its neighbours [146].

Algorithm 5-1. Rekeying process

t – time to rekeying

t′ – time to DSM starting the shared key distribution

δt – small time before t expires

1. At time t′: the DSM broadcasts (UpdateKey)

UpdateKey EKSH(KSH(1) KSH(0))

2. Sensors use the current shared key (EKSH) to get the next shared

key

DKSH(KSH(1) KSH(0))

3. At time δt: If any sensor does not have the next shared key

Sensors unicast to DSM (RequestKey)

RequestKey EKSH(Si ti)

4. After authentication, the DSM unicasts (UpdateKey)

UpdateKey EKSH(KSH(1) KSH(0))

Figure 5-4: Shared key management for robust clock skew.

141

5.4.4 Reconfiguration

The DSM will configure the shared key at the time of the next rekeying process, if

(1) any of the source sensors have been compromised; (2) any of the shared keys

have been revealed; (3) a source node has overtly requested the shared key; or (4) a

source has joined to participate in the data stream. The first condition forces all

source devices to be reconfigured, whereas the final two issues focus on requesting

that the source be configured. The actions required for the issues highlighted above

are summarised as follows:

(I) DSM withdraws the compromised nodes as the authenticated source, and if the

KSH(i) has been disclosed previously. This may expose all earlier shared keys.

(II) DSM computes new shared keys for both strong and weak encryption and

unicasts with control packets.

(III) DSM replies to the requesting source with current configuration.

Figure 5-5: Method to select encryption method based on the data sensitivitylevel.

142

(IV) DSM follows the authentication process, and if successful, DSM responds to

the source by initialising an InitKey control packet.

5.4.5 Encryption/Decryption

The above defined process makes both shared keys (KSH(1) KSH(0)) available at

sensors. Note that KSH(1) is always used for strong encryption, whereas KSH(0) is

always used for weak encryption. Each data block generated at sensors is a

combination of two different parts. The first part is for integrity checking and

maintaining the confidentiality level, whereas the other part is for the source

authentication (i.e., . The authentication part is always

encrypted using the strong encryption key; it contains the source ID for

authentication, time stamp (T) to avoid replay attack, and a flag value 1/0; where 1 is

for strong encryption of body part (highly sensitive data) or 0 for weak encryption of

body part (low sensitivity data). In order to encrypt the data part of the packet, every

sensor performs the XOR operation i.e. current shared keys KSH(1/0) with its own

secret key (ki), i.e. KSH(1)′ = KSH(1) Ki and KSH(0)′ = KSH(0) Ki’ then it uses the

newly generated key to encrypt the data packets. Shared key EKSH(1)′ is always used

for strong encryption, whereas shared key EKSH(0)′ is always used for weak

encryption (DATA MAC). The above specified data block encryption is always

based on the data sensitivity level and for data integrity and confidentiality.

The Capture layer of an IoT system (i.e. physical layer of sensor networks) is

responsible for obtaining the context of data from the deployed environment using

source sensors. This layer is also accompanied by classification methods, which

mostly follow the unsupervised neural network methods, such as KSOM (Kohonen

Self-Organising Map) used to categorize the real time sensed data [149]. The word

“sensor” not only signifies a sensing device but also applies to each data source that

may deliver functional context information [150]. Ganesan et al. [151] proposed a

similar kind of system, named DIMENSIONS, where authors extended sensors for

computation, storage and system performance. We follow the KSOM technique to

classify the sensed data at sensors to define the sensitivity level. KSOM uses data

mining techniques and classification to extract the data sensitivity level. Few sensors

143

are also pre-deployed with a high sensitivity level, where all generated data packets

are sent to the DSM with a high sensitivity level. The steps to select the encryption

method and shared key are shown in Figure 5-5. The strong encryption method

always uses a shared key EKSH(1) for highly sensitive data, whereas the weak

encryption method always uses the shared key EKSH(0) for low sensitivity data.

After data are received at the DSM, it always checks the authentication block and

applies the strong encryption shared key to get the authentication information i.e.

. Once it gets the source sensor ID, it checks its own database

to find the match and confirms that data packets are from authenticated sources. After

successful authentication, the DSM compares the received time frame (T) with its

current time (T′) to check the data freshness in order to avoid a replay attack (T - T′

≤ ΔT). After successfully checking for a replay attack, the DSM retrieves the

corresponding secret key of the sensors i.e. Ki retrievekey (Si) and checks the data

sensitivity level to find out the shared key used for encryption. If the data sensitivity

level is 1, then it performs the XOR operation i.e. KSH(1)′ = KSH(1) Ki, else KSH(0)′

= KSH(0) Ki. The computed new key is used for data decryption i.e. DKSH(1)′

(DATA MAC) for high sensitivity data and DKSH(0)′ (DATA MAC) for low

sensitivity data. After data decryption, the DSM compares the MAC as an integrity

check. The DSM always keeps the last shared key KSH(i-1) during use of KSH(i).

There is always the possibility of late arrival of data packets at a DSM because of

the untrusted wireless communication medium.

5.4.6 Tradeoffs

The communication overhead of the proposed security method depends on the

source network size. Each group of keys will have more chances of being

compromised as the network size increases [146]; this in turn increases the chances

of reconfigurations. The SEEN method accomplishes significant improvement over

a traditional rekeying approach by broadcasting shared keys without retransmissions.

Less frequent reconfigurations mean rich performance. In summary, a larger

network improves the performance by more proficient broadcasting. Therefore, we

144

can tradeoff between energy consumption and security to maximise the overall

performance.

The proposed shared key management is made robust by synchronizing clocks

among neighbours. As an example, Figure 5-4 describes the time domain and key-

slots of three sensors i.e. A, B and C, where there is a clock skew among three nodes.

Every UpdateKey control packet contains the time stamp to switch the shared key

(ti).

5.4.7 Required Resources for SEEN

5.4.7.1 Resources at Sensors for SEEN

We follow [140] to define the communication overhead and power consumption

theoretically. Following are the equations to define the communication overhead and

power consumption

(5-1)

PC1= 3(CSE + CSD) (5-2)

PC2= (TNSP + CSE) + (TNRP + CSD) (5-3)

PC3= (CSE + 2 × CSD) (5-4)

CO – Communication overhead

PC1 – power consumption during node authentication (initial phase)

PC2 – power consumption by a node during data transmission

PC3 – power consumption during the rekeying

Nc – total number of connections

Ni – number of packets transferred by node Si

CSE – computational power required by symmetric key encryption

CSD – computational power required by symmetric key decryption

TNSP – total number of sent packets

TNRP – total number of received packets

145

The communication overhead is always computed as a percentage and considers

the total number of communications and the number of packets transferred by each

sensor (see equation 5-1). The number of packets/total size is 74.125 bytes, whereas

the data packet size is 30 bytes. Power consumption for initial node authentication is

three times that required by both encryption and decryption processes (see equation

5-2). Each node needs a certain amount of power to participate in data transmission

and data packet encryption. The normalized form of power consumption shows in

equation 5-3. Sensors need power to decrypt the UpdateKey and also to initiate

RequestKey during the rekeying process. Sensors always need more power to

perform the encryption and decryption process. Equation 5-4 shows the

formulations for power consumption during rekeying.

5.4.7.2 Resources at DSM for SEEN

The buffer utilisations needs to be optimized at the DSM, as the security

mechanism of a big data stream needs to be performed in near real time because of

the big data stream features [4 - 5]. Here we present a procedure to compute the

halting time of a data block in a buffer before the stream data analysis is done. Let

there be n number of sensors and each sends m number of data packets. We assume

that the probability of success security verification at DSM is , or

delays with probability . We can compute Acquisition Probability

as [152]. Based on the value of A, we can measure

the resting time of the each individual data block; the resting time, represented as w,

is , where the value of w is inversely proportional to the value of A and

security verification time of DSM.

Algorithm 5-2. Selective encryption method for big sensor data streams

Description Applying different encryption (Strong/weak) to different

sensitivity levels of data.

Input Select the encryption methods for data packets based on

different sensitivity level.

Output Protect big data stream and maintain different levels of

146

confidentiality based on data sensitivity.

Step 1 SEEN System setup

1.10 DSM → Si: {Ek(KSH)}, DSM performs the centralised authentication 1.11 Si → DSM: { }, ith sensor authenticates the DSM and sends

this packet to DSM for registration. 1.12 DSM → Si: { }, DSM generates the shared key for

both strong and weak encryption and shares with corresponding source sensors.

Step 2 SEEN Rekeying

SEEN uses two control packets i.e. UpdateKey, RequestKey. UpdateKey

always uses EKSHi-1(KSH

i(1) KSHi(0)), where KSH is the current shared

key.

DSM → Si: { EKSHi-1(KSH

i(1) KSHi(0))}

If some sensors did not get the next shared key before δt, then sensor

uses RequestKey to get shared key from sensors.

Si → DSM: { EKSH(Si ti)}

Step 3 SEEN Node authentication

New node in the sensing network uses control packet named as InitKey

i.e. and sends request packet to DSM. Then DSM follows Step 1

by using current shared key.

Si → DSM: { }, follows Step 1 after this initial step.

Step 4 SEEN Encryption/Decryption

4.4 Encryption Every sensor has KSH(1) KSH(0) strong/weak encryption keys. Sensor

applies KSH(1) for authentication check and generates new key i.e.

KSH(1)′ = KSH(1) Ki to encrypt data packets.

EKSH(1/0)′ / EKSH(0)′ (DATA MAC)

4.5 Decryption DSM uses shared key (KSH(1)) for strong encryption for header.

Based on the flag value (1/0), DSM selects the shared key for

decryption, then performs the decryption process.

KSH(1/0)′ = KSH(1/0) Ki

147

DKSH(1/0)′ (DATA MAC)

5.5 Theoretical Analysis

This section provides theoretical analysis of the security scheme to show that it is

safe against attacks on authenticity, confidentiality and integrity.


We follow [4 - 5, 42 - 43] to define the following attack definitions and their

properties. Based on these attack definitions, we have proved the following theorems

and theoretical analysis.

Definition 1 (attack on authentication): An attacker Ma can attack on authenticity

if it is an adversary capable of monitoring, intercepting, and introducing him/herself

as an authenticated node to participate in the data stream.

Definition 2 (attack on integrity): An attacker Mi can attack on integrity if it is

capable of monitoring the data stream and trying to access and/or modify the data

block before DSM.

Definition 3 (attack on confidentiality): A malicious attacker Mc is an unauthorised

party which has the ability to access or view the unauthorised big data streams.

Definition 4 (replay attack): A malicious attacker Mr is an unauthorised party which

has the ability to intercept data packets and forward them later. This may cause the

loss of event detection during stream data analysis.

Theorem 1: Strong encryption (128-bit) is always safer and takes more

computational power and time than weak encryption (64-bit) in the SEEN security

model.

Proof: ECRYPT II proved that the key length of a 128-bit symmetric key provides

the same strength of protection as a 3,248-bit asymmetric key [34 - 35]. Symmetric

key cryptography becomes a natural choice for this purpose. It is mentioned with a

proof that symmetric key cryptography is approximately 1000 times faster than

148

strong public key ciphers [34]. From [4 - 5, 34], it is comparatively easy for an

attacker to read/modify packets which are encrypted with a smaller key length.

Crypto++ Benchmarks [153] also confirm that a smaller key length always takes less

time to break or find the shared key. From the above, we conclude that the key

length is directly proportional to the key domain size and also directly proportional to

the time required to find all the possible keys (see Table 3-2). This means an attacker

needs more computational time and resources to break 128-bit compared to 64-bit.

Theorem 2: DSM can easily identify delayed data packets which have been

intercepted by a replay attacker (Mr) using the SEEN security method.

Proof: A replay attack is also broadly known as playback attack, where an attacker

(Mr) intercepts the data packet(s) of data streams and forward later. The attacker also

repeatedly sends the data packets to block the DSM. This is carried out either by the

source sensor or man-in-the-middle attack.

In the SEEN security method, during encryption the source sensor always adds a

time stamp i.e. T (sending time/packet generation time) at the header part of the data

packets. This is in the format and is always used for

authentication and data freshness. For every data packet, the DSM compares the

received time frame (T) with its current time (T′) to check the data freshness and to

avoid a replay attack (T - T′ ≤ ΔT). If the time difference is less then ΔT then the data

packet is accepted otherwise it is discarded. Where ΔT tends to maximum time takes

to transmit data between source sensors and DSM. After successfully checking for

replay attack, DSM follows the data decryption process.

Theorem 3: In the SEEN security method, an attacker Mc cannot access or view the

unauthorised high sensitive data stream, whereas Mc cannot read low sensitivity data

stream in real time.

Proof: By following Figure 5-5 and Algorithm 5-2, it is clear that every data stream

or data packet within a stream is transmitted with a sensitivity level. Here in the

SEEN method, we consider two sensitivity levels i.e. ‘1’ for high and ‘0’ for low. For

highly sensitive data, sensors use 128-bit shared key i.e. KSH(1). The computational

hardness of the shared key is shown in Table 3-2. It shows that the most advanced

processor (Intel i7) takes decades to find all possible shared keys to perform the

decryption operation. Attacker Mc cannot read the high sensitivity data (strong

confidentiality).

149

For lower sensitivity data in the SEEN model, sensors use 64-bit shared key i.e.

KSH(0). Attacker Mc also needs years to get all the possible keys to decrypt the data

packets (see Table 3-2). The maximum time to update the shared key (i.e. t) is always

less than the time required for 64-bit in Table 3-2. So attacker Mc can read the low

sensitivity data but it is not possible in real time.

By following the above, we can confirm that the SEEN model maintains

confidentiality for sensitive data and partial confidentiality for low sensitivity data.

Theorem 4: An attacker Ma cannot forge the source to introduce itself as an

authenticated source and attacker Mi cannot get the shared key KSH to break data

integrity in the proposed security method SEEN.

Proof: The IDS at a sensor always monitors the sensor behaviour [141 - 142] and

reports to the DSM if it is captured by an attacker Ma. In this situation, the DSM

ignores the specific source sensor and does not consider the data packets from that

sensor for data analytics. DSM checks the authentication for each individual data

packet, where data packets arrive in a format . DSM

always applies the strong encryption shared key (i.e. KSH(1)) to decrypt and check for

authentication. After decryption, DSM compares the Si with its own database to

authenticate the source.

After a source device is authenticated, the DSM retrieves the corresponding secret

key of the sensors i.e. Ki retrieveKey(Si) and checks the data sensitivity level to

select the shared key used for encryption.

Based on the data sensitivity level, the DSM performs XOR operation i.e.

KSH(1/0)′ = KSH(1/0) Ki. The newly computed shared key will be used for data

decryption i.e. DKSH(1/0)′ (DATA MAC). After data decryption, DSM compares the

MAC as an integrity check. Through the MAC check, the DSM confirms that the

integrity of the data is intact.

The major drawback (this is also applicable to all other security models) is that the

confidentiality and integrity checks can be broken with a brute force attack.

Theorem 5: The proposed SEEN requires a comparatively small buffer size

compared to standard symmetric key solutions (i.e. AES-128) at DSM before stream

query processing.

Proof: Following Algorithm 5-2, it is clear that the proposed SEEN security method

provides a high level of data confidentiality for sensitive data, while it provides

150

partial confidentiality for low sensitivity data. We decrypt the header part for

authentication (see Theorem 4) and data freshness (see Theorem 2); after successful

authentication, we decrypt the data block for integrity checks. Another important

mechanism is the different keys with the key length used for encryption/decryption.

From Theorem 1, it is clear that key length is directly proportional to security

verification and security verification speed is inversely proportional to the buffer

required for security verification. By combining the above, we conclude that the

proposed SEEN security method needs a comparatively small buffer size. The

evaluation proof is in the following section.

5.5.2 Forward Secrecy

By following a standard symmetric key cryptography procedure, shared keys

used for encrypting data packets are only used once and until they are expired (i.e.

for time period t). Thus, previously used shared keys are worthless to an intruder

even when a previously-used shared key is known to the attackers. This is one of the

major advantages of frequent changing of the shared key. This is the reason we use

symmetric key cryptography over asymmetric-key cryptography. However, if an

intruder continuously monitors the data stream for a long period of time, he/she can

break the confidentiality of the low sensitivity data but not the high sensitivity data

(see Table 3-2). In order to maintain the different level of confidentiality, the SEEN

security method uses two different keys for different levels of data sensitivity. At the

same time data integrity is always maintained.


In order to evaluate the security strength and efficiency of the SEEN security

method under the above specified adverse situations, we experimented in multiple

simulation environments. The experiment was conducted using the in-house

simulators on an Intel (R) Core (TM) i5-6300 CPU @ 2.40 GHz 2.50 GHz CPU and

8 GB RAM running on Microsoft Windows 7 Enterprise. We first verified the

proposed security approach using Scyther [119]; second, we measured the

151

performance of the approach using JCE (Java Cryptographic Environment) [120];

third, we computed the required buffer size to process the proposed approach using

MatLab [121] to measure the efficiency of the security method; finally, we used

COOJA simulator in Contiki OS [118] to get the network performance of SEEN.


The SEEN security protocol is simulated in the Scyther simulation environment

by using the underlying Security Protocol Description Language (.spdl). Scyther is

an automatic security protocol verification tool that can be used to check the

correctness of the security protocols. As per the Scyther model, we defined the roles

of S and D, where S is a sensing device and D is the receiver (i.e. DSM). In this

scenario, S and D have all information for encryption/decryption that is initialised in

the system setup and rekeying phase. In this simulation environment, S sends the

encrypted data packets to D for security verification. We introduced three types of

attacks. First, an attacker changes the data packet while it is network. In the second,

an adversary steals the property of source (S) and forwards the data packets to D

pretending to be S. In the third, an adversary gets the data block to analyse and tries

to read the data and replay the data packets. We experimented with 100 runs with 10

run intervals for individual claims with results as shown in Figure 5-6. Here, we

model the security method by following the previous section and used different key

sizes (i.e. 64 bits, and 128 bits) in random data packets. Here we follow the SEEN

method to update the different keys (see Table 3-2).

152

Results: This experiment ranges from 0 to 100 instances in 10 intervals using

different numbers of data blocks. We checked the data integrity and confidentiality

after data packet authentication. As the key generation and distribution process is

handled by DSM, so we assumed that none of the intruders have the shared secret

key. We are using two different keys for the encryption process i.e. (K(0)) for weak

encryption and ( (1)) for strong encryption. This also confuses the intruder in

attempting to guess the key. During this experiment, we did not come across any

potential attacks at the DSM to compromise the shared key, so it is secured in terms

of confidentiality and integrity. Figure 5-6 shows the result of the security

verification experimented with in the Scyther simulation environment. Finally, we

conclude that the proposed model is secured against confidentiality and integrity

attacks.

(a) Scyther simulation result page of successful security

verification.

(b) Scyther simulation result page of successful security at DSM.

Figure 5-6: Scyther simulation result page of security verification.

153

In practice, attacks may be more sophisticated and efficient than brute force

attacks. Here, we model the process as described in the previous section and used

different key size (i.e. 64 bits, and 128 bits) in random data packets. The efficacy of

the proposed security shows in two important instances (i) during the security

verification at DSM and (ii) during the neighbour authentication process. We used

Scyther, an automatic security protocols verification tool for verifying our model.


We used JCE (Java Cryptographic Environment) to experiment on and evaluate

the performance of the SEEN method. JCE is the standard extension to the Java

platform that provides an implementation context for cryptographic methods. The

experiment is based on the features of the JCE in 64 bit Java virtual machine version

1.6. The security verification time of the experiment is computed in the DSM. The

experiment outcomes for security verification are shown in Figure 5-7. We

performed experiments and compared security verification time with different sizes

of data packets. We compare the performance of SEEN security with advanced

encryption standard (AES-128, AES-192), LSec and previously proposed models for

big sensing data streams i.e. DPBSV and DLSeF [4 – 5, 140].

Figure 5-7: Performance comparison SEEN method with AES-128, AES-192,

LSec, DPBSV, and DLSeF.

154

Results: The experiment results of the SEEN security method are better than

AES-128, AES-192 and LSec algorithms with different data packets as shown in

Figure 5-7. SEEN does not use the trusted part of sensor (i.e. TPM) and avoids

confidentiality attacks in comparison to DPBSV and DLSeF (see Table 5-2). So

even though the performance of SEEN is not as good as DPBSV and DLSeF, it is

acceptable in any circumstance of sensor network applications. The performance of

SEEN shows that it is more efficient and faster than AES-128, AES-192 protocols

while providing the same level of security and removing some of the unrealistic

assumptions of DPBSV and DLSeF.


This experiment for the required buffer size at DSM was carried out using a

MATLAB Simulation tool. The buffer size is based on the security verification time

at DSM (from Figure 5-7) with respect to different velocity of big data streams. This

performance is based on the verification time calculated as shown in Figure 5-8.

Here we compared the SEEN security method with standard AES-128, AES-192,

LSec, DPBSV and DLSeF (see Figure 5-8). The velocity of big data streams starts

from 50 to 300 MB/S with a 50 MB/S interval. The required buffer size for SEEN is

Figure 5-8: Efficiency comparison by comparing required buffer size at DSM

for security processing.

155

always smaller than the AES-128 algorithm with different rates of incoming data.

Figure 5-8 shows the minimum buffer size required at the DSM for the SEEN

method in comparison with AES-128, AES-192, LSec, DPBSV and DSeF. The

performance comparison proves that the SEEN method requires less buffer and is

efficient in performing security verification without compromising any security

properties.

5.6.4 Network Performance

We tested the SEEN protocol using a COOJA simulator in Contiki OS to get the

network performance (i.e. communication overhead and power consumption) [118].

We took the two most common types of sensor (i.e. Z1 and TmoteSky sensors) for

network simulation. In this experiment, we checked the performance while

computing and distributing the shared key.

For network simulation, we took a random area to deploy 51 nodes (i.e. 50

sensors and 1 DSM) in a COOJA simulation environment. We took initial battery

power of an individual sensor node 1x106 J, power consumption for transmission is

1.6W and power consumption for reception is 1.2W. Apart from these, we follow

the default properties of Z1 and TmoteSky sensors. We assume that the size of each

Figure 5-9: Energy Consumption.

156

data packet is 30 bytes, nonce 23 bits, secret key of 64/128 bits and token 4 bytes for

the simulation [140].

In order to compute the performance of the communication overhead, the

simulation used data packets of size 30 bytes of continuous interval. We follow

equation 5-1 to get the communication overhead. The performance of the

communication overhead is computed as a percentage (%) with respect to the

number of data packets as shown in Table 5-3. According to the network properties,

the communication overheard is inversely proportional to the number of packets in

the network. The simulation results also show similar performance to Table 5-3.

For every connection, SEEN exchanges control packets for source/DSM

authentication and shared key distributions based on the above specified packet sizes.

This is an acceptable tradeoff between energy and security for the sensor node. The

simulation results of energy consumption are shown in Figure 5-9. The SEEN

protocol required extra battery power for the network authentication but its

difference is very low. The energy consumption by using the SEEN protocol

remains the same even by increasing network size. We simulated the scenario using

50 nodes in 10 nodes interval as shows in Figure 5-9.

Table 5-2 Performance and Properties of Security Solutions

AES DPBSV DLSeF SEEN

Authenticity

Integrity

Confidentiality P P

Trust on Sensor Node

(TPM)

Computation HIGH LOW LOW LOW

Table 5-3 Communication overhead of SEEN protocol

NP 10 20 30 40 50 60 70 80

CO (%) 25 23 12.8 11 8 6.8 6 6

157

From all the above security analysis and experiments, we conclude that the

proposed security method (i.e. SEEN) is secured (from multi-level confidentiality

and integrity attacks), and efficient in terms of security verification speed and

required buffer size at DSM (compared to AES-128, AES-192 and LSec). Table 5-2

shows the comparisons of SEEN security properties with AES-128 and existing

DPBSV and DLSeF. It clearly shows that the proposed method provides the same

level of security as AES-128 while reducing computational overhead.

5.7 Summary

This chapter proposed a Selective Encryption (SEEN) method to maintain

confidentiality levels of big sensing data streams with data integrity. In SEEN, a

DSM independently maintains intrusion detection and shared key management as the

two major components. The method has been designed based on a symmetric key

block cipher and multiple shared key use for encryption. By employing the

cryptographic function with selective encryption, the DSM efficiently rekeys without

retransmissions. The rekeying process never disrupts ongoing data streams and

encryption/decryption. SEEN supports source node authentication and shared key

recovery without incurring additional overhead. We evaluated the performance of

SEEN by security analyses and experimental evaluations. We found that the SEEN

method provides significant improvement in processing time, buffer requirement and

protected data confidentiality and integrity from malicious attackers.

158

Chapter 6

Access Control Framework for Big


Chapter 5 solved the important steps of security verification by providing multilevel

security based on data sensitivity level in big data streams. Another important step is

to control the information leakage after security verification of big data streams. We

refer to this as an access control or information flow control problem over big

sensing data streams. To address this problem, we propose a lattice based

information flow control over big sensing data streams. We initialised two static

lattices i.e. sensor lattice for source sensor and user lattice for user or query

processor. We consider static lattices to process the information flow model faster,

because we are dealing with big data streams i.e. high volume and velocity of data

streams. The experimental results of the proposed information flow model show that

it can significantly handle the incoming big data streams with low latency and buffer

requirement.

6.1 Introduction

159

Data Stream Management Systems have been increasingly used to support a wide

range of real-time applications (e.g. military applications and network monitoring,

battlefield, sensor networks, health monitoring, financial monitoring) [115]. Most of

the above application, need to protect sensitive data from unauthorised accesses. For

example, in battlefield monitoring, the position of soldiers should only be accessible

to the battleground commanders. Even if data are not sensitive, it may still be of

commercial value to restrict access. So there is need of classification of types of

data/source accessible to end users. In another example, financial monitoring service,

stock prices are delivered to paying clients based on the stocks they have subscribed

to. Hence, there is a need to integrate access control mechanisms into a stream

manager. All the above applications also deal with stream data. As a first step to this

direction [154] presented a role-based access control model specifically tailored to

the protection of data streams. Objects to be protected are essentially views (or

rather queries) over data streams. The model supports two types of privileges: a read

privilege for operations such as selection, projection, and join and aggregate

privileges for operations such as min, max, count, avg, and sum. Another important

issue to be addressed is related to access control enforcement. This issue is further

complicated by the fact that access control mechanisms should process in real time

with high volume and velocity of the data. Nonetheless, one of our goals is to

develop a framework which is as lightweight as possible, independent from the

target stream engine.

One of the key decisions when developing an access control mechanism is the

strategy to be adopted to enforce access control. In this respect, three main solutions

can be adopted: preprocessing, post processing, and query rewriting. Preprocessing

is a naïve way to enforce access control according to which streams are pruned from

the unauthorised tuples before entering the user query. The main drawback of this

simple strategy is that it works well only for very simple access control models,

which, unlike ours, do not support policies that apply to views. We believe that this

is an essential feature to be supported, because it allows the specification of very

useful access control policies. For instance, if preprocessing is adopted, it is not

possible to enforce a policy authorizing a captain to access the average heartbeats of

his/her soldiers, but only during the time of a certain action and/or of those soldiers

positioned in a given region. In contrast, post processing first executes the original

160

user query, and then it prunes from the result the unauthorised tuples before

delivering the resulting stream to the user. Like preprocessing, this strategy has the

drawback that it does not support access control policies defined over portions of

combined streams. On support of the lattice structure, we design novel secure

operators (namely, Secure Read, Secure View, Secure Join, and Secure Aggregate)

that filter out from the results of the corresponding (not secure) operators those data

instances that are not accessible according to the specified access control policies.

The first article for access control over data streams supports a very expressive

access control model and, at the same time, is, as much as possible, independent

from the target DSMS [115]. In order to address the aforementioned challenge, we

have proposed an information flow control model using static lattices for source

sensors and users/query processors. Our method is based on typical lightweight to

handle big data streams. The main contributions of the chapter can be summarised as

follows:

We have developed and designed a novel information flow model to control the

access to big data streams using lattice structure.

Our proposed model uses two static lattice structures to map the data in a faster

way. Both of the lattice structures are divided with three level of data from the

SEEN method (from last chapter).

We validate our proposed method by theoretical analyses and experimental

results.

We evaluated the performance of the proposed model in realtime Kafka cluster.

The remainder of this chapter is organised as follows: background studies are

reviewed in the next section, Section 6.3 provides the system design consideration

including definitions, QoS requirements, Section 6.4 describes access control

mechanisms over big data streams, Section 6.5 evaluates the performance and

efficiency of this model through experimental results and Section 6.6 summarises

the contributions in this chapter.

161

6.2 Background Studies

In 2005, Stonebraker et al. [24] initially highlighted the eight requirements of real

time stream processing which makes stream processing research more challenging

and different to batch processing. In 2009, Nehme et al. [82] proposed a spotlight

architecture to highlight the need for security in data streams and differentiate the

security requirements of data (called data security punctuations) and query side

security policies (called query security punctuations).

6.2.1 Stream Processing

The Data Stream Management System, also known as STanford stREam data

Manager (STREAM), was initially developed by Arsu et al. in 2003 [23]. STREAM

is designed to deal with high velocity data rates and substantial numbers of

continuous queries through thoughtful resource allocation. Most of the work carried

out in the Data Stream Management System addresses different issues ranging from

theoretical modelling and analysis to executing comprehensive models to deal with

high speed data streams and response in real-time (near real-time). Research

methodologies include: STREAM [23], Aurora [155], and Borealis [26]. In data

stream management systems like STREAM [23], Aurora [155], and Borealis [26],

queries issued by the same client in the meantime can share Seq-window

administrators.

According to the STREAM framework, Seq-window administrators are reused by

queries on indistinguishable streams. Rather than developing the sharing of parts

between arrangements, Aurora research focuses on giving better execution over vast

numbers of queries. Aurora achieves this by clustering administrators as a basic

performance entity. In Borealis, the data on input information criteria from query

processing can be shared and changed by new approaching queries. StreamCloud is

a large scalable reliable streaming system to handle large scale data streams on

clouds [139]. StreamCloud utilises a new parallelization strategy that separates input

queries into subqueries apportioned to free arrangements of hubs to reduce the

circulation overhead. Even though numerous methodologies focus on scheduling

162

and revising for QoS, distributing execution and computation by the same user at

various times or by various users in the meantime are not supported in stream

processing engines. Other than common source Seq-windows as in DSMS, sharing

intermediate computation results is a superior approach to improving performance.

The focus of this research was on the performance of query processing, but not

much on the security issues in data streams. Nehme et al. [82] highlighted security

aspects of data streams, where the following subsection describes details about

security issues.

6.2.2 Stream Security

There have been several recent works on securing data streams

[82][156][157][158][154][159][160][161] focusing on query security punctuations

i.e. access control over data streams. In spite of the fact that these frameworks

support secure processing they are unable to avoid illegitimate data streams or data

security. Punctuation based enforcement of access control on streaming data is

proposed in [161]. Access control strategies are retransmitted each time, utilising

one or more security accentuations before the real data are transmitted. Both

punctuations have been prepared by streamshield (a unique filtration) for query plan.

Secure query processing in a shared manner is proposed [156]. From the

streamshield concept, the authors show a three-phase system to enforce access

control without presenting any unique operators, rewriting query, or influencing

QoS. Supporting role-based access control through query rephrasing strategies is

proposed in [154][158]. Query arrangements are reorganised and policies are

mapped to an arrangement of guide and filter operations to authorise access control

policies. The architecture in [159] utilises a post-query channel to implement access

control strategies at a stream level. The channel applies security arrangements before

a client gets the outcomes from SPE, but after query preparing. Designing SPEs

checking multilevel security imperatives has been tended to by authors in [157]. Xie

et al. [160] adopt a Chinese Wall policy to protect and avoid sensitive data

disclosure at DSMS.

163

The focus of this research was on query security punctuation, however data security

punctuation i.e. end-to-end security between source and SPE, is the mission. The

following subsection describes details of end-to-end security.

6.2.3 Chinese Wall Policy

Brewer and Nash [162] first demonstrated how the Chinese Wall policy can be used

to prevent consultants from accessing information belonging to multiple companies

in the same conflict of interest class. However, the authors did not distinguish

between human users and subjects that are processes running on behalf of users.

Consequently, the model proposed is very restrictive as it allows a consultant to

work for one company only. Sandhu [114] improves upon this model by making a

clear distinction between users, principals, and subjects, defines a lattice-based

security structure, and shows how the Chinese Wall policy complies with the Bell-

Lapadula model [163].

6.3 Design Consideration

In this section we give the system architecture, access control definitions, QoS

parameters and adversary model of the information flow control of big data streams.

6.3.1 System architecture

This section presents our example application that motivates the need for secure

stream processing in adaptive computing environments. We have a security

verification model that aims to prevent and detect attacks in real-time in the DSM

before data reaches the cloud. Such a service (i.e. security verification) provides

warning about various types of attacks, often involving multiple sources or data

stream in transit.

Figure 6-1 shows a multi-tier architecture of the big data stream access control

using a lattice model. This architecture figure includes source sensing devices to

transmit data to the DSM through wireless networks including a lattice model to

164

control the data stream access using information flow control model. Several

applications such as military monitoring and healthcare need to protect data against

unauthorised disclosure [138][140]. Various types of auditing may take place in the

data centre. The first level is the information flow control after security verification

at DSM, represented by the access control over data streams to specific users or

query processors. In this phase, the access control activities to data streams are

analysed in isolation. The next level is the data processing in the SPE and stream

query processing. The stream data processing is shown with connecting dark arrows,

which depicts the internal communication within the cloud. For further information

on stream data processing on data centre, refer to [128].

Along with this, we follow the SEEN method (from the last chapter) to deploy

Intrusion Detection Systems (IDS) at source and cloud data centre. Sensor based

IDS monitor a sensor’s behaviour and generates alerts on potentially malicious

activities onboard and network traffic [142]. IDS at source side can be set inline,

attached to a spanning port of a sensor [142]. IDS monitors a sensor’s behaviour and

generates alerts on potentially malicious sensor behaviour. The idea here is to allow

access to all packets we wish the IDS to monitor. IDS in a cloud data centre

generally computes inter and intra audit record patterns, this can guide the data

gathering process and simplify feature extraction from audit data [143]. This

technique analyses the system vulnerabilities quickly and accurately.

In our architecture, the data streams are always in the encrypted format when they

arrive at the DSM. DSM performs the security verification by using the shared keys

(initiated and distributed by the DSM from the SEEN method). The shared key

selection is based on the Flag value (FV) associated with incoming data packets.

After security verification, DSM sends data to the stream query processor and end

users. As data packets come with different sensitivity level or flag value, there must

be restriction on the data stream access.

Due to the characteristics of streaming data, there are a number of inherent

challenges that make continuous access control enforcement a challenging task. First,

a common characteristic of data streams is their high volume and high velocity of

data. It is not feasible to store all streaming data tuples with all their security

restrictions (which may be numerous and of fine granularity, as a result of large

165

number of users and various preferences for security) and take random accesses as

done in traditional databases. One scan of data and its security restrictions with

compact memory usage is required. Second, due to large data volume and velocity,

the speed of access control enforcement algorithms must be extremely fast, to be

synchronised with the incoming data. Third, given that continuous queries are

typically long running, and due to users’ mobility, wireless connectivity and changes

in preferences, data and its “sensitivity” are likely to change during the query

execution lifetime. Thus, the access control mechanism must be adaptive to possibly

very frequently changing and quite complex security policies. The foremost

challenge is the speed of enforcement. The security policies must take effect

immediately and with the correct precision to prevent any information leaks that

may occur, when access is no longer authorised, and to ensure that the access to data

is not denied, when an access privilege has in fact been granted, especially when it is

crucial to view the data (e.g. in case of an emergency). Finally, since the results in

streaming environments are expected to be produced in near real-time and security-

related processing is nothing but an added “overhead” compared to traditional

continuous query processing, the cost of the access control enforcement mechanism

must be as low as possible, not to decrease the utility of DSM.

By considering the above features and limitations of big data streams, we defined

our architecture. Figure 6-1 gives an overview of the access control architecture

using a lattice model. Here we divide the architecture into three parts while

describing the access control. Firstly, there is a set of sensors deployed at the source

area to sense data and send back to cloud data centre for analysis and decision

making. Every sensor is associated with a static lattice (we called it sensor lattice

(A1, B1, C1)) with predefined sensitivity level. Every sensed data packet associated

with a FV is sent to the cloud data centre. Secondly, DSM performs the security

verification before data reaches the data centre/query processor. DSM performs the

security verification over big data streams to ensure the data confidentiality,

integrity and authenticity. Once data is verified for its originality, DSM pushes data

streams to the query processor. Finally, control the user access of big data stream

using lattice model. At this step DSM pushes data to a set of users and query

processors. We are also maintaining a static lattice (we called it user lattice (A2, B2,

C2)) with predefined sensitivity level of users and query processors. This user and

166

query processor classification is based on the access to the sensitivity level of data

streams. This lattice mapping always satisfies the partial order relating, where a

class of users/query processors can access the same level of sensitivity data or less

sensitive data. They are not allowed to access higher sensitivity data.

6.3.2 Definition

In the following, we present an information flow model for big data streams to

protect against improper leakage and disclosure. We provide an information flow

model that is adapted from the lattice structure for the Chinese Wall proposed by

Sandhu [114]. We follow [160] to define the following definitions related to access

control over big data streams using a lattice model.

We have a set of data classes that provide access identification. These classes are

partitioned into conflict of interest classes based on the data access level. Classes

provide access to the same or lower level classes. Consequently, it is important to

protect against disclosure of sensitive information to unauthorised or users/query

processors. We begin by defining how the conflict of interest classes are represented.

Definition 1. [Conflict of Interest Class Representation:]

The set of companies providing service to the cloud are partitioned into a set of n

conflict of interest classes, which we denote by COI1, COI2, COI3. Each conflict of

interest class is represented as COI1 ≥ COI2 ≥ COI3.

Figure 6-1: Overview of access control of big data streams using a lattice model.

167

We next define the security structure of our model. Each data stream, as well as the

individual tuples constituting it, is associated with a security level that captures its

sensitivity. The security level associated with a data stream dictates which entities

can access it. An input data stream generated by sensors offering some service has a

security level that captures the organisational information. Input streams may be

processed by the stream processor to generate derived streams. Before describing

how to assign security levels to derived data streams, we show how security levels

are represented.

Definition 2. [Security Level Representation:]

A security level is represented as an n-element vector [i1, i2, . . . , in], where i j

COIj { } {T} and 1 ≤ j ≤ n. i j = signifies that the data stream does not

contain information from any sensors in COIj; i j COIj denotes that the data

stream contains information from the corresponding source sensor in COIj .

Consider the case where we have three COI classes, namely, COI1, COI2, and COI3.

The stream generated by sensor of COI1 has a security level of {1}. Similarly, the

stream generated by source sensors in COI2 has a security level {2}. Finally, the

stream from sensors in COI3 has a security level {3}. We next define the dominance

relation between security levels.

Definition 3. [Dominance Relation:]

Let L be the set of security levels, L1, L2 and L3 be three security levels, where

L1,L2, L3 L. We say security level L3 is dominated by L2, similarly security level

L2 is dominated by L1 and all these security levels satisfy partial order relation.

6.3.3 QoS Requirements

Constraint 1: Latency

Continuous big data streams and mapping to end user or query processor have to

satisfy certain QoS requirements with regards to performance, memory usage, and

accuracy. In this work we only consider the QoS performance metrics which is in

terms of latency. Latency is the amount of time it takes for a data instance to be

processed through lattice structure including any wait time incurred. Thus, it is the

duration from the time a data instance arrives at the leaf node of the operator tree to

168

reach the output buffer of the root node. Note that data instance latencies are

applicable only for data that are used in the output computation. We follow [164] to

design the latencies in our work which are computed as follows.

Let pi = <vi1, vi2, . . .vin> be one such path in an operator tree where vij , 1 ≤ j ≤ n,

denotes a vertex in the operator tree. Latency of vij , denoted by latency(vij), depends

on the operator type, specific algorithm for computing the operator, the waiting time

encountered in the queues, and the window size for the blocking operator. Latency

of (vik, vi(k+1)), denoted by latency(vik, vi(k+1)), depends on the size of the results sent

from vertex vik to vertex vi(k+1) and also on the bandwidth of the channel connecting

vik to vi(k+1). The tuple latency along this path pi, denoted by latency(pi), depends on

the processing latency at each vertex and the communication latency at each link in

this path.

Let m be the number of operator paths in the set of operator trees. The total tuple

latency of the system, denoted by system_latency, adapted from [165], is computed

as follows:

Where wi is the ratio of the number of output tuples along path pi to the number of

output tuples from the set of operator trees. QoS requirements may be that the

system latency be less than some given threshold value, denoted by threshold. Thus,

Constraint 2: Processing/Storage Capacity of Nodes

Since stream processing nodes are resource constrained, it is important to calculate

the resource utilisation of the nodes during information flow through lattice

generation. We use the notation nij to signify that the operator tree vertex vj is

executing at the node i. Recall that cpu(i), memory(i) indicate the available CPU and

memory for node i. Let proc(i), mem(i) denote the processing cost, storage cost

respectively incurred at node i due to the execution of the various operators. We

169

follow [164] to define the processing/storage. and

. Note that, proc(i) ≤ cpu(i) and mem(i) ≤ memory(i) at any

given point of time. In the context of our example, node N8 may not have the

capacity to perform multiple select operations.

6.3.4 Adversary Model

In our architecture from Figure 6-1, we assume that a large number of sensor nodes

are sources of big data streams. These sensors are fully connected and communicate

to DSM through a wireless medium. We assume that the DSM is aware of the

network topology and initially deployed node from the SEEN method (from the

previous chapter). We also assume that IDS is positioned at each source device and

at the cloud data centre so that source sensors and cloud data centre are capable of

detecting packet-loss attacks and data modifications [137]. The DSM is treated as

fully secured and protected in our model as it resides at the cloud data centre, where

we are performing security verification over big data streams. After successful

security verification, we implemented the access control mechanism over big data

streams to protect against information leakage and unauthorised access.

There can be several ways of attacking over big sensing data streams access:

Several applications such as health monitoring, the patient may not want any

unauthorised user or query processor to access his/her health data. Here privacy

protection of personal health data is crucial.

As we have created three sensitivity levels of data as well as users (i.e. high

sensitivity, low sensitivity, open access). Data should not be accessible to lower

level of users, rather it should accessible to the same and higher levels.

Data level in data streams should not be modified in the transit between source

sensor and DSM.

Each node whose IDS detects a packet loss attack will investigate the loss; we

assume the investigating source device to be trustworthy and not to report any false

response. This assumption is particularly important for the Majority Voting

algorithm adopted as part of our approach. However, we will also present a variant

of this algorithm able to relax this constraint, and thus able to tolerate up to a

170

confident number of colluding investigating source nodes. This is solved using the

SEEN method before access control mechanism implementation.

6.4 Access Control Model

According to Sandhu’s definition on lattice based access control, users are

defined as humans, subjects are processes and objects are files [114]. We follow the

same way to define our system, where users are humans and subjects are query

processors (QP) and objects are data blocks after security verification at DSM. We

use a standard five steps/stages process for the information flow control model. The

five stages are as follows:

Stage 0: structure module;

Stage 1: information flow between the levels;

Stage 2: recursive lattice construction;

Stage 3: conflict of interest;

Stage 4: decision over data access.

By following the above steps, information flow control policies specify under

which conditions information may be exchanged or accessed by the users and query

processor.

From the previous chapter’s description of security verification at DSM, sensors

always generate the data packets with the format {DATA; 1/0; Si, Si/DSM}, where

DATA means encrypted data packets, 1/0 means the flag value (FV) to define the

data sensitivity level, Si means the source of the data and finally Si/DSM shows who

has the influence to modify the data packets. As the shared keys are generated and

distributed by the DSM, DSM always has access and influence to modify the data

packets. Source sensor (Si) also has influence to access and modify the data packets

as it generates the data packets and encrypts using authenticated shared keys. After

security verification at DSM, we check the flow model (FM) to define the access

control. This flow model controls the access and information flow, and opens data

packets to only authenticated users and query processors. We made the flow model

171

simple and defined the static lattices for lightweight processing over big data

streams. There are three different ways of flow management, namely no

management, centralised management and distributed management. We follow

centralised management at DSM after security verifications. We defined our flow

model as follows

FM = <S, O, SC, →>

Where: S = Subjects

P = Processes

SC = Security Classes

= Can-flow relation on SC

Here we did not add an operations option in our FM, because our focus is only to

read or access the data stream instead of writing. We define a static lattice for

sensors, which will label incoming data streams and a static lattice for users to

define the access class for both user and query processor. The lattice structure with

access policy is shown in Figure 6-2. The lattice is a Directed Acyclic Graph (DAG)

with a single source and information is permitted to flow from a lower class to upper

class. We have divided our lattice into three classes i.e. {A, B, C}, where 1 is for

user lattice i.e. {A1, B1, C1} and 2 is for sensor lattice i.e. {A2, B2, C2}. We

defined A as the highest class (i.e. for high sensitivity information), followed by B

defined as lower class (i.e. low sensitivity data) and finally C is defined as lowest

class for open access information.

From previous chapter source sensor (Si) uses two different shared keys i.e.

KSH(1), KSH(0) for data packet encryption. Sensors use KSH(1) for strong encryption

Figure 6-2: Lattice model for data access

172

to protect high sensitivity data and append FV as 1 in data packet to define the data

sensitivity level and shared key used for data encryption, whereas sensors use KSH(0)

for weak encryption and append FV as 0 in data packets for low sensitivity data.

Finally, sensors do not encrypt the data packets for open access data. DSM identifies

the data packets with associated FV, which shared keys use to decrypt the data

packets and data sensitivity class for access control. There may be possibilities of

FV modification, when data is in transit between source sensor and DSM. In such

cases, DSM cannot decrypt the data packets using current shared key and drops

these data packets as they have already been modified before reaching the DSM. So

these data will not appear at the sensor lattice structure to map into the user lattice.

Figure 6-2 shows the access policy, where a class of user lattice has access to the

same and lower level classes of sensor lattice. We follow a modified Chinese Wall

model for information flow control by Snadhu [114] to define the conflict of interest

between the classes. This access policy always satisfies the properties of reflexive,

antisymmetric and transitive (i.e. partial order). Partial ordering on a set L is a

relation where:

a L, a a holds (reflexive)

a,b L, if a b, b a, then a = b (antisymmetric)

a,b,c L, if a b, b c, then a c (transitive)

Figure 6-3: Experiment Setups

173

We follow this partial order relation between the classes of lattice to define

access control of big data streams. This tends to query security punctuations (qsps)

of data streams.

6.5 Experimental Evaluation

6.5.1 System setup

This section presents the experimental evaluation of our access control

mechanism. The specification of the machine involved in the benchmarking is

depicted in Table 6-1. Since Kafka is currently in single machine cluster, the

benchmarking utilised only a single CPU core of the machine. The experiments have

been conducted using a Java application i.e. producer in Kafka cluster that we term

as the Dataset Reader. The Dataset Reader reads and separates text files and turns

them into data stream by using the producer’s API. We are using two sets of datasets

i.e. (1) HT sensor data (from home activity sensors) and (2) twin gas sensor datasets

[166]. HT sensor dataset size is approximately 2500MB and there are 1 million data

instances from 100 sensors. The twin gas sensor dataset includes the recordings of

Figure 6-4: Mapping time for HT Sensor Dataset

174

five replicates of an 8-sensor array. Each unit holds 8 MOX sensors and integrates

custom-designed electronics for sensor operating temperature control and signal

acquisition. Twin gas sensor dataset size is approximately 195 MB and there are

around 3.5 million data instances. The data descriptions are in Table 6-2. We have

conducted a set of performance experiments on the content-based broker, but did not

assess the runtime information for the actual filtering engine.

Table 6-1 Machine Specifications Hardware Description

CPU Intel(R) Core(TM) i5-6300U CPU @ 2.60 GHz 2.50 GHz

CPU Cores 8

RAM 8 GB

OS Ubuntu 15.10, Linux kernel 4.2.0-16

Kafka kafka_2.11-0.10.1.0

Java VM OpenJDK 64-Bit Server Java SE 8.0 65

JDK 1.8.0_111

Table 6-2 Dataset Information Dataset Count Size

HT Sensor Data 1000000 190 MB Twin Gas Sensor Data 2800000 2500 MB

Figure 6-5: Mapping time for twin gas sensor dataset

175

This complete architecture is implemented in a Kafka cluster, which gives a

distributed real-time streaming platform as shown in Figure 6-3. We input the

datasets (from Table 6-2) at producer’s API. The architecture uses three APIs i.e.

Producer API, Consumer API, Stream API, and Connector API. The Producer

API inputs datasets as a stream to a Kafka topic, the Connector API builds and runs

reusable producers and consumers to connect predefined Kafka topics of data

streams. We kept the consumer and stream API in our implementation environment

without any modification. We modified the broker to associate the mapping of data

instances with three levels (groups) of consumers. The broker maps the incoming

data streams (instances) with an appropriate level of consumer group. The data

mapping uses a queue and follows first-in-first-out (FIFO) model. In our

implementation architecture, the zookeeper works as a controller without any force

of modification in it.

6.5.2 Results Discussion

We have tested the proposed access control mechanism over big data streams in

the Kafka cluster system as described in the last subsection. The experiment was

conducted using two datasets i.e. HT sensor dataset and twin gas sensor dataset from

Table 6-2. These datasets map to static lattices from our model description to get the

performance. First we tested the performance of our proposed information flow

control mechanism using the HT sensor dataset. The performance is in terms of

mapping time or time taken to assign data instances to specific level of user and/or

query processor based on its FV. This is also termed as latency. The mapping time

using HT Sensor datasets is shown in Figure 6-4. This figure shows the data

mapping time with one million data instances, where we found that our static

information flow control model (three levels of data) takes around 70 seconds for

one million data instances. In the same figure, we also calculated the time taken for

single level, where we need around 55 seconds to map one million data instances. In

the same way we also evaluated the twin gas sensor dataset, which is around three

times larger than the HT sensor dataset. The results of the twin gas sensor dataset are

shown in Figure 6-5. From this figure, we found that around 200 seconds are

176

required to process around three million data instances for three-level data

sensitivity. From the same figure we also get that 150 seconds are needed for a

single level of data. From the above two experiments, we conclude that our three

levels of data do not take more time compared to the single level of data. So the

proposed method can be applicable for big sensing data streams.

6.6 Summary

This chapter proposed an information flow control method to control the access

over big sensing data streams, which aims to provide a real-time processing

infrastructure. The model has been designed based on a static lattice model to make

faster processing and to deal with high volume and velocity of big sensing data

streams. A static lattice is initialised for a source sensor with three levels of data

sensitivity and in the same way a static lattice for users with three levels of data

access. This lattice comparison works just after security verification of data packets

at DSM. This access control mechanism protects against information leakage and

unauthorised access. Several applications such as battlefield monitoring, health

monitoring need stream processing and access control over big sensing data streams.

The proposed information flow control model performs in near real time to control

unauthorised access over big sensing data streams. The performance of the proposed

information flow control model using a lattice structure for big sensing data streams

was evaluated in a real-time Kafka cluster. The performance of the proposed model

shows that it can be applied in big sensing data streams.

177

Chapter 7

Conclusion and Future Work

So far, the main modules of the proposed security solutions for big sensing data

streams have been framed in the last four chapters (from Chapter 3 to Chapter 6).

This chapter concludes the thesis by recalling the research contributions of each

chapter individually. Following that, we point out some major and promising future

work based on the contributions and those deserving further exploration. The

conclusions are drawn in Section 7.1, and the future work is presented in Section

7.2.

7.1 Conclusions

Big data streams analytics and its impact on batch processing in cloud

infrastructure have impacted significantly on current IT industries and computer

science research. Data security and access control has become one of the most

significant emerging issues to support the decision making system in the above two

trends. One of the problems is that traditional and existing data security techniques

are neither scalable nor efficient, due to the 4Vs properties of big sensing data

streams. These issues of big data streams prompt us to propose a new security

framework for big sensing data streams. In this thesis, we have proposed an end-to-

178

end and efficient security framework in terms of the real-time security verification

and employing a smaller buffer size without compromising security threats. In the

context of big data, the framework has provided a holistic conceptual foundation for

a security solution over big data streams and enables authenticated users and query

processors to access data without any kind of interruption. We have framed the big

sensing data streams format and highlighted its security issues in detail in Chapter 2.

Modules of the framework have been elaborated with proposed security solutions

from Chapter 3 to Chapter 6. We conclude these chapters as follows:

In Chapter 2, we have covered the background studies of big data streams and

cloud data centre and related security issues. As we are moving towards the

IoT, the number of smart sensing devices deployed around the world is

growing at a rapid pace. The communication medium between the source

sensing device and cloud data canter is wireless, and we know that wireless

communication medium is the most untrusted transmission medium. As

security will be a fundamental enabling factor of most IoT applications and

big data streams, mechanisms must also be designed to protect

communications enabled by such technologies. This chapter analyses existing

protocols and mechanisms to secure the IoT generated big data stream, as

well as open research issues. Along with the big data stream security survey,

we also highlighted the layers IoT security threats because in our survey we

took the source of big data stream as IoT. We analysed the existing

approaches to ensure fundamental security requirements and protect IoT

generated big data streams, together with the open challenges and strategies

for future research work in the area. We classified the big data stream security

based on the CIA triad features.

In Chapter 3, we proposed a novel authenticated key exchange scheme,

namely Dynamic Prime-Number Based Security Verification (DPBSV),

which aims to provide an efficient and faster (on-the-fly) security

verification scheme for big sensing data streams. The proposed scheme has

been designed based on a symmetric key block cipher. We update our shared

key at both source sensor and DSM independently by considering a random

prime number generation method. In the DPBSV scheme, we decrease the

179

communication and computation overhead by dynamic key initialisation at

both sensor and DSM ends, which in effect eliminates the need for rekeying

and decreases the communication overhead. We evaluated the proposed

security scheme in both theoretical analyses and experimental evaluations,

and showed that our DPBSV scheme has provided significant improvement

in processing time, required less buffer for processing and prevented

malicious attacks on authenticity, integrity, and partial confidentiality. DSM

implementation appears just before stream data processing as shown in our

main architecture diagram from Chapter 1. The proposed security

verification scheme (i.e. DPBSV) performs in near real time to synchronise

with the performance of the stream processing engine. Our main aim is not to

degrade the performance of stream processing such as Hadoop, S4, and

Spark etc.

In Chapter 4, we investigated a novel authenticated key exchange protocol,

namely Dynamic Key Length Based Security Framework (DLSeF), which

aims to provide a real-time security verification model for big sensing data

streams. DLSeF protocol is designed based on symmetric key cryptography

and dynamic key length to provide more efficient security verification of big

sensing data streams. This security model is designed by two dimensional

security i.e. not only the dynamic key but also the dynamic length of the key.

Later on, we proposed a synchronisation technique to get the key generation

properties from source neighbours. In such situations, source sensors are not

required to communicate to DSM in desynchronisation in shared key

generation. In our model, we decrease the communication and computation

overhead by performing dynamic key initialisation along with dynamic key

size at both source sensing devices and DSM, which in effect eliminates the

need for rekeying and decreases the communication overhead. The proposed

DLSeF model performs security verification in near real time to synchronise

with the performance speed of the stream processing engine. Our major

concern is not to degrade the performance of stream processing by

performing security verification in near real time. We demonstrated the

proposed DLSeF security model in both theoretical and experimental

evaluations. We showed that our DLSeF model has provided significant

180

improvement in the security processing time, and prevented malicious

attacks on authenticity, integrity and partial confidentiality.

In Chapter 5, we proposed a Selective Encryption (SEEN) method to

maintain confidentiality levels of big sensing data streams with data integrity.

In SEEN, a DSM independently maintains intrusion detection and shared key

management as the two major components. Our method has been designed

based on a symmetric key block cipher and multiple shared key use for

encryption. By employing the cryptographic function with selective

encryption, the DSM efficiently rekeys without retransmissions. SEEN uses

two different keys for encryption/decryption (i.e. strong encryption for

strong confidentiality, weak encryption for partial confidentiality and no

encryption for open access data). The rekeying process never disrupts

ongoing data streams and encryption/decryption processes. SEEN supports

source node authentication and shared key recovery without incurring

additional overhead. We evaluated the performance of SEEN by security

analyses and experimental evaluations. We found that our SEEN provides

significant improvement in processing time, buffer requirement and

protected data confidentiality and integrity from malicious attackers.

In Chapter 6, we have investigated an information flow control method to

control the access over big sensing data streams after security verification at

DSM. Our aim is to provide a real-time processing infrastructure that

synchronises with the stream processing engine. The model has been

designed based on a static lattice structure to make faster processing and to

deal with the 4Vs properties of big sensing data streams. A static lattice is

initialised for a source sensor with three levels of data sensitivity and in the

same way a static lattice is initialised for users with three levels of data

access. This lattice comparison works just after security verification of data

packets at DSM. This access control mechanism protects against information

leakage and unauthorised access. Several applications such as battlefield

monitoring and health monitoring need stream processing and access control

over big sensing data streams. The proposed information flow control model

performs in near real time to control unauthorised access over big sensing

181

data streams. The performance of the proposed information flow control

model using lattice structure for big sensing data streams was evaluated in a

real-time Kafka cluster. We conclude from the experiment that the proposed

model has shown that it can be applied in big sensing data streams, which will

not degrade performance for stream processing engines.

7.2 Future Work

Based on the roadmap of Figure 1-1 in Chapter 1, our research mainly focused on

security aspects of big sensing data streams. Based on the contributions in this thesis,

we point out several issues still worth being investigated in future in this section.

One direction is to further investigate security verification over big sensing

data streams from the perspectives of efficient and lightweight security

solution. We have investigated by proposing a symmetric key block cipher

for security verification at DSM. Our solution in this thesis is increasing

the performance by decreasing communication and computational

overhead. We compared our solution with AES technique. The foremost,

proposed security solution is to perform a comparative study of our work

with other symmetric key cryptographic techniques such as RC4, RC6. We

will further investigate new strategies to improve the efficiency of

symmetric-key encryption towards more efficient security-aware big data

streams. We are also planning to investigate using the technique to develop

a moving target defence strategy for the Internet of Things.

It is promising to extend our ideas and methods to complex and hybrid

data streams. In this thesis, we mainly investigate the security solutions for

data streams from one type of source i.e. sensors. But in the big data era,

more and more types of sources are involved and termed as sources of

data, e.g. sensors, mobile devices, cameras, social network data etc.

Privacy and security concerns exist in such data streams as well. It is a

challenging task to apply a simple security solution for different types of

data streams. It will also be a challenging task to get the synchronisation

182

properties from neighbours as stated in our previous chapters in the hybrid

sensing area. In such cases we need to apply different security solutions for

different types of big data streams or a hybrid security structure to support

hybrid big data sensing data streams.

We are also planning to further investigate the proposed selective

encryption technique to improve the efficiency of symmetric-key

encryption towards more efficient security-aware big sensing data streams.

We are also planning to introduce the access control model over big

sensing data streams, which will give access to the end user or query

processor based on the data level. Our proposed solution for information

flow control using a static lattice is to make the flow model lightweight to

deal with big sensing data streams. But there are several situations and

applications that exist to change the data sensitivity level based on time or

situation. For an example, in battlefield sensing some sensors may be send

very low sensitivity data throughout the day but suddenly change to high

sensitivity mode once they find any abnormal activity in the area. In this

situation we need to implement a dynamic lattice structure to deal with

data stream sensitivity level based on situations. And also we have to make

the solution lightweight for big sensing data streams.

With the contributions of this thesis, we are planning to further investigate

security architecture of big sensing data streams in real time cloud data

centre. We will integrate the security solution with recent advances of

stream processing engines such as Apache Storm while collecting data

from source sensors. We have already developed a testbed for data

collection and end-to-end security and information flow control to control

the access to big data streams. Finally we need to integrate all these setups

in a cloud to get the performance.

183

Bibliography

[1] S. Tsuchiya, Y. Sakamoto, Y. Tsuchimoto and V. Lee, "Big Data Processing

in Cloud Environments," FUJITSU Science and Technology Journal, vol. 48,

no. 2, pp. 159-168, 2012.

[2] “Big data: science in the petabyte era: Community cleverness Required,”

Nature, vol. 455 no. 7209, pp. 1, 2008.

[3] The Big Data Big Bang, https://en.wikipedia.org/wiki/Exabyte, accessed on

December 28, 2015.

[4] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "A Dynamic Prime Number

Based Efficient Security Mechanism for Big Sensing Data Streams." Journal

of Computer and System Sciences, vol. 83, no. 1, pp. 22- 42, 2017.

[5] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "DLSeF: A Dynamic Key

Length based Efficient Real-Time Security Verification Model for Big Data

Stream." ACM Transactions on Embedded Computing Systems, vol. 16, no. 2,

pp. 51:1-51:24, 2016.

[6] Big Data Definitions: http://dx.doi.org/10.6028/NIST.SP.1500-1

[7] Big Data Use Cases and

Requirements: http://dx.doi.org/10.6028/NIST.SP.1500-3

[8] Big Data Security and Privacy: http://dx.doi.org/10.6028/NIST.SP.1500-4

[9] Big Data Reference Architecture: http://dx.doi.org/10.6028/NIST.SP.1500-6

[10] D. Puthal and B. Sahoo, "Secure Data Collection & Critical Data

Transmission in Mobile Sink WSN: Secure and Energy efficient data

184

collection technique." LAP Lambert Academic Pubilishing: Germany, 2012.

ISBN: 978-3-659-16846-8.

[11] J. Deng, R. Han and S. Mishra. "INSENS: Intrusion-tolerant routing for

wireless sensor networks." Computer Communications, vol. 29, no. 2, pp.

216-230, 2006.

[12] M. A. Jan, P. Nanda, X. He, Z. Tan and R. P. Liu, "A robust authentication

scheme for observing resources in the internet of things environment. " In

13th International Conference on Trust, Security and Privacy in Computing

and Communications (TrustCom), pp. 205-211, 2014.

[13] D. Zissis and D. Lekkas, "Addressing cloud computing security issues,"

Future Generation Computer Systems, vol. 28, no3, pp. 583-592, 2012.

[14] C. Liu, X. Zhang, C. Yang and J. Chen, "CCBKE-Session key negotiation

for fast and secure scheduling of scientific applications in cloud computing."

Future Generation Computer Systems, vol. 29, no 5, pp. 1300-1308, 2013.

[15] M. Benantar, R. Jr and M. Rathi, "Method and system for maintaining client

server security associations in a distributed computing system." U.S. Patent

6,141,758, issued October 31, 2000.

[16] B. Kandukuri, V. Paturi and A. Rakshit, "Cloud security issues." IEEE

International Conference on Services Computing, (SCC'09), pp. 517-520,

2009.

[17] V. Borkar, M.J. Carey and C. Li, “Inside "Big Data Management": Ogres,

Onions, or Parfaits?,” In 15th International Conference on Extending

Database Technology (EDBT'12), pp. 3-14, 2012.

[18] S. Chaudhuri, "What Next?: A Half-Dozen Data Management Research

Goals for Big Data and the Cloud." In 31st Symposium on Principles of

Database Systems (PODS'12), pp. 1-4, 2012.

[19] A. Labrinidis and H. Jagadish, "Challenges and Opportunities with Big

Data." Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2032-2033,

2012.

[20] J. Granjal, E. Monteiro and J. Sá Silva, "Security for the Internet of Things:

A Survey of Existing Protocols and Open Research Issues." IEEE

Communications Surveys & Tutorials, vol. 17, no. 3, pp. 1294-1312, 2015.

185

[21] J. Dean and S. Ghemawat, "MapReduce: A Flexible Data Processing Tool,"

Communications of the ACM, vol. 53, no. 1, pp. 72-77, 2010.

[22] P. Mell and T. Grance, The Nist Definition of Cloud Computing (Version 15),

U.S. National Institute of Standards and Technology, Information

Technology Laboratory, 2009. URL: http://www.nist.gov/itl/cloud/upload/

cloud-def-v15.pdf, accessed on: 01 April, 2014.

[23] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein

and J. Widom. "STREAM: the stanford stream data manager (demonstration

description)." In ACM SIGMOD international conference on Management

of data, pp. 665-665, 2003.

[24] M. Stonebraker, U. C¸ etintemel and S.B. Zdonik, "The 8 Requirements of

Real-Time Stream Processing." SIGMOD Record, vol. 34, no. 4, pp. 42-47,

2005.

[25] D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M.

Stonebraker, N. Tatbul and S. Zdonik, "Monitoring streams: a new class of

data management applications." 28th international conference on Very Large

Data Bases, pp. 215-226, 2002.

[26] D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M.

Stonebraker, N. Tatbul and S. Zdonik, "Aurora: a new model and

architecture for data stream management." The VLDB Journal—The

International Journal on Very Large Data Bases, vol. 12, no. 2, pp. 120-139,

2003.

[27] S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. M.

Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss and M. A.

Shah, "TelegraphCQ: continuous dataflow processing." In ACM SIGMOD

international conference on Management of data, pp. 668-668, 2003

[28] B. Albert, "Mining big data in real time." Informatica (Slovenia), vol. 37, no.

1, pp. 15-20, 2013.

[29] M. Dayarathna and S. Toyotaro, "Automatic optimization of stream

programs via source program operator graph transformations." Distributed

and Parallel Databases, vol. 31, no. 4, pp. 543-599, 2013.

186

[30] D. Puthal, B. Sahoo, S. Mishra and S. Swain, "Cloud Computing Features,

Issues and Challenges: A Big Picture." In International Conference on

Computational Intelligence & Networks (CINE), pp. 116-123, 2015.

[31] H. Demirkan and D. Delen, "Leveraging the capabilities of service-oriented

decision support systems: Putting analytics and big data in cloud." Decision

Support Systems,vol. 55, no. 1, pp. 412-421, 2013.

[32] J. Lu and D. Li, "Bias correction in a small sample from big data." IEEE

Transactions on Knowledge and Data Engineering, vol. 25, no. 11, pp.

2658-2663, 2013.

[33] J. M. Tien, "Big data: unleashing information." Journal of Systems Science

and Systems Engineering, vol. 22, no.2, pp. 127-151, 2013.

[34] J. Burke, , J. McDonald and T. Austin, "Architectural support for fast

symmetric-key cryptography." ACM SIGOPS Operating Systems Review,

vol. 34, no. 5, pp. 178-189, 2000.

[35] www.cloudflare.com (accessed on: 04.08.2014)

[36] A. Boldyreva, M. Fischlin, A. Palacio and B. Warinschi, "A closer look at

PKI: Security and efficiency." In 10th international conference on Practice

and theory in public-key cryptography (PKC '07), pp. 458-475, 2007.

[37] K. Park, S. Lim and K. Park, "Computationally efficient PKI-based single

sign-on protocol, PKASSO for mobile devices." IEEE Transactions

on Computers, vol. 57, no. 6, pp. 821-834, 2008.

[38] PUB, NIST FIPS, 197: Advanced encryption standard (AES), Federal

Information Processing Standards Publication 197 (2001): 441-0311.

[39] S. Heron, "Advanced Encryption Standard (AES)." Network Security, vol.

2009, no. 12, pp. 8-12, 2009.

[40] J. Aemen and V. Rijmen, "The design of Rijndael: AES-the advanced

encryption standard." Springer Springer Science & Business Media, 2013.

[41] Hu, Han, Y. O. N. G. G. A. N. G. Wen, T. Chua and X. U. E. L. O. N. G. Li,

"Towards Scalable Systems for Big Data Analytics: A Technology Tutorial."

IEEE Access, Vol. 2, pp. 652 – 687, 2014.

[42] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "A Dynamic Key Length based

Approach for Real-Time Security Verification of Big Sensing Data Stream."

187

in 16th International Conference on Web Information System Engineering ,

pp. 93-108, 2015.

[43] D. Puthal, S. Nepal, R. Ranjan and J. Chen, "DPBSV- An Efficient and

Secure Scheme for Big Sensing Data Stream." in 14th IEEE International

Conference on Trust, Security and Privacy in Computing and

Communications (IEEE TrustCom-15), pp. 246-253, 2015.

[44] C. A. Ardagna, R. Asal, E. Damiani and Q. Hieu Vu, "From security to

assurance in the cloud: a survey." ACM Computing Surveys (CSUR), vol. 48,

no. 1, 2015..

[45] Z. Xiao and Y. Xiao, "Security and privacy in cloud computing." IEEE


[46] S. Subashini and V. Kavitha, "A survey on security issues in service delivery

models of cloud computing." Journal of network and computer

applications, vol. 34, no. 1, pp. 1-11, 2011.

[47] C. Modi, D. Patel, B. Borisaniya, A. Patel and M. Rajarajan, "A survey on

security issues and solutions at different layers of Cloud computing." The

Journal of Supercomputing, vol. 63, no. 2, pp. 561-592, 2013.

[48] C. Rong, S. T. Nguyen and M. G. Jaatun, "Beyond lightning: A survey on

security challenges in cloud computing." Computers & Electrical

Engineering, vol. 39, no. 1, pp.47-54, 2013.

[49] W. Huang, A. Ganjali, B. H. Kim, S. Oh and D. Lie, "The State of Public

Infrastructure-as-a-Service Cloud Security." ACM Computing Surveys

(CSUR), vol. 47, no. 4, 2015.

[50] C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel and M. Rajarajan, "A

survey of intrusion detection techniques in cloud." Journal of Network and

Computer Applications, vol. 36, no. 1, pp. 42-57, 2013.

[51] Niroshinie Fernando, Seng W. Loke and Wenny Rahayu, "Mobile cloud

computing: A survey." Future Generation Computer Systems, vol. 29, no. 1,

pp. 84-106, 2013.

[52] X. Chen, K. Makki, K. Yen and N. Pissinou, "Sensor network security: a

survey." IEEE Communications Surveys & Tutorials, vol. 11, no. 2, pp. 52-

73, 2009.

188

[53] Y. Zhou, Y. Fang and Y. Zhang, "Securing wireless sensor networks: a

survey." IEEE Communications Surveys & Tutorials, vol. 10, no. 3, pp. 6-28,

2008.

[54] T. Winkler and B. Rinner, "Security and privacy protection in visual sensor

networks: A survey." ACM Computing Surveys (CSUR), vol. 47, no. 1, 2014.

[55] D. Djenouri, L. Khelladi and N. Badache, "A survey of security issues in

mobile ad hoc networks." IEEE communications surveys, vol. 7, no. 4, pp. 2-

28, 2005.

[56] L. Abusalah, A. Khokhar and M. Guizani, "A survey of secure mobile ad hoc

routing protocols." IEEE Communications Surveys & Tutorials, vol. 10, no.

4, 78-93, 2008.

[57] D. Chopra, H. Schulzrinne, E. Marocco and E. Ivov, "Peer-to-peer overlays

for real-time communication: security issues and solutions." IEEE


[58] E. G. AbdAllah, H. S. Hassanein and M. Zulkernine, "A Survey of Security

Attacks in Information-Centric Networking." IEEE Communications Surveys

& Tutorials, vol. 17, no. 3, pp. 1441-1454, 2015.

[59] J. Cao, M. Ma, H. Li, Y. Zhang and Z. Luo, "A survey on security aspects

for LTE and LTE-A networks." IEEE Communications Surveys & Tutorials,

vol. 16, no. 1, pp. 283-302, 2014.

[60] C. M. Medaglia and A. Serbanati, "An overview of privacy and security

issues in the internet of things." In The Internet of Things, pp. 389-395.

Springer New York, 2010.

[61] R. Weber, "Internet of Things–New security and privacy challenges."

Computer Law & Security Review, vol. 26, no. 1, pp. 23-30, 2010.

[62] Kai Zhao and Lina Ge, "A survey on the internet of things security." In 9th

International Conference on Computational Intelligence and Security (CIS),

pp. 663-667. IEEE, 2013.

[63] J. Sun and C. Chen, "Initial Study on IOT Security." Communications

Technology, vol. 7, 2012.

[64] H. Kopetz, "Internet of things." In Real-time systems, pp. 307-323. Springer

US, 2011.

189

[65] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari and M. Ayyash,

"Internet of things: A survey on enabling technologies, protocols, and

applications." Communications Surveys & Tutorials, vol. 17, no. 4, pp.

2347-2376, 2015.

[66] S. Li, L. Xu and S. Zhao, "The internet of things: a survey." Information

Systems Frontiers, vol. 17, no. 2, pp. 243-259, 2015.

[67] L. Xu, W. He and S. Li, "Internet of things in industries: a survey." IEEE

Transactions on Industrial Informatics, vol. 10, no. 4, pp. 2233-2243, 2014.

[68] E. Ilie-Zudor, Z. Kemény, F. Blommestein, L. Monostori and A. Meulen, "A

survey of applications and requirements of unique identification systems and

RFID techniques."Computers in Industry, vol. 62, no. 3, pp. 227-252, 2011.

[69] Y. Wang, G. Attebury and B. Ramamurthy, "A survey of security issues in

wireless sensor networks." IEEE Communications Surveys & Tutorials, vol.

8, no. 2, pp. 2-23, 2006.

[70] IEEE Standard for Local and Metropolitan Area Networks—Part 15.4: Low-

Rate Wireless Personal Area Networks (LR-WPANs) Amendment 1: MAC

Sublayer, IEEE Std. 802.15.4e-2012 (Amendment to IEEE Std. 802.15.4-

2011), (2011) 1-225, 2012.

[71] P. Thubert, "Objective function zero for the routing protocol for low-power

and lossy networks (RPL)." (2012). RFC 6550.

[72] C. Bormann, A. P. Castellani and Z. Shelby, "Coap: An application protocol

for billions of tiny internet nodes." IEEE Internet Computing, vol. 16, no. 2,

2012.

[73] T. Zheng, A. Ayadi and X. Jiang, "TCP over 6LoWPAN for industrial

applications: An experimental study." In 4th IFIP International Conference

on New Technologies, Mobility and Security (NTMS), pp. 1-4. IEEE, 2011.

[74] D. Conzon, T. Bolognesi, P. Brizzi, A. Lotito, R. Tomasi and M. A. Spirito,

"The virtus middleware: An xmpp based architecture for secure iot

communications." In 21st International Conference on Computer

Communications and Networks (ICCCN), pp. 1-6. IEEE, 2012.

[75] S. Sicari, A. Rizzardi, L. Grieco and A. Coen-Porisini, "Security, privacy and

trust in Internet of Things: The road ahead." Computer Networks, vol. 76, pp.

146-164, 2015.

190

[76] S. Bandyopadhyay, M. Sengupta, S. Maiti and S. Dutta, "A survey of

middleware for internet of things." In Recent Trends in Wireless and Mobile

Networks, pp. 288-296. Springer Berlin Heidelberg, 2011.

[77] A. Gómez-Goiri, P. Orduña, J. Diego and D. López-de-Ipiña, "Otsopack:

lightweight semantic framework for interoperable ambient intelligence

applications." Computers in Human Behavior, vol. 30, pp. 460-467, 2014.

[78] M. Isa, N. Mohamed, H. Hashim, S. Adnan, J. A. Manan and R. Mahmod,

"A lightweight and secure TFTP protocol for smart environment." In IEEE

Symposium on Computer Applications and Industrial Electronics (ISCAIE),

pp. 302-306. IEEE, 2012.

[79] C. Liu, B. Yang and T. Liu, "Efficient naming, addressing and profile

services in Internet-of-Things sensory environments." Ad Hoc Networks, vol.

18, pp. 85-101, 2014.

[80] G. Colistra, V. Pilloni and L. Atzori, "The problem of task allocation in the

internet of things and the consensus-based approach." Computer Networks,

vol. 73, pp. 98-111, 2014.

[81] G. Fox, H. Gadgil, S. Pallickara, M. Pierce, R. L. Grossman, Y. Gu, D.

Hanley and X. Hong, "High performance data streaming in service

architecture." Technical Report, Indiana University and University of Illinois

at Chicago, 2004.

[82] R. V. Nehme, H. Lim, E. Bertino and E. Rundensteiner, "StreamShield: a

stream-centric approach towards security and privacy in data stream

environments." In ACM SIGMOD International Conference on Management

of data, pp. 1027-1030. ACM, 2009.

[83] P. Chen, X. Wang, Y. Wu, J. Su and H. Zhou, "POSTER: iPKI: Identity-

based Private Key Infrastructure for Securing BGP Protocol." In 22nd ACM

SIGSAC Conference on Computer and Communications Security, pp. 1632-

1634. ACM, 2015.

[84] S. Laury and S. Wallace, "Confidentiality and taxpayer compliance."

National Tax Journal, pp. 427-438, 2005.

[85] G. Bella and L. Paulson, "Kerberos version IV: Inductive analysis of the

secrecy goals." In Computer Security—ESORICS 98, pp. 361-375. Springer

Berlin Heidelberg, 1998.

191

[86] Dr G. Padmavathi and Mrs Shanmugapriya, "A survey of attacks, security

mechanisms and challenges in wireless sensor networks." International

Journal of Computer Science and Information Security, vol. 4, no. 1&2, pp.

1-9, 2009.

[87] W. Lou, W. Liu and Y. Fang, "SPREAD: Enhancing data confidentiality in

mobile ad hoc networks." In INFOCOM 2004. Twenty-third AnnualJoint

Conference of the IEEE Computer and Communications Societies, vol. 4, pp.

2404-2413. IEEE, 2004.

[88] Y. Jian, S. Chen, Z. Zhang and L. Zhang, "Protecting receiver-location

privacy in wireless sensor networks." In INFOCOM 2007. 26th IEEE

International Conference on Computer Communications, pp. 1955-1963.

IEEE, 2007.

[89] Y. Zhang, W. Liu, W. Lou and Y. Fang, "MASK: anonymous on-demand

routing in mobile ad hoc networks." IEEE Transactions on Wireless

Communications, vol. 5, no. 9, pp. 2376-2385, 2006.

[90] M. Saini, P. Atrey, S. Mehrotra and M. Kankanhalli, "Adaptive

transformation for robust privacy protection in video surveillance." Advances

in Multimedia, 2012.

[91] A. Perrig, R. Szewczyk, J. Tygar, V. Wen and D. E. Culler, "SPINS:

Security protocols for sensor networks." Wireless networks, vol. 8, no. 5, pp.

521-534, 2002.

[92] J. Luo, P. Papadimitratos and J. Hubaux, "GossiCrypt: wireless sensor

network data confidentiality against parasitic adversaries." In 5th Annual

IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc

Communications and Networks, 2008. SECON'08., pp. 441-450. IEEE, 2008.

[93] K. Chan and S. Chan, "Key management approaches to offer data

confidentiality for secure multicast." IEEE Network, vol. 17, no. 5, pp. 30-

39, 2003.

[94] S. Jiang, N. Vaidya and W. Zhao, "Preventing traffic analysis in packet radio

networks." In DARPA Information Survivability Conference &

Exposition II, 2001. DISCEX'01. Proceedings, vol. 2, pp. 153-158. IEEE,

2001.

192

[95] C. Karlof and D. Wagner, "Secure routing in wireless sensor networks:

Attacks and countermeasures." Ad hoc networks, vol 1, no. 2, pp. 293-315,

2003.

[96] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci, "A survey on

sensor networks." IEEECommunications magazine, vol. 40, no. 8, pp. 102-

114, 2002.

[97] J. Newsome, E. Shi, D. Song and A. Perrig, "The sybil attack in sensor

networks: analysis & defenses." In 3rd international symposium on

Information processing in sensor networks, pp. 259-268. ACM, 2004.

[98] A. Laszka, M. Felegyhazi and L. Buttyan, "A Survey of Interdependent

Information Security Games." ACM Computing Surveys (CSUR), vol. 47, no.

2, 2014.

[99] A. Boukerche and D. Turgut, "Secure time synchronization protocols for

wireless sensor networks." IEEE Wireless Communications, vol. 14, no. 5,

pp. 64-69, 2007.

[100] X. Du and H. Chen, "Security in wireless sensor networks." IEEE Wireless

Communications, vol. 15, no. 4, pp. 60-66, 2008.

[101] M. Tubaishat, J. Yin, B. Panja and S. Madria, "A secure hierarchical model

for sensor network." ACM Sigmod Record , vol. 33, no. 1, pp. 7-13, 2004.

[102] H. Deng, W. Li and D. Agrawal, "Routing security in wireless ad hoc

networks." IEEE Communications Magazine, vol. 40, no. 10, pp. 70-75,

2002.

[103] S. Demurjian, H. Wang and L. Yan, "Implementation of Mandatory Access

Control in Role-based Security System with Oracle Snapshot Skill." (2001).

[104] S. Osborn, R. Sandhu and Q. Munawer, "Configuring role-based access

control to enforce mandatory and discretionary access control policies."

ACM Transactions on Information and System Security (TISSEC), vol. 3, no.

2, pp. 85–106, 2000.

[105] E. Bertino, P. Bonatti and E. Ferrari, "TRBAC: A temporal role-based access

control model." ACM Transactions on Information and System Security

(TISSEC), vol. 4, no. 3, pp. 191-233, 2001.

[106] G. Brose, "Access control management in distributed object systems." PhD

diss., Freie Universität Berlin, 2001.

193

[107] S. Oh and S. Park, "Task–role-based access control model." Information

systems, vol. 28, no. 6, pp 533-562, 2003.

[108] M. I. Sarfraz, M. Nabeel, J. Cao and E. Bertino, "DBMask: fine-grained

access control on encrypted relational databases." In 5th ACM Conference on

Data and Application Security and Privacy, pp. 1-11. ACM, 2015.

[109] A. Gupta, M. Kirkpatrick and E. Bertino, "A formal proximity model for

RBAC systems." Computers & Security, vol. 41, pp. 52-67, 2014.

[110] M. Nabeel and E. Bertino, "Fine-grained encryption-based access control for

big data." In 13th Annual Information Security Symposium, p. 3. CERIAS-

Purdue University, 2012.

[111] A. Kamra and E. Bertino, "Privilege states based access control for fine-

grained intrusion response." In Recent Advances in Intrusion Detection, pp.

402-421. Springer Berlin Heidelberg, 2010.

[112] Q. Ni, E. Bertino and J. Lobo, "Risk-based access control systems built on

fuzzy inferences." In 5th ACM Symposium on Information, Computer and

Communications Security, pp. 250-260. ACM, 2010.

[113] E. Bertino, C. Bettini and P. Samarati, "A discretionary access control model

with temporal authorizations." In workshop on New security paradigms, pp.

102-107. IEEE Computer Society Press, 1994.

[114] R. Sandhu, "Lattice-based enforcement of chinese walls." Computers &

Security, vol. 11, no. 8, pp. 753-763, 192.

[115] B. Carminati, E. Ferrari, J. Cao and K. Tan, "A framework to enforce access

control over data streams." ACM Transactions on Information and System

Security (TISSEC), vol. 13, no. 3, 2010.

[116] J. Deng, R. Han and S. Mishra, "INSENS: Intrusion-tolerant routing for

wireless sensor networks." Computer Communications, vol. 29, no. 2, pp.

216-230, 2006.

[117] I. Kaddoura and S. Abdul-Nabi, "On formula to compute primes and the nth

prime." Applied Mathematical science, vol. 6, no.76, pp.3751-3757, 2012.

[118] Contiki operating system official website, http://www.contiki-os.org/

[119] Scyther, [Online] http://www.cs.ox.ac.uk/people/cas.cremers/scyther/

194

[120] M. Pistoia, N. Nagaratnam, L. Koved and A. Nadalin, "Enterprise Java 2

Security: Building Secure and Robust J2EE Applications." Addison Wesley

Longman Publishing Co., Inc., 2004.

[121] Matlab, [Online] http://au.mathworks.com/products/matlab/

[122] N. Penchalaiah and R. Seshadri, "Effective Comparison and evaluation of

DES and Rijndael Algorithm (AES)." International Journal of Computer

Science and Engineering, vol. 2, no. 5, pp. 1641-1645, 2010.

[123] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh and A.

Byers. Big data: The next frontier for innovation, competition, and

productivity, 2011

[124] M. Bahrami and M. Singhal, "The Role of Cloud Computing Architecture in

Big Data." In Information Granularity, Big Data, and Computational

Intelligence, Springer International Publishing, pp. 275-295, 2015.

[125] A. McAfee, E. Brynjolfsson, T. Davenport, D. Patil and D. Barton, "Big

data." The management revolution. Harvard Bus Rev, vol. 90, no. 10, pp. 61-

67, 2012.

[126] A. Deshpande, Z. Ives and V. Raman, "Adaptive query processing."

Foundations and Trends in Databases, vol. 1, no. 1, pp. 1-140, 2007.

[127] T. Sutherland, B. Liu, M. Jbantova and E. Rundensteiner, "D-cape:

distributed and self-tuned continuous query processing." In 14th ACM

international conference on Information and knowledge management, ACM,

pp. 217-218, 2005.

[128] R. Ranjan, "Streaming Big Data Processing in Datacenter Clouds." IEEE

Cloud Computing. Vol. 1, no. 1, pp. 78-83, 2014.

[129] J. Walters, Z. Liang, W. Shi and V. Chaudhary, "Wireless sensor network

security: A survey." Security in distributed, grid, mobile, and pervasive

computing. Vol. 2007, no. 1, 2007.

[130] D. Carman, P. Kruus and B. Matt, "Constraints and approaches for

distributed sensor network security." Technical Report 00-010, NAI Labs,

Network Associates, Inc., Glenwood, MD, 2000.

[131] L. Eschenauer and V. Gligor, "A key-management scheme for distributed

sensor networks." In 9th ACM conference on Computer and communications

security, ACM, pp. 41-47, 2002.

195

[132] J. Daemen and V. Rijmen, "AES the advanced encryption standard." In The

Design of Rijndael, 2002.

[133] K. Akkaya and M. Younis, "An energy-aware QoS routing protocol for

wireless sensor networks." In 23rd International Conference on

Proceedings Distributed Computing Systems Workshops, pp. 710-715, 2003.

[134] S. Nepal, J. Zic, D. Liu and J. Jang, "A mobile and portable trusted

computing platform." In EURASIP Journal on Wireless Communications

and Networking, vol. 1, pp. 1-19, 2011.

[135] J. Kulik, W. Heinzelman and H. Balakrishnan, "Negotiation-based protocols

for disseminating information in wireless sensor networks." In Wireless

networks, vol. 8. No. 2/3, pp. 169-185, 2002.

[136] H-S. Lim, Y-S. Moon and E. Bertino, "Provenance-based trustworthiness

assessment in sensor networks." In Seventh International Workshop on Data

Management for Sensor Networks, pp. 2-7, 2010.

[137] S. Sultana, G. Ghinita, E. Bertino and M. Shehab, "A lightweight secure

provenance scheme for wireless sensor networks." In 18th International

Conference on Parallel and Distributed Systems (ICPADS), pp. 101-108,

2012.

[138] G. Selimis, L. Huang, F. Massé, I. Tsekoura, M. Ashouei, F. Catthoor, J.

Huisken, J. Stuyt, G. Dolmans, J. Penders, and H. De Groot. "A lightweight

security scheme for wireless body area networks: design, energy evaluation

and proposed microprocessor design." Journal of medical systems, vol. 35,

no. 5, pp. 1289-1298, 2011.

[139] V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente and P.

Valduriez, "Streamcloud: An elastic and scalable data streaming

system." IEEE Transactions on Parallel and Distributed Systems, vol. 23, no.

12, pp. 2351-2365, 2012.

[140] R. A. Shaikh, S. Lee, M. AU Khan and Y. J. Song, "LSec: lightweight

security protocol for distributed wireless sensor network." In IFIP

International Conference on Personal Wireless Communications, pp. 367-

377. Springer Berlin Heidelberg, 2006.

[141] N. Tsikoudis, A. Papadogiannakis and E. P. Markatos, "LEoNIDS: a Low-

latency and Energy-efficient Network-level Intrusion Detection

196

System." IEEE Transactions on Emerging Topics in Computing, vol. 4, no. 1,

pp.142-155, 2016.

[142] M. Roesch,"Snort: Lightweight Intrusion Detection for Networks." LISA, vol.

99, no. 1, pp. 229-238. 1999.

[143] W. Lee and S. J. Stolfo, "Data Mining Approaches for Intrusion Detection."

In Usenix security. 1998.

[144] Y. Xie, D. Feng, Z. Tan and J. Zhou, "Unifying intrusion detection and

forensic analysis via provenance awareness." Future Generation Computer

Systems, vol. 61, pp.26-36, 2016.

[145] A. S. Wander, N. Gura, H. Eberle, V. Gupta and S. C. Shantz, "Energy

analysis of public-key cryptography for wireless sensor networks." In Third

IEEE international conference on pervasive computing and communications,

pp. 324-328, 2005.

[146] T. Park and K. G. Shin, "LiSP: A lightweight security protocol for wireless

sensor networks." ACM Transactions on Embedded Computing Systems

(TECS), vol. 3, no. 3, pp. 634-660, 2004.

[147] A. Bogdanov, L. Knudsen, G. Leander, C. Paar, A. Poschmann, M. Robshaw,

Y. Seurin, and C. Vikkelsoe, "PRESENT: An ultra-lightweight block

cipher." In International Workshop on Cryptographic Hardware and

Embedded Systems, pp. 450-466, 2007.

[148] T. A. Zia and A. Y. Zomaya, "A Lightweight Security Framework for

Wireless Sensor Networks." JoWUA, vol. 2, no. 3, pp. 53-73, 2011.

[149] K. Van Laerhoven, "Combining the self-organizing map and k-means

clustering for on-line classification of sensor data." In International

Conference on Artificial Neural Networks, pp. 464-469. Springer Berlin

Heidelberg, 2001.

[150] P. Ferreira and P. Alves, Distributed context-aware systems. Springer, 2014.

DOI 10.1007/978-3-319-04882-6

[151] D. Ganesan, D. Estrin and J. Heidemann, "DIMENSIONS: Why do we need

a new data handling architecture for sensor networks?." ACM SIGCOMM

Computer Communication Review, vol. 33, no. 1. pp. 143-148, 2003.

197

[152] R. M. Metcalfe and D. R. Boggs, "Ethernet: distributed packet switching for

local computer networks." Communications of the ACM, vol. 19, no. 7, pp.

395-404, 1976.

[153] Crypto++ Benchmarks. Available:

http://www.cryptopp.com/benchmarks.html (accessed on: 30.07.2016)

[154] B. Carminati, E. Ferrari and K. Tan, "Specifying access control policies on

data streams." In 12th International Conference on Database Systems for

Advanced Applications (DASFAA ’07). Springer, Berlin, pp. 410–421, 2007.

[155] H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, M. Cherniack, C.

Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, N. and R. Tibbetts,

"Retrospective on aurora." The VLDB Journal, vol. 13, no. 4, pp. 370-383,

2004.

[156] R. Adaikkalavan and T. Perez, "Secure shared continuous query processing."

In ACM Symposium on Applied Computing, pp. 1000-1005, 2011.

[157] R. Adaikkalavan, X. Xie and I. Ray, "Multilevel secure data stream

processing: Architecture and implementation." Journal of Computer

Security, vol. 20, no. 5, pp. 547-581, 2012.

[158] J. Cao, B. Carminati, E. Ferrari and K.-L. Tan, "Acstream: Enforcing access

control over data streams." In IEEE 25th International Conference on Data

Engineering, pp. 1495-1498, 2009.

[159] W. Lindner and J. Meier, "Securing the borealis data stream engine." In 10th

International Database Engineering and Applications Symposium

(IDEAS'06), pp. 137-147, 2006.

[160] X. Xie, I. Ray, R. Adaikkalavan and R. Gamble, "Information flow control

for stream processing in clouds." In 18th ACM symposium on Access control

models and technologies, pp. 89-100. ACM, 2013.

[161] R. V. Nehme, E. A. Rundensteiner and E. Bertino, "A security punctuation

framework for enforcing access control on streaming data." In IEEE 24th

International Conference on Data Engineering (ICDE), pp. 406-415, 2008.

[162] D. F. C. Brewer and M. J. Nash, "The Chinese Wall Security Policy." In

IEEE Symposium on Security and Privacy (S & P), pp. 206–214, 1989.

[163] D. E. Bell and L. J. LaPadula, "Secure Computer System: Unified Exposition

and MULTICS Interpretation." Technical Report MTR-2997 Rev. 1 and

198

ESD-TR-75-306, rev. 1, The MITRE Corporation, Bedford, MA 01730,

1976.

[164] I. Ray, S. Madria, and M. Linderman, "Query Plan Execution in a

Heterogeneous Stream Management System for Situational Awareness."

In IEEE 31st Symposium on Reliable Distributed Systems (SRDS), 2012, pp.

424-429. IEEE, 2012.

[165] S. Chakravarthy, and Q. Jiang, "Stream data processing: a quality of service

perspective: modeling, scheduling, load shedding, and complex event

processing." Springer Science & Business Media, Vol. 36, 2009.

[166] UCI Machine Learning Repository, [Online]. Available:

http://archive.ics.uci.edu/ml/datasets/

[167] D. Puthal, S. Nepal, R. Ranjan and J. Chen, " A Secure Big Data Stream

Analytics Framework for Disaster Management on the Cloud." in 18th IEEE

International Conferences on High Performance Computing and

Communications (HPCC 2016), pp. 1218-1225, 2016.

towards efficient and lightweight security architecture ... · towards efficient and lightweight...

Documents