siti hanisah binti kamaruzamangreenskill.net/suhailan/fyp/report/038086.pdf · siti hanisah binti...

29
“DATA AT REST” ENCRYPTION FOR HADOOP SITI HANISAH BINTI KAMARUZAMAN BACHELOR OF COMPUTER SCIENCE (NETWORK SECURITY) UNIVERSITI SULTAN ZAINAL ABIDIN 2017

Upload: others

Post on 21-Jan-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

“DATA AT REST” ENCRYPTION FOR HADOOP

SITI HANISAH BINTI KAMARUZAMAN

BACHELOR OF COMPUTER SCIENCE

(NETWORK SECURITY)

UNIVERSITI SULTAN ZAINAL ABIDIN

2017

Page 2: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

“DATA AT REST” ENCRPTION FOR HADOOP

SITI HANISAH BINTI KAMARUZAMAN

Bachelor of Computer Science (Network Security)

Faculty of Informatics and Computing

Universiti Sultan Zainal Abidin, Terengganu, Malaysia

MAY 2017

Page 3: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

i

DECLARATION

I hereby declare that this report is based on my original work except for quotations

and citations, which have been duly acknowledged. I also declare that it has not been

previously or concurrently submitted for any other degree at Universiti Sultan Zainal

Abidin or other institutions.

________________________________

Name : ..................................................

Date : ..................................................

Page 4: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

ii

CONFIRMATION

This is to confirm that:

The research conducted and the writing of this report was under my supervision.

________________________________

Name : ..................................................

Date : ..................................................

Page 5: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

iii

DEDICATION

Firstly, and foremost praised to Allah, the most Merciful for blessing me and

giving me the opportunity to undergo and complete this final year project, “Data at

Rest” Encryption for Hadoop. Besides I would like to express my gratitude to my

supervisor, Dr. Wan Nor Shuhadah Binti Wan Nik for her guidance to complete my

final year project and give some ideas and suggestion about my project. I feel so proud

to be supervised by her because of her kindness.

Finally, I also would like to thank all my family members and to my friends

that giving me a lot of moral support to finish the project. Next, I also would like to

thank to Faculty of Informatics and Computing for the chance to expose and explore

students with this project. I would like to thank to all lecturers in Faculty of

Informatics and Computing for giving me a great support to complete the final year

project.

Page 6: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

iv

ABSTRACT

Trusted computing and security of utility is one of the most challenging topics

today and is the cloud computing's core technology that is currently the focus of

international IT universe. Even in every second, the amount of data is drastically

increased nowadays. Hadoop is an open source software framework that supports

large data sets storage and processing in a distributed computing environment and

well-known implementation of MapReduce. Hadoop is used for big data analysis.

MapReduce is one common programming model to process and handle a large amount

of big data. Hadoop Distributed File System (HDFS) is a distributed, scalable and

portable file system that written in java for Hadoop framework. However, the main

problem is the data at rest is not secure which is intruders can steal or converts our

data or information. Hadoop Distributed File System (HDFS) store the data that has

been analysed by Hadoop and the encryption method may be implemented to the data

in HDFS to securing the data.

Page 7: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

v

ABSTRAK

Pengkomputeran dipercayai dan keselamatan utiliti adalah salah satu topik

yang paling mencabar hari ini dan teknologi teras pengkomputeran awan yang pada

masa ini adalah fokus semesta IT antarabangsa. Walaupun dalam setiap saat, jumlah

data yang meningkat secara drastik pada masa kini. Hadoop adalah rangka kerja

perisian sumber terbuka yang menyokong penyimpanan set data yang besar dan

pemprosesan dalam persekitaran pengkomputeran teragih dan pelaksanaan terkenal

MapReduce. Hadoop digunakan untuk analisis data yang besar. MapReduce adalah

salah satu model pengaturcaraan biasa untuk memproses dan mengendalikan sejumlah

besar data yang besar. Sistem Fail Teragih Hadoop (HDFS) ialah, sistem fail berskala

dan mudah alih diedarkan yang ditulis dalam java untuk rangka kerja Hadoop. Walau

bagaimanapun, masalah utama adalah data yang berada dalam keadaan rehat tidak

selamat dimana penceroboh boleh mencuri atau menukarkan data atau maklumat

kami. Sistem Fail Teragih Hadoop (HDFS) menyimpan data yang telah dianalisis oleh

Hadoop dan kaedah penyulitan boleh dilaksanakan untuk data dalam HDFS untuk

menyelamatkan data.

Page 8: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

vi

CONTENTS

DECLARATION i

CONFIRMATION ii

DEDICATION iii

ABSTRACT iv

ABSTRAK v

CONTENTS vi-vii

LIST OF TABLES viii

LIST OF FIGURES Ix

LIST OF ABBREVIATIONS x

CHAPTER 1 INTRODUCTION

1.1 Introduction 1

1.2 Background 1-2

1.3 Problem Statement 2

1.4 Objective 3

1.5 Scope 3

1.6 Activities and Milestones

4

CHAPTER 2 LITERATURE REVIEW

2.1 Introduction 5

2.2 Related Project and Article 5-6

2.3 Cryptography 7

2.6 Hadoop-Based on Cloud Data 8

2.7 Summary

8

CHAPTER 3 METHODOLOGY

3.1 Introduction 9

3.2 Analysis Study 9

3.3 Methodology Review 10-13

PAGE

Page 9: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

vii

3.4 Method/Techniques 13-14

3.5 Framework of Project 14-15

3.6 System Requirement 15

3.6.1 Software Requirement 15

3.3.2 Hardware Requirement 16

3.7 Summary

16

REFERENCES 17

Page 10: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

viii

LIST OF TABLES

TABLE TITLE PAGE

1.1 First table in chapter 1 4

Page 11: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

ix

LIST OF FIGURES

FIGURE TITLE PAGE

3.3.1 First figure in chapter 3 10

3.3.2 Second figure in chapter 3 12

3.4.1 Third figure in chapter 3 13

3.5.1 Fourth figure in chapter 3 14

Page 12: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

x

LIST OF ABBREVIATIONS / TERMS / SYMBOLS

HDFS Hadoop Distributed File System

AES Advanced Encryption Standard

DEA Data Encryption Algorithm

Page 13: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

1

CHAPTER I

1.1 INTRODUCTION

The key aspect discussed in chapter 1 is includes background, problem

statement, objectives, scope, activities and milestones of the project. Big data,

Hadoop, HDFS and the importance to encrypt the data will be described in

background form. Some problems of the topic are stated in the problem statement.

Besides that, all the purposes of the project are stated in objective. Furthermore, this

chapter also discuss on the scope of the project that involved to make an encryption

for data at rest in Hadoop, and also discuss about activities and milestones during

complete this project.

1.2 BACKGROUND

Recently, trusted computing and security of utility is one of the most

challenging topics. It is also the cloud computing's core technology that is currently

the focus of international IT universe. Moreover, in every second the amount of data is

drastically increase day by day. In recent years, the faster development of the internet,

Internet of Things and Cloud Computing have led to the drastic growth of data in

almost every industry and business area [6]. The development of big data had attracted

Page 14: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

2

attention from variety field around the world. Big data can be found in three forms

which are structured, unstructured and semi-structured [6].

Besides that, Hadoop is use for big data analysis and also open source software

framework that allows for the distributed processing of big data sets across clusters of

computers using simple programming language [6]. Its support large data sets storage

and processing in a distributed computing environment and well-known

implementation of Map Reduce. Map Reduce is one common programming model to

process and handle the large amount of big data.

In addition, Hadoop Distributed File System (HDFS) is a distributed, scalable

and portable file system that written in java for Hadoop framework. Hadoop

Distributed File System (HDFS) store the data that has been analyse by Hadoop.

However, the data at rest or in motion is not secure. Intruders can steal or converts our

data or information. So, the encryption might be implemented to the data in HDFS to

securing the data. In the same way to secure confidentiality of data at rest, the

encryption method is important due to keep the data safe from any intruders.

1.3 PROBLEM STATEMENT

While the data at rest are stored in the file system, there are several problems

where the data is not secure. Intruders can steal or converts our data or information.

Moreover, an attacker who can enter the Data Centre either physically or

Page 15: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

3

electronically can steal the data they want, since the data is un-encrypted and there is

no authentication enforced for access [3].

1.4 OBJECTIVE

We have identified three main objective of the project. It can be identified as the

following:

I. To study the architecture of Hadoop.

II. To implement the encryption technique for data at rest in HDFS using AES

algorithm.

III. To test and evaluate the successfulness of AES algorithm in HDFS for data at

rest.

1.5 SCOPE

The data that was analyse by Hadoop will be stored in Hadoop Distributed File

System (HDFS). This project will encrypt the data at rest which mean the data that are

stored in HDFS. However, the encryption for data in transit is out of our scope.

Further, the encryption of data at rest only cover on data in the text form. In order to

make data encryption possible in Hadoop, the adjustment of Hadoop architecture is

needed. Thus, this project will be run on Linux.

Page 16: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

4

1.6 ACTIVITIES AND MILESTONES

TASK / WEEK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Topic Discussion and

Determination with

supervisor

Project Title Proposal -

abstract submission

Proposal Writing –

Introduction, background,

Problem Statement,

objective, scope

Proposal Writing -

Literature Review

(Research on related

project)

Proposal Writing -

Literature Review

(continued)

Proposal Progress

Presentation and

Evaluation (Presentation

1)

Discussion and Correction

of the Proposal

Proposed Solution –

Methodology(use flow

chart and AES technique

for encryption)

Proposed Solution -

Methodology (continued)-

understanding about AES

technique

Proof of concept-using

Hadoop and AES

technique

Drafting Report of

the Proposal-Chapter 1, 2

and 3

Submission Report of

the Proposal-Chapter 1, 2

and 3

Page 17: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

5

CHAPTER 2

LITERATURE REVIEW

2.1 INTRODUCTION

This chapter discuss the concept and idea from the previous research or article

that related to this project. It is important in order to understand the problem and

suggest the appropriate solution.

2.2 RELATED PROJECT AND ARTICLE

In the research paper [3], the researcher state that Hadoop is a free, Java-based

programming framework that support the processing of large data sets in a distributed

computing environment [3]. Moreover, Hadoop allows running applications on

systems with thousands of nodes with thousands of terabytes of data [3]. Furthermore,

Hadoop ecosystem consist of the Hadoop kernel, MapReduce, the Hadoop distributed

file system (HDFS) and a number of related components such as Apache Hive,

HBase, Oozie, Pig and Zookeeper [3]. Next, encryption ensures confidentiality and

privacy of user information and it secures the sensitive data in Hadoop [3]. Hadoop

Page 18: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

6

did not include basic controls for data protection and most third-party tools could not

scale along with NoSQL and so were little use to developers [4].

According to the researcher, the data at rest can be protected in two ways.

First, when file is stored in Hadoop, the complete file can be encrypted first and then

stored in Hadoop. Second, to applying encryption to data blocks once they are loaded

in Hadoop system [3]. Based on this paper, HDFS supports AES, OS level encryption

for data at rest. However, Zookeeper, Oozie, Hive, HBase, and Pig don’t offer data at

rest encryption solution but for this components encryption can be implemented via

custom encryption techniques or third party tools [3].

In the research paper [1], the researcher compared the Apache Spark and

Apache Hadoop. They concluded that Spark helps to simplify the challenging and

compute-intensive task of processing high volumes of real-time or archived data, both

structured and unstructured, seamlessly integrating relevant complex capabilities such

as machine learning and graph algorithm. Besides, Spark bring Big Data processing to

the masses to run over hundreds, thousand, or even tens of thousands of machines in a

cluster is merely a configuration change. Hence Apache Spark is not replacing to

Hadoop but it is one of the alternatives to Hadoop.

Page 19: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

7

2.3 CRYPTOGRAPHY

Cryptography is one of the principal means for protecting information security.

Encryption is the process of encoding data in such a way that only authorized users

can decode and use the data which is self-defensive and enhances data security [1].

According to NIST’s definition, information security is the practice of maintaining the

integrity, confidentiality, and availability of data from malicious access and system

failure [1].

Modern cryptosystem can be classified into symmetric cryptosystem,

asymmetric cryptosystem and digital signature [1]. For a symmetric cryptosystem, the

sender and receiver share an encryption and decryption key [1]. These two keys are

the same or easy to deduce each other [1]. The examples of symmetric cryptosystem

are DES (Data Encryption Standard) and AES (Advanced Encryption Standard).

For an asymmetric cryptosystem, the receiver processes public key and private

key [1]. The public key can be published but the private key should be kept secret [1].

The examples of asymmetric cryptosystem are RSA (Rivest Shamir Adleman) and

ECC (Elliptic Curve Cryptosystem) while for Digital signature the examples are MD5

and SHA1 [1].

Page 20: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

8

2.4 HADOOP-BASED ON CLOUD DATA

Cloud computing is an emerging and increasingly popular computing

paradigm, which provides the users massive computing, storage, and software

resources on demand [2]. With more and more cloud applications being available, data

security becomes an important issue in cloud computing. A security enhancement for

Hadoop [2], which provides strong mutual authentication by using Kerberos is

presented.

In order to ensure data security in Hadoop-based cloud data storage, a novel

triple encryption scheme is proposed and implemented, which combines HDFS files

encryption using DEA (Data Encryption Algorithm) and the data key encryption with

RSA, and then encrypts the user's RSA private key using IDEA (International Data

Encryption Algorithm) [2].

2.5 SUMMARY

This chapter provide an overview regarding the concept of application. Based

on the study that has been made it shows that literature review is one of the important

part in research or study of new. Literature review will help in determining the idea

and technique has been studied before or not. Every journal has their major point and

it can be used to relate with this project. The technique is chosen based on previous

research articles and journals. Every journal and article will be compared to decide

which better technique will be selected.

Page 21: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

9

CHAPTER 3

METHODOLOGY

3.1 INTRODUCTION

This chapter cover the detail explanations about methodology used for this

project. The methodology being use to ensure that the implementation of this project

can fulfilled the objective and make sure that the system or tool can be develop

successfully. Therefore, after considering pros and cons of several different system

models or tools available, the iterative and incremental flow chart model has been

choosing. The details about this iterative model will be explained in this chapter.

3.2 ANALYSIS STUDY

The development and testing processes of the evolutionary/iterative

methodology will be completed on each stage. This model is less costly when it comes

to changing the scopes and requirements. By using this method, it will be easier to test

and debug during small iterations.

Page 22: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

10

3.3 METHODOLOGY REVIEW

FIGURE 3.3.1

Start

1. Choose a Title of

project

2. Find the literature

review

3. Identify the problem

statement

4. Identify the objective

of the project

5. Find the scope and

limitation of work in the

project

6. Install Hadoop and study the

architecture of Hadoop.

7. Study the AES algorithm of encryption

and find the location to implement the

algorithm in Hadoop architecture.

8. Test the successfulness

of encryption

End

Page 23: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

11

Based on the diagram above, it shows the steps to complete the project. In step

1, it starts with brainstorming the idea and title of the project that have been approve

by my supervisor and Head of Department where I decided to choose the project title

“Data at Rest” Encryption for Hadoop. This open source software framework uses

single server. After choose the project title, the article or research about the project

title will de find out. Several articles will be choosing to make as the literature review.

In the step 3, the problem statement has been identifying from the reading of

article. Besides that, the objective of the project also can be justified in step 4. In step

5, regarding to the project, the scope and limitation of the project has been identifying

where the data that was analyse by Hadoop will be stored in Hadoop Distributed File

System (HDFS). This project will encrypt the data at rest which mean the data that are

stored in HDFS. Further, the encryption of data at rest only cover on data in the text

form. In order to make data encryption possible in Hadoop, the adjustment of Hadoop

architecture is needed. Thus, this project will be run on Linux.

The next step is the installation of Hadoop in Oracle VM VirtualBox and

study of Hadoop architecture. Hadoop architecture consist of the Hadoop kernel, Map

Reduce and Hadoop Distributed File System (HDFS) and a number of related

component such as Apache Hive, HBase, Oozie, Pig and Zookeeper.

Page 24: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

12

Hive

HBase

Pig

Other Project

(Avro, Zookeeper)

Map Reduce

Yarn Map Reduce

HDFS

Hadoop Framework

FIGURE 3.3.2 Architecture of Hadoop

HDFS is a highly faults tolerant distributed file system that is

responsible for storing data on the cluster while MapReduce is a powerful parallel

programming technique for distributed processing of vast amount of data on clusters.

Besides that, HBase is a column oriented distributed NoSQL database for random

read/write access. Next, Pig is a high level data programming language for analysing

data of Hadoop computation. In addition, Hive is a data warehousing application that

provides a SQL like access and relational model while Sqoop is a project for

transferring or importing data between relational databases and Hadoop. Oozie is an

orchestration and workflow management for dependent Hadoop jobs.

In step 7, the data at rest in HDFS will be encrypted by AES encryption

algorithm. The study of AES algorithm for encryption is needed to find the location to

implement the AES algorithm in Hadoop architecture. If the intruders have the correct

key to decrypt the data, the encrypted data will be decrypt into plaintext.

Page 25: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

13

Last step is test the successfulness of encryption. The testing stage of

the framework must be performed in order to detect any defect that can only be found

when you test it in the operational environment. If everything functions smoothly

without any bugs or error, the framework will be converted to final product and if still

have any bugs the process will go back to step 7.

3.4 METHOD/TECHNIQUES

FIGURE 3.4.1 Encryption and Decryption

In this project, encryption technique will be applied Encryption is the process

of encoding data in such a way that only authorized users can decode and use the data

which is self-defensive and enhances data security [1]. Its means from plaintext to

cipher text. Decryption is the process that converting cipher text back to plaintext [7].

Page 26: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

14

Symmetric encryption is used to encrypt more than a small amount of data.

During both the encryption and decryption, the process of symmetric key is used. The

key to encrypt the data must be used to decrypt a particular piece of cipher text, [7].

The goal of every encryption algorithm is to make it as difficult as possible to

decrypt the generated cipher text without using a key [7].

3.5 FRAMEWORK OF PROJECT

Encryption

FIGURE 3.5.1

Hadoop Linux/ Ubuntu

(SERVER)

HBase

Other Project (Avro,

Zookeeper)

HDFS

Hive Pig

Map Reduce

Yarn Map Reduce

Hadoop Framework

Admin

access

Component of Hadoop

install

Page 27: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

15

Based on the framework of the project above, admin will access or control the

server. Next, open source software which is Hadoop Apache will be installed in the

server. The encryption method will be implement in the component of Hadoop which

is in HDFS (Hadoop Distributed File System). The data at rest in HDFS will be

encrypted by AES encryption algorithm. AES algorithm for encryption is needed to

find the location to implement the AES algorithm in Hadoop architecture. If the

intruders have the correct key to decrypt the data, the encrypted data will be decrypt

into plaintext.

3.6 SYSTEM REQUIREMENT

The framework requirements are needed in order to complete the

system. The requirement of hardware and software are the most important part for

project to be succeeded, because the hardware and software requirement will influence

the successfulness of the project. Incomplete requirement may cause the project face a

few problems.

3.6.1 Software Requirement

Microsoft Office PowerPoint 2016

Microsoft Word 2016

Window 8.1 single language

Oracle VM VirtualBox

Page 28: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

16

3.6.2 Hardware Requirement

Laptop HP

Mouse

Printer

3.7 SUMMARY

As a conclusion, in order to produce a complete project within the time

given, the selection of suitable methodology is needed to ensure the deployment of the

project are successful. A good methodology will provide systematics steps in the

development of project and can carry out minimum error.

Page 29: SITI HANISAH BINTI KAMARUZAMANgreenskill.net/suhailan/fyp/report/038086.pdf · SITI HANISAH BINTI KAMARUZAMAN Bachelor of Computer Science (Network Security) Faculty of Informatics

17

REFERENCES

[1]https://www.researchgate.net/publication/301887194_Efficient_Hybrid_MAES_En

cryption_Algorithm_for_Mobile_Device_Data_Security_at_Rest_in_Cloud_Environ

ment

[2]. Yang, C., Lin, W., & Liu, M. (2013, September). A novel triple encryption

scheme for Hadoop-based cloud data security. In Emerging Intelligent Data and Web

Technologies (EIDWT), 2013 Fourth International Conference on (pp. 437-442).

IEEE.

[3]. Sharma, P. P., & Navdeti, C. P. (2014). Securing big data Hadoop: a review of

security issues, threats, and solution. Int. J. Comput. Sci. Inf. Technol, 5.

[4]. https://securosis.com/assets/library/reports/Securing_Hadoop_Final_V2.pdf

[5]. Padmavathi, B., & Kumari, S. R. (2013). A survey on performance analysis of

DES, AES and RSA algorithm along with LSB substitution. Int. J. Sci. Res, 2(4), 170-

174.

[6] Pol, U. R. (2016). Big Data Analysis :Comparision of Hadoop MapReduce and

Apache Spark Big Data Analysis : Comparision of Hadoop MapReduce and Apache.

International Journal of Engineering Science and Computing, 6(6), 6389–6391.

https://doi.org/10.4010/2016.1535

[7] Microsoft [Online] Available:

https://msdn.microsoft.com/enus/library/windows/desktop/aa381939(v=vs.85).asp