big data security challenges: hadoop perspective · storage and processing systems security, which...

18
Big Data Security challenges: Hadoop Perspective Gayatri Kapil 1 , Alka Agrawal 2 , Raees Ahmad Khan 3 1,2,3 Department of Information Technology Babasaheb Bhimrao Ambedkar University (A Central University), Lucknow, India. 1 [email protected] 2 alka [email protected] 3 [email protected] July 29, 2018 Abstract With the exponential growth of big data, it has become increasingly vulnerable and has been exposed to malicious attacks. These attacks can damage the essential qualities of privacy, integrity and availability of information systems. In order to deal with these malicious intentions, it is necessary to develop effective security mechanisms. This paper first describes Hadoop and its components and its current secu- rity mechanism, and then analyzes security problems and its risks. In addition, some important aspects of big data Hadoopsecurity and privacy have been proposed to increase your tract and safety and, ultimately, based on previous details, Hadoop security Challenges concludes. Key Words :Hadoop, MapReduce, HDFS, Hadoop Com- ponents, Hadoop Security and Data Encryption and HDFS Encryption. 1 International Journal of Pure and Applied Mathematics Volume 120 No. 6 2018, 11767-11784 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ 11767

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

Big Data Security challenges: HadoopPerspective

Gayatri Kapil1, Alka Agrawal2,Raees Ahmad Khan3

1,2,3Department of Information TechnologyBabasaheb Bhimrao Ambedkar University(A Central University), Lucknow, India.

[email protected] [email protected]@yahoo.com

July 29, 2018

Abstract

With the exponential growth of big data, it has becomeincreasingly vulnerable and has been exposed to maliciousattacks. These attacks can damage the essential qualities ofprivacy, integrity and availability of information systems. Inorder to deal with these malicious intentions, it is necessaryto develop effective security mechanisms. This paper firstdescribes Hadoop and its components and its current secu-rity mechanism, and then analyzes security problems andits risks. In addition, some important aspects of big dataHadoopsecurity and privacy have been proposed to increaseyour tract and safety and, ultimately, based on previousdetails, Hadoop security Challenges concludes.

Key Words:Hadoop, MapReduce, HDFS, Hadoop Com-ponents, Hadoop Security and Data Encryption and HDFSEncryption.

1

International Journal of Pure and Applied MathematicsVolume 120 No. 6 2018, 11767-11784ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

11767

Page 2: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

1 Introduction

A challenge that is gradually coming out in developing big datatechnology is to initiate new business opportunities for all eminentindustries. Nowadays, where almost everything is becoming digitalwhich is the reason for this era to be digitalized but a major con-cern of IT industries is in front to keep this data securely and pro-cessing it which is being produced from various different sources.Also, many IT industries are still facing problem to convert thedata generating from different sources or unstructured data intothe usable format so that it can be processed and can be used inother applications. Hadoop has emerged as a solution to almostall big data problems. As big data is different from other data interms of volume, velocity, variety, value [1], its processing has alsobecome difficult for most of the government and business applica-tions. Because of the huge volume of big data, traditional meth-ods for managing to extract and analysing the same are not veryuseful as these may not provide the accurate result for decisionmaking etc. Therefore, its management in real time has becomea major concern for research. For using big data in a managedway, researchers and practitioners have explored various tools andtechniques. Thus, big data is a moving target and requires moreattention to capture, curate, handle and process it. Though, ini-tially, it was expected that the data was less and can be easilyhandled by RDBMS but now RDBMS tools have failed to managebig data. To overcome this, Apache software foundation has devel-oped a system tool called Apache Hadoop. It is one of the mosthighly used technologies which can handle the large volume of dataas well as provide high-speed access to the data within the currentapplication. It is used for distribution, processing and running ap-plication for a large amount of datasets. It is a Java-based tooland works as a master-slave technique to handle the large volumeof continuous data traveling at a high speed from different sourceslike events, emails, social media, external feeds, etc. [1]. Also, itis an easily available tool to store process thelarge volume of dataand provides high-speed access within the application and is usedby big industries like Google, Yahoo, Facebook, etc. [2]. About63% of various communities and organizations are using Hadoop tomanage a huge amount of structured, semi-structured and unstruc-

2

International Journal of Pure and Applied Mathematics Special Issue

11768

Page 3: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

tured data [2]. Several Enterprises and Organizations are rapidlydepending with trust and confidence on Hadoop for storing theirprevious data and processing it. But security and protection of thestored data is somehow lacking in Hadoop. This is the major lim-itation of this processor. To understand this, consider the case ofstorage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,therefore the risk of privacy and security breach arisesand securityof this data requires a strong security mechanism.For these reasons,this paper presents details about Hadoop and its components be-cause it remains the processing of necessary structures and largedata for management, and then about some exiting mechanisms toincrease security and privacy. Rest of the paper is organized asfollows: Section 2 defines the architecture of Hadoop including itscomponents and how it stores, process and manage big data.Bigdata security challenge and Existing Hadoop security mechanismsare discussed in Section 3 and Section 4 respectively. And enumer-ated the directions to be taken while using the big data Hadoopincluding security privacy measuresdiscussed in Section 4. Finallythe authors conclude their work in Section 5.

2 Hadoop: Big Processing Solution

Apache Hadoop is an open sourceplatform and has introduced anew easy way of storing and processing data. This was actuallyinfluenced by the Googles published documents which highlightedits attempt for handling the barrage of data. Consequently, it hasbecome the basic standard for storing, processing and analysingenormous amount of data which is in terabytes and petabytes [3].Even Hadoop provides the same processing services of expensivehardware in affordable, industry-standard servers which can storeprocess data without any limits. By using straightforward program-ming models, it processes the data which comes vary in Gigabytesto Petabytes produced by series of computers. Nowadays, the sce-nario has changed and data is increasing rigorously hence RDBMSis not able to perform efficiently because of the large volume ofdata. Vikram S. et al. have defined, big data in terms of its fivedimensions including Volume, Velocity, Variety, Value complex-

3

International Journal of Pure and Applied Mathematics Special Issue

11769

Page 4: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

ity and also suggested the basic idea to handle big data with thehelp of Hadoop architecture like name node, data node, edge nodeand HDFS [17]. The authors have also introduced the issues facedby different users of big data i.e. data privacy and search analy-sis which required urgent attention for the research work. In thepaper [16], the authors discussed the relation between the key com-ponents of Hadoop i.e. Map Reduce and Hadoop Distributed FileSystem. Map reduce is used for large-scale distributions whereasHadoop Distributed File System is used to store all input data andgenerate data for further applications. HDFS is further dividedinto three categories i.e. software architecture, bottleneck porta-bility limitations, and portability assumption. A typical Hadooparchitecture is shown in figure-1.

Figure-1: Hadoop Architecture

2.1 Hadoop Distributed File System (HDFS)Storage of Data

HDFS has been developed using distributed file system design. Itis highly fault tolerant and holds the significant amount of dataand provides easier access to that data. HDFS is a core componentof Hadooparchitecture used to store various input and output datafor the application. HDFS is the block-structured files system [2-3].Currently, default block size is 128 MB which was previously 64 MBand default replication factor is 3. Block size and replication factorsare configurable parameters. An individual file is a divided intothe fixed size of blocks with the following characteristics. 1)Blocksare stored in a cluster of one or many machines with enough datastorage capacity and data node manages the data storage of itssystem. 2) HDFS will be responsible for recovery of Data Nodeand distributes data across the data node in groups. 3) HDFS play

4

International Journal of Pure and Applied Mathematics Special Issue

11770

Page 5: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

an administrative role to add or remove the node from a cluster asshown in Figure-2.

Figure-2: HDFS Storage

HDFS TerminologiesName Node

Name node is the core part of Hadoop system. If name nodecrashes, the entire Hadoop system goes down. The name nodemanages the file system namespace and stores the metadata infor-mation along with the location of the data blocks.Secondary NameNode

Secondary name node is responsible for copying and merging thenamespace image and editing log. In case, if the name node crashes,the namespace image stored in secondary name node can be usedto restart the name node. Secondary name node is the backbone ofname node.Data Node

It stores the blocks of data and retrieves them. The data nodesalso report the block’s information to the name node periodically.

2.2 MapReduce- Distributed Data ProcessingFramework

Hadoop MapReduce is a Java-based system developed by Google inwhich data from the HDFS store is processed by using MapReduceprogramming paradigm [2-3]. In the MapReduce paradigm, eachjob has a user-defined map reduce phase (which is a processed ina completely parallel manner, by splitting the input data set intoindependent chunks and using those data in user consumption orextra processing). HDFS is the storage system for both entry andexit of the MapReduce job. The main components of MapReduce

5

International Journal of Pure and Applied Mathematics Special Issue

11771

Page 6: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

are described as follows: 1) Job Tracker is the master of the systemwhich manages the jobs and resources in the cluster knows as TaskTrackers. 2) Task Trackers are the slaves which are deployed oneach machine. They are responsible for running the map and re-duce tasks as instructed by the Job Tracker. 3) Job History Serveris a daemon that provides historical information about completedapplicationsas shown in Figure-3.Map Reduce Process

Figure-3: Map Reduces Process

Step1: Input data in the form of image, video, text files is con-verted into < Key1 (K1) and Value1 (V1) > which are done byinput record reader.Step2: Output (K1, V1) is again converted into Key 2 Value 2

(K2, V2) by Mapper.Step3: Second stage output i.e. K2, V2 is converted into K2 list

(Value2) with the help of shuffle and Sort techniques.Step4: Reducer takes the values of Key 2, List (Value2) and gen-

erated the output Key 3, ValueStep5: Final output is generated by Output Record Writer which

takes the output of Reducer (K 3, V3) as an input.

2.3 Other Hadoop Components

Hadoop is neither a single tool nor only a programming language.Hadoop is a software library written in Java used for processinglarge amounts of data in a distributed environment. Hadoop stan-dalone cannot provide all the services or facilities that are requiredto process big data. Its ecosystem is a set of tools which help processlarge data of size ranging from Gigabytes to Petabytes simultane-ously. Hadoop is an Apache Project which provides many facilitiessuch as Map Reduce for parallel computing, etc. However, there

6

International Journal of Pure and Applied Mathematics Special Issue

11772

Page 7: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

is much more to do if one wants to create recommendation engineover big data, to run clustering algorithm over big data, and toget the nearby real-time access using big data itself. To processesthese requirements, one has to add more and more componentsfrom Hadoop. Apache pig, Hive, HBase, HDFS, Map Reduce, Ma-hout, Oozie, Zookeeper, Sqoop, these are several components whencombined with original Hadoop help to make ecosystem much morescalable for a robust solution.

Table-1:shows the Hadoop Components

7

International Journal of Pure and Applied Mathematics Special Issue

11773

Page 8: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

3 Big Data: Hadoop Security Chal-

lenge

To achieve high quality performance in the field of availability andscalability, IT organizations are depending on Hadoop and its com-ponents. Amazon uses the same to build their product search in-dices and process their millions of sessions. Facebook is using datawarehouse, log processing and also recommendation systems [8].Hadoop and its components are used by cloud space for their cus-tomer projects. Twitter is also using the same to manage the datawhich is generated on their website daily. The New York Timesuses Video and Image Analysis in addition to these great perform-ers, IBM, Firm, LinkedIn, and the University of Freiburg [15].

Hadoop ecosystem is evolving to satisfy the needs of many or-ganizations, researchers, and Government. At present, some or-

8

International Journal of Pure and Applied Mathematics Special Issue

11774

Page 9: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

ganizations and enterprises analyze the information and locationdata collected from the various customers of different areas. Later,they organize the collected for marketing activity, so personal datacan be disclosed when analyzing the data of customers.That hascreated a new target for hackers and other cyber criminals. Thisdata, which was previously used by organizations, is extremely valu-able, subject to privacy laws and regulations. Consequently, thesecompanies have need security to secure and protect their privacy.That means, demand for data scientists and stronger security andprivacy have continue its ascent in protecting the users personalinformation.

4 Hadoop Security

Initially, at the time of creation of Hadoop, the security issuesweren’t on the top priority [23]. The only thing in the mind ofthe developers was to develop a system for distributing and paral-lel processing of huge data. To solve these problems, need to bea strong security in Hadoop for securing sensitive information [23].Later on, some mechanisms have been proposed to Hadoop clusterto secure them. Authorization, authentication, encryption, and keymanagement are available and feasible pillars for securing Hadoopcluster. Firstly, Hadoop distributions performed much of the inte-gration and setup work with central security as Active Directory orLDAP through Apache Knox Gateway [23]. It is system that pro-vides a single point of authentication and access for Hadoop service.It accesses over HTTP/HTTPs to Hadoop cluster and eliminatesSSH edges node risks. Hadoop distributions performed much of theintegration and setup work with central security as Active Direc-tory or LDAP [23]. For securing communication between variousnodes include Kerberos, Simple Authentication and Security Layer(SASL) etc. Authentication hashing techniques have been imple-mented. This system is using SHA-256 [23] hashing technique. Theuser is allowed to authenticate to name node by sending a hash func-tion. Then name node compares that hash function sent by userwith the one generated by itself.

Secondly, HDFS Encryption, HDFS offers ’transparent’ encryp-tion embedded within the Hadoop file system. This means data is

9

International Journal of Pure and Applied Mathematics Special Issue

11775

Page 10: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

encrypted as it is stored into the file system, transparently, withoutmodication to the applications that use the cluster. This is an im-portant feature to support tenant data privacy in multi-tenant clus-ters. HDFS can be used with Hadoop’s Key Management Service(KMS), or integrated with third party key management services.Researchers and practitioners have proposed various encryptionsscheme with HDFS for securing stored and transit data.

Lei Xu et al. [27] presented CL-PRE.It is a certificate lessproxy re-encryption scheme for secure data sharing with publiccloud. CL-PRE uniquely integrates identity-based public key intoproxy re-encryption. It eliminates the key escrow problem in tra-ditional identity-based encryption, and does not require the use ofcertificates to guarantee the authenticity of public keys.M. Li etal. [28] have proposed new cloud architecture, MyCloud, insteadof cryptographic solutions to support user-configure privacy protec-tion in cloud environment. Firstly, MyCloud de-privileges the cloudprovider and then it enables user configured privacy protection. Ithas also reduced the TCB size to minimize the attack surface ofthe cloud platform. S.Park and Y. Lee [26] have proposed a secureHadoop architecture by adding encryption and decryption func-tions in HDFS. Secure HDFS was implemented by adding the AESencrypt/decrypt class to CompressionCodec in Hadoop.

Yuan Tian [25] has proposed overview of big data and discussedits security issues. In addition, he has summarized certain wayswhich improve the security of big data including security harden-ing methodology with attributes relation graph, attribute selectionmethodology, content based access control model, a scalable mul-tidimensional anonymization approach. He has also proposed anintelligent security model for enhancing big data security whichis capable of real time data collection and threat analysis. Themodel detects the threat before security intrusion in the system. Anew security model for GHadoop, an extension of Hadoop MapRe-duce framework has been developed.For the protection of GHadoopfrom traditional attacks, several security methods are provided. Forinstance, Public key cryptography and SSL(Secure Socket Layer)have been used for security [23]. A cloud-oriented storage efficientdynamic access control scheme has been developed. This includescipher text based on the CP-ABE and symmetric encryption algo-rithm (such as AES) [53]. Has proposed encryption method using

10

International Journal of Pure and Applied Mathematics Special Issue

11776

Page 11: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

AES and OTP algorithms and integrated on Hadoop for improvethe performance of file during encryption and decryption.

4.1 Comparison of Exiting Approaches/ Method

Thus, security technology and other methods are always essential.Following are some potential methods and techniques used advan-tages, limitations are shown in table-2.

Table-1:shows the Hadoop Components

On the basis of the above discussion, these pillars and proposed ap-proaches/methods have some limitations like first approach, whichis easier and less expensive to implement, is also less effective. That

11

International Journal of Pure and Applied Mathematics Special Issue

11777

Page 12: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

the simple file permission and access control mechanisms are em-ployed. If the system is violated, the attacker will have access toall data.

5 Important Aspects of Big Data Se-

curity and Privacy: Hadoop Perspec-

tive

Some important aspects of security and privacy in Hadoop are men-tioned below: Hadoop is gaining popularity at enterprise level. Itis reliable and cost-effective big data storage and processing plat-form as compared with the other competitive software. But, alongwith this there are some risks associated with it. For example, riskof data leakage while it is transferred over network from Hadoopclient to data node. There are some Hadoop distributors like IBM,Cloudera and Hortonworks [29-30] that claim to be providing secu-rity to the clients data. Even if their claims are true, not everyonecan afford to use a specialized distribution. Getting informationsecurity is now a fundamental right. For a highly secure Hadoopenvironment, there should be open frameworks which are availablefor everyone. The sensitive data of enterprises is stored on cloudand all the services are accessed through Internet which means or-ganizations have to face many problems related to data leakage andsecurity.• To build an infrastructure which is cost effective and efficientlyscalable, cloud providers have to build an infrastructure that under-stands customers requirements at all levels. In order to do so theyneed share storage devices and physical resources between multipleusers. This is known as multi-tenancy. But sharing of resourcesmeans that, the resources are prone to attackers. If customer andattacker are using same physical devices than the attacker can eas-ily get access to customers data, if proper security measures are notimplemented.• Companies do not have direct control over their data [29-30], theycan never know if their data is being used by someone else or not.Since there is zero number of transparent mechanisms to monitorthe resources directly and many security issues arise automatically.

12

International Journal of Pure and Applied Mathematics Special Issue

11778

Page 13: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

• Since, customers have to share physical resources with other cus-tomers and they do not have direct control over their data, theyrely on the cloud providers using trust mechanisms as an alterna-tive to giving users transparent control over their data and cloudresources. By assuring the customers that the providers operationsare certified in compliance with organizational safeguards and stan-dards, cloud providers can build confidence over their customers.• Privacy and Security have always been two distinct domains forconcerns. Yet they are usually discussed together since security isrequired in order to provide privacy. The enterprises need to be surethat their sensitive data is not being accessed by cloud providersand that it is not being shared with some third party in return forsome money, which is a serious security threat to customers privacy.Security and privacy standards such as International Organizationfor Standardization (ISO) [29-30] have evolved, which requires ser-vice providers to comply with these regulatory standards to fullysafeguard their clients data assets. This has resulted in very protec-tive data security enforcement within enterprises including serviceproviders as well as the clients.Earlier data was safely confined in isolated clusters or data siloswhere security wasnt an issue. But after getting surrounded by anever growing ecosystem of tools and applications, Hadoop evolvedinto Big Data as-a-Service (BDaaS) and took to the cloud [29-30].While these innovations have served to democratize data and bringHadoop into the mainstream, they have also created new securityconcerns for organizations that now struggle to scale security in stepwith Hadoops rapid technological advances. Due this need to beexplored new security approaches for securing sensitive informationHadoop cluster and big data in cloud.

6 Conclusion

It can be inferred that in the research of the Hadoop security, theexplored techniques are not sufficient as volume of big data is nowgradually involving everywhere in the various fields. Thus, it needsmore privacy and security approaches and explored further to iden-tify importance of security in big data locations. Furthermore, findout the more secure and fast methods to keep data secure. And,

13

International Journal of Pure and Applied Mathematics Special Issue

11779

Page 14: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

there is a need to focus on application security rather than devicesecurity which provide reactive and proactive protection.

References

[1] Oguntimilehin A., Ademola E.O., A Review of Big Data Man-agement, Benefits and Challenges, Journal of Emerging Trendsin Computing and Information Sciences, vol. 5, pp-433437,June 2014.

[2] T. White, MapReduce and the hadoop distributed file system,in Hadoop: The definitive guide, 1st edition, O’Reilly Media,Inc., Yahoo press, 2012.

[3] D. Borthakur, The hadoop distributed file system: architec-ture and design, Hadoop Project Website [online]. Available:http://hadoop.apache.org/ core/docs/current/hdfs design.pdf

[4] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. An-thony, H. Liu, P. Wyckoff, R. Murthy, Hive A Warehous-ing Solution Over a MapReduce Framework, In Proc. of VeryLarge Data Bases, vol. 2, pp. 1626-1629, 2009.

[5] Konstantin Shvachko, HairongKuang, Sanjay Radia, RobertChansler Yahoo! Sunnyvale, The Hadoop Distributed File Sys-tem California USA, 2010 IEEE.

[6] Harshawardhan S.Bhosale, Devendra P.Gadekar, A ReviewPaper on Big Data and Hadoop, International Journal of Sci-entific and Research Publication vol. 4, 2014.

[7] Deepika P, Anantha Raman G R, A Study of Hadoop-RelatedTools and Techniques, International Journal of Advanced Re-search in Computer Science and Software Engineering, vol. 5,pp-160-164, 2015

[8] James Manyika, Michael Chui, Brad Brown, Jacques Bughin,Richard Dobbs, Charles Roxburgh, Angela Hung Byers, Bigdata: The Next Frontier for Innovation, Competition, and Pro-ductivity, McKinsey Global Institute, 2012.

14

International Journal of Pure and Applied Mathematics Special Issue

11780

Page 15: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

[9] Gang Zhao, A Query Processing Framework based on Hadoop,International Journal of Database Theory and ApplicationVol.7, pp. 261-272, 2014.

[10] Zookeeper- Apache Software Foundation project home pagehttps://zookeeper.apache.org

[11] Apache Mahout, http://mahout.apache.org.

[12] Apache Sqoop, https://sqoop.apache.org/

[13] Apache Ooie, https://oozie.apache.org/

[14] Apache Ambari, https://ambari.apache.org

[15] C.L. Philip Chen, Chun-Yang Zhang, Data-intensive applica-tions, challenges, techniques and technologies: A survey on BigData, Information Sciences, vol. 275, pp-314-347, 2014.

[16] Jeffrey Shafer, Scott Rixner, and Alan L. Cox, The HadoopDistributed Filesystem: Balancing Portability and Perfor-mance, Performance Analysis of Systems Software (ISPASS),IEEE International Symposium, pp-122 133, 2010.

[17] S.Vikram Phaneendra E.Madhusudhan Reddy, Big Data- so-lutions for RDBMS problems- A survey, In 12th IEEE/IFIPNetwork Operations Management Symposium (NOMS 2010)(Osaka, Japan, Apr 2013).

[18] http://www.bmcsoftware.in/guides/hadoop-ecosystem.html

[19] http://www.dezyre.com/article/recap-of-hadoop-nes-for-january-2018/373

[20] Stephen Kaisler, Frank Armour, J.Alberto Espinosa and Wol-liam Money Big Data: Issues and Challenges Moving Forward,Hawaii International Conference on System Sciences 46th, pp-995-1003, 2013.

[21] Mark Troester(2013), Big Data Meets Big Data Ana-lytics, www.sas.com/resources/.../ WR46345.pdf, retrieved10/02/14.

15

International Journal of Pure and Applied Mathematics Special Issue

11781

Page 16: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

[22] Hadeer Mahmoud, Abdelfatah Hegazy, Mohamed H.Khafagy,An approach for big data security based on Hadoopdistributed file system, International Conference on Inno-vative Trends in Computer Engineering (ITCE),2018,DOI:10.1109/ITCE.2018.8316608.

[23] Masoumeh Rezaei Jam, Leili Mohammad Khahli, MohammadKazem Akbari, A Survey on Security of Hadoop, Interna-tional Conference on Computer and Knowledge Engineering(ICCKE), 2014,DOI: 10.1109/ICCKE.2014.6993455

[24] Youngho Song, Young-Sung Shin, Miyoung Jang, Jae-WooChang, Design and Implementation of HDFS Data EncryptionScheme using ARIA Algorithm on Hadoop, IEEE InternationalConference on Big Data and Smart Computing (BigComp),2017, DOI: 10.1109/BIGCOMP.2017.7881720.

[25] Yuan Tian, Towards the Development of Best Data Securityfor Big Data, Communication and Network, Scientific ResearchPublishing Inc. vol-9, pp-291-301, 2017.

[26] Seonyoung Park and Youngseok Lee Secure Hadoop with En-crypted HDFS, J.J. Park et al. (Eds.): GPC 2013, LNCS 7861,pp. 134141, Springer, Berlin, Heidelberg.

[27] Lei. Xu, X. Wu and X. Zhang. CL-PRE: a certificatelessproxy reencryption scheme for secure data sharing with publiccloud. Proc. Of 2012 ACM Symposium on Information, Com-puter and Communications Security (ASIACCS12), , pp. 87-88. 2012,

[28] Min Li, Wang Zang, Kun Bai, Men Yu, and Peng Liu, My-Cloud: Supporting User-Configured Privacy Protection inCloud Computing, In Proceedings of ACM ACSAC, pp. 59-68, 2013.

[29] Securing Hadoop: Security Recommen-dation for Hadoop Environments athttps://securosis.com/assets/library/reports/Securing Hadoop Final V2.pdf

16

International Journal of Pure and Applied Mathematics Special Issue

11782

Page 17: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

[30] Raj R. Parmar, Sudipta Roy, Debnath Bhattacharyya, SamirKumar Bandyopadhyay, and Tai-Hoon Kim, Large-Scale En-cryption in the Hadoop Environment: Challenges and Solu-tions,https://ieeexplore.ieee.org/document/7922533/

17

International Journal of Pure and Applied Mathematics Special Issue

11783

Page 18: Big Data Security challenges: Hadoop Perspective · storage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,

11784