issn 0970-647x big data analyticscsi-india.org.in/communications/csic_april_2017.pdfrole of hadoop...

52
Knowledge Digest for IT Community www.csi-india.org ISSN 0970-647X Big Data Analytics SECURITY CORNER Enhanced Protection for Big Data using Intrusion Kill Chain and Data Science 24 ARTICLE MiDeSH: Missile Decision Support System 28 Volume No. 41 | Issue No. 1 | April 2017 ` 50/- 52 pages including cover COVER STORY Role of Hadoop in Big Data Analytics 14 TECHNICAL TRENDS Data Lake: A Next Generation Data Storage System in Big Data Analytics 19 RESEARCH FRONT Sentiment and Emotion Analysis of Tweets Regarding Demonetisation 21

Upload: others

Post on 14-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

Knowledge Digest for IT Community

ww

w.c

si-in

dia.

org

ISSN

097

0-64

7X

Big Data Analytics

SECURITY CORNEREnhanced Protection for Big Data using Intrusion Kill Chain and Data Science 24

ARTIClEMiDeSH: Missile Decision Support System 28

Volume No. 41 | Issue No. 1 | April 2017 ` 50/-

52 pages including cover

COvER STORYRole of Hadoop in Big Data Analytics 14

TECHNICAl TRENDSData Lake: A Next Generation Data Storage System in Big Data Analytics 19

RESEARCH fRONTSentiment and Emotion Analysis of TweetsRegarding Demonetisation 21

Page 2: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

www.csi-india.org 2

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Gautam Mahaptra, Vice President, CSI, Email: [email protected]

Date Event Details & Contact Information

MARCH 24-25, 2017

First International Conference on “Computational Intelligence, Communications, and Business Analytics (CICBA - 2017)” at Calcutta Business School, Kolkata, India. Contact: [email protected]; (M) 94754 13463 / (O) 033 24205209

International Conference on Computational Intelligence, Communications, and Business Analytics (CICBA - 2017) at Calcutta Business School, Kolkata, India. Contact (M) 9475413463 / (O) 03324205209, Email id : [email protected]; www.cicba-2017.in

APRIL15-16, 2017

1st International Conference on Smart Systems, Innovations & Computing (SSIC-2017) at Manipal University Jaipur, Jaipur, Rajasthan. http://www.ssic2017.comContact : Mr. Ankit Mundra, Mob.: 9667604115, [email protected]

MAY08-10, 2017

25-27, 2017

ICSE 2017 - International Conference on Soft Computing in Engineering, Organized by : JECRC, Jaipur, www.icsc2017.com Contact : Prof. K. S. Raghuwanshi, [email protected], Mobile : 9166016670

Indian Engineering Educators and Administrators Conference (IEEAC-2017) Organized by Terna Engineering College

JUNE05-30, 2017

Workshop on LAMP (Linux, Apache, My SQL, Perl/Python) , Jaypee University of Engineering and Technology, Raghogarh, Guna - MP, www.juet.ac.in Dr. Shishir Kumar ([email protected]) 9479772915

JULY20-22, 2017

IEEE International Conference on Networks & Advances in Computational Technologies (NetACT 2017), organized by CSI Trivandrum chapter http://netact17.in/ Contact : [email protected] International Conference on Networks & Advances in Computational Technologies (NetACT 2017) organized by CSI Trivandrum chapter http://netact17.in/ Contact : [email protected]

OCTOBER28-29, 2017

International conference on Data Engineering and Applications-2017 (IDEA-17) at Bhopal (M.P.),http://www.ideaconference.in Contact : [email protected]

DECEMBER21-23, 2017

Fourth International Conference on Image Information Processing (ICIIP-2017), at Jaypee University of Information Technology (JUIT), Solan, India, (http://www.juit.ac.in/iciip_2017/) Contact : Dr. P. K. Gupta ([email protected]) (O) +91-1792-239341 Prof. Vipin Tyagi ([email protected])

C S I C A L E N D A R 2 0 1 6 - 1 7

Kind Attention: Prospective Contributors of CSI Communications

Please note that Cover Theme for May 2017 issue is Nano Computing. Articles may be submitted in the categories such as: Cover Story, Research Front, Technical Trends, Security Corner and Article. Please send your contributions by 20th April, 2017.

The articles should be authored in as original text. Plagiarism is strictly prohibited.

Please note that CSI Communications is a magazine for members at large and not a research journal for publishing full-fledged research papers. Therefore, we expect articles written at the level of general audience of varied member categories. Equations and mathematical expressions within articles are not recommended and, if absolutely necessary, should be minimum. Include a brief biography of four to six lines, indicating CSI Membership no., for each author with high resolution author photograph.

Please send your article in MS-Word format to to Associate Editor, Prof. Prashant R. Nair in the email ids [email protected] with cc to [email protected]

(Issued on the behalf of Editorial Board CSI Communications)

Prof. A. K. NayakChief Editor

Page 3: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

3 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

CSI CommunICatIonS

Please note:CSI Communications is published by Computer Society of India, a non-profit organization. Views and opinions expressed in the CSI Communications are those of individual authors, contributors and advertisers and they may differ from policies and official statements of CSI. These should not be construed as legal or professional advice. The CSI, the publisher, the editors and the contributors are not responsible for any decisions taken by readers on the basis of these views and opinions.Although every care is being taken to ensure genuineness of the writings in this publication, CSI Communications does not attest to the originality of the respective authors’ content. © 2012 CSI. All rights reserved.Instructors are permitted to photocopy isolated articles for non-commercial classroom use without fee. For any other copying, reprint or republication, permission must be obtained in writing from the Society. Copying for other than personal use or internal reference, or of articles or columns not owned by the Society without explicit permission of the Society or the copyright owner is strictly prohibited.

P l U SCSI Executive Committee 06Report to the Members of Computer Society of India on CSI Transactions on ICT – The Premier Journal of CSI

08

Life Time Achievement Award 32Foundation Day Seminar-2017 36Report on CSI Student Conventions 37National Seminar on Innovation in Digital Learning 39Brain Teaser 41CSI Reports 42Student Branches News 46

ContentsCover StoryRole of Hadoop in Big Data AnalyticsDeepali Bajaj, Urmil Bharti, Rupali Ahuja & Anita Goel

14

Technical TrendsData Lake: A Next Generation Data Storage System in Big Data AnalyticsRemya Sasidharan Panicker

19

Research FrontSentiment and Emotion Analysis of Tweets Regarding DemonetisationPushkal Agarwal, Nirmal Kumar S., Lokesh Todwal & Sakthi Balan M.

21

Security CornerEnhanced Protection for Big Data using Intrusion Kill Chain and Data ScienceAbdul Khadar A., Dr. Shrishail Math & H. Srinivas Murthy

24

ArticlesMiDeSH: Missile Decision Support SystemC.R. Suthikshn Kumar

28

Printed and Published by Mr. Sanjay Mohapatra on Behalf of Computer Society of India, Printed at G.P. Offset Pvt. Ltd. Unit-81, Plot-14, Marol Co-Op. Industrial Estate, off Andheri Kurla Road, Andheri (East), Mumbai 400059 and Published from Computer Society of India, Samruddhi Venture Park, Unit-3, 4th Floor, Marol Industrial Area, Andheri (East), Mumbai 400 093. Tel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Chief Editor: Prof. A. K. Nayak

Chief EditorPROF. A. K. NAYAK

EditorDR. DURGESH MISHRA

Associate EditorPROF. PRASHANT NAIR

Published byMR. SANJAY MOHAPATRAFor Computer Society of India

Design, Print and Dispatch byGP OFFSET PvT. LTD.

Volume No. 41 • Issue No. 1 • APRIl 2017

Page 4: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

www.csi-india.org 4

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Dear Fellow CSI Members,

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”

– Geoffrey Moore

“Data is the new science. Big Data holds the answers.”

– Pat Gelsinger

The theme for the Computer Society of India (CSI) Communications (The Knowledge Digest for IT Community) April, 2017 issue is Big Data Analytics, a game-changing technology that translates data to information and information to insights

In this issue, Cover Story article is “ Role of Hadoop in Big Data Analytics “ by Deepali Bajaj, Urmil Bharti, Rupali Ahuja and Anita Goel. The authors have provided an overview on big data analytics, its applications, types and features as also highlighted the application of Hadoop for analytics.

The Research front is titled, “ Sentiment and Emotion Analysis of Tweets Regarding Demonetization” by Pushkal Agarwal, Lokesh Todwal, Nirmal Kumar S. and Sakthi Balan M. Here, in the wave of the demonetization surgical strike by the Honorable Prime Minister of India, the authors have analyzed real-time data from online social networks like Twitter

Remya Sasidharan Panicker have contributed to Technical Trends through the article, “Data Lake: A Next Generation Data Storage System in Big Data Analytics”, which focuses on a cutting-edge storage system for the big data wave.

The Security Corner has Abdul Khadar, Shrishail Math and H Srinivas Murthy giving us new insights on Enhanced Protection for Big Data using Intrusion Kill Chain and Data Science.

Another article by C.R. Suthikshn Kumar, “MiDeSH: Missile Decision Support System” showcases an initiative towards national security

The newly elected CSI National Executive Committee (Execom) and inspirational citations of CSI lifetime achievement awardees are published in this issue.

This issue also contains Crossword, CSI activity reports from chapters, student branches and Calendar of events. Major CSI event reports of International Summit on Trends & Innovations on Net gen ICT, regional and state student conventions also find place.

We are thankful to entire ExecCom for their continuous support in bringing this issue successfully.

We wish to express our sincere gratitude to all authors and reviewers for their contributions and support to this issue.

The next issue of CSI Communications will be on the theme “Nano Computing”. We invite the contributions from all CSI members and researchers on this theme. We also look forward to receive constructive feedback and suggestions from our esteemed members and readers at [email protected].

With kind regards,Editorial Team, CSI Communications

Editorial

Page 5: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

President’s Message

Sanjay Mohapatra, Bhubaneswar, [email protected]

01 April, 2017

Dear Members,

Warm Greetings!!

I am humbled, honoured and privileged to assume the role of President of premier professional society ‘Computer Society of India’ after serving as Vice President during 2016-2017. I express my sincere thanks and appreciation to the members who supported me throughout my journey as a member of CSI ExecCom for about two decades prior to becoming Vice President. It will be my honor and privilege to serve CSI as President and I assure all our members that I will put my time and effort to bring in more transparency in the working of the society and make our members proud.

To start with, I would like to thank everyone who participated in this year’s election and took their time to vote for the candidates of their choice.

I would like to welcome new ExecCom members who have been elected and I am sure they will be an asset and contribute to better governance and growth of CSI.

I would like to thank our outgoing President, Dr. Anirban Basu for his dedication for the growth of CSI in all aspects and bring in transparency in management. The electoral reforms carried out under his guidance is an important milestone in the history of CSI. Dr. Basu always advises on involving our members in our activities and serving their professional interests. Dr. Anirban Basu will be an active ExecCom member as IPP and help me to transition into the position of President. I look forward to his guidance in the days to come.

An important activity of CSI is to organize conferences for disseminating new knowledge and providing a platform for networking and exchanging of ideas. In addition to the conferences, CSI has strength in technical publications.

In the last few years, we as a team relentlessly strived for the growth and sustainability of the society time after time. We are able to see a substantial growth in Student, Institutional and Individual membership. We hope the same will continue.

My vision as a President is sustainable growth of the society. Sustainability is achieved by increasing the membership of different categories at various levels. Growth comes parallel with sustainability and depends on the kind of work, connectivity and approach.

In the coming days, we are trying to focus on the regions with less Academic membership/chapters to strengthen the society. Already those regions have been identified and discussions are going at various levels regarding the approach to be followed. We are also trying to increase the corporate membership, along with Institutional and Student membership which in turn will make the society to become strong. Step by step procedure will be followed to improve various publications of CSI.

One of my goals for this year is to try to address the needs of our young student members. As Walt Disney had said “Our greatest natural resource is the minds of our children”. I repeatedly stress on involving student members and increase student centric activities, which are the key areas through which student membership can be strengthened along with the visibility of the society. In the coming days we are planning for large number of Faculty Development Programs in various chapters. As we are Registered Education Provider of PMI, we are in the process of offering PMP training through chapters across the regions.

By working together, we can make CSI better and requesting you to share your valuable comments and suggestions at email id ([email protected] ) on how we can improve, and what can be done to serve CSI and you better.

Sincerely

Sanjay MohapatraPresident, CSI

5 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 6: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

President (2017-18)Mr. Sanjay MohapatraPlot No. 5, CM 839/11, Sector 9 CDA, Market Nagar, Cuttack - 753 014, Odisha.(M) 91-9861010656(E) [email protected]

EXECUTIVE COMMITTEE

vice President Cum President Elect (2017-18)Mr. Gautam MahapatraVailla No: 8, Maithri Enclave, Near Tulsi Gardens, Yapral Kapra, Hyderabad-500 062.(M) 9490995206,(E) [email protected]/ [email protected]

Hon. Treasurer (2017-19)Mr. Manas Ranjan PattnaikPlot No. N-24,25 Chandaka Industrial Estate,Patia, KIIT, Bhubaneswar(M) 07873099999(E) [email protected]

Region-I (2017-19)Mr. Arvind Sharma3/294, Vishwas Khand,Gomati Nagar, Lucknow-226010. UP(T) 522-4075496(M) 9918653442 / 9415063442 (E) [email protected] [email protected]

Region-III (2017-19)Dr. vipin Tyagi Dept of CSE Jaypee University of Engg. and Tech.Raghogarh, Guna - MP 473226 (T) 07544 - 267310-14 ext.134 (M) 09826268087(E) [email protected]

Hon. Secretary (2016-18)Prof. A. K. NayakIndian Institute of Business Management,Budh Marg, Patna - 800 001(T) 0612-2538809, (M) 09431018581, 09386598581(E) [email protected]

Immd Past President (2017-18)Dr. Anirban BasuFlat #309, Ansal Forte, 16/2A Rupena Agrahara, Hosur Road, Bangalore 560068.(T) 080 25731706(M) 9448121434(E) [email protected] [email protected]

Region-II (2016-18)Mr. Devaprasanna Sinha73B Ekdalia Road,Kolkata - 700 019(T) (033)24408849(M) 91 9830129551(E) [email protected]

Region-Iv (2016-18)Mr. Hari Shankar MishraCommand Care, Opp. Loreto Convent School,A.G. Office Road, Doranda, Ranchi – 834002, Jharkhand(T) 0651-2411318 (R)(M) 9431361450(E) [email protected]

www.csi-india.org 6

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

CSI Executive Committee

Page 7: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

DIVISION CHAIRPERSONS

NOMINATIONS COMMITTEE (2017-2018)

Region-v (2017-19)Mr. vishwas BondadeNo. 774, 2nd Stage, Indiranagar, Bangalore 560038(M) 09844058799 (E) [email protected]

Division-I (2017-19)Mr. Apoorva Agha8, Katra Road, Allahabad, UP - 211002(M) 09415316183/08004905012(E) [email protected] [email protected]

Prof. K. SubramanianB 28,Tarang Apmts, Plot 19, IP Extn, Patparganj, Delhi - 110092(M) 09818065948(E) [email protected]

Region-vII (2017-19)Dr. M. Sundaresan Professor and Head,Department of Information Technology, Bharathiar University, Coimbatore - 641046, Tamil Nadu.(M) 09443042340(E) [email protected]

Division-III (2017-19)Mr. Raju L. kanchibhotlaAashirvad, 42/260/1/2, Shramik nagar,Moulali Hyderabad-500046, India(M) 09000555202 / 94 40 32914192(E) [email protected]

Mr. Subimal Kundu Flat No. 1A, Block - 7, Space Town Housing Complex,P.O. Airport, Kolkata – 700052(M) 8100592673; (M)98301-92673(E) [email protected] [email protected]

Division-Iv (2016-18)Dr. Durgesh Kumar MishraH-123-B, Vigyan Nagar,Annapurna Road, Indore(M) 09826047547(E) [email protected]

Division-v (2017-19)Dr. P. KumarProfessor and HeadDepartment of Computer Science and Engineering, Rajalakshmi Engineering College, Chennai – 602 105.(M) 098405 73702(E) [email protected]

Region-vI (2016-18)Dr. Shirish S. SaneDattaprasad, Plot No. 19,Kulkarni Colony, Sadhu Waswani Road, Nashik 422 002(T) 0253-2313607(R)(M) 09890014942(E) [email protected]

Division-II (2016-18)Prof. P. KalyanaramanPlot No. 139, Vaibhav Nagar, Phase I, Opp VIT Gate 3, Vellore – 632014.(M) 7708785555(E) [email protected]

Dr. Brojo Kishore MishraAssociate Professor, Department of IT, C. V. Raman College Engineering, Bhubaneshwar - 752054. India(M) 09437875808 (E) [email protected] [email protected]

7 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 8: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

A R E P O R T

Report to the Members of Computer Society of India on CSI Transactions on ICT – The Premier Journal of CSI

Prof. S. V. Raghavan Shri S. Mahalingam Chief Editor and Director, CSI Publications Ltd. Chairman, CSI Publications Ltd.

The Beginning:We deem it a great privilege to bring to you this report

on the genesis and progress of the premier journal from our prestigious Computer Society of India. The journal is appropriately called the CSI Transactions on ICT and is expected to grow over time in to dedicated transactions covering topics such as Systems and Architecture, Software Design and Performance, Cyber and Information Security, Education Health and Agriculture, Economics, Practice and Management, and Computing and Computational Science. The inaugural issue of CSI Transaction on ICT was brought out in March 2013. We would like to share with the members of the Computer Society of India, the genesis of this series of transactions planned by the Computer Society of India.

India has made great strides in the areas of export of services and the IT professionals from India are highly regarded globally. We now see a number of start-ups in the Information Technology field. Use of Information and Communication Technology has advanced considerably in India in the last decade among Business Organisations, Government and Individuals. Indian ICT companies as well as global leaders have set up research organizations in India. The Government is keen on ushering in a Digital India. In this environment, it is essential that a high quality research journal focusing on research efforts in India is brought out. CSI Publications is dedicating to creating the necessary structure to bring out this kind of journal. This is a critical need at this time of great transformation in the ICT scene in India.The Ambience:

Information and Communication Technology (or ICT in short), is pervasive and ubiquitous. It is touching the life of every one of us in several ways. Civil society governance is dominated by the presence of ICT in education, health and agriculture. In order to increase the quality of life of common man and to provide inclusive growth one recognizes that all round progress based on ICT is absolutely essential in Science and Technology in such a manner that the Research and Development efforts result in useable, simple and affordable devices or services. During such a whole-hearted and dedicated pursuit, generation of knowledge and deeper understanding of issues, are the natural outcome.

Obviously, the knowledge so generated should be institutionalized on an ongoing basis for sharing among peers and to leave cultural legacies for posterity. Homegrown publications of high quality are the only answer. To synergize

the Indian presence in the ICT space, Computer Society of India has been working on the concept of high-quality publications with globally comparable academic content and quality, for the last several years. The deliberations in the society, in the publications committee, and the CSI Executive Committee have resulted in the definition of a series called, CSI Transactions on ICT.

As envisaged, the series has SIX topics that are considered relevant to cover the ICT space from concept to realization, primarily focused on what is happening in India. In fact, the creative ability, design capability, development potential, innovation in deployment across the Civil, Military, and Intelligence space, and optimal resource mobilization in the form of people, ideas, and funds have carved out a huge market for ICT in India in recent times. Phenomenal knowledge and experience gets created during such a transformation process. CSI Transaction series will strive to institutionalize this knowledge being created in India, with accuracy, authenticity, and agility.

For ICT in India, the emergence of National Knowledge Network (NKN) as the integrator of Science, Technology, and Higher Education is an important factor. All scientific laboratories and institutions of Higher Learning (including IITs, IIMs, and Universities) are in one massive multi-gigabit network by the end of 2011. Indian contributions to science, technology, and higher education have been seamlessly integrated to receive global acclaim. Efforts akin to NKN are afoot to extend such linkages and integration to schools, polytechnics, colleges, literacy missions, and skills development organizations. India is on the threshold of making history, as the country perceives NKN as a game changer.The Structure:

Besides, it is widely believed (and strongly so) that India is rediscovering herself - especially in the ICT space. Accordingly, the following are identified as potential transactions to seize the enormous opportunity that beckons Computer Society of India and to respond in a professional manner to seize the opportunity. � CSI Transactions on ICT: Systems and Architecture � CSI Transactions on ICT: Software Design and

Performance � CSI Transactions on ICT: Cyber and Information Security � CSI Transactions on ICT: Education, Health, and

www.csi-india.org 8

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 9: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

A R E P O R T

Agriculture � CSI Transactions on ICT: Economics, Practice, and

Management � CSI Transactions on ICT: Computing and Computational

ScienceAs the Founder Chief Editor of the CSI Transaction

series, I would like to recognize the Editors-in-Chief and all members of the Editorial Board for their exemplary work and contribution to the journal. For the initial three years, we had a strong team of 45 Editors associated with the Editorial Board, who are strong academic and technical leaders in their own right and who participate in the mammoth effort towards knowledge institutionalization.

World-renowned publishing house Springer is our publisher to ensure that CSI Transactions on ICT has Global circulation. CSI Transactions is published quarterly – March, June, September, and December – starting from March 2013. Both Online and Printed version are part of Springer CS package distribution globally. The Editorial Board has planned SIX sections as a single book until we pick up speed and participation.

This publication comes from CSI Publication, a Section 8 Company promoted and fully owned by Computer Society of India. Shri. S. Mahalingam is the Chairman of the Board. CSI Publication is a non-profit organization. It will manage its own finances. So far, a few companies have financially supported the activities. Due to this, for the first three years, the access to issues- that is issues of 2013, 2014 and 2015- were freely accessible in the Springerlink Website. Access to issues of 2016 and beyond will be restricted to Subscribers. There are four categories of subscribers- Academic Institutions, Companies (both ICT and User Organisations), Professionals and Students. Subscription to the Publication by the members of the Computer Society of India has been kept at a highly subsidized rate. This is an on line publication and Print copies, if requested, will be charged extra.The Rationale:

CSI Transactions on ICT was launched at a time when the world is gripped with the larger problem of securing the Cyber Space. Extensive discussions in various policy forums and professional conferences in last few years seem to point out that securing Cyber Space is not only a non-trivial task, but also an extremely challenging one. There is general acknowledgement of the fact that developing ones own hardware and software ecosystem along with a robust and dependable supply chain management is important to creating a secure Cyber Space. We hope that our Transactions carry scholarly articles focusing on different aspects of robust Cyber Space, which is fast morphing in to a Critical Infrastructure of a nation.

The six part organization was based on the premise that ICT based systems are increasingly becoming part of all critical infrastructure. In fact, citizens of this planet consider the entire cyber space as a critical infrastructure, as everyone’s life is dependent on it in one way or the other. We are living in an information society and the “quality” of information sought by users is on the rise. Obviously, research

and development will have to reflect the aspirations of the contemporary information society. The researchers have to constantly strive to generate new knowledge, progressively convert them in to affordable technologies, and engineer them in to products or services that are natural to all of us in daily life. This, I believe, is an ongoing process and ever-expanding one. This will also be changing in tune with our increased and deeper understanding of basic science.

There is a continuum in advances made in physics to the sophistication that is getting entrenched in end-use of ICT based systems. Rapid advances in Physics are enabling new device technologies to gain ground; phrases such as “integrated photonics and semi-conductors” for chip design is becoming pervasive in design to integrate sensor functions, optical switching functions, and optical transmission functions with computing. The impact of such development is tremendous. We will soon witness dramatic changes in systems for communication that perform at Terabit speeds, systems to measure and monitor for healthcare diagnosis and personalized therapy, systems that predict land-slides in hilly terrain, and so on. Each one of these systems will face challenges such as limiting power consumption, limiting cost and maximizing distance covered during transmission, while maintaining highest possible security. Moreover, miniaturization is moving towards Nano levels. Quantum Information Systems are waiting on the wings. We need to worry about the architecture of these systems at all levels keeping functions, reliability, availability, performance, and cost in mind.

All these developments mean that we will have very powerful compute chips that contain 1000 cores or more as CPU power. Harnessing such power will be a Grand Challenge for Computer Science community. Perhaps, the entire gamut of software development of the past may have to be re-invented to optimally to use the opportunity presented by the modern developments in devices. Besides, data generated from the field - be it an experiment using Physics, Chemistry, or Biology, sensors collecting parameters about galaxies or observations from human genomes – is reaching scales of the order of Petabyte. Sheer movement of these data in and out of processors, coupled with a few meaningful calculations as a part of a model is a herculean task. On the top of it all, providing programming convenience is an insurmountable task. Revisit to paradigms of programming to exploit modern architecture is a tremendous effort in itself. Software design for functional completeness and faults is taken for granted. Performance guarantee, Reusability, Reliability, dependability, and Security are the main challenges for the software designers today.

In the backdrop of such “change”, focus on Architecture and Systems make fantastic sense. When coupled with Software Design and Performance, it reflects the enormity of the problem. When the whole effort is viewed in the context of Cyber Security, it turns out to be “manna from heaven” for designers, engineers, and technologists. They can now conquer the computing ecosystem holistically and effectively with the help of “domain experts” and theoreticians. We see the demonstration of such “coming together” off and on in various works reported on big science and big data. Lot

9 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 10: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

more needs to be done. Use of ICT in education is increasingly getting personalized demanding an array of “atomic” learning modules that can intelligently, automatically and seamlessly regroup themselves based on the user requirement. Of course, presence of metadata will enable the process. Health research and health care add to this list; health research poses the big data challenge and healthcare delivery poses the immersive interaction challenge. Agriculture raises issues related to sensors deployment and models that are used in operations as well as prediction. All these technology triggered opportunities and application driven challenges have to be Economically realized and optimally managed when deployed in practice as technology environments in an enterprise. During such a process demands are placed on Computational Sciences. CSI Transaction on ICT covers it all. It is time that we start focusing on each section as a special dedicated issue.The Editorial Board:

CSI Transactions editorial board transacts business through electronic deliberations. Of course, the board meets as and when convenient on the occasion called Editorial Board Conference (EBC). One such was held in 2015, after bringing out the transactions for 2 consecutive years. The deliberations led to major experiments in “defining the source of research papers”. On the whole, the Transactions is gaining strength and is set to scale greater heights in the coming months.

Board felt that Invited columns, invited papers, and Keynote speeches could be a valuable addition. Some of the areas that the Board suggested for Special Issues are Human Computer Interaction, Cyber Security including Forensic Data Science, Internet of Things (IoT), Cloud++, Big Data Science, Virtualization (Cloud) and OS are also topics of relevance.

More specifically, the Board felt that Big Data; Data Science; Visualization, including Cognitive Computing, Machine Learning, AI, Deep Learning, Computing and Algorithms, Computer Architecture and VLSI Design, Autonomic Computing, Self Repair, Self - healing, Large Scale System Design, Computer Networks, Future Languages, Cyber Security, IoT, Bio-Informatics, Cyber Physical Systems, Cloud computing, soft computing, wired and wireless systems, smart cities, etc. were identified as areas of interest.

The Editorial Board, after considerable deliberations, decided to make three experiments to popularize CSI Transactions on ICT and to increase the number of channels that can act as potential source of contents for each issue. The first one was to bring out special issues based on conferences held in India. The second one was to align with an advanced program of Government of India and bring out special issues based on the research outcome reported in those programs. The third one was to have special issues dedicated to a subject matter of current interest. The rationale for this decision to experiment was to institutionalize the wonderful work done in India that normally goes unnoticed. Besides, as CSI Transactions on ICT is from a professional society, the Editorial Board felt that such an approach by a journal would foster industry academia collaboration significantly.

Going forward, such a decision of an Editorial Board of CSIT, opens a new experiment in high quality articles sourcing and publishing to be in resonance and dynamic equilibrium

with the ambient academic atmosphere. Of course, the three pronged approach will align and conform to the original six broad classifications identified; of course with expanded scope to be in tune with times.Translating Board’s vision:

Operationalizing the three ideas and actually conducting those experiments became a challenge! India, as the readers are aware has quite a few educational institutions that carry out high end and high quality research, ranging from basic to applied sciences, engineering, and technology. Computer Science and ICT being an all-encompassing discipline, selecting conferences and maintaining focus in each issue and across issues were seen as an operational challenge. We decided to look at representative conferences in 2016 and selected one each from south, north and east. We also insisted that the organizers of these conferences present to us the best, as evaluated by them. We identified three conferences; ICAARS from south, REDSET from north and ICAC from east. While ICAARS had modest number of papers, ICAC had an impressive number, and REDSET had the maximum. We scheduled ICAARS as the March 2016 issue, along with one paper selected from general pool of submissions. We scheduled REDSET across three issues June, September, and December of 2016, as REDSET had an impressive number of papers to be carried. We decided to make it a combined issue to keep the REDSET papers together. Along the same lines, we scheduled ICAC as March 2017 issue.

The second part of the experiment was to align with a Government program, which is directly related to our journal CSI Transactions on ICT. Sir Visveswaraya PhD program, in which the Government of India gives out over 1000 PhD Fellowships, turned out to be an ideal fit. The Ministry of Electronics and Information Technology (MEIT) was also keen to collaborate. Since the program has a self-designed three stage filter for selecting publishable material, it naturally crated a win-win situation for the program as well as the journal. The first batch of papers is scheduled as June 2017 issue.

Having explained to the reader the rationale and modus operandi of selecting papers for our journal based on a three-pronged approach suggested by the Editorial Board, I would like to write a few paragraphs about the work reported in the current issue; viz., March 2016 issue of CSI Transactions of ICT focusing on the Special Issue - ICAARS.

ICAARS stands for International Conference on Advanced Automation, Robotics, and Sensors, held in PSG College of Technology, Coimbatore, India in 2016. Robotics and Automation is the key technology of the future, with application potential across a wide spectrum - from strategic areas such as aerospace, defense and atomic energy to services such as hospitality and healthcare.  It holds the potential to transform the future of manufacturing.

The development of robots into intelligent machines touches upon issues such as self-understanding  of humans, socio–economic, legal and ethical issues.  Robotics is an area where several sciences meet. Many developments in sensors, vision systems, virtual systems, adaptive control systems, precision machining & material handling  have

A R E P O R T

www.csi-india.org 10

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 11: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

contributed to the  advancements  in the field of Robotics & Industrial Automation. The papers selected for publication in this issue reflect the spirit of Robotics, Automation, and Sensors.

REDSET conference is unique in the sense that it represents a type of event in India, which encourages reviews, summaries, developments, and usage experiences in addition to applied research. The conference has also attracted a larger cross of the academic community apart from the traditionally known institutions. In terms of domain of knowledge covered, such conferences span a wide spectrum covering education, health, agriculture, finance, cyber security, and management with emphasis on use ICT in those domains.

ICAC - International Conference on Advanced Computing was organized by Maulana Abdul Kalam Azad University of Technology (formerly it was West Bengal University of Technology) in October 2016. ICAC is an example of Government sponsored multi-institutional program and collaborative work among institutions, besides being an international conference of repute. All the papers exhibit depth and insight in the problems tackled. The Linkage with GoI Program:

The June 2017 issue is a special one honoring India’s greatest ever engineer/architect/planner Dr. Mokshakundam Visvesvaraya, in whose name the Government of India has launched a Doctoral Programme to identify the Best Talent across the country in areas such as Electronics System Design and Manufacturing (ESDM), Information Technology (IT), and Information Technology Enabled Services (ITES). The salient features of the scheme are reproduced here from the web site of Medial Lab Asia, who manage is Visvesvaraya PhD Scheme and implement it on behalf of the Ministry of Electronics and Information Technology (MeitY), Government of India:1. Give thrust to R&D, Create innovative ecosystem and

Enhance India’s competitiveness in the knowledge intensive sectors such as Electronics Systems Design and Manufacturing (ESDM), Information Technology (IT), and Information Technology Enabled Services (ITES).

2. To help in the fulfillment of the commitments made in National Policy on Electronics (NPE) 2012 and National Policy on Information Technology (NPIT) 2012.

3. Support to 1500 PhD students in each of ESDM and IT/ITES sectors (Total: 3000 PhDs).

4. Out of the above, 500 PhDs in each of ESDM and IT/ITES sectors would be from full-time PhD candidates. The other 1000 PhDs in each of ESDM and IT/ITES sectors would be from part-time PhD candidates.

5. The scheme will also support 200 Young Faculty Research Fellowships in the area of ESDM and IT/ITES with the objective to attract and retain young faculty members. This is expected to help in the recognition and encouragement of young Faculty members involved in research and technology development in these areas.

6. Infrastructural grant to Academic Institutions for creation and/or up gradation of laboratories. Grant up to Rs. 5 lakh for every Full-Time PhD Candidate supported under the

scheme may be provided to an Academic Institution.

7. One of the key goals of the Visvesvaraya PhD Scheme is to encourage working professionals and non-PhD faculty members to pursue PhD in the ESDM & IT/ITES sectors, as part-time candidates. It is envisioned that having part-time PhD students is likely to encourage the Industry-Academia interaction, help in the alignment of the R&D efforts between industry and academia, and bring value to the country.

An apex committee at national level, called Academic Committee for ‘The Visvesvaraya PhD Scheme’ of MeitY, manages the program academically for quality assurance. I have the privilege of chairing the Academic Committee. I have distinguished members from premier institutions, viz., Professor Navkant Bhat from Indian Institute of Science, Professor Sanjiva Prasad from Indian Institute of Technology Delhi, and Professor Abhay Karandikar from Indian Institute of Technology Bombay. I retired as Professor of Computer Science from Indian Institute of Technology Madras.

As the Visvesvaraya PhD program is seen as a unique honor for a student to be selected, the academic committee concentrates on quality of work carried out by the scholars, on an ongoing basis. One of the efforts is to bring together the scholars every quarter, for a knowledge-sharing workshop. The workshop is designed as a three level filter to get the best from a batch of students. A batch consists of students admitted in a certain academic year. For example, the current issue carries the work reported by the first batch of Visvesvaraya PhD Fellowship awardees 2014-2015. The three levels of the quality filter are: Selection of about 10% from the abstracts presented by a specific batch, reviewing the presentation material, listening to the presentation by the scholars selected, and interacting with the scholars along with their research guides, and reviewing the full paper (based on the abstract and the presentation) for publication in an academic journal of India - the CSI Transactions on ICT published by well known Springer and the journal is owned by the prestigious Computer Society of India. Springer has the journal as a part of their Computer Science package, thereby ensuring worldwide circulation and exposure to the selected work of our prestigious Visvesvaraya PhD Fellowship scholars.

Dedicating an issue for a specific program as a Special Issue is unique gesture by CSI Publications and a unique experiment by CSI Publications in supporting Indian academics for fast track publication of their work in a reviewed journal of repute with worldwide circulation. The idea is to ensure that our Visvesvaraya PhD Fellowship holders, who come up with high quality work, are recognized and rewarded on an ongoing basis. This was also part of the deliberations of the Academic Committee for ‘The Visvesvaraya PhD Scheme’ of MeitY, to carry out a bold experiment in synergy across a Government Program, independent Journal of Computer Society of India, fast track recognition of Visvesvaraya scholars, and industry participation for value identification and enhancement from such pursuits. Above all, it is our way of saluting the young scholars using the paradigms, “Make in India” and “Made in India” for local relevance and global quality, in line with the

A R E P O R T

11 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 12: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

A R E P O R T

Digital India pursuits of our Government.In the current issue of the CSI Transactions on ICT dated

June 2017 (being released in Second Workshop of the Visveraya PhD Scheme in IISc., Bangalore), we have 11 papers from the First Visvesvaraya PhD Program Workshop held during October 2016 in Indian Institute of Technology Bombay. Shri Sanjeev Mittal, Joint Secretary to Government of India in MeitY, Dr. M. R. Anand, MD and CEO of Media Lab Asia and Senior Economic Advisor in MeitY, Government of India, and Prof. B. G. Fernandes, Head of the Department of Electrical Engineering in IIT Bombay shared their views and vision about the program. The four National Academic Committee members formed the review panel. I would like to take this opportunity once again to record my appreciation to the excellent work done by each one of them for our effort towards quality assurance for the Visvesvaraya PhD program. For the benefit of the readers, I would like to present a quick preview of the work presented and their importance in today’s digital world. Needless to add, that each paper has a significant contribution and is of global quality.1. Research in 3D integration has attracted researchers

from industry as well as academia due to its benefits over 2D architecture due to better performance, lower power consumption, small form factor and co-existence of heterogeneous technology. However, due to higher power density and reduced heat dissipation properties, thermal challenges cause significant concerns, in the otherwise promising 3D integration technology. Lokesh Siddu and Preeti Ranjan Panda of Indian Institute of Technology Delhi, investigate system-level thermal aware data/task mapping policies for 3D memory architectures.

2. On-chip Wavelength Division Multiplexing (WDM) application of ring resonators on Silicon-on-Insulator (SOI) platform, poses design challenges such as Large Free Spectral Range (FSR) and narrow line width requirements. In fact, the metrics demand opposing structural requirements from a ring resonator - larger FSR demands for a smaller resonator length whereas smaller line width requires a large resonator length device. Awinash Pandey and Shankar Kumar Selvaraja of Indian Institute of Science Bangalore, suggest the use of an embedded type ring resonator configuration - a structure made of a racetrack type ring resonator with another ring embedded inside it, because such structures are capable of showing coupled resonator induced transparency (CRIT) which can result in a very narrow line-width. The authors present an (initial) experimental demonstration of CRIT in an internally loaded micro ring resonator fabricated on SOI material platform. They optimized the structures using 3D-FDTD analysis and coupling-matrix method.

3. Remote sensing through atmosphere requires high power laser sources around 1.6  μm band in Laser Imaging Detection and Ranging (LIDAR) applications and free space optical communication applications. Improving the pumping efficiency and reliability of the laser without increasing nonlinearity that too in a cost effective manner

is a significant challenge. S. Arun and V. R. Supradeepa of Indian Institute of Science Bangalore, argue that because of high atmospheric transparency in this band, one has to build a high power laser that generates >25 W of output power at 1570 nm. They do so using sixth-order cascaded Raman amplification of a low power, Erbium–Ytterbium seed. The authors discuss the challenges in building a high power Ytterbium doped fiber laser operating at 1117 nm generating >100 W CW output power for use as the primary laser source for Raman laser experiments. In the process, the authors demonstrate a novel, drive scheme for standard laser diode modules (without wavelength locking) that they use for pumping rare-earth doped lasers and amplifiers.

4. Organic semiconductor based photo detectors would be very attractive, innovative and well suited for light detection applications. The reason for organic semiconductor in photonic devices is lower cost, light weight, mechanical flexibility, chemical modification, tunability of absorption range with co-evaporation and co-mixing of molecules and ease of integration. Debarati Nath, Puja Dey, Debajit Deb, Jayantha Kumar Rakshit, and Jithendra Nath Roy of National Institute of Technology Agartala, in their work discuss the fabrication and characterization of organic semiconductor based photo detector with a choice of organic semiconductor as donor and acceptor. Besides, they discuss optimization by employing diversified organic semiconductors for fast response time, high photosensitivity, high quantum efficiency, low dark current, large dynamic range and long lifetime. The authors discuss the design of an equivalent circuit model for organic photo detector (OPD) structure and present results obtained through simulation using MATLAB Simulink, in which Rubrene and BPPC are used as active layer of OPD. Using their OPD proposal, the authors argue that 500 MHz of Operating frequency that is much higher than the speed of the red light illumined Bi-layer OPDs reported till today, is feasible.

5. Non-contact measurements – non-ionizing and non-invasive – of human body have been attracting the attention of medical science researchers. Remote measurement of ‘heartbeat’ has several strategic applications besides medical diagnosis. Harikesh Dala, Ananjan Basu, and Mahesh P Abegoankar of Center for Applied Research in Electronics, Indian Institute of Technology Delhi, discuss a method for non-contact measurement of respiration and heartbeat using microwave Doppler radar phase modulation in X-band (8-12 GHz), as the reflected signal from the body depends on the radar cross section (RCS) of body.

6. In automatic speech recognition systems, the information in the speech signal is traditionally retrieved in the form of feature vectors representing sub-word units and thereby converting the features into human readable text form. However, these systems perform poorly due to degradations of speech under varying environmental conditions. To improve the performance, the main issues

www.csi-india.org 12

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 13: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

A R E P O R T

to be considered are: (a) Determination of speech regions in the speech data collected in degraded environments, and (b) Recognition of speech sounds from the degraded speech in the detected speech regions. Although there exist wide variety of techniques, which address these issues, most of them are applicable for clean speech synthetically degraded by stationary noise conditions, due to the need for large amount of training data for statistical modeling. Vishala Pannala of International Institute of Information Technology, Hyderabad, focuses on methods of processing the signals so as to determine the desired speech regions in degraded conditions. For this, the author explores signal-processing methods to extract speech-specific characteristics independent of the characteristics of degradations.

7. The next generation networks (NGN) have higher network density to increase the capacity of the overall network and consequent energy consumption. Yogitha Ramamoorthy and Abhinav Kumar of Indian Institute of Technology, Hyderabad, argue that Base Station Switching (BSS) combined with appropriate coverage extension techniques, such as coordinated multi-point (CoMP) transmission is the way forward in achieving higher energy efficiency while maintaining the QoS. The authors discuss the performance evaluation of CoMP with BSS, utilizing suitable resource allocation techniques.

8. Ramesh K Gupta and Bijoy K Das of Indian Institute of Technology Madras, propose and demonstrate a method for the fabrication of a Silicon on Insulator (SOI) platform with custom-design device layer thickness (<1 μm) which can be accessed by any desired number of adiabatically tapered single-mode input/output waveguides (multi-input multi-output waveguides) of widths and heights >1  μm, operating at λ  ~  1550  nm. The input/output waveguides can be pigtailed with standard single-mode fiber with lensed tip ensuring modal overlap of >70% (coupling loss <1.5 dB). Such a multi-input multi-output SOI platform will facilitate for CMOS silicon photonics based on-chip applications with an additional usage freedom of device layer thickness. Moreover, it can be potentially used to design SOI based stand-alone devices, which can be useful at transmitters/repeaters for short-haul/long-haul optical communication.

9. In uncontrolled environment, recognition of faces from imagery, present multiple challenges. The primary challenges are occlusion, pose and illumination variation. Convolutional Neural Network (CNN) is a bio-inspired network that learns the way human brain learns. CNN offers deep observation of features present in input image. Dattatray D. Sawat and Ravindra S. Hegadi of Solapur University, Solapur, present a combined approach to detect faces using deep features extracted by deep CNN and the classification by Cubic Support Vector Machine. The authors use Area based approach for removal of distant faces and background pixels, in order to reduce the processing time required per frame at detection stage.

10. Machine learning, where a machine will independently

learn from users previous data and provide solution and better suggestions to the user, is considered to be the next generation human machine interaction technology. The emerging trend is to institutionalize learning as and when it happens, resulting in ‘Reference based self-learning’. Avinash Keskar of Visvesvaraya National Institute of Technology, Nagpur and N C Shivaprakash of Indian Institute of Science, Bangalore, propose a reference based self-learning model, which can learn classification on new data from its previous trained models. The authors take recourse to simulation on three feature vectors as a process of events and achieve an accuracy of around 90% using reference-based learning. Obviously, this method of live training and classification reduces time required for database preparation and model training separately for each event based features.

11. Speech is the natural communication means, for interaction between humans and machines. Telephone-speech technology has been receiving more attention in recent times. The spectral-temporal features offer a significant performance improvement for telephone speech recognition when compared with the conventional ‘feature based speech/speaker identification’. The commonly used method to measure the performance of a speech recognition system is the recognition accuracy. For obtaining proper accuracy it is necessary to design an efficient classifier for the recognition purpose which will lead to correct recognition results. Mridusmita Sharma and Kandarpa Kumar Sarma of Guwahati University, Guwahati, discuss soft computation based spectral and temporal models of linguistically motivated Assamese telephonic conversation recognition.

I am sure that readers find the June 2017 issue extremely interesting. The contents of this issue not only present the state-of-the-art, but also show cases the research work being carried out in various institutions. While all the papers span the area of ESDM, IT and ITES, in line with Sir Visvesvaraya PhD program objectives, it also presents the potential that exists across the length and breadth of India.The Way Ahead:

The high quality of papers printed, prominence of Editors, regularity of issues, stability of platform and the reach CSI Publications has established to attract good quality papers show that the CSI Transactions on ICT is in the process of establishing itself as a high quality Journal to come out of India. Given the track record hitherto, the focus will be the following:1. Get far wider participation for Papers. This will mean

greater in roads into Research Departments of IT Companies, Educational Institutions, Management Researchers and Companies focusing on practical applications of ICT. We are in the process of establishing CSI Transactions on ICT as the publication for the authors to get the benefit from.

2. Obtain international recognition for this publication in

Contd. on pg. 27

13 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 14: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

Role of Hadoop in Big Data Analytics Deepali Bajaj Urmil BhartiAsst. Prof., Shaheed Rajguru College of Asst. Prof., Shaheed Rajguru College of Applied Sciences, University of Delhi Applied Sciences, University of Delhi

Rupali Ahuja Anita GoelAsst. Prof., Matreyi College, University of Delhi Associate Prof., Dyal Singh College, Univ. of Delhi

1. IntroductionThe advent of technologies like

mobile computing, cloud computing, internet of things, sensor based networks and the availability of internet in handheld devices has resulted in generation of large amount of data, both structured and unstructured, which is also known as Big Data.

The opportunity of organizing this large date into a meaningful and valuable information, is being realised by industries, organizations and companies. But, the challenge with big data is that it is difficult to handle such large amount of data,effectively,using traditional methods. New tools, technologies, models and methodologies are used to handle big data. Hadoop, an open source framework,is being majorly used for processing big data. It is a prominent distributed storage and compute environment which is used for storing and processing of big data.Big Data

Big Data is a massive collection of data which is generated at an exponential rate in a wide variety of formats and has become hard to handle using traditional data management tools.

The theory of big data is based on five V’s: � Volume: Large volume of

data generated every second by individuals, organizations, machines etc.

� Velocity: Speed at which data is being generated.

� Variety: Various formats in which the data is available (text, blogs, tweets, video, barcode, databases etc.).

� Veracity: Correctness and accuracy of data.

� Value: Insights or information that may be generated by applying analytics on big data.The interest of organizations in

big data has risen due to the value it may generate for their businesses and researches. Organizations want to expand, make better business decisions and create new products and services; big data plays a major role in this. With large amounts of data spanning from user buying trends, to twitter tweets, the data holds valuable information. Proper extraction and analysing of this data may reveal insights in future and help organizations take profitable business decisions or create actionable intelligence.Big Data Analytics

Big Data Analytics (BDA) is the process of applying advanced analytic techniques to large varied data sets in order to gather insights and discover hidden patterns that may help analysts, businesses and researchers in making faster and better decisions.

Traditional analytics deals with structured, transactional data collected over a period of time, in data warehouses for performing Business Intelligence (BI). A BI analyst focuses on

finding trends, generating reports and visual analysis of data.

In BDA, data scientists, predictive modellers and other analytics professionals analyse large volumes of transactional, as well as, data of other forms, collected from different types of sources that may remain untapped by conventional business intelligence  programs. These data forms includeweb server logs,internet clickstream data, social media content, social network activity reports, patient’s health records, text from customer emails, survey responses, mobile-phone call detail records, and machine data captured by sensors connected to internet of things.

Table 1 illustrates the differences between Traditional Analytics and Big Data Analytics.

BDA can be performed on different types of data like, text, image, clicks, logs and blogs to reveal insights about behavioural patterns of customers/users/clients, optimizing performance, taking smart business decisions, predicting future values, preventing diseases, combating crime, reducing frauds, and mitigating risks.

Fig 1 shows the different types of big data analytics.

Table 1: Traditional Analytics vs Big Data AnalyticsTraditional Analytics

Big Data Analytics

Type of Analysis Diagnostic and Descriptive analysis

Predictive and Prescriptive Analysis

Data Source Limited data sets, cleaned data, simple models

Large scale data sets, variety of data like, structured/unstructured/semi-structured data, unprocessed data, Complex data model

Analytical Domain

What happened and why?

Gain new insights, find trends, hidden patterns, correlations

www.csi-india.org 14

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

COvER STORY

Page 15: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

Fig 1: Types of Big Data Analytics

Social Analytics

Log Analytics

Sentiment Analytics

Content Analytics

visual Analytics

BIG DATA Analytics

Web Analytics

Performance Analytics

Behaviour Analytics

Predictive Analytics

Text Analytics

Today, various commercial as well as open source tools, like IBM BigInsights, SAP Hana tool, Oracle Big Data Appliance, Pivotal Big data suite, Lumify, Apache Storm, RapidMiner etc., are available to perform different types of analytics on Big Data Hadoop is a popular open source framework used for BDA. Many companies like Cloudera, Hortonworks, and IBM have built their big data solutions on top of Hadoop.Introduction to Hadoop

Apache Hadoop is an open-source framework for distributed data processing of large data sets. It is currently used by Google, Facebook, LinkedIn, Yahoo, Twitter and many more organizations to improvise their user experience, get feedback, and, build new services and products.  Hadoop is popular for its flexible and scalable architecture that stores and processes big data on commodity hardware machines. It allows distributed processing of large data sets on cluster of nodes.

In general, a cluster is a group of servers and other resources that act like a single system and enable high availability, load balancing and parallel processing. Hadoop cluster is a special type of computational cluster designed specifically for storing and analysing huge amounts of  unstructured data  in a distributed computing environment. It is designed to scale up from a single server to thousands of machines, where each machine provides local computation and storage. The largest publicly known Hadoop cluster is Yahoo!’s 4000 node cluster followed by

Facebook’s 2300 node cluster.Hadoop does not depend on high

hardware availability. Hadoop library is designed to detect and handle failures at the application layer. This provides an always available service,even when each node itself is prone to failures.

The main benefits of using Hadoop cluster in big data analytics perspective is its scalability, cost-effectiveness and reliability. For analysing the large data, Hadoop breaks it into smaller chunks and assigns each chunk to an individual node in the cluster. Rather than depending on performance of a single node, Hadoop focuses on parallelism. Hadoop handles the increasing data by horizontal scaling and adding additional nodes in the cluster effortlessly. Apache Hadoop is a free open source software and its distribution  is free. Hadoop cluster can be installed on commodity hardware rather than investing in powerful, high performance and expensive servers. Hadoop cluster is highly fault tolerant and resilient to node/rack failure. Data in Hadoop is replicated to other cluster nodes so in case of node failure, additional copies of the data is made available. Thus, data is always available for analysis when stored in Hadoop.

The initial version, Hadoop 1.0 supported only Map Reduce(MR) processing model. This model was not able to handle streaming and real-time data. Also, Hadoop 1.0 was limited to 4000 nodes per cluster. Moreover, the entire file system was subjected to Single-Point-of-Failure (SPOF), as it was managed by single Name Node. Hadoop 2.0 overcomes the limitations of Hadoop 1.0. It supports Map Reduce as well as other tools, like, Spark, Hama and Giraph. Cluster resource management is done by YARN (Yet Another Resource Negotiator).Hadoop 2.0 is scalable up to 10,000 nodes per cluster. It also supports Multiple Name Node servers, thus eliminating the risk of single point of failure. Hadoop 2.0 is capable of running event processing, data streaming and real time operations.Basic Architecture of Hadoop 2.0

Hadoop cluster  is a set of host machines (nodes) that are partitioned into  racks. The cluster follows the master-slave architecture. Few nodes

called Name Nodes, act as master nodes. All other nodes of the cluster work as slave nodes and are called Data Nodes. The Name Nodesare controller nodes of the Hadoop architecture. They manage the namespace of the entire file system. Data nodes contain the blocks of data residing in the files.

Hadoop 2.0 project comprises of four main modules, namely, Hadoop Common, Hadoop Distributed File System 2, Hadoop YARN and Hadoop MapReduce (MR).Various projects like Pig and Hive are built on top of these modules. Fig 2 shows the architecture of Hadoop 2.0.

MR (Batch)

TEZ (Execution Engine)

YARN (Cluster Resource Management)

HDFS2 (Redundant Reliable Storage)

Services (HBase)

RT Stream Graph

(Storm, Giraph)

Pig (Data Flow)

Hive (SQL)

Others (Cascading)

Hadoop Common

Fig. 2: Hadoop 2.0 Architecture

The four main modules of Hadoop 2.0 are as follows -1. Hadoop Common: Java libraries

and utilities required by other Hadoop components.

2. HDFS2 (Hadoop Distributed File System): Java-based file system that provides scalable and reliable data storage. HDFS2 can store up to 200 PB of data. To store a data file in HDFS2, it is divided into data blocks and these blocks are then saved in different data nodes of the Hadoop cluster. Name nodes store the directory and metadata related to the file system. They also store the mappings of files to blocks and their physical location. Data Nodes can store and retrieve data blocks as directed by Name Node. The Name Node is also responsible for replication of blocks in multiple Data Nodes. Every data node sends Heart Beat and a Block Report to the Name Node at regular intervals to signify that it is functioning correctly. Block report is a list of all blocks that a Data Node holds. Fault tolerance in HDFS2

15 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

COvER STORY

Page 16: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

COvER STORY

is achieved by block replication with a default replication factor of three. The block size and replication factor are configurable parameters and can be set as per application requirement.HDFS2 applications must have a WORM (write-once-read-many) access model for files. A file once created, cannot be updated except for appends and truncates.

3. Hadoop YARN: Responsible for job scheduling/monitoring and cluster resource management. It acts as a central platform for delivering consistent operations, security, and data governance across Hadoop 2.0 cluster. YARN is designed to provide a generic processing platformfor data stored across a cluster and a robust resource management framework. It supports batch processing using MapReduce, graph processing using tools like Apache Giraph and real time processing using tools like Apache Storm.

4. Hadoop MapReduce: Programming model for parallel processing of large data sets. In this model, users specify a map function and a reduce function according to their requirements. Map function processes a key/value pair and generates a set of intermediate key/value pairs as an output. A reduce function,processes the output of map function and merges all intermediate values associated with the same key to generate the

final result as shown in Fig 3. Hadoop 2.0 deals with partitioning

the input data, scheduling the program’s execution across a set of machines, handling node failures, and managing the required inter-machine communication. Programmers with no experience of parallel and distributed processing can easily utilize the resources in a Hadoop cluster with ease.

Several projects have been developed on top of Hadoop 2.0 basic architecture to handle different needs of big data. Some of the key Apache Hadoop 2.0 related projects are-

Ambariisa web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters. It supports HDFS2, MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, and Sqoop. Ambari, provides a dashboard for viewing cluster health  factors, featuring  heat maps and the ability to view MapReduce, Pig and Hive application visually.

Avrois aremote procedure call and data serialization framework.

Cassandra is a scalable multi-master database with no single point of failure.

Chukwa is a data collection system used to manage large distributed systems.

HBase is a scalable, distributed database. HBase supports structured data storage for large tables.

Hive is a data warehouse infrastructure that provides data summaries and ad-hoc querying.

Mahout is a scalable machine

learning and data mining library.Pig is a  high-level data flow

language and execution framework for parallel computation. It allows the user to write map-reduce function in Pig Latin scripts.

Spark is a fast and general compute engine for Hadoop data. It provides a simple and expressive programming model that supports a wide range of applications, including ETL (Extract, Transform and Load), machine learning, stream processing, and graph computation.

Tez is a  generalized data flow programming framework built on Hadoop YARN. It provides a powerful and flexible engine to execute an arbitrary Directed Acyclic Graph of tasks to process data for both batch and interactive use-cases.  Hive, Pig, and other frameworks in the Hadoop ecosystem run on the top of it.

ZooKeeper is a high-performance coordination service for various Hadoop components in distributed applications.Hadoop in BDA

Businesses that rely on Hadoop need a variety of analytical infrastructures and processes to find the answers to their critical business queries. And often the data on which analysis is to be performed is live and streaming data.

MapReduce is best suited if the data operations and reporting requirements are static in nature and the user can wait for batch-mode processing. But, if the requirement is to do analytics on streaming data, like sensors data from a factory floor, or to run applications that require multiple operations on the same data set then Spark is recommended.

Spark is an open source cluster computing framework designed for fast computation. It runs on top of Hadoop cluster and can access data through HDFS2 and HBase. It can also process structured data in Hive and streaming data from Flume, Kafka, and Twitter as shown in Fig. 4.

Spark provides an interactive mode for users to receive immediate results for their queries. MapReduce does not have an interactive mode, but add-ons such as Hive and Pig make working

Partition 1 Intermediate 1

Intermediate 2

Results 1

Results 2

Intermediate 3

Partition 2

Partition 3

Partition 4

Partition 5

Input Files Map Phase Intermediate Results

Reduce Phase Results

Map Task 1

Reduce Task 1

Reduce Task 2

Map Task 2

Map Task 3

Fig. 3: MapReduce working model

www.csi-india.org 16

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 17: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

easier for adopters. While MapReduce operates in steps, Spark operates on the complete data set in one leap. Spark is easy to use in contrast to Hadoop MapReduce, as it allows its users to write applications in other languages like Scala, or Python in addition to Java. Spark has a built-in set of over 80 high-level operators which are used interactively to query data.

Java Scala Python

Spark Spark SQL Spark Stream

HDFS HBase HiveHDFS Flume Kafka Twitter Custom

Hadoop

Fig. 4 : Spark Integration with various Data Stores

Spark is designed to cover a wide range of applications such as batch applications, iterative algorithms, interactive queries,real-time streaming, machine learning and graph algorithm. What really gives Spark edge over MapReduce is its speed. Spark handles most of its operations “in-memory”and provides result faster than any other approach requiring disk access.  Spark enables applications in Hadoop clusters to run up to 100 times faster in memory and 10 times faster even when running on disk.

The basic data structure of Spark is Resilient Distributed Datasets (RDD). It is an immutable and fault-tolerant distributed collection of objects in the cluster. RDD can be created from the data storage or from other RDDs by performing available operations like Map, filter, combine ByKey on them. Spark makes use of RDD to achieve faster and efficient processing. Fig.5 shows the real time processing of big data using Apache Spark. Data is imported from various data sources and processed by Spark unified processing engine using RDDs. The final results of the computation are stored in any cluster data stores, like, HDFS2 or HBase and are used by data scientist or analytic professionals for making strategic business decisions.

HDFS2

Data Stores

Data Ingest Tools

(Kafka Flume)

HBase

OutputR R R

Fig 5: Real time data processing in Spark

According to a surveyconducted by developers of Apache Spark: � 91% users prefer Spark for

performance gains.

� 77% users find it is easy to use.

� 71% usersfind its deployment easy and simple.

� 64% users depends on Spark for advanced analytics

� 52% users utilizesit for real-time streaming applications.

Adoption of Hadoop for BDAHadoop has been adopted by

almost every industry for performing Big Data Analytics. Apart from its regular advantages of increasing profits, reducing costs, increasing customer base, improving efficiency and performance of business, it has proved to provide some specific advantages in different industries. A list of some domains that have benefitted from use of Hadoop in BDA are as follows -

Healthcare: BDA is heavily being used on patient’s health record, DNA analysis for prevention and cure of diseases, predicting epidemics and enhancing the quality of human life.Children’s Healthcare of Atlanta, Explorys, DignityHealth are analysing the patient’s data and tracking their vitals for deciding the best treatment plan for them.

Learning: BDA using Hadoop is used in Higher Education for curriculum enhancement, increasing student engagement, improve results, career counselling, choosing subjects, measure teacher’s effectiveness etc.

Telecom:Telecom companies are leveraging Hadoop to perform analysis of call detail records to improve call quality and enhance customer

satisfaction. They regularly analyse network logs and data collected through sensors to proactively detect failures in network infrastructure. Verizon, China Mobile and Telefonica perform analytics on their subscriber’s data using Hadoop.

Sports: BDA is being performed on previous match recordings to strategize the current game and also to predict game results. It is also being used to analyse performance of players to quantify their abilities and help in their team selection. For example, NFL teams utilize Big Data to get real-time analytics regarding player & strategy performance.

Retail : Retail industry is utilizing Trend Analytics, social media analysis, market basket analysis, brand sentiment analysis on Big data for enhancing consumer experience, retaining customers, personalizing shopping, recommending products, optimizing store layouts, predicting demands, promoting products effectively etc. Retail giants like Walmart and Tesco and ecommerce bigwigs like eBay and GroupOn are using Hadoop for BDA.

Banking: BDA is being used to detect, prevent and eliminate frauds related to misuse of debit/credit cards.JPMorgan and Bank of America use Hadoop to process massive amounts of data.

Entertainment : On demand video and audio websites are using BDA to provide content based recommendations and personalize content for its users to enhance their experience. Spotify uses Hadoop to provide music recommendations to its users. Netflix uses MapReduce, Pig, Hive and spark for performing various types of analytics

Transportation : Analytics on location based data, GPS data, data from sensors is being utilized for traffic management, route planning, reducing traffic congestion and predicting traffic conditions. Many cities, like City of Dublin, Stockholm, New Jersey and Zhejiang have been able to reduce traffic congestion, improve transportation service and optimize their energy

17 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

COvER STORY

Page 18: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

consumption An Example : Walmart, an American

multinational retail giant, collects 2.5 petabytes of unstructured data from more than 1 million customers every hour.  Their analysis covers millions of products and hundreds of millions of customers from different sources. The analytics systems at Walmart analyses close to 100 million keywords on daily basis to optimize the bidding of each keyword. As bulk of unstructured data is generated every hour, Walmart is still improving its operational efficiency by leveraging big data analytics and enhancing customized shopping experience. Walmart has extracted significant value with big data and became successful. Walmart initially started with 10 node  Hadoop  cluster as an experiment and subsequently migrated to a 250 node cluster in 2012 for combining 10 different websites into a single website. Walmart is riding high on BDA to provide excellent e-commerce and m-commerce solutions to optimize the shopping experience of customers when they visit Walmart store, or access Walmart website.

Hadoop and NOSQL technologies provide access to real-time data collected from different sources. Predictive analytics using machine learning algorithms are employed to enhance 10% to 15% increase in online sales. Walmart applies data mining techniques to discover hidden patterns in sales data and it

facilitates this organization to identify association between products and give recommendations to users based on mining results.Challenges of Hadoop in BDA

Despite the benefits, Hadoop cluster is not always the best solution for every organization’s data analysis requirements. For example, organizations with relatively small data might not gain enormously from a Hadoop cluster even if intense and complex data analysis is required.

Another drawback of Hadoop cluster is that all its mining algorithms are based on parallel processes running on separate cluster nodes. If data analysis doesn’t fit for use in a parallel processing environment, then Hadoop may not be the right choice.

The most significant  inhibition in using a Hadoop cluster  is thelearning curve associated with installing, operating and supporting the cluster. Until organizations have Hadoop experts in their IT departments, it becomes very difficult to perform the required data analysis. Conclusion

Hadoop is a popular framework used for big data storage and analytics. It provides scalable data storage and distributed processing. Hadoop is being applied to extremely large datasets which are otherwise complex to work using traditional database management systems. The size of the

data sets is beyond the capability of commonly used software tools and storage systems, to capture, store, manage, as well as process the data within a tolerable elapsed time. Apache Hadoop can be used to analyze data for gathering insights which can lead to better decision and strategic business moves. It provides a paradigm where IT companies can apply data science and machine learning algorithms to provide product recommendation, web analysis, social analysis, and sentiment analysis.

Big data analytics using Hadoop can help an organization to operate more efficiently, find new opportunities and derive next-level competitive advantage. It provides an opportunity to innovate with minimal investment.References[1] h t t p : / / p e r s p e c t i v e s . m v d i r o n a .

com/2010/07/hadoop-summit-2010/[2] https://www.quora.com/What-are-the-

applications-of-Big-Data-and-Hadoop-in-sports

[3] https://www.dezyre.com/article [4] https://www.simplilearn.com/big-data-

applications-in-industries-article[5] http://www.rcrwireless.com/20140328/

wireless/what-mobile-companies-are-using-hadoop-3

[6] h t t p s : / / l i n k . s p r i n g e r . c o m /chapter/10.1007/ 978-3-319-08976-8_16

[7] https://www.sas.com/en_us/insights/analytics/big-data-analytics.html

[8] h t t p : / / w w w . i n f o w o r l d . c o m /article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html

COvER STORY

n

www.csi-india.org 18

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

About the AuthorsMs. Deepali Bajaj has over 10 years of teaching experience as Asst. Prof. in Department of Computer Science, Shaheed Rajguru College of Applied Sciences for women (University of Delhi). She is currently doing her research in the area of Cloud Computing.

Ms. Urmil Bharti is working as an Assistant Professor in Department of Computer Science, Shaheed Rajguru College of Applied Sciences for Women, University of Delhi. She has around 10 years of teaching experience and is presently doing her research in the area of Cloud Computing. She has also worked in Industry for more than 10 years and her last designation was Sr. Quality Analyst.

Ms. Rupali Ahuja is an M.C.A from Kurukshetra University. She is currently working as an Assistant Professor in Maitreyi College, University of Delhi. She has 14 years of teaching experience. Her research interest includes Cloud Computing and Big data tools and technologies.

Dr. Anita Goel is an Associate Professor in Computer Science, Dyal Singh College, University of Delhi, India. She has an experience of more than 28 years. Dr. Goel has guided several students for their doctoral studies in the area of web applications, cloud computing and education management. She has authored several books in Computer Science with a leading International publisher.

Page 19: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

Data Lake: A Next Generation Data Storage System in Big Data Analytics

Remya Sasidharan Panicker Asst. Professor, MET’s Institute of Engineering Nasik Maharashtra

The traditional big data analytics makes it compulsory to store data in some structured format before analysis is done. Even storing data in warehouse require numerous preprocessing activities. This creates a limitation in an environment where new data are added frequently, like social networking analytics where big data analytics need to be performed on frequently changing data. But storing the data in warehouse or in any particular schema becomes a tedious job. Data Lake provides a solution for this problem.

What is Data Lake?A data lake is a storage repository

that stores vast data in flat architecture. It stores data in its native format. There is no preprocessing of the data before storing it in Data Lake. Data is stored in raw format in Data Lake, each data element is assigned a unique identifier and given a tag with related metatags. Queries can be fired on these data lakes, it results in small data sets which are later on analyzed and structured accordingly. This makes it possible to stores various format of data under one single repository. The content of the Data Lake need not to be converted in a particular schema, it can be done when they are queried. The data lake performs the extract, load and transforms (ELT) methods to accumulate and integrate data instead of traditional ETL approach. It follows a “Schema on Read” approach means when data is fetched at that time it is structured or transformed. The data lake allows us to store structured data from relational databases (tables, rows, columns), semi-structured data (CSV,XML, JSON,logs).It also includes unstructured data (PDF, documents, emails) and images, audio, video etc., hence creating a centralized data store including all forms of data.Following are the key aspects of Data Lake: � Harness raw data at low cost.

� Multiple type of data (text, audio, video, feeds, doc, xml etc.) under one infrastructure.

� No need of data transformation at load time.

� To handle single subject analytics

� Perform real-time big data analytics efficiently

Comparison of Data Lake with Data Warehouse:

In today’s ever evolving operational environment where advanced analytics is employed, data warehousing is facing the challenges in dealing with the velocity of the data that arrives. Data Lake has come up with the solution for this. Following are the difference between Data Warehouse and Data Lake.1. Schema: In Data Warehouse

schema is defined before data is stored and in Data Lake schema is defined after data is stored. Hence Data Lake provides Agility and can work properly even if some data is unavailable.

2. Scale: Data Warehouse, if scaled then cost will increase (for preprocessing activities like cleaning, transforming etc). But Data Lakes provides scalable data repository with low cost.

3. Access Methods: Data Warehouse can be accessed with standard BI tools or by standard SQL. While Data Lake can be accessed by user defined programs.

4. Form of Data: In Data Warehouse

data is cleaned i.e. all the activities like cleaning, smoothing, clustering etc. are performed and then data is stored. While in Data Lake data is raw.

5. Cost and Efficiency: Data Warehouses are costlier to implement while Data Lake is a low cost storage.

Building a Data Lake Infrastructure:Companies and Organizations

can build Data Lake as per their requirements. Following stages represents how Data lake infrastructure is build: � Stage 1: Creating a place where

data is gathered on a large scale. Probably Hadoop provides the solution for this.

� Stage 2: Designing a tool that can retrieve data and transform the required data i.e. building analytic environment.

� Stage 3: Delivering the data and analytics to more people as possible.

� Stage 4: Providing enterprise features like governance, auditing and security

Factors to be considered for designing a Data Lake:

Creating data lakes does not merely involve loading data into a repository but requires many factors to be considered. While designing or creating a data lake

TECHNICAL TRENDS

19 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 20: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

we have to consider following things:1. Indexing: Whatever data we are

loading in the Data Lake has to be given a unique identifier. The index of these identifiers has to be maintained as a centralized index.

2. Authorization: While designing Data Lake, the grant access to the data has to be maintained so as to prevent unauthorized access.

3. Data Protection: The Data Lake designer should incorporate data security and data availability under circumstances like system failure or any other disaster.

4. Agile Analytics: Data Lake should

allow multiple analytical models and approaches.

Advantages of Data Lake:The Data Lake allows us to

integrate and store all useful data under single infrastructure. It is useful in environment where dynamic data has to be analyzed. It is useful in real-time analysis of streaming data. It is one of the smart methods of data analysis which give a quick insight of the analytics. It enhances the ability to analyze data in real time environment. It can be widely and efficiently utilized in social networking big data analytics. Data Lakes allows us to perform big data analysis on different types and kinds of

data. Following are the advantages of Data Lake.1. Low Cost2. Fidelity (Unchanged Data)3. Accurate Results since updated

data is available4. Easily Accessible5. Runtime Binding

Conclusion:Data Lake gives a new way to

manage new types of data and use the data as and when required. It will be very beneficial when faster results of analytics are expected. Nowadays faster result are expected, Data Lake will provide a platform where accelerating result can be expected. More insight of data can be gained using Data LakeReferences:1. Data Lake With Data Gravity Pull, By

Hassan Alrehamy ,Coral Walker published at 2015 IEEE Fifth International Conference on Big Data and Cloud Computing

2. Article: Forget data warehousing, it’s ‘data lakes’ now, By Digital News Asia | Mar 31, 2015

3. White Paper: How to Design a Successful Data Lake, By www.knowledgent.com

n

TECHNICAL TRENDS

The concept can be compared to a water body, a lake, where water flows in, fitting up a reservoir and flows out.

STRUCTURED DATA1. Information in rows and columns2. Early ordered and processed with

data mining tools

The outflow of water is the analyzed data. Through this process, you are able to “sift” through all the data quickly to gain key business insights.

STRUCTURED DATA1. Raw, unorganised data2. Emails3. PDF files4. Images, video and audio5. Social media tools

The incoming flow represents multiple raw data archives ranging from emails, spreadsheets, social media content, etc.

The reservoir of water is a dataset, where you run analytics on all the data.

HOW DO DATA LAKES WORK?

34

1

2

Fig. 1: How Data Lake Work (Source: Reference no. 3)

About the Author

Prof. Remya Sasidharan Panicker [CSI Membership - N1106676] is currently working as an Asst. Professor in MET’s Institute of Engineering Nasik Maharashtra. She is an active CSI Member since 2012. Her area of work include Big Data Analytics and Data Mining. She can be reached at [email protected]

www.csi-india.org 20

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 21: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

21 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Sentiment and Emotion Analysis of Tweets Regarding Demonetisation

Pushkal Agarwal Nirmal Kumar S. (Final year, B.Tech), Lokesh Todwal, (Pre-final Year, B.Tech.) (Assistant Professor), Sakthi Balan M., (Associate Professor) Dept. of CS and Engg., The LNM Inst. of IF, Jaipur, India Dept. of CS and Engg., The LNM Inst. of IF, Jaipur, India

Lokesh Todwal Sakthi Balan M. B.Tech in Computer Science Engineering at LNMIIT, Jaipur Associate Prof. at Dept. of CS and Engg. at LNMIIT, Jaipur

IntroductionAnalyzing sentiments (positive,

negative or neutral) and emotions (anger, sadness, joy, etc.) from the text provided by a group of people corresponding to a certain event can carve out the niche of providing insights in text analytics, especially when text data is large. On one hand sentiments shows the agreement, disagreement or neutrality among the masses whereas on the other hand emotions coming out from text cluster the reactions of the group.

Retrieving and analyzing real time data from online social networks like Twitter is gaining popularity because the dynamic trends and opinions are immediately updated on such platforms. Examples of unpredictable events like- Demonetisation, Jallikattu, etc. and predictable events like Diwali, or a new movie release can easily be observed on Twitter. There can be 0.1 to 1 million tweets with variety of opinions about any of aforementioned events. Retrieving (in real time) and processing such huge amount of data is not much scalable. This makes big data analysis

relevant to this problem. In following sections we will discuss the sentiment and emotions on Demonetisation, which happened in India recently (8th Nov’16 - 30th Dec’16). Observing twitter trends we selected  #BlackMoney and #Demonetisation of the November and January periods respectively. The center idea is same in both these hashtags, just in lieu of #BlackMoney, later #Demonetisation was coined and was mostly used by all for this phenomenon. We will also explore the possible ways to compute large data for

these algorithms using R programming libraries. Dataset

We collected data for #BlackMoney from 8th November 2016 to 20th November 2016 and #Demonetisation from 1st January 2017 to 18th January 2017.   We used “twitterR” library defined for using twitter API in R programming. Due to twitter API limitations we fetched the data in slices and combined it hence after. We collected around 3.5 lakh tweets with #BlackMoney posted by around 135000 users and around 2.7 lakh tweets with #Demonetisation by around 46000 users. Daily trends of these two hashtags can be seen in Fig. 1 and Fig. 2. In these “Unique Users” means the number of users who tweeted in that time slice and “Unique Tweets” is non retweet posts (by adding filter as: Tweets[IsRetweet==F] in R programming data frame). Data Preprocessing

Along with tweets’ text, the twitter API provides metadata like RetweetCount, FavoriteCount, IsRetwet, TimeStamp, UserName etc. We selected the text and refined it using text mining pre-processing techniques.

R E S E A R C H F RO N T

150000

100000

50000

0

04-1

1-20

16

05-1

1-20

16

06-1

1-20

16

07-1

1-20

16

08-1

1-20

16

09-1

1-20

16

10-1

1-20

16

11-1

1-20

16

12-1

1-20

16

13-1

1-20

16

14-1

1-20

16

15-1

1-20

16

16-1

1-20

16

17-1

1-20

16

18-1

1-20

16

19-1

1-20

16

20-1

1-20

16

21-1

1-20

16

22-1

1-20

16

23-1

1-20

16

24-1

1-20

16

25-1

1-20

16

26-1

1-20

16

27-1

1-20

16

28-1

1-20

16

29-1

1-20

16

30-1

1-20

16

01-1

2-20

16

02-1

2-20

16

03-1

2-20

16

04-1

2-20

16

05-1

2-20

16

06-1

2-20

16

Date

VariableTotal TweetsUnique TweetsUnique Users

Fig. 1 : Daily Trends of #BlackMoney Data

Coun

t

15000

10000

5000

0

01-0

1-20

17

02-0

1-20

17

03-0

1-20

17

04-0

1-20

17

05-0

1-20

17

06-0

1-20

17

07-0

1-20

17

08-0

1-20

17

09-0

1-20

17

10-0

1-20

17

11-0

1-20

17

12-0

1-20

17

13-0

1-20

17

14-0

1-20

17

15-0

1-20

17

16-0

1-20

17

17-0

1-20

17

18-0

1-20

17

Date

VariableTotal TweetsUnique TweetsUnique Users

Fig. 2 : Daily Trends of #Demonetisation Data

Page 22: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

www.csi-india.org 22

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

For some preliminary analysis we generated word clouds from 10000 randomly selected tweets in both the data sample. The preprocessing algorithm and WordClouds based on number of time a dominant word occur in the respective texts (#BlackMoney and #Demonetization omitted for better view of other words) are given table 1.Sentiment and Emotion Analysis

After cleaning the text and performing preliminary analysis we did qualitative analysis using “Sentiment” library. Using the aforementioned library’s “classify_polarity” and “classify_emotions” functions we can

get the sentiments and emotions of corresponding to each tweet passed as vector. Since the data of #BlackMoney of a single day (Example: 8th November 2016) is itself large we took a random sample and selected 10% representative tweets. Aggregating the day-wise count from the taken data we generated Fig.3 and Fig.4 to visualize sentiments and emotions categories of pre and post Demonetisation phenomena on twitter.

One can infer from Fig.3 that Positive is the dominant polarity in both the periods. This means that majority of the people tweeted in favor of Demonetisation. The second dominant

polarity is negative, but the number of tweets with negative polarity is less than half of the tweets with positive polarity. Interestingly, in post event period i.e. Fig.3-(b) increase in positivity is observed.

Fig. 4 shows different emotions that were proposed by Paul Ekman and is widely accepted as the basic human emotions, one can see that the dominant emotion is joy and the second dominant emotion is anger in (a) but sadness largely over-shoot anger in (b) taking the second slot. The decrease in Joy% is compensated with increase in the sadness% in later period with anger being the same as previous.

Slicing the data for further analysis we come up with cluster level analyses and found out that with decrease in positive, negative and neutral joys considerably large increase is observed negative sadness. A bit of positive and negative surprise is also increased. These all inferences can be drawn from Fig.5.

ConclusionData analysis carves out

the nice problems in business, recommendations, political issues and many more. Real time text data is increasing exponentially with increase in social media platforms. Twitter platform is commonly used by the people to express their opinions nowadays. Influencers like- media channels, political persons or superstars also drive the groups’ opinion using social media. In our analysis of the trends of

R E S E A R C H F RO N T

Table 1: Preprocessing algorithm and visualization of tweets words

Get 10% tweets of total tweets

Remove digits, special character and stopwords

Covert text into lower text

Strip white spaces and stem the text

Convert corpora into TermDocumentMatrix

Plot wordcloud

(a) Preprocessing Steps (b) WordCloud On #BlackMoney (c) WordCloud On#Demonetisation

Negative28%

Neutral13%

Positive59%

(a) #BlackMoney (8th-20th Nov'16)

Negative28%

Neutral7%

Positive65%

(b) #Demonetisation (1st-18th Jan'17)

anger11%

disgust

2%

fear4%

joy71%

sadness9%

surprise3%

(a) #BlackMoney (8th-20th Nov'16)

anger11%

disgust

1% fear3%

joy52%

sadness27%

surprise6%

(b) #Demonetisation (1st-18th Jan'17)

Fig. 3 : (a) and (b) Showing Sentiments in two different periods.

Fig. 4 : (a) and (b) Showing Emotions in two different periods.

Page 23: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

About the AuthorsMr. Pushkal Agarwal (CSI Membership - 01377314) is pursuing B.Tech in Computer Science Engineering at LNMIIT, Jaipur. He is currently pursuing semester long internship at Nielsen. He has presented a paper in the 50th Annual Convention of CSI under “Big Data Analytics track”. He has been awarded as “Best Student Activist” by CSI for 2016-17 year. He can be reached at [email protected]

Mr. Lokesh Todwal is pursuing B.Tech in Computer Science Engineering at LNMIIT, Jaipur. He is in pre-final year. His interest lies in the area of Data Science and Machine Learning. He can be reached at [email protected]

Mr. Nirmal Kumar S. is currently working as Assistant Professor at Department of Computer Science and Engg. at LNMIIT, Jaipur. His interest lies in the areas of Social Network Analysis and Cognitive Modelling. He can be reached at [email protected]

Dr. Sakthi Balan M. (CSI Membership-F8001751) is currently working as Associate Prof. at Department of Computer Science and Engineering at LNMIIT, Jaipur. His interest lies in the areas of Data Analytics, Text analytics, Cognition and emotion modelling, Biocomputing, Formal Language and Automata Theory. He has 8 Journal/Special Volumes Publications and more than 30 Conference Publications. He can be reached at [email protected]

23 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

opinions expressed in twitter during and after the Demonetisation in India, we analyzed the change in emotions observed among the masses. Although, the top sentiment was positive in both the periods but post-facto the joy was converted to the sadness. Negative sadness cluster became dominant in January period. This may reveal that people were mostly sad and hope with the policy ended up with disappointment due to more people sufferings. There are numerous was to visualize data word could is one way of it. Research in big data field is need of the hour now.

References[1] h t t ps : / / c ra n . r-pro j ec t .o rg /we b/

packages/twitteR/twitteR.pdf[2] https://cran.r-project.org/src/contrib/

Archive/sentiment/[3] Ekman, Paul. “An argument for basic

emotions.”  Cognition & emotion  6.3-4 (1992): 169-200.

[4] Shantanu Biswas, Nirmal Kumar Sivaraman, Sakthi Balan M, Pushkal Agarwal, Qualitative Analysis of Social Synchrony, Book of Abstracts and Papers OR and Ethics, 28th EURO Conference Operational Research, 2016.

[5] Buettner, Ricardo. “Predicting user behavior in electronic markets based on personality-mining in large online

social networks.”  Electronic Markets (2016): 1-19.

[6] Hennig-Thurau, Thorsten, et al. “The impact of new media on customer relationships.”Journal of service research 13.3 (2010): 311-330.

[7] Munmun De Choudhury, Hari Sundaram, Ajita John, and Dorée Duncan Seligmann. 2009. Social Synchrony: Predicting Mimicry of User Actions in Online Social Media. In Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04 (CSE ’09). IEEE Computer Society, Washington, DC, USA, 151–158. DOI:http://dx.doi.org/10.1109/CSE.2009.439]

n

R E S E A R C H F RO N T

0 5

10 15 20 25 30 35

% si

ze in

#De

mon

etis

atio

n

0 5

10 15 20 25 30 35 40

% si

ze in

#Bl

ackM

oney

Fig. 5 : Cluster Level analysis on Emotions and Sentiments in two different period.

Page 24: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

Enhanced Protection for Big Data using Intrusion Kill Chain and Data Science

Abdul Khadar Aa* Dr. Shrishail Mathb H Srinivas Murthyc

aAsst. Professor, Dept. of Information & Engg., bProf.& HOD, Dept. of Information & Engg., cAssociate Professor, Department of SJCIT, Chikkaballapur. Dayanand Sagar Academy of Technology & Information & Engg., SJCIT, Chikkaballapur. Mgmt., Kanakapur Road, Bangalore.

Intrusion kill chain has been the important tool for identifying and tackling the advanced persistent threats. The Intrusion Kill Chain(IKC) processes the activities of the Advanced Persistent Threats(APT) from the day zero. These activities of the APTs are better analyzed under the control of data science. Data science and the big data are used technically harnessing the ability of data for new imminent. Data science controlled intrusion kill chain can better protect big data from being attacked by the APTs.

Keywords: Intrusion Kill Chain, Advanced Persistent Threats, Data Science, Hadoop.

1. IntroductionData is produced intensively cheap

and ubiquitous. Organizations are now digitizing analog content that was created over centuries and collecting numerous new types of data from web logs, mobile devices, sensors, instruments, and transactions. IBM estimates that 90 percent of the data in the world today has been created in the past two years.

At the same time, new technologies are emerging to organize and make sense of this avalanche of data. We can now identify patterns and regularities in data of all sorts that allow us to advance scholarship, improve the human condition, and create commercial and social value. The rise of “big data” has the potential to deepen our understanding of phenomena ranging from physical and biological systems to human social and economic behavior.

The organizations are under critical threat on behalf of their produced data. The data becomes vulnerable if the data is new and used for forensic investigation. This data can be made more secure by data science process. This paper proposes a framework that embeds the functionality of intrusion kill chain with the theories of data science to combat the advanced persistent threats on the forensic data of an organization.

2. Advanced Persistent ThreatsAPTs are t h e

attacks by the a d v a n c e d technical persons using advanced technologies and tools to gain the access to a specific target organization main processing system through which they could possess a long term accommodation within the target organization so as to attack the forensic data of the organization. They follow some specific attack path as belowPhase 1: Introduction & the “Recon” phase

The attackers invest heavily in researching their target organization so that the attacker can appear as part of the organization and earn the trust of victims. Attackers even hire linguists to analyze the lingo that is specific to the organization so they can craft attack emails that sound like they are from internal sources.Phase 2: The brake-in phase

Attacker are able to trick end users into downloading tainted PDFs or GIFs, or have them launch other malicious activity. They would use application white listing and thin client virtualization. Poisoned watering hole attacks work better here for the attackers. The role of IP address reputation services in helping to detect malicious activity during the break-in phase is critical

here.Phase 3: The “command and control” phase

In this phase the attacker gains the control and able to command various attacks. At this phase the attacker partially owns the authority of the organization. He can make the network admin sense his activity as not an attack but some internal process.Phase 4: The lateral move phase

Here the attacker leverages his initial break-in to compromise more important machines on the network. The attacker checks-in the machine on the network of the organization from which he could gain the important link to the entire network. The checking out of the attacker will be clue less to the network admin. The attacker will spend a complete system to make sure that victim does not get any hint of the attack making him to believe that the attacker’s system is an internal system or part of the organization.3. Intrusion Kill Chain

Intrusion kill chain model proposed by Eric M. Hutchins, Michael J. Cloppert, and Rohan M. Amin from Lockheed Martin Corporation. It is a series of phases that an attacker inescapably follows to model and carry out his intrusion. The intrusion kill chain phases are as follows:

S E C U R I T Y C O R N E R

www.csi-india.org 24

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 25: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

Information Gathering which involves selection of targets, collecting information about the target, for example searching emails, technologies the target uses, people on which their target trusts.

Weaponization is coupling of malicious code with undoubted dispatch-able files such as pdfs, docs, ppts and etc.

Exploitation is the third phase the attacker delivers the weaponized file to its target environment. The most common delivery vectors are email, drive by download through a website link or through USB removable device. Once the malicious weaponized file gets successfully delivered in its target environment, the use of the vulnerability of the target system is taken to execute its malicious code.

Installation is the most important and crucial phase of the Kill chain. Here the malicious code gets installed inside the target environment.

Command and control (C2), in this phase the installed malicious code generates a communication channel to control its execution and continue its actions to achieve its target.

Action is the last phase of the kill chain in which adversary achieves its objectives by performing activities like data exfiltration. Defenders can be confident that adversary achieves its goal only after passing through all these phases.

KILL

CHA

IN P

ATTE

RN

Information Gathering

Weaponzation

Delivery

Exploitation

Installation

Command & Control

Action

DET

ECT

ION

INTR

USI

ON

CON

STR

UCT

ION

Fig. 1 : Intrusion Kill Chain Pattern

4. Data ScienceThe field of data science is emerging

at the intersection of the fields of social science and statistics, information and computer science, and design. The idea is to bring these disciplines together and to provide the organizations a platform that can better safeguard their data. Data science follows some theories. They are database theory, agile manifesto and spiral dynamics.

Database Theory : Database theory is about organizing data and organizing it in a way that makes storing and retrieving it efficient. Data can be categorized into objects, objects can be put into collections and objects and collections can have relationships between each other and themselves.

Agile Manifesto: The Agile Manifesto is a set of principles that ensures high quality outputs in environments subject to high levels of change and ambiguity. Agile methods overcome rapid changes and ambiguity through adopting an iterative development process. The Agile Manifesto looks to remove all cultural barriers between developer, client and end user and focuses on using the latest technology to making things simple but not simpler. All things change and the longer it takes to test the solution in the live environment the higher the risk of failure.

Spiral Dynamics: Spiral Dynamics is a theory of human development and behavior and explains why humans do what we do. It explains the psychology behind why we get out of bed in the morning, why we feel compelled to create things and why we seek to better ourselves and better serve our loved ones. The theory talks about two mental states, one of “facts” and one of “values”. Facts are what we believe. Our beliefs are based on the knowledge we currently have and the environment we are currently in.

Values are what we desire. Our desires are driven by our intentions and/or concerns which are also based on the knowledge we currently have and the environment we are currently in.

There are three components involved in data science organizing, packaging and delivering data (the OPD of data). Organizing is where the physical location and structure of the data is

planned and executed. Organizing data involves the physical storage and format of data and incorporated best practices in data management.

Packaging is where the prototypes are build, the statistics is performed and the visualization is created. Packaging data involves logically manipulating and joining the underlying raw data into a new representation and package.

Big Data

Organizing Delivery

Packaging

Fig. 2 : Components of Data Science

Delivering is where the story gets told and the value is obtained. Delivering data involves ensuring that the message the data has is being accessed by those that need to hear it. However what separates data science from all other existing roles is that they also need to have a continual awareness of What, How, Who and Why.5. Related Work

Jisang Kim [1], proposes and verifies the algorithm to detect the advanced persistent threat early through real-time network monitoring and combinatorial analysis of big data log. Moreover, provide result tested through the analysis in the actual networks of the deduced algorithms. Nurul Nuha Abdul Molok [2], case study explores social media as the most challenging information leakage channel and its link to APT attacks. It explains how this phenomenon happens through the understanding of the underlying factors of information leakage via OSN that facilitates the suggestion to combat this problem. Guangmingzi Yang [3], says that the traditional security system is not work well when facing the APT attacks while the new technologies are not complete yet. Through analyzing the life cycle of APT, locating the key point in the attacking process and discovering the efficient defense techniques, one can minimum the lost and prevent a part of

S E C U R I T Y C O R N E R

25 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 26: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

the APT. The next step is to combine the new technologies, strategies, and build a perfect framework for the security system which can protect our data from the APT attacks. William Hurst [4] discusses was put forward on the challenges of big data analytics in the growing digital world. As the amount of data being created every day increases, uncovering information in significantly large datasets is becoming more of a challenge. Factors, such as information security, digital threats and information sharing, require the use of big data analytics to uncover hidden information and enhance the services provided. In terms of security, improved support can be provided, as well as cost efficiency. Processing large datasets, using big data evaluation techniques, to uncover anomalous behaviors in a system, can enhance existing security methods. In the Internet of Things, big data analytics has benefits for the well-being of people and helps with the evolution of integrated digital devices. This is particularly the case in healthcare, where it plays an important role in the early detection of degenerative illnesses. Bhawna Gupta [5], this work proposes the use of Big Data Analytics for analyzing the enterprise data. It discussed a

framework based on Hadoop for dealing the targeted attacks using Big Data Security Analytics. One can manage the Big Data characteristics of large volumes of enterprise data. If enterprise has an unmet business need for strategic decision making with a high degree of processing, a Revolution Analytics and Hadoop combination offers significant opportunity to gain advantage.6. Proposed Framework

The paper proposes a framework where in which the sensitive date of the organization in formulated under the data science with a proviso of intrusion kill chain security to tackle the advanced persistent threats on the sensitive data. The following figure depicts the proposed framework.

The framework consists of four main modules. The logging module is the module the log data from the global platform possessing host intrusion detection systems (HIDS), the network

intrusion detection systems (NIDS), the network server and the mail servers, etc., that the organization is managing for its bid data protection. This data as originated is stored on to the Hadoop cluster intended to store terra bytes of big data. Once the big data is available the data science applies its theories on the big data to safeguard it from being accessed and threatened by the adversaries. Data science conceals the sensitive big data using its three OPD components organizing, packaging and delivery. Data science since provides the secure and safe method of storing and maintaining the sensitive big data which is too good for the advanced persistent threats to breach the organization’s internal security of the data. Moreover the intrusion kill chain would be in the monitor mode for every data being updated or included into the Hadoop cluster. Here it verifies the sensitive big data at each and every stage for any unusual activity or unpracticed behavior of the user or any process or file within the network of the organization. And if it finds any such malicious activity within vicinity of the sensitive big data it follows it through the well known and comprehensive phases called information gathering, weaponization, delivery, exploitation, installation, control & command and action that the APT is generally goes through to achieve

the target. The intrusion kill chain (IKC) monitoring the sensitive big data which is pre guarded by the data science components and theories provides the double protection for the data under the menace of APTs. The IKC and the data science communicate with the Hadoop cluster using the Hadoop pig, hive and flume. IKC is completely modeled under the Map-Reduce parallel and distributed programming environment for efficient and cost effective processing. The proposed framework is analyzed to enhance the efficiency of the IKC using data science to combat the APT on the organization’s sensitive big data.Conclusion

As the data originates from any organization, it breaths in the menace of advanced persistent threats that consistently working and trying to invade into the organization’s security bounds. The sensitive big data is initially pre protected under the data science and then safe guarded by the intrusion kill chain’s monitoring round the clock the data becomes more secure. The proposed framework provides the double layer protection for the data so that APT cannot breach into the organization. There is a good level of scope for the enhancement of the framework as the APTs are bound to attack. The framework as uses the data

S E C U R I T Y C O R N E R

www.csi-india.org 26

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

HIDS NIDS SERvER

HADOOP CLUSTER

LOG MANAGEMENT

MODULE

HIvEData Science Module

Packaging

Organising

Delivery

IG

W

D

E

I

C2

A

IG

W

D

E

I

C2

A

BIG DATA

KILL

CHA

IN P

ATTE

RN

LOGGING MODULE

SYSTEM ADMINISTRATOR

INTR

USIO

N CO

NSTR

UCTIO

NDE

TECT

- IO

NSY

NTHE

SIS

PIG

MAIL

Fig. 3 : Proposed Framework

Page 27: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

science at its core the analysis with the suggested cost effective methods of storing and delivering the big data are an added advantage.References[1] Advanced Science and Technology

Letters Vol.29 (SecTech 2013), pp.30-36 http://dx.doi.org/10.14257/astl.2013.29.06 “Detection of Advanced Persistent Threat by Analyzing the Big Data Log” Jisang Kim.

[2] Originally published in the Proceedings of the 8th Australian Information Security Management Conference,

Edith Cowan University, Perth Western Australia, 30th November 2010 “Information Leakage through Online Social Networking: Opening the Doorway for Advanced Persistence Threats” Nurul Nuha Abdul Molok.

[3] Journal of Chemical and Pharmaceutical Research, 2014, 6(7):572-576 “The prevent of advanced persistent threat” Guangmingzi Yang.

[4] Journal of Computer Sciences and Applications, 2015, Vol. 3, No. 3A, 1-9 Available online at http://pubs.sciepub.com/jcsa/3/3A/1©Science and Education Publishing DOI:10.12691/

jcsa-3-3A-1 “Guest Editorial Special Issue on: Big Data Analytics in Intelligent Systems” William Hurst.

[5] Bhawna Gupta et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 3867-3870 “Big Data Analytics with Hadoop to analyze Targeted Attacks on Enterprise Data”.

[6] CSI Communications | June 2015 | 2 www.csi-india.org Data Science – Data, Tools & Technologies Hardik A Gohel Assistant Professor, AITS, Rajkot

n

S E C U R I T Y C O R N E R

order to attract papers from overseas. This will put CSI Transactions on ICT in the international map of reputed publications. Springer, with its extensive reach, will provide us with assistance in this regard.

3. Start and enlarge the subscription base. This should get to the desk of all researchers based on their involvement in the six topics we have taken up, Research Directors who are keen to identify the trends, Students who are getting into the profession and policy makers who are charged with providing the drivers of growth for this area of education and specialization. This expanding base will attract more people to join the professional activities of Computer Society of India. We will focus on getting Institutional Subscribers from amongst IT companies, User organizations and Research and Academic Institutions.

4. Promote this journal among companies - both IT (large, medium and small) and User Organizations- for financially supporting this through large scale subscriptions for their employees. This will be a great contribution from the companies in India to help in the establishment and sustenance of a high quality journal on Computing from India.

5. Enlarge the subscription base internationally. Springer will provide help in this through their reach.

6. Work on bringing theme based Transactions each quarter.

We expect to organize Round Tables with participation of leaders from Industry, Academia, Research Institutions and Government to identify themes relating to emerging technologies and relevant applications.

7. Get far wider participation for Papers. This will mean greater in roads into Research Departments of IT Companies, Educational Institutions, Management Researchers and Companies focusing on practical applications of ICT. We are in the process of establishing CSI Transactions on ICT as the publication for the authors to get the benefit from.

In our journey so far, we have benefitted immensely by the inputs we got from our eminent body of Advisors, who make up our Advisory Council. Springer has provided its platform and been responsive to our needs. Our editors have sifted through papers and reviewed them diligently. To all these organizations and people, we owe her heartfelt thanks.In Conclusion

The authors would like to thank the members of the Computer Society of India for their continued involvement in the development of CSI Transactions on ICT. The CSI Transactions on ICT is creating a strong signal in India that the Computer Society of India is dedicated to the goal of developing the journal, and enhancing it’s contents, quality and reputation. CSI believes that it is our contribution to Make in India that is Made in India with local relevance and global quality.

n

Contd. from pg.13

27 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 28: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

MiDeSH: Missile Decision Support System C. R. Suthikshn Kumar

Department of Computer Science and Engineering, Defence Institute of Advanced Technology (DIAT), Pune 411025, India. Email: [email protected]

DRDO has developed successfully, missiles such as Agni, Prithvi, Akash etc. These are huge projects which have provided us significant assurance on our defence capabilities. However, given a particular conflict during war, an appropriate choice of missile to be used requires very important decision making capability. Accurate decisions can improve the usefulness of Missile inventories. A Missile Decision Support System (MiDeSH) aims at providing concise and correct information on missiles to the Military commanders during war. The MiDeSH helps solve the information overload and arriving at quick decisions in launching accurate missile attacks. The MiDeSH can be queried for information on missiles and their deployment, target information, decision on appropriate choice of missiles for specific situations etc. MiDeSH will have graphical user interface and client server architecture. It will have algorithms to implement the decision making in real time.

I. IntroductionMissiles are becoming part and

parcel of Defence Forces. They play a vital role during the war time and also serve as deterrence during peacetime. The role played by Patriot Missiles during Gulf war can be cited here[1]. India is at the forefront of developing missiles [2,4]. DRDO has a grand plan for Guided missile development in India[5]. There are variety of missiles which have specific purpose and range. Also, each missile is very expensive to manufacture.

For effective use of missiles during any war, there should be proper selection and matching of missiles for targets. Military commanders need to have all the data on missile deployments, range and capabilities, target matching information. Decisions to launch a missile attack needs to be backed by justification on the selection of a particular missile. The Military Commander may have information on the target to be attacked, but arbitrary choice of missile can result in missing the target. The Decision Support Systems will come into practice which provide proper supporting information in helping the human decision making.

Decision Support systems(DSS) have been developed and deployed for various domains[7] such as Medical

and Health, Production Management, Human resource Management, Defence, Operations, Education etc. The use of DSS for Missile attacks is proposed in this paper. MiDeSH is a Missile Decision Support System envisaged to encapsulate the knowledge about missiles and their capabilities, target information, missile deployments, etc. MiDeSH supports the decisions through exploration of alternatives and ranking the decisions using intelligent decision making algorithm. The Military commander during war time, can utilize the MiDeSH in various ways such as information gathering

and analysis, choice and selection of missiles, etc. MiDeSH provides quick and accurate decisions based on the scenario presented to it in the form of target information. MiDeSH has specific details of Indian missiles and also their deployments, typical targets.

This paper is structured as follows: In the next section ( section II), the current scenario in Indian Missile program is discussed. In section III, we briefly review the Decision Support System(DSS). The proposed architecture and design of MiDeSH are presented in Section IV. Summary and Conclusions are presented in section V.

A R T I C L E

Table 1: Indian MissilesName Type RangePrithvi-I Ballistic 150 kmPrithvi-II Ballistic 350 kmDhanush Ballistic 350 kmAgni-I Ballistic 700 kmAgni-II Ballistic 2,000 km Agni-III Ballistic 3,500 kmAgni-IV Ballistic 4,000 kmAgni-V Ballistic +5,000 kmPrahaar Ballistic 150 kmPragati Ballistic* 60 km - 170 km*K-15 SLBM(c) 750 km*K-4 SLBM* 3,500 km*BrahMos Cruise 290 kmNirbhay Cruise 1,000 km

www.csi-india.org 28

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 29: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

II. Indian Missile ScenarioDRDO is the central R&D

organization that develops India’s missile program. There are plethora of missiles being developed. The technologies of these missiles are transferred to Public and Private sector companies for further production. The following table 1, shows the various missiles being developed in India.

Integrated Guided Missile Development Program ( IGMDP) is the flagship program for development of Indian Missiles. It has four sub-programs i.e,, � Surface to Surface:

¬ Short Range (Prithvi) ¬ Long Range (Agni)

� Surface to Air: ¬ Medium Range (Akash) ¬ Short Range (Trishul)

� Anti-tank: ¬ Nag

� Cruise : ¬ Nirbhay

Agni Missiles are ballistic medium to intercontinental missiles capable of carrying nuclear warheads. These missiles have target range of 700Km to 10,000km. Strategic Forces Command(SFC) of Indian Army is responsible for command and control of Indian missiles such as Agni and Prithvi. Bramhos missile is supersonic cruise missile with a range of 290km[15]. Brahmos missiles are jointly developed by co-operation between India and Russia.III. Decision Support System

Decision Support System (DSS) is an interactive, flexible and adaptable computer based information system for improved decision making[7]. It analyses the data input and data stored, provides a visualization which is very intuitive, and also allows the decision maker’s own insights. The DSS has its components such as data and information, computer system, DBMS and model, graphical user interface(GUI).

DSS is an interactive computer system, which is able to adapt to new situations for improved decisions. There are several types of DSS such as Model based DSS, Knowledge driven DSS, etc, Some of the benefits of

DSS are : increased speed, increased flexibility, accurate decision making etc. DSS have many applications and have been extensively applied for defense related problems[9, 10, 11]. Iv. MiDeSH: Missile Decision Support

SystemThe Missile Decision Support

System( MiDeSH) is an advanced DSS with specific application for selecting missiles for assigning to targets. The Military commanders during the war are overloaded with information. The MiDeSH aids the commanders in providing accurate information on missiles and providing alternatives based on the information provided such as co-ordinates and details of targets to be destroyed. MiDeSH block diagram is shown as follows:

RealTime Update

MiDeSH Engine DatabaseGUI

GUI

Fig. 1 : MiDeSH block diagram

The main blocks of MiDeSH with brief explanation are as follows: � MiDeSH Engine: This is the core

intelligent DSS. MiDeSH engine uses expert rule base to analyse the data from database, GIS and real time updates to generate decision alternatives in real time.

� Database: The database houses all the information required i.e., Missiles, their deployments, their destruction capabilities, target information, war heads etc.

� Real-time updates: MiDeSH is connected to real time updates on missile deployments, targets destroyed, etc.

� GUI: The graphical user interface for interacting with the MiDeSH. The user can enter data, view the visualization of data, review the progress of the decision making, review the decisions taken etc.

� GIS: This block will provide data on geographical locations, terrains, target co-ordinates etc.

A typical decision chain in a war can be illustrated with data in the following table.

Sl. No.

Attribute Details

1 Target 500 Km North West2 Co-ordinates (latitude, longitude)3 Area 100 Sq M4 Terrain Runway/ Radar

Facility/ Ammunition Storage/ Ship etc

5 Required Destruction

85%, 90%, 95%

6 Time frame Immediate, Urgent, Deferred

7 Weather Cloudy, clear, rainy 8 Detonation Air, Ground, Sea9 Preferred

Launch sitesArea1, Area2, Ship1, Ship2, Submarine1, Submarine2, Island1, Island 2 etc

The Military commander feeds this data into the MiDeSH. MiDeSH analyzes this data and also information from its database about the missiles and targets to generate the decisions. MiDeSH provides decision alternatives which can be explored by the Military commander to take the final decision.

The MiDeSH engine uses Multi-criteria Decision Making (MCDM) which involves six components[12]: � Goals � Decision makers � Evaluation criteria (objectives and

attributes) � Decision Alternatives � Decision Environment � Outcomes

The generalized statement of Missile Decision Support System is as follows: Given a set of missiles M = { m1, m2, m3…mk}, and set of targets T = { t1, t2, t3, … tl}, the missiles need to be assigned to targets such that the maximum number of targets are attacked while the overall cost of the attack is kept minimum. The cost matrix is as follows:

T1 T2 T3M1 C11 C12 C13M2 C21 C22 C23M3 C31 C32 C33

Further, the details of costs of missiles and priority of targets can be utilized in making a better decision. There may be additional complexity

A R T I C L E

29 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 30: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

involved as certain targets can only be brought down by specific type of missiles.

Decision rules provide the basis on which the decision alternatives can be prioritized and best of the alternative can be chosen. The MiDeSH output may list the decision alternatives with most preferred decision at the top. The following is the example output of MiDeSH:

Sl No Missile Launch Center1 Agni I Unit 1

2 Agni II Unit 2

The Military commander may use this output in finalizing the decision on missile launch.

Triple-version MiDeSH : N-version programming is adapted for Fault Tolerant Computing Systems[13]. In order to improve the accuracy and reliability of MiDeSH, we propose a Triple-version MiDeSH. This is a triple version software which has three different DSS algorithms running on three different platforms. The software is developed by three separate teams which handle the same input and output format. The following block diagram shows the architecture of Triple-Version MiDeSH.

Version 1

Version 2

Version 3

Input

Voter O/P

Fig. 2 : Triple version MiDeSH

GIS Software: MiDeSH relies on GIS software for Multi-criteria decision making. The Geographical information required for MiDeSH are remote sensing images, geo-referenced statistics, facts and observation data. These are obtained from GIS Software. GIS is special purpose database which uses spatial co-ordinate system for storing and retrieving data and information. GIS decision support system involves integration of spatially referenced data in a problem solving environment[12]. The GIS provides data output in the form of maps, tables, diagrams etc suited for graphic display and for multi-criteria decision making. The MiDeSH decisions can be visualized using GIS. Also, the

terrains for missile trajectories for final attacks, the inventories of the missiles, the targets etc can be displayed with graphical maps. This aids the final decision making by the Military Commander while also assessing the current scenario of missiles.

The decision alternatives overlaid on Geographic map will be as shown in the figure. The top four alternatives have been presented for a target which is a ship in the Indian Ocean. The darkened region shows the range of Agni II obtained by Missile Range Tool[14]. The selection of launch site for the missile, the type of the missile, distance for the target, time for destruction of the target, payload etc are the decision parameters which are ranked and displayed by MiDeSH. The display can be projected on a giant screen for the Military Commanders to analyze the decisions. The trade-off parameters are distance and the size/cost of the missiles. If the target is attacked from a nearer launch site, a smaller missile may suffice. This saves significantly on the operation costs.

Fig. 3 : MiDeSH output visualization

Iv. Summary and ConclusionsMiDeSH is a Multi-criteria Decision

Support System useful for missile launch decision making. India’s Missile Development program has resulted in several successful missiles which are in different categories. The ranges and target destruction capabilities of these missiles need to matched with the particular targets during an attack scenario. The Military commanders are assisted by the MiDeSH in accurate and quick decision making. Also, human/machine error can be overcome with

the triple version software. This paper has discussed the

architecture of MiDeSH. The triple-version MiDeSH is dependable and reliable system proposed for accurate and fast decisions. The GIS software is integrated into the MiDeSH in order to enable it for Multi-criteria decision making. The outcomes are displayed graphically with color coded maps, charts etc. v. References[1] Wikipedia Entry on Missiles : https://

en.wikipedia.org/wiki/Missile.[2] “Integrated Guided Missile

Development Program (IGMDP)”, DRDO special Publication Series, 2008.

[3] D.Ghose, NPTEL Course on Guidence of Missiles, 2012.

[4] M. Chinsoria, “India’s Missile Program: Building blocks for effective Deterrence”, CLAWS, Brief No. 22, Sept. 2011

[5] DRDO, “Integrated Guided Missile Development program”, DRDO Special Publication Series, DESIDOC, 2008..

[6] M. Swaine and L. H. Runyon, “Ballistic Missiles and Missile Defence in Asia”, NBR Analysis, June 2002.

[7] A. Kanojiiya and V. Nagori, “Fundamentals of Decision Support System and Exploring Research Application in Education”, CSI Communications, June 2016, pp. 30-33.

[8] U. Averweg, “Decision Making Support Systems: Theory and Practice”, Bookboon.com publishers, 2012.

[9] Randleff, L. R., & Clausen, J. “Decision Support System for Fighter Pilots”. (IMM-PHD-2007-172), PhD Thesis, Technical University Denmark, 2007.

[10] PK Davis,et al., “Implications of Modern Decision Science for Military Decision Support Systems”, RAND Corp Report, 2005.

[11] S. Patil and L. MadanBhavi, “Web Based Decision Support System for Management of Defence Activities”, ICCCCT’10, pp. 731-742.

[12] J. Malczewski, “GIS and Multicriteria Decision Analysis”, John Wiley and Sons, 1999.

[13] I. Koren and M. Krishna, “Fault Tolerant Systems”, Elsevier India, 2013.

[14] CarlosLab, “Missile Range Tool”, www.carloslabs.com/node/164/.

[15] Brahmos Supersonic Cruise Missile, http://www.bramhos.com/

[16] N. Panigrahi, “Computing in Geographic Information Systems”, CRC Press, July 2014.

[17] N. Panigrahi, “Geographic Information Science”, CRC Press, August 2009.

n

A R T I C L E

www.csi-india.org 30

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 31: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

A R T I C L E

About the AuthorsDr. CRS Kumar is currently Head of the Department of Computer Engineering and also Chairman of Data center in Defence Institute of Advanced Technology(DIAT), DRDO, Ministry of Defence. He has received PhD, M.Tech., MBA and B.E. degrees from reputed Universities/Institutes. His areas of interest are in Cyber Security, Network Security, Fault Tolerant Computing, Game Theory, Wireless Networking. He is a Fellow of IETE, Fellow of Institution of Engineers, Senior Member of IEEE and Distinguished Speaker of Computer Society of India.Dr. Kumar has worked in leading MNCs such as Philips, Infineon, L&T Infotech in senior positions. He has visited several countries such as Australia, Germany, France, Netherlands, USA, UK, HK for work/conference participation.. Dr Kumar is member of DIAT Academic council and AICTE-INAE Steering Committee. He is currently supervising 8 PhD students and 4 Master’s students. He is reciepient of several awards including “Distinguished HOD award” at TechNext India 2017, held at IIT Mumbai.

31 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

International Summit on Trends & Innovations on Net Gen ICT On occasion of CSI Foundation Day, CSI Hyderabad Chapter in collaboration with RCI DRDO organized an International Summit on Trends & Innovations on Net gen ICT (TINICT) on 4th March, 2017 at Hotel Novotel, Hyderabad.

Page 32: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

Life Time Achievement AwardDr. M. L. GoyalDr. M. L. Goyal did his B.E. (Hons.) in Electrical Engg. from MREC, Jaipur; M.E. (Distinction) in Electrical Engg. from BITS, Pilani; M.A.Sc. in Computer Science from University of Toronto, Canada; and Ph. D. in Computer Science from Jawaharlal Nehru University, New Delhi.He Worked in CMC Limited in different Regions, SBUs and functions for more than 31 years (1977-2008). He was associated with the management of software development & implementation, systems support; consultancy, quality & excellence processes, marketing, education & training and General Management. During 1991-93, on deputation from CMC limited, he worked as an Adviser to the Govt. of Mauritius and Head, Central Informatics Bureau at Port Louis. He superannuated from CMC Limited after serving as General Manager at Chennai and New Delhi. While working in CMC Limited, he received several appreciation and special contribution awards. In October 1998, he was given the Outstanding Recognition Award for his significant contribution for the growth of IT Education and Applications & for achieving professional excellence. After superannuation from CMC Limited, in September 2008, he joined Maharaja Agrasen Institute of Technology, Delhi as its Director and continued up to May 2016. Many innovations were introduced in the working of the institute and the institute grew at a rapid pace during this period. Since, May 2016, he is working as Director General at this institute.He served the Computer Society of India as its Divisional Chairmen, Honorary Secretary, Vice President, & President. He was a Member of the Executive Council of the South East Asia Regional Computer Confederation (SEARCC) during 1994-96 and the Indian Representative to the International Federation of Information Processing (IFIP) during 1996-98. His contribution to CSI has been very significant. During his CSI President-ship, a Committee was set up to prepare a draft national IT Policy. The Committee brought out a document “INTENT – Information Technology for National Transformation” and the same was released to the press in October, 1995 and presented to the Planning Commission and various Govt. Departments. First time in CSI, 2 National IT Application Awards of Rs. 50,000/- each were instituted in the year 1996. The original CSI logo was expanded by adding to it, the Society’s name, year of its registration and CSI’s motto “Sarve Bhavantu Sukinah”. His association with CSI started in 1973 when he presented a technical paper based on his M.E. thesis at the CSI-73 Annual Convention at Delhi. In September 1998, CSI conferred on him its Fellowship Award. He was the President – Computer Science Section of the Indian Science Congress Association during 1999 – 2000 and Hon. Treasurer; Chairman, Board of Examination and Council Member of the Institution of Electronics and Telecommunication Engineers (IETE) during 2006-09. He has been the Chairman, Institution of Engineers, Delhi State Centre, during 2013-14.He served as a member in various Committees formed by the Department of Information Technology, Ministry of Communication and Information Technology, Govt. of India; All India Council of Technical Education, Confederation of Indian Industry, Technology Information and Assessment Council of Department of Science & Technology, National Board of Accreditation and Bureau of Indian Standards. He was also a member of the Governing Council of DOEACC Society during 1994-96. He is a Fellow of the Institution of Engineers (India) and the Institution of Electronics & Telecommunication Engineers. On the occasion of the 46th Engineers’ Day on September 15, 2013, The Institution of Engineers (India) Delhi State Centre conferred on him the Eminent Engineer Award for his significant contribution to the advancement and application of practice of Engineering in India.In grateful recognition of his services to the Computer Society of India and his outstanding contribution as an IT professional to IT Industry and Education, CSI has decided to confer on him the Life Time Achievement Award. The Society takes pride and pleasure in presenting him this citation on the occasion of its 51st Annual Convention held at Coimbatore on 23rd January 2017.

www.csi-india.org 32

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 33: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

Life Time Achievement AwardDr. R. SrinivasanDr. R. Srinivasan had contributed extensively for promotion of High quality Research, Computer Education, IT Industry, Resesearch and Computer Society of India.Dr. Srnivasan is one of the co-founders of the CSI Bangalore Chapter- inaugurated in 1973/74. Served CSI-BC as Vice Chairman and Chairman. He has been the Regional vice-President for the South, Vice President and President of CSI. His flagship initiative has been the CSI Karnataka Student Convention in 1987 which is still continuing every year for the last 29 years!!His lectures on “Success story of Indian Software Industry and the Lesson for Developing Nations”, Beijing, China, in the year 2000 and “ Computer Society of India, its Structure and Activities”, Milan, Italy, in the year 1999 had helped CSI to reach new destinationsDr. Srinivasan has been a member of the Committee on IT Task Force constituted by the then Prime Minister of India, Shri Atal Bihari Vajpayee and contributed for developing new strategies He has introduced Video lecture programs in CSI from eminent personalities including Sir Arthur C. Clark.Dr. Srinivasan served as a Scientist in National Aerospace laboratories for 35 years. He played a major role in the accession and establishment of NAL Computer Centre housing a mainframe UNIVAC 1100-H1 Computer.Dr. Srinivasan has been in the Committee Chaired by Dr. Abdul Kalam for the design and development of parallel computer in DRDO. He has been a member of the Expert Committee to procure computers for ADA, Bangalore and SERC, Ghaziabad.Dr. Srinivasan has worked in IT Industry for about 8 years; as CTO in Tata Elxsi, in BFL Software, and as CTO in iCMG, Bangalore.Dr. Srinivasan has been a very good teacher and a researcher. He at the age of 78 now, he is working as Emeritus Professor in M. S. Ramiah Institute of Technology, Bangalore. He has produced 8 Ph.D’s and published 27 papers in the last four years - has guided more than 100 BE and M.Tech projects.In grateful recognition of his services to the Computer Society of India and his outstanding contribution as an IT professional to IT Industry and Education, CSI has decided to confer on him the Life Time Achievement Award. The Society takes pride and pleasure in presenting him this citation on the occasion of its 51st Annual Convention held at Coimbatore on 23rd January 2017.

Dr. D. D. SarmaDr. Dhavala Dattatreya Sarma, born to Venkataratmma and Jagannadha Sastry earned his Graduation, Masters level Degrees in Arts and Sciences and Ph.D from Andhra University

Dr. Sarma was Chief Scientist (Scientist G) at National Geophysical Research Institute (Council of Scientific and Industrial Research, India) and extensively worked on Stochastic and Computer Modeling. Dr. Sarma was a Post Doctoral Research Associate at the University of Georgia (USA). He had received intensive training in Computer Methods and signal processing at IIT-K and University of Roorkee, Roorkee (U.P-India). He had received intensive training in Computer Methods and Operations Research at the Imperial College of Science & Technology (London) and the University of Leeds, Leeds, U.K. He was a visiting Scientist

33 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 34: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

Life Time Achievement Awardat the world famous Centre de Geostatique, Fontainebleau, France. Presently, he is working as Professor and Director, Guru Nanak Institutions Technical Campus, Hyderabad,Dr. Sarma has promoted research and produced several Ph.D. Holders. He had published over seventy five research papers and Three books He has organized a number of national and international conferences on various aspects of computers, e-learning and entrepreneurship education. Over the years, he has held leadership positions in various high profile scientific/educational institutions. Among others, Dr. Sarma is a Fellow, Computer Society of India, Fellow, A. P. Akademi of Sciences, Fellow, Telangana Academy of Sciences, Indian Society for Probability and Statistics, Fellow Geological Society of India. He was Regional Representative for Asia of the International Geostatistics Association (France) from 1992 - 2000. He is presently the Chairman IT & CSE Section of A. P. Akademi of Sciences. Dr. Sarma became member of CSI in 1968 and was associated with the Regd. Office of CSI, Hyderabad since its formation He was Regional Representative of CSI during 1979- 83 and organized Four regional conferences. He was member, Publication Committee of CSI from 1996 -1998 and during 2004. Dr. Sarma had served as Chairman, CSI Hyderabad Chapter from 1986-88, Chairman, Finance Committee of CSI- 95 held at Hyderabad (1995), Chairman Div. VIII (Micro Computers), during 1994-1998. He was member, awards committee of CSI during 1998 and 2004 and member, Publication Committee of CSI from 1996 -1998 and during 2004. As Divisional Chairman he organized a number of workshops and conferences on various aspects of computer methods and modeling. He was Regional Academic Auditor for Aptech for their NCC-Aptech Educational Programme for a number of years. Dr. Sarma is the Editor, International Journal of Computer Science and Engineering being brought by Guru Nanak Institutions.In grateful recognition of his services to the Computer Society of India and his outstanding contribution as an IT professional to IT Industry and Education, CSI has decided to confer on him the Life Time Achievement Award. The Society takes pride and pleasure in presenting him this citation on the occasion of its 51st Annual Convention held at Coimbatore on 23rd January 2017.

Mr. G. RamachandranMr. G. Ramachandran obtained M.Sc degree in Mathematics from Madras University and M.Stat from Indian Statistical Institute, Kolkata. He is a member of Computer Society of India from 1965 and currently a Fellow life member.Mr. G. Ramachandran has made outstanding contribution in the field of Information Technology for Indian Industry for more than five decades. He has developed and implemented more than 150 Information Technology Projects, covering many domains. He has developed Strategic Plan for computerisation for many enterprises. He was a member of the group constituted to have an appraisal of EDP facilities available in Public Sector Undertakings under the ministry of Heavy Engineering Industries. He was a pioneer in introducing bar codes for retail stores billing. He has worked with both Public and Private enterprises. As an entrepreneur he has set up two companies, one on software development and the other on software training. He has represented our country and presented country paper in Asian Productivity Council, Tokyo and Computer Conferences at Singapore, Hong Kong and Tokyo. He has trained more than 400 Information Technology professionals, who are now contributing for the advancement of Indian Information Technology Industries. He was a member of the first MCA Syllabus committee of the Madras University.Mr. G. Ramachandran was committed and his contribution to Computer Society of India is outstanding. He played a major role in acquiring own premises for many chapters. He has organised many national and

www.csi-india.org 34

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 35: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

Life Time Achievement Awardinternational conferences on behalf of Computer Society of India. He was the convener of the first National Students Convention of Computer Society of India and the first DOEACC Chairman. He was a member of the group constituted to start the Education Directorate at Chennai. He started the Visakhapatnam Chapter and held various positions in CSI Executive Committee as Regional Vice President (South), Honorary Secretary, Past Secretary and Vice President.Mr. G. Ramachandran excelled in the sports field also. He was a member of the Madras University, Madras State, West Bengal State Basket Ball team. He captained West Bengal State Basket Ball team and was selected to represent Indian Basket Ball team.In grateful recognition of his services to Computer Society of India, Information Technology Industry and Society, Computer Society of India is pleased to confer on Mr. G. Ramachandran Life Time Achievement Award. The Society takes pride and pleasure in recognising him with this citation on the occasion of the 51st Annual Convention held at Coimbatore on 23rd January 2017.

Prof. U. K. SinghProf. Uttam Kumar Singh, Founder Director General of Indian Institute of Business Management & Dr. Zakir Husain Institute, Patna completed B.Sc. and MBA (MIS) from Bihar University, Muzaffarpur in 1972 & 1974 and further obtained PhD, Master of Public Administration (MPA) and BNYS Degree. After completion of academic pursuits, Prof. Singh entered into Institution Building and established several technical and vocational Institutes of national repute at Patna, Ranchi, New Delhi, Kolkata, Pune, Bhubaneswar including two universities in Nagaland and Arunachal Pradesh. Since the Year 1979, Prof. U. K. Singh, initiated Computers & IT Education in the State of Bihar & Jharkhand. He is the first academician to start P. G. Diploma in Computer Applications, BIT, MIT, BCA & MCA in undivided Bihar and Jharkhand. As a pioneer personality in the areas of Computers & IT Education in India. Prof. Singh was instrumental in initiating computer science for Women, School Teachers and Govt. officials in 1984 with financial support from Department of Electronics, Govt. of India under IT Awareness Programme (ITAP). Under his guidance, Govt. of India established the National Centre for IT Instructional Materials Development, National Centre for Research and Training for Professionals and Administrators with funding from Govt. of India, Department of Electronics, Prof. Singh introduced Computer Aided Education in Non-formal Education in the Year 1985.Prof. U. K. Singh is Fellow of Computer Society of India and was Founder Vice Chairman of CSI, Patna Chapter. Later, he served CSI as Chairman, CSI Patna Chapter, Divisional Chairman (Data Communication), Regional Vice President (East), twice member of Nominations Committee at National Level. He was nominated as TC Member (Education) to the International Federation of Information Processing (IFIP) Vienna, Austria. He activity organized various Regional, Divisional and National Conferences at various locations in India. CSI conferred Fellowship on Prof. U. K. Singh during 2011 for his contributions on the objectives of CSI.Prof. Singh was also elected President of Computer & IT section of 100th Indian Science Congress. As prolific writer on Computers & IT, Prof. Singh has published several Books and Articles. Prof. Singh is also associated with Institution of Electronics & Tele-Communication Engineers, All India Management Association, Indian Society for Technical Education, Indian Commerce Association and also Executive Member of All India Council for Technical Education (AICTE) for five years. Presently, Prof. Singh has been nominated TC Member (Education) to IFIP, Austria by CSI.In grateful recognition of his services to the Computer Society of India and his outstanding contribution as an IT professional to IT Industry and Education, CSI has decided to confer on him the Life Time Achievement Award. The Society takes pride and pleasure in presenting him this citation on the occasion of its 51st Annual Convention held at Coimbatore on 23rd January 2017.

35 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 36: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

A R E P O R T

Foundation Day Seminar-2017 (Silicon Institute of Technology, Bhubaneswar)

1. Student Branch Name: Silicon Institute of Technology, Bhubaneswar

2. Region-IV

3. Event Title and Date: Seminar on “The Latest Trends in IT”, on the occasion of CSI Foundation Day, 6th March 2017

4. Speakers:

� Mr. Amit C. Kesh, Senior Technology Architect, Infosys

� Mr. Amartya Roy, Project Manager, Infosys5. Gist of the Event

CSI Foundation day was observed at Silicon Institute of Technology, Bhubaneswar on 6th. Mar. 2017. On this occasion we organised a seminar on “The Latest Trends in IT” by inviting experts from Infosys, Bhubaneswar.

The seminar was inaugurated by Dr. Jaideep Talukdar, Principal, Silicon Institute of Technology, Bhubaneswar in the presence of Mr. Nitai G. Dhall, Founder and mentor of CSI-Silicon Student branch. He welcomed the guest speakers on the auspicious occasion of CSI Foundation day and delivered his inaugural address on the ever-changing technology trends and need of more innovation and research to cope with it.

Mr. Amit C. Kesh, Senior Technology Architect, Infosys delivered his talk on “Internet of Things (IoT)”. He started his talk by explaining the importance of such an area in the field of water management, power management, crash management, Smart city etc. He presented a clear picture on how IoT is going to be one of the prominent areas of research in coming days. Also he focussed on the development that is happening in countries like USA, Germany, Japan and the role of Infosys in handling the projects based on this technology. He presented some videos on the IoT which were very much entertaining and

informative. Finally, he invited the students to be a part of it by contributing ideas which may lead to a better social life for poor people of our country.

Mr. Amartya Roy, Project Manager, Infosys delivered the second talk on another prominent area of research and applications known as “Machine Learning”. He presented his talk in a very informal way with the introduction to machine learning and its role in E-commerce, how the companies like Flipkart, Amazon etc take the benefits of such technology to understand the customers’ behaviour for their business growth. Also he pointed out the security threats involved in E-commerce transactions on which more research required to make digital India a success.

Both the talks were very much appreciated by the audience, mostly students of the institute. Around 70 students of B.Tech and MCA were present. The sessions were very much interactive and informative.

Finally, Dr. Talaukdar and Mr. Dhall presented the mementos to the guest speakers and the event ended with a photo session.

(L-R) Debasish Jena (Student Convener), Dr. Bimal K. Meher (SBC), Mr. Priyabrat Nayak(Head, II Cell), Mr.

Amartya Roy(Infosys), Dr. Jaideep Talukdar(Principal), Mr, BiswaBhusan (Infosys), Mr. Amit C. Kesh(Infosys),

Mr. Nitai. G. Dhall(Trustee)

www.csi-india.org 36

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 37: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

A R E P O R T

37 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Report on CSI Student Conventions Regional Student Conventions

Region-I

Regional Student Convention (Region I) was held at SRM University, Delhi – NCR Campus on 13th - 14th February 2017. The Event was inaugurated by chief guest Prof. S K Kak, Founder Vice Chancellor of Mahamaya Technical University, Noida, Dr. John B. Haynes, CEO American Institute of Physics, Mr. Saurabh Agarwal, RSC (Region Mr. Piyush Goyel, Chapter Patron, CSI Ghaziabad Chapter. The event was presided over by Dr. Prof. Manoj Kumar Pandey, Director, SRM University, Delhi – NCR Campus. Prof. Kak said “computers are not magic but they are more than magic, so we should use it smartly.” Mr. Saurabh Agarwal asked students to learn how the fusion of technology is changing our day to day life. Event saw participation from over 300 students across seven states of the region. The main aim of the convention was to build a foundation and inspired the students of Computer Science Engineering, and allied branches of engineering to understand and apply the new trends in technology. Contests included Start-up Idea Presentation, Model Presentation, Android App development, Code in less, Treasure Hunt, website design, More from Waste etc.

Region-v

Regional Student Convention for Region-V was held at JSS

Academy of Technical Education, Bangaluru on 10th & 11th March 2017. The theme for the convention was “Data Science for Digital India Initiatives”. Over 315 students participated in the convention from different institutions. The event was inaugurated by Dr. Anirban Basu, President, CSI who was the chief guest for the event. In his inaugural speech he highlighted about CSI and its goals. The guests of honour for the event were Mr. Balachander Agoramurthy, CTO and Co-founder of 4sight technologies. Mr. K Devraj, Deputy Director of MSME, Mr. S Prakash, Chairman CSI Bangalore Chapter and Dr. Mrityunjaya V Latte, principal, JSSATE. Dr. Prabhudev Jagadish, HOD, CSE, JSSATE welcomed the gathering. Dr. S Prakash in his address mentioned that CSI is very happy to award this convention to JSSATE considering the infrastructure and contribution of JSSATE to technical education. Dr. Mrityunjaya V Latte, Principal delivered the presidential address. Mr. Balachander Agoramurthy in his keynote address elaborated various new technologies. He covered topics like Artificial Intelligence, Cloud Computing, Big Data Analytics, Internet of Things and many more. Mr. K Devraj, Deputy director, MSME highlighted the various initiatives taken by government under Digital India campaign and also the contribution that can be made through data science. He also mentioned the various opportunities that the government is opening up for students to do projects for Digital India. He also stated that if the project ideas are up to the mark then it can be funded by the government.

State Student ConventionsPunjab

A two-day CSI Punjab State Student Convention was hosted on 3 & 4 March 2017 at Chitkara University, Patiala. The primary theme of the convention was Digital Connectivity & Social Impact. Event was inaugurated by Mr. Shiv Kumar, RVP-I. Dr. Madhu Chitkara, Vice Chancellor, Chitkara University, presided over the event and Mr. Maninder Singh, CSI State Student Coordinator, Punjab delivered the keynote address.

Page 38: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

A R E P O R T

www.csi-india.org 38

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

The convention received overwhelming response from the students of various colleges of the Punjab State. All workshops received a very good response by the participants, who gave a positive feedback of the same. The participants said that such workshops were not being organized by other institutions in the state, and that they were glad to have attended the workshops at Chitkara University campus. Other events also received huge response, with students competing with each other to attain the winning position. Mr. Saurabh Agrawal, Regional Student Coordinator, Region-I, CSI was the Guest of honour at the valedictory ceremony and gave away the prizes to all the winners. He interacted with the students of various colleges. In his address to the students he applauded the efforts put-in by the Department of Computer Applications, Chitkara University, Punjab for hosting the student convention.

Haryana

CSI Student Chapter of Dronacharya College of Engineering, Gurgaon organized the CSI State Level Student Convention for Haryana State on 17th – 18th February’17. The Theme for the Convention was “Inventory Revolution in IT”. A total of 204 teams from all over the Haryana have registered for the event. Mr. Satish K Khosla as Chief Guest, Mr. Shiv Kumar, Regional Vice President, CSI as Guest of Honor:. Mr. Saurabh Agarwal, Regional Student Coordinator, Region-I, CSI as Expert Speaker. Speaking on the occasion, Mr. Shiv Kumar, RVP-1 talked about Computer Society of India that the activities conducted for the Students associated with the Society includes lecture meetings, seminars, conferences, training programmes, programming contests and practical visits to installations. He also discussed the various achievements and role of students

in CSI activities. Mr. Satish Khosla discussed about Inventory revolution in Information technology. He discussed about his real life experiences and growth of IT during the period of his service. Mr. Saurabh Agrawal, RSC-I talked about the theme of the Student Convention – “Inventory Revolution in IT”. He stressed on the Latest Developments in Technology & changes in societal behaviour. He focused on individual meanings and real life experiences which very well emphasized how the youth of today is really becoming empowered by the proper usage of IT.

Tamil Nadu

The two days CSI State Student Convention for Tamil Nadu was held at Knowledge Institute of Technology, Salem on 17-2-2017 and 18-2-2107. A total of 150 students from various institutions participated in the convention. The inaugural function commenced in E-Block seminar hall. Dr. V Kumar, HOD/CSE and convenor of CSI convention welcomed the gathering and presented an overview of the convention activities. Dr. PSS Srinivasan, Principal and Dr. K Visagavel, Vice Principal of KIOT rendered the presidential and the felicitation address respectively. Dr. K Govinda, Regional Vice President, Region-VII, CSI inaugurated the convention and presented the chief guest address. Mr. S Venkatesh, CEO, Consensus Technologies Pvt. Ltd., Coimbatore delivered the key note address and shared his views on the importance of programming. After the inauguration, the workshop on Mysteries of Python was handled by Mr. S Venkatesh, CEO, Consensus Technologies Pvt., Ltd. Simultaneously a speech on Internet of Things was rendered by Dr. K Govinda. Seminar on Data Science & Big Data Analytics was taken by Mr. S S Aravinth, AP/CSE. Workshop on Mobile Application Development was handled by Ms A Priyadharshini, AP/CSE. Workshop on Digital Marketing was handled by Mr. A Sekar, AP/CSE.

Page 39: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

A R E P O R T

National Seminar on Innovation in Digital Learning11-12 February 2017

D.K. DwivediGeneral Chair, NSIDL

Inaugural Address by Prof. Rajendra Prasad, Vice Chancellor, Allahabad State University-Chief Guest

Release of Souvenir of National Seminar

Valedictory Address by Prof. K.P. Mishra, Former Vice Chancellor, Nehru Gram Bharti University- Chief Guest

Keynote Address by Prof. Rajeev Tripathi, Director, MNNIT- Guest of Honour

Welcome Address by Dr. K.K. Tewari, Secretary, Utthan

A view of the participants of the National Seminar

39 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

Page 40: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

A R E P O R T

National Seminar on Innovation in Digital Learning was organized by Computer Society of India-SIET Student Branch in association with Computer Society of India Allahabad Chapter on February 11-12, 2017 at Shambhunath Institute of Engineering & Technology (SIET), Allahabad. National Seminar was inaugurated by Chief Guest- Prof. Rajendra Prasad, Vice Chancellor, Allahabad State University and Guest of Honour- Prof. Rajeev Tripathi, Director, MNNIT, Allahabad by lighting the Lamp. Programme started with the Saraswati Vandana. Dr. K. K. Tewari, Secretary, Utthan & Shambhunath Group of Institutions, Allahabad, welcomed the guests and participants. Prof. M. M. Gore, Dean, International Resource Generation, MNNIT, Allahabad & Chairman, CSI Allahabad Chapter briefly introduced the audience about the CSI and National Seminar. In Technical Session-I, Dr. Brijendra Singh, Professor, Lucknow University delivered an invited talk on the topic- Importance of Digital Learning in Digital India. Dr. G. P. Sahu, Associate Professor, MNNIT, Allahabad, delivered invited talk on the topic- Green Computing. In Technical Session-II, Er. Mithilesh Mishra, Vice Chairman, CSI Allahabad Chapter delivered a talk on the topic- Computer, Content and Connectivity. In Technical Session-III, Dr. Pawan Chakravorthy, Associate Professor, IIIT, Allahabad delivered very elaborative & interactive talk on Virtual Reality & Cloud Computing. In Technical Session-IV, Dr. Lokendra Mishra, Associate Professor, Ewing Christian College, (Deemed University), Allahabad delivered an invited talk on Securing Digital Information using Cyber Forensic. Er. Ranjeet Kumar, IIIT, Allahabad delivered talk on Copyright Issues: Digital Learning in Digital Era. Contributed papers were also presented in aforesaid sessions. In Technical Session-V, contributed papers were presented by students.Online Software Quiz–EUPHEUS was also organized by the CSI Student member volunteers of SIET during the event with a view to ensure a strong connect of maximum CSI Student members. 200+ Students participated in the Competition. Mr. Nikhil Kumar Pankaj, B.Tech., CSE 2nd year Student of SIET was declared Winner while Ms. Shipra Agarwal, B.Tech., CSE 3rd year Student, SIET and Mr. Mohd. Fahim Anwar, B.Tech., CSE 2nd year student of SIET were 1st & 2nd runner up respectively.Earlier, on 22.01.2017, CSI Young Talent Search Software awareness Competition was organized by CSI Allahabad Chapter at SIET, Allahabad in which 90+ students participated. B.Tech., CSE 3rd year Student of SIET, Mr. Gaurav Shukla was declared winner while Mr. Rishank Kesarwani & Mr. Jayendra Narayan Singh were jointly declared joint 1st runner up and Mr. Umashanker Maurya was declared joint 2nd runner up.In Valedictory Session, a very live panel discussion was arranged on the contemporary topic- Bridging the gap between the Curriculum and IT Industry Requirements. Panelists were Dr. K. K. Tewari, Secretary, Utthan & Shambhunath Group of Institutions, Allahabad, Dr. Shirshu Verma, Associate Professor, IIIT, Allahabad who is also Incharge Professor of Placement Cell, Dr. G.P. Sahu, MNNIT, Allahabad, Mr. D. K. Dwivedi, Patron, CSI Allahabad Chapter & General Chair, National Seminar and the Chief Guest Prof. K.P. Mishra, Former Vice Chancellor, Nehru Gram Bharti University,

Allahabad & Former Director, Bhabha Atomic Research Centre. At the end, Prof. K. P. Mishra, Chief Guest summarized the Panel discussion and delivered Valedictory Address. After that, certificates and trophies were distributed by the Chief Guest to the winner and runners up of the two Competitions viz CSI Young Talent Search Software awareness Competition organized by CSI Allahabad Chapter on 22.01.2017 and Online Software Quiz-EUPHEUS organized by CSI-SIET Student Branch Volunteers on 12 February 2017.More than 300 delegates participated in the event and more than 25 authors presented contributed papers in two days National Seminar in five Technical Sessions.

EEE

www.csi-india.org 40

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

On the occasion of CSI foundation day, a competition QUIZBUG has been organized by the CSI student branch at Manipal University Jaipur on 6th March 2017. The students were briefed about the CSI foundation day by Dr. Prakash Ramani, SBC MUJ.The competition was open for B.Tech students of Manipal University Jaipur. The questions in quiz were on programming concepts of ‘C language’. The quiz lasted for 30 minutes and a total of 25 students participated from all years. It was great opportunity for students to test their C language skills. It was no cake walk for students answering those questions, and all of the participants did their best. However, it was a competition, not all could be awarded as winners, thus, top 5 students were awarded prizes and certificates were given to them.

QUIZBUG

Page 41: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

41 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

CrossWord Durgesh Kumar Mishra Chairman, CSI Division IV Communications Professor (CSE) and Director Microsoft Innovation Center, Sri Aurobindo Institute of Technology, Indore. Email – [email protected]

Test your knowledge on BIG DATASolution to the crossword with name of first all correct solution provider(s) will appear in the next issue. Send your answer to CSI Communications at email address [email protected] and cc to [email protected] with subject: Crossword Solution – CSIC March 2017 Issue.

CluesACROSS3. The Hadoop jobs scheduler6. The memory where Namenode is

located7. The data is to be stored in this form in

Hadoop9. The framework for job scheduling and

cluster resource management11. Stores actual data in the form of blocks12. It processes structured data into Hadoop14. Splits the data into independent chunks

DOwN1. Achieving coordination between Hadoop

node2. It moves Structured data into Hadoop4. It uses 50070 as default port number5. Data intelligence component in hadoop8. It processes Unstructured data into

Hadoop10. Enhancing the efficiency of MapReduce13. The default input format in MapReduce15. Supports Multiline commands

We are overwhelmed by the response and solutions received from our enthusiastic readers

Congratulations!All nearby Correct answers to October 2016 month’s crossword received from the following reader:

• Prof. Kirti Patil, Assistant Professor, MET’s BKC Institute of Engineering, Adgaon, Nashik

• Mr. Deepu Benson, Amal Jyothi College of Engineering, Kerala

• Rashid Sheikh, Associate Professor, Sri Aurobindo Institute of Technology Indore

1

2

3

4

5

6

7 8

9

10

11

12

13

14 15

1 E 2 C

C A 3 B4 F 5 F L O S S 6 B L

R I 7 S O U R C E F O R G E

E P A G N

E S N 8 A N D R O I D9 M O O D L E D P E

I R 10C P A N E L R

U A C

M 11F H12C O M M I T T E R

R

K13E X I M

N14C Y G W I N

Solution for February 2017 Crossword

BRAIN TEASER

Like CSI on facebook at : https://www.facebook.com/CSIHQ

Page 42: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

www.csi-india.org 42

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

F R O M C H A P T E R S & D I v I S I O N S

BANGALORE CHAPTER

CSI Bangalore Chapter organized a Technical Talk on Formal Verification of Cyber-Physical Systems held on 13th January, 2017 at CSIR-NAL campus. Event was Hosted by CSIR-NAL, Banglaore. Ms. K S Bhanumathi, Immd. Past Chairperson, CSI-BC and Convener SIG-FM welcomed the gathering. Dr. Yogananda Jeppu from HSTL Introduced the speaker Ms. Pavithra Prabhakar. She has spoken about Cyber-physical systems (CPS) combine computation, control and communication in novel ways to achieve sophisticated functionalities as in autonomous driving, and automated load balancing in smart grids. CPS have immense societal and economic impact. A grand challenge towards exploiting this potential is the development and deployment of “reliable” CPS. Formal verification has emerged as a powerful methodology that provides rigorous and automated analysis of systems. In contrast to simulation and testing, verification aims at constructing correctness proofs. She provided a brief overview and verification techniques of Cyber-Physical Systems. Ms. Bhanumathi K S, Convener SIG-FM delivered the vote of thanks.

BHILAI CHAPTER

Bhilai chapter had organized a technical session on “Data Management & Next Generation Data Center”.

The talk was delivered on 22-3-2017 by Mr. Satinder Pal Singh, Head, Systems Engineering of M/s. NetApp Marketing & Services India Private Limited.

The speakers deliberated on the Storage Solutions for today’s Modern Data centers which demands more of a “Software Defined Storage” and how storage can be leveraged to access and protect data either on Premises or on Cloud.

The product portfolio of the Company was presented which

can meet the needs of a futuristic scalable and affordable Storage Solution.The program was attended by CSI members of Bhilai Chapter.

The two day International Symposium on Cloud Computing and Data Analytics organized by the Computer Science Division of National Institute of Engineering in association with CSI Bangaluru and Mysuru chapter was kicked off today at its premises by Dr. Anirban Basu, President, CSI.

Addressing the gathering Dr. Basu highlighted that Cloud Computing is an extremely important and emerging field because of it cost effectiveness, performance and high availability, & Govt. of India is encouraging the application of Cloud Computing in day to day life and projecting the Cyber Physical Systems. He stressed that the Ideas that come out of this gathering must become a reality.Dr. B G Sangameshwara, the Vice Chancellor of JSS S&T University, who was the guest of honour, noted that nowadays educational institutes are under pressure to deliver and are not able to do so to their full potential. The typical problems faced by them include insufficient infrastructure, lack of teachers and small classrooms etc. which can be solved by application of Cloud Computing and it will play a prominent role in classrooms of tomorrow.Dr. Rajkumar Buyya, Director of the CLOUDS Laboratory, University of Melbourne, Australia, delivered the key note speech. Speaking on the occasion, Buyya said Cloud Computing is the next revolution in IT. Domestic market is growing at a faster pace and provides enormous opportunities to software professionals. Cloud Computing should become the utility. Enormous amount of data is being generated and the world is totally data driven. It is advantageous to move from Classical computing to Cloud computing. Several Computing Paradigms have promised to deliver “Computing Utilities “vision and market oriented clouds have started to become a reality.The proceedings of the symposium and computer Science department newsletter were released on the occasion. We had received 200 plus papers, about 120 papers were selected for the presentation (oral and poster). Earlier, Dr. H D Phaneendra, Chairman of the organising committee and Head of the Computer Science and Engineering Department, welcomed the dignitaries. Rampur Srinath,

Page 43: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

43 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

F R O M C H A P T E R S & D I v I S I O N S

Chairman of CSI Chapter and conveyer of ISCCDA 17, Mysuru introduced the dignitaries to the audience. Dr. Prakash S, Chairman, CSI Bengaluru Chapter, Dr. Raghavendra Rao, Professor, Department of Computer Science and Engineering, Dr. K. Raghuveer, Head of Infomartion Science and Engineering, Girish, Head of the Master of Computer Applications, DR. T N Sridhar, Controller of Examination, Dr. Suresh B, Dean R&D, Lokesh S Treasurer ISCCDA 17, Dr. Yuvaraju, Secretary ISCCDA 17, teaching, non-teaching staff, participants and students were present.

BHOPAL CHAPTER

CSI Bhopal Chapter organized an International Conference on “Make in India : An opportunity to sustainable Entrepreneurship Development” on 16th and 17th February 2017. Objective of of this event is to identify the opportunity and scope of sustainable Entrepreneurship Development in economic, social, environmental and technical scenario. To bring Corporate Experts, Academicians and Researchers under one roof to share views, experiences on a common platform. The conference was organized by Department of Computer Science, Commerce, Management, Green cluster and Training & Placement cell in co-organization with MPCON, CII, ISCA Bhopal Chapter. The conference was inaugurated by Prof Mohan Lal Chhipa, Vice Chancellor, Atal Bihari Vajpayee Hindi University, Bhopal and Guest of honour Dr. U. N. Shukla, Registrar, Barkatullah University, Bhopal. Shri Vishnu Rajoria, Founder Chairman, Career Society presided over the function. The conference was multi-disciplinary comprising of four technical sessions of diverse areas to incorporate the different sectors of Make in India campaign.

Delegates from UK, America and Saudi Arabia participated in the conference. The prizes for Poster competitions were sponsored by ISCA, Bhopal chapter.

COIMBATORE CHAPTERThe inauguration of the Short Term Training Program on Pattern Recognition and Applications (PR&A-2017) was scheduled on 27th February 2017. Mr. Vishnu Potty, VP, CTS and Chairman, CSI Coimbatore Chapter was the Chief Guest and Dr. Basabi Chakraborty and Dr. Goutam Chakraborty from Iwate Prefectural University, Japan were the Guests of Honour. Dr. Radhamani also represented Chapter Secretary.

After invocation, Dr. Malarvizhi, welcomed the guests and Dr. Ramalatha Marimuthu explained about the programme. Mr. Vishnu Potty spoke about the importance of attending skill development programmes. Dr. Radhamani introduced the Guests of Honour. The STTP was conducted for four days and there were about 45 participants in and around Coimbatore. Dr. Sudha Sadhasivam from PSG College of Technology, Coimbatore and Dr. Ganesh Kumar from Anna University, Coimbatore were other trainers for the programme. The overall focus of the program was to aid in finding research problems and solving research problems in Pattern Recognition Applications.

HARIDWAR CHAPTER

In the array of knowledge events for students and faculty, Haridwar Chapter has organized a Personality Development session on 3rd February 2017 to improve the skills of students in association with Department of CS and Engg., Faculty of Engg. & Tech., Gurukula Kangri University, Haridwar. The resource person for this event was Mr. Amit and Ms Divya from TIME Institute, Roorkee. The speakers addressed the students and discussed the benefits of good personality tips for improving the first impression during an interview. Mr. Amit shared his own experiences during an Interview, how to drive interview towards the important questions. The session was interactive and had many activities like Mock Interview, Group Discussion with the explanation of points remember, an aptitude test etc. Total 60 students attended the session. Dr. Sunil Panwar, Dean, Faculty of Engineering & Technology, presented a Memento to the Invited guest. Dr. Mayank Aggarwal, Vice Chapter Chairman thanked all the guest and invited speakers for holding such a nice session. The event was organized by Mr. Suyash Bhardwaj, member and Mr. Nishant Kumar, Chapter Secretary.

MUMBAI CHAPTERCSI Mumbai Chapter and IFPUG (International Function

Page 44: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

www.csi-india.org 44

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

www.csi-india.org 44

C S I C o m m u n I C a t I o n S | m a R C H 2 0 1 7

F R O M C H A P T E R S & D I v I S I O N S

Point Users Group) organised 13th International Conference ISMA-13 on Software Measurement Analysis and CSI Foundation Day Celebration in association with IIT Bombay, at Victor Menezes Convention Centre, IIT Bombay, Powai on 6th March 2017. The International Conference ISMA-13 jointly held with IFPUG. 225 members from Mumbai, Hyderabad, Benguluru, Pune, Nashik, Ahmedabad, Ranchi, Coimbatore, Chennai and 15 International members are participated. Mr. Murali Chemuturi inaugurated the Conference. Dr. D B Phatak delivered the keynote address followed by Mr. Thomas Cagley, the President of IFPUG who spoke on CEP Presentation. Three parallel specialised sessions with eighteen presentations of International Standards pacified the hunger of the participants with knowledge about function point, story point and SNAP etc.

NASHIK CHAPTER

CSI Nashik Chapter celebrated Information Technology Day on 10th March 2017. On the occasion Dr. Baisa Gunjal from Amrutvahini College of Engineering Sangamner, received Yashokirti Puraskar in the auspicious hands of Padma Shri Dr. D B Phatak. The award is instituted by Shri Avinash Shirode patron and past chairman of CSI; in memory of his mother Late Sou Shevantabai Shirode. Dr. Baisa Gunjal is especially abled (Divyang) lady, from Gunjalwadi in Sangamner. She has fought all odds to contribute to computers and information technology field and has become successful in life. Teenager Nilay Kulkarni received special award for developing various mobile apps right from his childhood. In addition to academic excellence awards and appreciation of college principals and student branch counselors;

Mr. Vivek Gogate received patron award and Dr. Mahesh Sanghavi the special achiever award at National level. On the occasion Chairman Mr. Diwakar Yawalkar shared Nashik region activities and Mr. Shirish Sane, identified Nashik as fast growing IT hub and appealed the IT professionals to be part of nation building and take Nashik on global IT map. Chief guest

Padma Shri Dr. Deepak B Phatak guided the audience about creating integrated IT systems and developing the system with the theme ‘Make in India for India’. Special edition of ACCESS newsletter was published on the occasion. The program was attended by eminent IT personalities of Nashik, academia and IT professionals.

vADODARA CHAPTER

CSI Vadodara Chapter organized One Day National Seminar on Cyber Security on Saturday the 4th March, 2017 at ONGC Officer’s Club, Vadodara. This National Seminar was organized in accordance with CSI Foundation Day Celebration. The seminar was jointly organized by IETE, Vadodara Centre, CSI, Vadodara Chapter and IE(I), Vadodara Local Centre. The Chief Guest for the seminar, Dr. Smriti Dagur, Immediate Past President, IETE, delivered a Keynote Address on Cyber Security. Guest of Honour, Shri Arun Kumar, Group General Manager and Basin Manager, ONGC, Vadodara addressed the audience. Dr. Mamta C Padole, Hon Secretary, IETE Vadodara Centre and CSI Hon. Chapter Secretary welcomed all the guests and anchored the Inaugural Ceremony. Mr. Tushar Kher, Chairman, Vadodara Centre introduced the theme. Mr. Chetan Shah, Chapter Chairman and Mr. Ashit Shah, Chairman IE(I), Vadodara Local Centre felicitated the guests. During the Seminar, various talks from eminent experts were presented. The talks comprised of Cyber Security Awareness by Mr. Dipak M Rai, Former Vice President(IT), Reliance Industry Limited, ISO 27001:2013 Requirement and implementation by Mr. Tushar B Kher, In-Charge of Disaster Recovery Data Center of Project ICE at ONGC, Various Threats and Prevention measures by Mr. Chetan Shah, DGM(IT) at L&T Technology Ltd., Cyber Laws/IT Act, by Dr. Mahesh Thakar, Advocate Gujarat High Court, Doctorate in Law, Winner of five Gold Medal in LLM, Demonstration of Cyber Security Tools by Prof Kshitij Gupte, Asst Professor, Dept of Computer Sci & Engg, The MSU of Baroda and Prof Rushi Trivedi, Asst Professor, Dept of Computer Sci & Engg, The MSU of Baroda 3 Professional organizations IETE, Vadodara Centre, CSI, Vadodara Chapter and IE(I), Vadodara Local Centre came together to organize this National Seminar on Cyber Security where more than 350 registered participants from Industry, Academia and students attended the seminar. The seminar was well appreciated and was very interactive raising variety of questions from all participants. Dr. VK Shah, Vice Chairman, IETE, Vadodara Centre proposed the vote of thanks.

Page 45: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

45 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

vELLORE CHAPTER

CSI Vellore Chapter in association with Apple Developer Group (ADG) organized App-a-Thon 2k17, a 36 hours competition at VIT University, Vellore on 8th, 9th and 10th of March, 2017. This event saw a plethora of brilliant, amazing and out of the box ideas. Mr. Sreenath, Mr. Mohammed Rafy and Ms. Saranya were the judges for the event. Ingenious judgment and encouragement made the hack a memorable learning experience which will continue to inspire the students to create and innovate. Around 200 participants participated. The event was organized by Prof. R Rajkumar, Chairman, Prof. K Govinda, RVP VII.The Chapter in association with SCOPE and IEEE Computer Society organized Symposium and Hackthon at VIT University from 3-3-2017 to 10-3-2017. The hackthon supported by Venturesity, Bangalore. The second session of the event is cyber security conducted by i3Indya technologies followed by internet of things by Blue Banyan technologies. Around 300 participants attended. This event organized by Prof. P Boomiathan, Prof. R Rajkumar, Chapter Chairman and Prof. K Govinda, RVP-VII.

vISAKHAPATNAM CHAPTERThe Chapter in association with Visakhapatnam Steel Plant organized an International Conference on DIGITS - Digital India in Global IT Spectrum on 24th and 25th February 2017 at The Gateway Hotel, Visakhapatnam. As India is marching digitally ahead and Visakhapatnam is identified to be developed as a smart city and IT Hub of Andhra Pradesh, CSI, Vizag organized

this conference to bring together researchers, engineers, developers and practitioners from academia, industry, government establishments, NGOs & MNCs to disseminate their knowledge, share their experiences and exchange ideas in latest developments in Information and Communication Technology to “Innovate, Integrate and Transform” into a Digital India with a Global IT perspective. Mr. P Madhusudan, CMD, RINL and Chief Patron, CSI-Vizag graced the event as the Chief Guest. Mr. J A Chowdary, Advisor for IT & Special Chief Secretary to CM, Govt. of A P was the Guest of Honour Prof. S V Raghavan, Former Scientific Secretary to Govt. of India delivered the keynote address on Digital Life and expectations. Mr. Raju Kanchibotla, RVP-V, CSI appreciated the initiatives taken by CSI, Visakhapatnam chapter. Mr. D N Rao, Director (Operations) and Chairman, CSI-Vizag explained the need of such conferences. Mr. K V S S Rajeswara Rao, GM (IT&ERP), RINL and Vice-Chairman, CSI-Vizag welcomed the audience. Mr. Anindya Paul, AGM (IT), Vizag Steel, Secretary, CSI-Vizag and Convener, DIGITS proposed the vote of thanks. The 2 days’ Conference had six sessions packed with lectures from eminent speakers of national and international repute, Manufacturer’s presentation and selected paper presentation. Speakers from USA, Europe, Asia and India with varied interest and expertise gave lectures to enlighten the delegates from equally varied fields.

F R O M C H A P T E R S & D I v I S I O N S

Book Title : The class of JAVAAuthor : Pravin M. JainISBN : 978-81-317-5544-0Price : Not AvailablePublisher : Pearson

As its title implies, this book teaches class in Java Programming. By now, nearly everyone in the computing field knows what Java is: an object-oriented, Internetaware language with the potential to revolutionize programming.

The book is divided into 23 chapters starting with OOPS introductory to class, exceptions, multithreading, networking, GUI – swing and MVC, Applets, JDBC, interaction with database, annotations and many more. The Indic characters in Unicode are one of unique quality in the book. It has a good emphasis on object orientated design; class diagrams are used extensively throughout the book to

make it easy to understand how examples work. Working through the book will teach you how to program - not just how to write simple applets.

Examples in the book teach a wide range of topics, from simple concepts to high level. The approach taken by the book introduces topics gradually, and makes it easy to pick up the skills needed to program in Java.

The book is easy to read and understand by the student community. The book will serve as useful textbook for students in computer science, information technology, computer applications and students who wish to learn object oriented programming using Java.

Review by: Dr. Kanhaiya LalHODDepartment of Computer Science & EngineeringBirla Institute of Technology Mesra, Patna Campus

Page 46: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

www.csi-india.org 46

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

REGION-IManav Rachna International University, Aravali Hills

31-1-2017 – Prof. M N Hoda, Chairman, Division-I & Dr. N C Wadhwa, VC Inaugurating CSI Student Branch

28-2-2017 - Technical Fest on Technoholic 2017

Jaypee University of Engineering & Technology, Guna Dronacharya Group of Institutions, Greater Noida

15-1-2017 – Coding Competition on KODEATHON 17.1 16-2-2017 - Industrial Visit to National Small Industries Corporation Limited, Okhla

REGION-IIISagar Institute of Science and Technology, Bhopal

4-3-2017 - Interactive session with Mr. Ajay Nema, Chief Architect, ADVA optical Networking

13-3-2017 & 14-3-2017 - Prof. Puneet Himtani addressing during Hands on Workshop on Java Technology

Manipal University, Jaipur The LNM Institute of Information Technology, Jaipur

14 & 15-2-2017 - Two days workshop on Web App Development 27-2-2017 - Winter Internship Projects

F R O M S T U D E N T B R A N C H E S

Page 47: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

47 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

F R O M S T U D E N T B R A N C H E S

REGION-IvGandhi Institute for Education and Tech., Bhubaneswar Silicon Institute of Technology, Bhubaneswar

14-2-2017 – Prof. M Mutyalu, VC inaugurated the Student Branch

11 & 12-2-2017 - Second National Conference on Recent Advances in Computer Science & Engineering

REGION-vSasi Institute of Technology & Engineering, Tadepalligudem

11-2-2017 - Prof. Raghavendra Rao addressing during Guest Lecture on Recent Trends in Data Mining

24-2-2017 to 26-2-2017 - Three Day Workshop on MEAN Stack Technologies

Gokaraju Rangaraju Inst. of Engg. & Tech., Hyderabad Scient Institute of Technology, Hyderabad

19-1-2017 - AAVISHKAR 2K17 17-2-2017 & 18-2-2017 - Workshop on Python Programming

Bharat Institute of Engineering and Technology, Hyderabad

16-2-2017 & 17-2-2017 – Two Days Guest Lecture on Develop the logic one insect move outside of the circle at clockwise

and anti clockwise

27-2-2017 - One Day Seminar on Networking and Security Technologies

Page 48: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

www.csi-india.org 48

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

REGION-vvasireddy venkatadri Institute of Technology, Nambur Rao Bahadur Y Mahabaleswarappa Engg. College, Ballari

11-2-2017 – Mr. T Rajesh delivering Guest Lecture on Hadoop & Big Data

6-2-2017 & 7-2-2017 – Two days workshop on Socket Programming in Linux and Maneuver in Computer Science Research

Stanley College of Engineering & Technology for Women, Hyderabad

27-2-2017 to 3-3-2017 - Certification Program on OOPS using Java 4-3-2017 & 5-3-2017 - Two day workshop on IOT and its Applications

Anurag Group of Institutions, Hyderabad vasavi College of Engineering (Autonomous), Hyderabad

21-2-2017 - One day Seminar on Brain Finger Printing 7-3-2017 - Online Technical Quiz Contest

LENDI Institute of Engineering & Technology, visakhapatnam

16-2-17 to 17-2-17 – Dr. Aruna Mallapati delivering lecture during two Day Workshop on Data Mining & Big Data

6-3-2017 - Winners of TECH WHIZ QUIZ-2017

F R O M S T U D E N T B R A N C H E S

Page 49: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

49 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

F R O M S T U D E N T B R A N C H E S

REGION-vAnurag College of Engineering, Aushapur

16-2-2017 & 17-2-2017 - Two Day Workshop on Adobe Photoshop with hands-on practice

18-2-2017 - Expert Guest Lecture on Information Security

NBKR Institute of Science and Technology, Nellore

3-3-2017 & 4-3-2017 - Two Day National Level Student Technical Convention

7-3-2017 - Mr. M V Dinesh delivering lecture during one day Workshop on FOSS & Mozilla

REGION-vIBharati vidyapeeth’s College of Engg. for Women, Pune K K Wagh Institute of Engg. Education & Research, Nashik

25-2-2017 - Workshop on Advance Ethical Hacking 19-3-2017 to 21-3-2017 – Mr. Swaraj Joshi, Prof. Birla, Prof. Sane, Mr. Milind Ghyar, Prof. Nandurkar, Prof. Kamlapur and

Mr. Vishal Pattar during inaugural of Equinox 2017

Mukesh Patel School of Technology Management & Engineering, Shirpur

7-1-2017 - Expert Talk on Latex and Its Application 14-1-2017 - One Day Workshop on NS-3

Page 50: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

www.csi-india.org 50

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

F R O M S T U D E N T B R A N C H E S

REGION-vIGuru Gobind Singh Polytechnic, Nashik Pune Institute of Computer Technology, Pune

6-3-2017 - Motivational Speech by Dr Vijay Mhaske during CSI Day celebrations

28-2-2017 – Event on Career in Management and Interview Outsmarting Skills

REGION-vI REGION-vIIPankaj Laddhad Inst. of Tech. and Mgmt. Studies, Buldana Priyadarshini Engineering College, vaniyambadi

18-3-2017 & 19-3-2017 - Mr. D M Kharat addressing the session on WORDPRESS

18-3-2017 - One day Workshop on Software Testing

REGION-vIIS A Engineering College, Chennai St. Peter's College of Engg. and Technology, Chennai

6-3-2017 - Dr. Viji Rajesh, Prof. Sujatha, Prof. Geetha & Dr. Nagarajan during CSI day Celebrations

21-2-2017 – Dr. Shanthini, Dr. Selvan, Dr. Srinivasan & Dr. Sikamani during Special Lecture on Big Data Analytics

valliammai Engineering College, Kattankulathur

11-1-2017 – Mr Gowthaman delivering the Guest Lecture on Robotics and Security Development

2-2-2017 Guest Lecture on Analysis of Algorithms

Page 51: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

51 C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

REGION-vIIAgni College of Technology, Chennai PSGR Krishnammal College for Women, Coimbatore

15-2-2017 to 25-2-2017 – Event on AGNIHACKATHON 7-2-2017 - Expert Lecture on An Overview of JoomlaNational Engineering College, Kovilpatti Jeppiaar Institute of Technology, Sriperumpudur

16-2-2017 & 17-2-2017 - Dr. Bala Murugan, SSC-TN, CSI releasing Souvenir during Two days National Level

Technical Symposium (NECSI’17)

10-2-2017 – Dr. Marie Wilson & Mr. Annesly carvalho releasing the Techisetz’17 magazine during National Level

Technical SymposiumKongu Engineering College, Erode Er Perumal Manimekalai college of Engineering, Hosur

4-3-2017 - National Conference (NCNIC’17) 13-3-2017 - Mr. Aravind Kumar explaining the kit usages during Hands On Training Internet Of Things(IOT)

Syed Ammal Engineering College, Ramanathapuram vIT University, vellore

18-2-2017 – Dr. Periyasamy, Vice Principal distributing the prize to the winner during App Development Contest

13-3-2017 – during One day workshop on Internet of Things

F R O M S T U D E N T B R A N C H E S

Page 52: ISSN 0970-647X Big Data Analyticscsi-india.org.in/Communications/CSIC_April_2017.pdfRole of Hadoop in Big Data Analytics 14 TECHNICAl TRENDS Data Lake: A Next Generation Data Storage

www.csi-india.org 52

C S I C o m m u n I C a t I o n S | a P R I L 2 0 1 7

REGION-vIIKnowledge Institute of Technology, Salem Kalaignar Karunanidhi Institute of Technology, Coimbatore

6-3-2017 – CSI Day Celebrations Digital Thinking Contest 3-3-2017 & 4-3-2017 - International Conference on Science, Technology, Engineering and Management (ICSTEM 2017)

Mar Baselios College of Engineering and Technology (MBCET), Trivandrum

27-1-2017 - One day Seminar on Higher Studies in Foreign Universities

15-2-2017 & 16-2-2017 - Two day Workshop on Campus to Corporate Life

Toc H Institute of Science and Technology, Arakkunnam

17-2-2017 & 18-2-2017 - Two day intercollegiate technical event TECH FOSS 2K17

Student branches are requested to send their report to [email protected]

with a copy to [email protected].

Chapters are requested to send their activity report to [email protected].

Kindly send high resolution photograph with the report.

Registered with Registrar of News Papers for India - RNI 31668/1978 If undelivered return to : Regd. No. MCN/222/20l5-2017 Samruddhi Venture Park, Unit No.3, Posting Date: 10 & 11 every month. Posted at Patrika Channel Mumbai-I 4th floor, MIDC, Andheri (E). Mumbai-400 093 Date of Publication: 10th of every month