csic 2015( june )

52
CSI Communications | June 2015 | 1 ISSN 0970-647X Cover Story Data Science – Data, Tools & Technologies 8 Cover Story Leveraging Bigdata Towards Enabling Analytics Based Intrusion Detection Systems in Wireless Sensor Networks 12 Article The Cardinal Sin of Data Mining and Data Science: Overfitting 32 Security Corner Area Prone to Cyber Attacks 40 Research Front A Novel Approach to Secure Data Transmission using Logic Gates 17 Volume No. 39 | Issue No. 3 | June 2015

Upload: danghuong

Post on 02-Jan-2017

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSIC 2015( June )

CSI Communications | June 2015 | 1

ISS

N 0

97

0-6

47

X

Cover StoryData Science – Data, Tools & Technologies 8

Cover StoryLeveraging Bigdata Towards Enabling Analytics Based Intrusion Detection Systems in Wireless Sensor Networks 12

ArticleThe Cardinal Sin of Data Mining and Data Science: Overfi tting 32

Security CornerArea Prone to Cyber Attacks 40

Research FrontA Novel Approach to Secure Data Transmission using Logic Gates 17

Volume No. 39 | Issue No. 3 | June 2015

Page 2: CSIC 2015( June )

CSI Communications | June 2015 | 2 www.csi-india.org

Know Your CSI

Executive Committee (2015-16/17) »President Vice-President Hon. Secretary Hon. Treasurer

Prof. Bipin V. Mehta Dr. Anirban Basu Mr. Sanjay Mohapatra Mr. R. K. [email protected] [email protected] [email protected] [email protected]

Immd. Past President

Mr. H. R. [email protected]

Nomination Committee (2015-2016)

Dr. Anil K. Saini Mr. Rajeev Kumar Singh Prof. (Dr.) U.K. Singh

Regional Vice-PresidentsRegion - I Region - II Region - III Region - IV

Mr. Shiv Kumar Mr. Devaprasanna Sinha Dr. Vipin Tyagi Mr. Hari Shankar Mishra Delhi, Punjab, Haryana, Himachal Assam, Bihar, West Bengal, Gujarat, Madhya Pradesh, Jharkhand, Chattisgarh,

Pradesh, Jammu & Kashmir, North Eastern States Rajasthan and other areas Orissa and other areas in

Uttar Pradesh, Uttaranchal and and other areas in in Western India Central & South

other areas in Northern India. East & North East India [email protected] Eastern India

[email protected] [email protected] [email protected]

Region - V Region - VI Region - VII

Mr. Raju L. Kanchibhotla Dr. Shirish S. Sane Mr. K. Govinda Karnataka and Andhra Pradesh Maharashtra and Goa Tamil Nadu, Pondicherry,

[email protected] [email protected] Andaman and Nicobar,

Kerala, Lakshadweep

[email protected]

Division ChairpersonsDivision-I : Hardware (2015-17) Division-II : Software (2014-16) Division-III : Applications (2015-17)

Prof. M. N. Hoda Dr. R. Nadarajan Mr. Ravikiran Mankikar [email protected] [email protected] [email protected]

Division-IV : Communications Division-V : Education and Research

(2014-16) (2015-17)

Dr. Durgesh Kumar Mishra Dr. Suresh Chandra Satapathy [email protected] [email protected]

Important links on CSI website »

Publication Committee (2015-16)

Dr. A.K. Nayak Chairman

Prof. M.N. Hoda Member

Dr. R. Nadarajan Member

Mr. Ravikiran Mankikar Member

Dr. Durgesh Kumar Mishra Member

Dr. Suresh Chandra Satapathy Member

Dr. Vipin Tyagi Member

Dr. R.N. Satapathy Member

Important Contact Details »For queries, correspondence regarding Membership, contact [email protected]

About CSI http://www.csi-india.org/about-csiStructure and Orgnisation http://www.csi-india.org/web/guest/structureandorganisationExecutive Committee http://www.csi-india.org/executive-committeeNomination Committee http://www.csi-india.org/web/guest/nominations-committeeStatutory Committees http://www.csi-india.org/web/guest/statutory-committeesWho's Who http://www.csi-india.org/web/guest/who-s-whoCSI Fellows http://www.csi-india.org/web/guest/csi-fellowsNational, Regional & State http://www.csi-india.org/web/guest/104Student Coordinators Collaborations http://www.csi-india.org/web/guest/collaborationsDistinguished Speakers http://www.csi-india.org/distinguished-speakersDivisions http://www.csi-india.org/web/guest/divisionsRegions http://www.csi-india.org/web/guest/regions1Chapters http://www.csi-india.org/web/guest/chaptersPolicy Guidelines http://www.csi-india.org/web/guest/policy-guidelinesStudent Branches http://www.csi-india.org/web/guest/student-branchesMembership Services http://www.csi-india.org/web/guest/membership-serviceUpcoming Events http://www.csi-india.org/web/guest/upcoming-eventsPublications http://www.csi-india.org/web/guest/publicationsStudent's Corner http://www.csi-india.org/web/education-directorate/student-s-cornerCSI Awards http://www.csi-india.org/web/guest/csi-awardsCSI Certifi cation http://www.csi-india.org/web/guest/csi-certifi cationUpcoming Webinars http://www.csi-india.org/web/guest/upcoming-webinarsAbout Membership http://www.csi-india.org/web/guest/about-membershipWhy Join CSI http://www.csi-india.org/why-join-csiMembership Benefi ts http://www.csi-india.org/membership-benefi tsBABA Scheme http://www.csi-india.org/membership-schemes-baba-schemeSpecial Interest Groups http://www.csi-india.org/special-interest-groups

Membership Subscription Fees http://www.csi-india.org/fee-structureMembership and Grades http://www.csi-india.org/web/guest/174Institutional Membership http://www.csi-india.org /web/guest/institiutional-

membershipBecome a member http://www.csi-india.org/web/guest/become-a-memberUpgrading and Renewing Membership http://www.csi-india.org/web/guest/183Download Forms http://www.csi-india.org/web/guest/downloadformsMembership Eligibility http://www.csi-india.org/web/guest/membership-eligibilityCode of Ethics http://www.csi-india.org/web/guest/code-of-ethicsFrom the President Desk http://www.csi-india.org/web/guest/president-s-deskCSI Communications (PDF Version) http://www.csi-india.org/web/guest/csi-communicationsCSI Communications (HTML Version) http://www.csi-india.org/web/guest/csi-communications-

html-versionCSI Journal of Computing http://www.csi-india.org/web/guest/journalCSI eNewsletter http://www.csi-india.org/web/guest/enewsletterCSIC Chapters SBs News http://www.csi-india.org/csic-chapters-sbs-newsEducation Directorate http://www.csi-india.org/web/education-directorate/homeNational Students Coordinator http://www.csi- india .org /web/national-students-

coordinators/homeAwards and Honors http://www.csi-india.org/web/guest/251eGovernance Awards http://www.csi-india.org/web/guest/e-governanceawardsIT Excellence Awards http://www.csi-india.org/web/guest/csiitexcellenceawardsYITP Awards http://www.csi-india.org/web/guest/csiyitp-awardsCSI Service Awards http://www.csi-india.org/web/guest/csi-service-awardsAcademic Excellence Awards http://www.csi-india.org/web/guest/academic-excellence-

awardsContact us http://www.csi-india.org/web/guest/contact-us

Page 3: CSIC 2015( June )

CSI Communications | June 2015 | 3

ContentsVolume No. 39 • Issue No. 3 • June 2015

CSI Communications

Please note:

CSI Communications is published by Computer

Society of India, a non-profi t organization.

Views and opinions expressed in the CSI

Communications are those of individual authors,

contributors and advertisers and they may diff er

from policies and offi cial statements of CSI. These

should not be construed as legal or professional

advice. The CSI, the publisher, the editors and the

contributors are not responsible for any decisions

taken by readers on the basis of these views and

opinions.

Although every care is being taken to ensure

genuineness of the writings in this publication,

CSI Communications does not attest to the

originality of the respective authors’ content.

© 2012 CSI. All rights reserved.

Instructors are permitted to photocopy isolated

articles for non-commercial classroom use

without fee. For any other copying, reprint or

republication, permission must be obtained in

writing from the Society. Copying for other than

personal use or internal reference, or of articles

or columns not owned by the Society without

explicit permission of the Society or the copyright

owner is strictly prohibited.

Printed and Published by Suchit Shrikrishna Gogwekar on Behalf of Computer Soceity of India, Printed at G.P.Off set Pvt Ltd. Unit No.81, Plot No.14, Marol Co-Op. Industrial Estate, off

Andheri Kurla Road, Andheri (East), Mumbai 400059 and Published from Computer Society of India, Samruddhi Venture Park, Unit No. 3, 4th Floor, Marol Industrial Area Andheri

(East), Mumbai 400093. Editor: A K NayakTel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Printed at GP Off set Pvt. Ltd., Mumbai 400 059.

Chief EditorDr. A K Nayak

Guest EditorDr. Vipin Tyagi

Published byExecutive Secretary

Mr. Suchit Gogwekar

For Computer Society of India

Design, Print and Dispatch byCyberMedia Services Limited

PLUSBrain TeaserDr. Durgesh Kumar Mishra

43

Reports 45

Student Branches News 49

Cover Story

8 Data Science – Data, Tools &

Technologies

Hardik A Gohel

12 Leveraging Bigdata Towards Enabling

Analytics Based Intrusion Detection

Systems in Wireless Sensor Networks

Pritee Parwekar and Suresh Chandra Satapathy

Research Front

17 A Novel Approach to Secure Data

Transmission using Logic Gates

Rohit Rastogi, Rishabh Mishra, Sanyukta

Sharma, Pratyush Arya and Anshika Nigam

20 An Effi cient Cluster-based Multi-Keyword

Search on Encrypted Cloud Data

Rohit Handa and Rama Krishna Challa

28 A Collaborative Approach for Malicious

Node Detection in Ad hoc Wireless

Networks

Shrikant V Sonekar and Manali Kshirsagar

Article

32 The Cardinal Sin of Data Mining and

Data Science: Overfi tting

Gregory Piatetsky-Shapiro

and Anmol Rajpurohit

Practitioner Workbench

34 Programming.Tips() »

Salting Passwords

Rahul Bhati

35 Programming.Learn("R") » Cluster Analysis in R Language

Ghanshaym Raghuwanshi

Case Study

36 Data Quality Perspective on Retail

ERP Implementation : A Case Study

Dinesh Mohata

Security Corner

40 Area Prone to Cyber Attacks

Abha Thakral, Nitin Rakesh and

Abhinav Gupta

Complaints of non-receipt of CSIC may be communicated to Mr. Ashish Pawar, 022-29261724, [email protected], indicating

name, membership no, validity of membership (other than life members), complete postal address with pin code and contact no.

Page 4: CSIC 2015( June )

CSI Communications | June 2015 | 4 www.csi-india.org

EditorialProf. A.K. NayakChief Editor

Dear Fellow CSI Members,

In the last few years, Data is increasing at a very high rate. Data being available since centuries in various forms is being digitized. There

has been an explosion in the amount of data that’s available. Now the problem is not about getting the data, the problem is what and how

to use it eff ectively. The data to be processed is not only the own data of an organization but all of the data that is available and relevant.

Using this huge amount of data eff ectively requires something diff erent from traditional statistics. Processing of this data requires

distinctive new skills and tools. It requires high performance computing, data processing, development and management of databases,

data mining and warehousing, mathematical representations, statistical modelling and analysis, and data visualization with the goal of

extracting information from the data collected for various applications. Data Science has emerged as a new area that combines all these

expertise intersecting the fi elds of social science and statistics, information and computer science, and design.

Our ability to process this voluminous data is limited by the lack of expertise. The databases are diffi cult to process using traditional

tools and to represent using standard graphics software. The data is also more heterogeneous in comparison to previous data. Digitized

text, audio, and visual content, like sensor and weblog data, is typically messy, incomplete, and unstructured; and frequently must be

processed with other data to be useful.

Recognizing the importance of Data Science in processing of voluminous data and to discuss various aspects of Data Science, the

publication committee of Computer Society of India selected the theme of CSI Communications (The Knowledge Digest for IT

Community) June issue as "DATA SCIENCE".

In cover story of this issue, "Data Science- Data, Tools and Technology" by H. A. Gohel, an overview of Data Science is given. We have

given an overview of National Data Sharing and Accessibility Policy (NDSAP) and Big data initiative of Govt. of India. P. Parewekar and

S. C. Satapathy have proposed a hybrid solution to utilize the capabilities of Bigdata across networks with an ability to detect and fi ght

against intrusions. In research front, we have included three articles. In "A Novel Approach to Secure Data Transmission using Logic

Gates" by R. Rastogi and his students have proposed a technique to transmit data in encrypted form. Another article by R. Handa and

R.K. Challa has given an effi cient search technique based on multi keywords to search data in the cloud. S. Sonekar and M. K. Shirsagar

have given "A Collaborative Approach for Malicious Node Detection in ad-hoc Wireless Networks".

An article by A. Thakral, N. Rakesh and A. Gupta has provided reasons of cyber security related vulnerabilities in Indian context and has

given certain measures to tackle these.

Finally, a case study "Data Quality Perspective on Retail ERP Implementation" by D. Mohata gives issues and challenges faced in

processing of data in implementing a retail ERP solution.

This issue contains the exclusive interview with Mr. Raj Saraf, Chairman of Zenith Computers and Zenith Infotech to get his views on the

Indian IT scenario and role of CSI in present context .

This issue also contains Practitioner's Workbench, Crosswords, CSI reports and news from divisions, chapters, student branches, and

Calendar of events.

We are thankful to Gregory Shapiro and A. Rajpurohit for permitting to share their views on overfi tting in data science.

The publication committee express it’s deep condolences on the sad demise of Late Hemant Sonawala, the Past President, fellow and life

time achievement awardee who was treated as one of the father fi gure in indian IT industry. We request the fellows and senior members

who are personally known to him to express their deep concern and communicate the same in [email protected].

I take this opportunity to express my thanks to the Guest Editor Dr. Vipin Tyagi, who agreed to bring out this issue. On behalf of publication

committee, I wish to express my sincere gratitude to all authors and reviewers for their support and signifi cant contribution in this issue.

I hope this issue will be successful in introducing various aspects of Data Science to IT community.

Finally, I look forward to receive the feedback, contribution, criticism, suggestions from our esteemed members and readers

at [email protected].

Prof. A.K. Nayak

Chief Editor

Page 5: CSIC 2015( June )

CSI Communications | June 2015 | 5

CSI Communications, May 2015 issue with a theme “Cyber

Security” is appreciated by members at large. The Guest Editor

Dr. Vipin Tyagi, RVP3 has put his sincere eff orts in compiling

informative articles on Cloud Security; Cyber Security; Security,

Privacy and Trust in Social Networking sites etc. Today it is a

need of the day to educate professionals and citizens about use

and abuse of Cyber World.

Recently the meeting of the Executive Committee of CSI

was held at Kolkata, in which many decisions regarding the

functioning of CSI were taken. Website for CSI 2015 Convention,

which is being hosted by Delhi Chapter, is up. The Regional Vice

Presidents and Divisional Chairpersons gave overview of the

activities conducted by the chapters and also deliberated on

activities planned in the region and in the division. The conveners

for IT Excellence Award and YITP Awards for 2015 are Shri Raj

Saraf and Dr. Nilesh Modi respectively. Both the awards are very

popular as every year large number of nominations are received.

Mr. H. R. Mohan, Chairman, Awards Committee will send call

for Nominations for CSI Service Awards in due course of time.

Dr. Suresh Chandra Satapathy, Chairman, Division V (Education

& Research) will initiate CSI Research initiatives. Dr. A.K.Nayak,

Chairman, Publications Committee briefed about the various

initiatives in publications.

ExecCom nominated Regional Student Co-ordinators (RSC)

and State Student Co-ordinators (SSC) from the nominations

received for these positions.

I am happy to note that School of Computer Science at VIT

University has started off ering 2 Credit Course as CBCS for the

student members of CSI for co-extracurricular activities by them.

The activities are supervised by the faculty members. This is the

very good initiative by the university to encourage students to

join CSI as student member.

I had an opportunity to meet Dr. Dilip Kumar Sharma,

Chairman; Managing Committee Members of Mathura Chapter

and Prof. D. S. Chauhan, Vice Chancellor, GLA University,

Mathura. The activities conducted by the chapter with the

support of Dr. D.S. Chauhan are impressive. I also met faculty

members of Hindustan Institute of Management & Computer

Studies, Mathura. There is an active student branch in the

campus, which is conducting many technical activities.

CSI SIG -eGovernance has announced thirteenth

anniversary of the prestigious CSI Nihilent eGovernance Awards.

The process adopted to fi le the nomination is paperless. The

nominations can be fi led through the portal. The awards will be

given away to the winners during CSI 2015 at Delhi.

CSI Young Talent Search in Computer Programming for

the selection of teams to represent India at SEARCC (South

East Asia Regional Computer Confederation) International

Schools’ Software Competition - 2015 is announced. The top two

teams at the National level will represent India at the SEARCC

International Schools’ Software Competition 2015 (ISSC 2015)

to be held at Colombo, Sri Lanka between 9th and 11th October

2015. This is a good opportunity for schools to nominate their

teams in this competition.

The advanced application of IT in Agriculture is becoming

more popular due to usefulness to the farmers. Agriculture is

one of key sectors which aff ect the life. Cloud Computing, Social

Media, Image Processing etc. will help to improve GDP as well

happiness of citizens. The technology will benefi t farmers and

food processing units. It will bring parity in prices and quality

products to consumer. The role of IT education in Agriculture is

important and CSI can take the lead in this area.

It is a sad moment for all of us to know about the sad

demise of Shri Hemant Sonawala (78), Past President, Fellow

and recipient of Life Time Achievement Award of Computer

Society of India, on 30th May 2015 at Mumbai. He was a very

lively person, worked tirelessly all his life for the profession,

CSI, Digital and Hinditron. His contribution towards Computer

Society of India will be remembered for a long time. His demise is

a huge loss to the society and IT fraternity. Let us pray almighty

to rest his soul in eternal peace.

With best wishes,

Bipin V. Mehta

President’s Message Prof. Bipin Mehta

From : President’s Desk:: [email protected] : President's MessageDate : 1st June 2015

Dear Members,

Page 6: CSIC 2015( June )

CSI Communications | June 2015 | 6 www.csi-india.org

Our journey for making CSI, a professionally run society is

continuing unabated inspite of insinuations and casting of

aspersions by some senior members. We know our goal and we

are following Swami Vivekananda’s words “Arise, awake and stop

not till your goal is reached”.

We have already taken some bold decisions to bring in systems

and processes in CSI to help our Members.

1. When the CSI web site was down and the vendor Leo

Technosoft started demanding more payment, we conducted

our investigations on the reasons for such demands and on

payments made so far on development of CSI web site.

We found out (as detailed by Hony. Secretary) that since

2010, CSI has paid Rs.68,46,595.00  for development of

CSI Knowledge Portal. This includes payments to various

agencies: Rs.1517728.00 to Mindcraft Software Pvt. Ltd.,

Rs. 36,121,93.00 to Leo Technosoft Pvt Ltd., Rs. 16,166,74

to Consultant Mr. Mohan Datar and Rs.1,00,000.00 to

Ms. Shailaja Adurthi.

Unfortunately details of terms of payment, agreements,

deliverables etc. are not available in CSI offi ce and in Minutes

of ExecCom meetings. Further investigation is needed to

ascertain the circumstances which prompted the Presidents

of the relevant periods to approve such payments.

I was wondering why even after payments of such staggering

amounts, there have been so many complaints about the

web site. Why so many of our members who accessed the

web site complained about diff erent aspects of its working?

How many prospective members did we lose due to non

functioning of CSI Web site? How many members could not

edit their personal data due to erroneous behavior of the CSI

web site? The numbers are countless.

The vendor Leo Technosoft demanded more as they were

not happy with the amount of Rs. 36,121,93.00 paid from

July 2011 till September 2014. As the Vice President, Hony.

Secretary and Treasurer decided not to yield to their unjust

demands, the services were stopped.

Alternative plans have been made to develop a new portal

from scratch and the process has started with minimum

possible expense.

2. We have made signifi cant changes to our mouthpiece CSI

Communications. A new set of editors will be announced

soon and we are streamlining the process of publishing

reports on CSI activities. The guidelines are as follows:

Reports on Student Branch activities should be sent to: [email protected]

The report should be brief within 50 words highlighting the

achievements and with a photograph with a resolution higher

than 300 DPI.

Reports on Chapter Activities should be sent to: [email protected]

The report should be within 100 words highlighting the

objective and clearly discussing the benefi ts to CSI Members.

It should be accompanied by a photograph with a resolution

higher than 300 DPI.

Conference/ Seminar reports should be sent by Div Chairs and RVPs to [email protected]

Again the report should be brief within 150 words highlighting

the objective and clearly discussing the benefi ts to CSI

Members. It should be accompanied by a photograph with a

resolution higher than 300 DPI.

Members may note that we are trying to accommodate as

many reports as possible within the available space and are

requested to keep the guidelines in mind. There is necessity

to print good quality photographs and care need to be taken

on this.

I am glad that Dr. Vipin Tyagi, VP, Region III has agreed to coordinating publishing reports of these activities. He can

be contacted at [email protected] for any issues.

3. We have got very good response to our Call for Regional and

State Student Coordinators. The list of Coordinators is being

fi nalized and will be announced in June 2015.

4. Our resolve to off er various training programs to our members

has gathered momentum. A Two days Training program on

Embedded System Design using MSP 430 is being organized

at CSI- Education Directorate Chennai on June 13 and 14 in

association with NIELIT of Govt. of India.

CSI-Education Directorate is close to fi nalizing the agreement

for PMI Certifi cation for our Members.

5. Dr Suresh Satapathy, Division Chair, Education and Research

has been requested to prepare a list of Conferences happening

world wide whose Call for Papers will be of interest to our

members. The list will be included in CSIC soon.

Overall things are improving in CSI. The new ExecCom

believes in transparency, effi ciency, prudence and has zero

tolerance for fi nancial irregularities. We will continue our

Journey in this direction and are determined to improve

things with the cooperation of all our Members.

Best wishes,

Dr Anirban Basu

Vice President’s

Column

Prof. Dr. Anirban Basu, Vice President

Page 7: CSIC 2015( June )

CSI Communications | June 2015 | 7

Meeting with Mr. Raj Saraf, Chairman of Zenith Computers and Zenith Infotech

CSI Vice President, Dr. Anirban Basu and Hony. Secretary, Mr. Sanjay Mohapatra along with Mr. Ravikiran Mankikar, Chairman Division III, met Mr. Raj Saraf, Chairman of Zenith Computers and Zenith Infotech in his offi ce in Mumbai on May 4, 2015. Mr. Saraf has been closely associated with the Computer Society of India and has been a well wisher of CSI for long. As Mr. Saraf has been the doyen of the IT industry, Dr Anirban Basu and Mr. Sanjay Mohapatra felt that his views on the Indian IT scenario and role of CSI in the present scenario will be very relevant and interesting to CSI Members. The following is the summary of the discussions:

What is the present IT scene in the country?The present scenario of Hardware manufac turing is very bad as lot was expected from the budget, however nothing came which would have encouraged the domestic manufacturing to take it up. The import of hardware will continue like before as to import is cheaper than manufacture in India. The only possible item which will see local manufacturer may be Mobile phones but certainly not Desktop, Laptop, Thin client etc. In terms of Software the export will continue with a normal 10 to 15% growth.

What do you see in the near future in terms of technology, growth of IT industry and employment of Indian IT professionals?In terms of technology we see lot of growth happening as lot of companies shifted their R & D into India and the demand within India for high calibre IT professionals by MNC’s, large Indian companies are growing and the one who has the highest demand is the E-Commerce companies. I do not see any growth of IT professionals at the low level of Software companies requirements due to more & more automation in the Software sector.

What are plans of Zenith Group in terms of technology development and creating more job opportunities?In terms of Zenith Group the company is looking into areas of Cloud Technology and as it has been tradition of the Zenith group to go into the latest fi eld of IT. We feel that entire IT infrastructure requirement of user Private or Public will move to the Cloud. We feel more than 75% of the infrastructure will move to Infrastructure as a Service model. The company will be creating more jobs as with the latest Cloud technology the employees will learn more and will have better scope for developing their own skill sets.

What is your opinion of the role of CSI in Indian IT scene?CSI being the oldest body for IT professionals should try to recapture the earlier position by not allowing people to go away to other organisations like NASSCOM. Currently, the profi le of CSI is very good in education and R&D but very weak in commercial organisation and commercial IT professionals. Probably the best course would be for CSI to create a parallel organisation for the commercial market within CSI itself.

How can CSI be more eff ective?CSI to be more eff ective should be present in all forums Private or Government irrespective of the venue or location. Another thing CSI is much spread all over the country which should consolidate itself to not more than 8 to 10 locations.

How can Zenith and CSI work together in PM’s mission of Digital India?Zenith has been supporting CSI and with CSI can defi nitely work together in the PM’s mission of Digital India with biggest expertise which Zenith could off er by way of off ering Cloud Infrastructure Private or Public. In fact the whole emphasis of Digital India is based on internet and Zenith with infrastructure and CSI with diff erent applications can defi nitely partner in selective fi elds of Digital India program.

. h tt d ,,

h l

ee d n l

ee

gg ,,

Left to right: Mr. Ravikiran Mankikar, Chairman Division III, Dr. Anirban Basu, Vice President, Mr. Raj Saraf, Chairman, Zenith Computers and Mr. Sanjay Mohapatra,

Hony. Secretary on May 4, 2015

Word of CondolenceOffi ce Bearers, Executive Committee Members, Fellows and Members of Computer Society of India express deep condolences on the sad demise of Shri Hemantbhai Sonawala, Past President, Fellow and Life Time Achievement Awardee of CSI. Shri Hemantbhai, a technology entrepreneur, driven by his mission of “Better life through Technology” for the last four decades, was one of the founding fathers of the IT industry in India.

Shri Hemantbhai pursued the Indian dream at a time when few Indians were returning to India. His strong belief in India and its potential brought him back from US shores to set up business in India to lay the foundation for India’s growth story as an IT superpower. For him, IT does not just mean Information Technology, but Indian Talent. His focus for the last four decades has been to leverage India’s talent in engineering to make India a self reliant economy and to position it as a leader in the global scenario.

His contribution towards Computer Society of India, education, his philanthropic services and empowering young members in CSI, will be remembered for a long time.

His demise is not only loss to his family, but a huge loss to the society and IT fraternity.

May his soul rest in eternal peace.

Page 8: CSIC 2015( June )

CSI Communications | June 2015 | 8 www.csi-india.org

IntroductionIf there was a time machine for real, I

would like to take the readers 20 years

back before explaining the title of the

article. There’s a simple reason for it; to

demonstrate how technology today has

worked wonders for almost all the domains

including fi nance, marketing, government

agencies, forensics, education and so on.

Yes, technology it is which has taken on

the roll, but I won’t be talking about the

machines and circuits, but something that

has transformed today’s decision making

process. Let me help you peep through the

offi ce of a CEO, which depicts a typical

corporate scene of the 90s. A company

is seriously hit by its competitors, who

have captured the market. The boardroom

is fi lled with all the top management

executives of the company, struggling

to analyze the situations. The directors

have never ending questions for the

statisticians, marketing executives,

sales managers and the other top notch

professionals. What were our sales this

year? How have we performed over the

last 10 years? What are the market shares

of the product and the deviation of the

fi nancial fi gures? What amount of revenue

are we getting from our best products in

the metro regions? A high profi le meeting

leaves the CEO completely stressed, and

the managers with stringent deadlines

for answers to never ending business

questions, that are going to be really tough

to answer. The next 48 hours are not less

than a nightmare for the managers, who

spend their stressful days and sleepless

nights to fi nd answer to the questions.

They keep scurrying the fi les and sales

reports with their sleeves rolled up under

the lamplight. Piles of printed data of sales

fi gures and a computer running business

software, which seems to be just an

extension of a business calculator. They

wished if there was something to make

their tasks easy.

Yes, there is something today

invented that has helped the business

pundits to answer these tricky questions

better backed by strong evidence. This is

where technology has taken the center

stage and it boasts about the capabilities

of Business Intelligence and Data Science.

The professionals today have been well

equipped with such tools with the support

of the BI (Business Intelligence) experts.

They make the data speak for literally

everything that has happened for over the

period of years. They are able to analyze

the growth of company over the timeline,

performance of products, employees,

divisions in distributed geographical areas

and much more. Well this seems cool for

those who are market research people,

and also for those budding start-ups who

wish to analyze the market before giving

a dive into it. Well that’s not all. Data

science has helped the professionals walk

the extra mile to impress the CEO. After

answering the business questions, they

show the predictions of how the market

would move in the coming months?

What’s the best point of investment?

How would the existing product perform

in the next fi nancial year? Answer to such

important business questions can help

companies fetch billions of dollars from

the market. Such important statistics help

the decision making of the strategists

and the directors of the corporation with

their investments, product launches,

organization restructuring and much

more.

Data Science - Data vs Information vs KnowledgeThere could be three things possible when we consider a technical scenario: data, information and knowledge. Some real facts that are stored in some physical medium may be termed as data. So maybe the traffic signal information and bus GPS data is getting logged somewhere into the database. If you run a simple query on that database, you would get list of columns with unique identifiers, numbers, timestamps and some ids. This could barely make some sense to the viewer. The bus crosses a signal is a fact and the log getting created is data. So what’s the point in logging those alpha-numeric characters in a database, which grows at a rapid rate? This is a very valid question, which is going to be answered soon. The database administrator has the complete idea of the database schema, column mappings, id mappings and other technical stuff. He designs a complex query, which helps render a dataset that seems more

nominal to be read, because he has

simplified the data in a readable format.

The report now has some columns like

date, bus number, stop name, journeys

and arrival time. This is something that

makes more sense than rows of data

that seemed Greek and Latin at the

start. This is a transformed version of

captured data that provides you the

correct facts. This is the information

that helps you understand the running

of buses and at what time the next bus

would come at the mentioned stop. Just

a second, is it just for the information

to the passenger that we are running

a cluster of servers with a team of

technical brains monitoring it? Every

Monday morning, the project director of

the bus services of the city, receives an

automated mail that gives him a report

of the complete bus system and how

it performed the last week. It includes

complete information right from the

bus frequencies, stops occupancies, bus

accidents, traffic information and much

more. Now when he comes to know that

there are 3 buses running on a route that

has only 2 passengers coming every 3

hours, he can make the decisions to

change the route of some buses, so that

the stops overloaded with passengers

could benefit from the empty running

buses. This decision could be taken

because he knew about the complete

bus service system performance. Now

that’s smart isn’t it? That is because

he had the knowledge about the things

through the reports he checked in his

Monday mails.

Now there may be some non-tech

people or may be some technical ones but

not working with data, which may pop up

with a question of extracting knowledge

from data. It’s easy to understand the

logging of data and querying the database

to get information. But imagine a bus

system with 200 buses running across

the cities, with each bus sending a signal

every 5 seconds and each stop sending

signals of arrival of each bus and also of

its departure. Wouldn’t that cumulate to

crores of records in a week? Moreover,

how would one read all those records

and come up with a condensed report?

There’s when something called business

Data Science – Data, Tools & Technologies

Cover Story

Hardik A GohelAssistant Professor, AITS, Rajkot

Page 9: CSIC 2015( June )

CSI Communications | June 2015 | 9

intelligence comes to place, which has got

some intersection with data science. A lot

of people say that business intelligence

and data science are two completely

diff erent things, but factually, apart from

reporting of historical data, the other

segments of BI collaborate closely with

data science to render useful analytics. So

what is the word BI? It has been the buzz

of the IT market for quite a while. If I go by

the defi nition provided by the world’s top

consultancy giant (Gartner Inc.), “Business

intelligence (BI) is an umbrella term that

includes the applications, infrastructure

and tools, and best practices that enable

access to and analysis of information

to improve and optimize decisions and

performance.” The defi nition seems to

be tough for novices to digest, so I would

surely take you through. Getting a deeper

into the bus service system would help us

understand the mentioned jargons easily.

Say for example, the bus service system

has numerous bus operators, which design

the bus schedules, routes and the journeys

of the same. Generally, they defi ne this in

an excel spreadsheet which is easier to

maintain and review. They give this to the

bus services technical team by uploading

the fi les to the FTP servers. This is the initial

crunch of data that the team receives, for

the fi rst day of the week. Moreover, the

buses generate signals every 5 seconds, to

keep the system updated of their presence

on the streets, which also helps them in

monitoring the vehicles. A signal of every

5 seconds are amounts to some 3, 45,000

signals per day by the buses. There might

be some 200 stops in the city, that report

the system, the entry and exit of each bus

on the stop in each journey, which may

amount to some 3,00,000 signals per

day. Considering the complete system

with other logging mechanisms of tickets

and passenger counts, the database could

expect some 12, 00,000 records per

day, which account to almost a crore per

week. Remember the data given by bus

service operators in the excel fi les? How

would you include that as a part of your

analytics? Also there might be data in fl at

fi les given by bus stops about their daily

data in fl at fi les or CSVs. Even if the DBA

(Database Administrator) dealt with such

huge amount of data, how would he deal

with data which is coming from diff erent

sources like CSVs, Excels, and Satellite

data of the buses? Business Intelligence

comes to rescue with its impressive ETL

technology. The ETL or data Extract,

Transform and Load, helps integrate the

data from various data sources (which

are not generally structured) into a single

place. This helps us get data at one place

to help us start perform analysis on the

data. But still we have a problem. Billions

of records are getting logged every week

into a database. How could you store all of

them into a single database? This would

lead to a situation that can be termed

as data explosion, where it becomes

diffi cult to handle data. Moreover the

query would take years to execute if you

ran a complex query over some years

of historical data. Data Warehousing,

a component of Business Intelligence

helps us get this done. The load of the live

database is reduced by archiving historical

data in a data warehouse and letting

the live data come into the production

database. The data-warehouse is a copy

of the transactional database that is

restructured for analysis purposes (again

using ETL). Still the data-warehouse is an

OLTP database, which is not suitable for

analysis. So, heard of OLAP cubes? The

second and most important component

of BI comes to the focus with the

Analytical cubes. OLAP (Online Analytical

Processing) cubes are BI components

that store data in a compressed and

pre-aggregated form that are helpful for

running analytical queries. These cubes

are structured in a way to store data

in an optimized way. The cubes have

capacity to store historical data of several

years. Water in the well never helped

to quench thirst of the thirsty. What we

were concerned with was the knowledge

to gain insights to make decisions. So the

third front of BI off ers reporting services,

which help represent the data through

interactive reports. These reports help

us get a bird eye view of the happenings

of the business. This is the point where

BI may bid good bye and let core data

science take the center stage. Going by

the Wikipedia defi nition, “Data Science

is the extraction of knowledge from data.”

We now are well versed with knowledge

and data. But the various techniques to get

knowledge out of the stored information

or data make this a subject of interest.

Professionals working in the fi eld of data

science are termed as data scientists.

Applying data science techniques on data

varies from case to case, and it needs to

have a well-planned approach. There can

be a general plan for performing data

science over some datasets. Moreover,

the data professional must be certain

with the type of output that he wants

after performing the required analytics.

The fi eld of data science can be very

interesting as it borrows a lot of things

from a myriad of disciplines. There are

techniques, algorithms and patterns

derived from areas like Information theory,

Information technology, mathematics,

statistics, programming, probability

models, data engineering, data modeling,

pattern learning, predictive modeling

and analytics, business intelligence,

data compression and high performance

computing. The predictive modeling,

theories and models of data mining

have added a lot to data science, as they

have enhanced the predictive capabilities

of the fi eld.

The Data ExplosionWith the rise of technology and data

storage systems, we have been able to log

data into servers. Over the period of years,

the cost of storage hardware has gone

down, which has allowed IT companies

to buy numerous commodity servers and

storage systems to store data and also

to extend data storage as a service to its

clients. Content generated from analog

systems inform of sensors, mobile devices,

instruments, web logs and transactions

has been digitized and stored. It’s worth

highlighting the fact that 90% of the data

in the world today has been generated in

the past two years. Data scientists have

applied numerous techniques on this

massive data to identify patterns that has

added to the commercial and social value

of humans. This avalanche of data has

led to inception of new technologies like

that of Big Data, which help us perform

our experiments better and quicker on the

incoming data. Several high performance

computing systems like that of Hadoop

and Cluster computing have helped data

scientists explore petabytes of data in

a much quicker way than ever before. It

is an additive for a data scientist if he is

well-versed with big data technologies.

Since a single person cannot be a jack

of all trades especially in such complex

projects, generally the data analytics

team has several big data developers,

administrators and architects on board to

assist the core data scientists to expedite

the analytics process.

Page 10: CSIC 2015( June )

CSI Communications | June 2015 | 10 www.csi-india.org

Tools and TechnologiesAs far as data science is concerned, it is

not one technology that would help you

get through. Along with strong domain

expertise and analytical capabilities,

you need to have strong knowledge of

a bunch of technologies. Since we are

going to have a lot of data coming in,

the data is generally in spreadsheet or

in a RDBMS. If at all the incoming data

is in some other format, we have data

transformation tools, as explained earlier

to get it converted. When the data sets are

in excel, you would need to have strong

excel skills to transform and restructure

data as part of the data preparation

process. Similarly if you are working on an

RDMBS, SQL is something you should be

handy with. A lot of complex queries need

to be prepared to get the datasets ready.

The core part of data science comes with

the data modeling, predictive analytics

and algorithms that form the spine of

the trade. Generally there are numerous

existing libraries that help the exploration

process, so a good sound knowledge of R

statistics and Python would be good, as

you would be spending a lot of time with

their consoles. There are some statistical

models that need to be custom coded to

get the model working. Integrating these

algorithmic predictive engines into real

time or existing applications might ask

for some experience with programming

languages like Java and Ruby. The

reporting holds enormous importance

as visualizations are seen as the results

of any data science project. There are

technologies like SAS and SPSS that serve

a complete data science stack right from

data integration to report rendering. If

you have done custom coding in open

source technologies, then D3.js and

other JavaScript frameworks could help

you build stunning visualizations. The

emerging career is seen as a data scientist

who is able to handle the big data and

get results with minimum latency. For a

person to analyze terabytes of data, big

data is a recommended solution for them.

This makes Hadoop with any distribution

necessary for the developer. To perform

predictive analysis and data mining on the

Hadoop stack, Mahout is one of the most

popular technologies. So know-how of

these could work wonders for you. Also for

algorithm design and modeling purposes,

good understanding of statistics is a must,

even if you don’t come from a statistical

background or schooling. The list never

ends, and also practically it’s not possible

to learn all at once, but a programming

language, machine learning language,

RDBMS and big data technology is a must

for a data scientist these days.

ConclusionThe future of data science is certainly

green as it is one of the most in-demand

jobs in the market. The companies today

want to know more about the markets and

products before investing. Departments

today are hungry for analytics over the

tons of data stored in the data servers.

The demand-supply model is completely

imbalanced now, due to the high demand

of the data scientists and the scarcity of

those. Every company today wants to

employ these trained professionals which

could help them assist to grow better and

faster. For the same reason the corporate

are ready to shell out hefty amounts

to them. Data is growing and deriving

commercial and business value out of it is

the need of the hour.

References[1] Anonymous (2015)  What is Data

Science? A New Field Emerges, Available at:  http://datascience.berkeley.edu/about/what-is-data-science/ (Accessed: 10th May 2015).

[2] IBM Expert (2014)  About data scientists,  Available at:  http://w w w - 0 1 . i b m . c o m /s o f t w a r e /d a t a / i n f o s p h e r e / d a t a -scientist/  (Accessed: 12th May 2015).

[3] Richard Rivera Adam Haverson (2014)  Data Scientist vs Data Analyst,  Available at:https://www.captechconsulting.com/blogs/data-scientist-vs-data-analyst  (Accessed: 14th May 2015).

[4] S.N.Smith (2015)  Data Science Beta, Available at:http://datascience.stackexchange.com/ (Accessed: 13th May 2015).

[5] Vijay Krishnan (2014)  What algorithms do data scientists actually use at work?,  Available at:  http://www.quora.com/What-algorithms-do-data-scientists-actually-use-at-

work (Accessed: 14th May 2015).

n

Abo

ut th

e A

utho

r

Hardik A Gohel, an academician and researcher, is an Assistant Professor at AITS, Rajkot and life member of CSI.

His research spans Artifi cial Intelligence and Intelligent Web Applications and Services. He has 35 publications in

Journals and proceedings of national and international conferences. He is also working as a Research Consultant. He

can be reached at [email protected]

The term "data science” has existed for over thirty years and was used initially as a substitute for computer science by "Peter Naur" in 1960. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications. In 1996, members of the International Federation of Classifi cation Societies (IFCS) met in Kobe for their biennial conference. Here, for the fi rst time, the term data science was included in the title of the conference ("Data Science, classifi cation, and related methods").

In Nov. 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the University of Michigan. In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In conclusion, he coined the term "data science" and advocated that statistics be renamed data science and statisticians data scientists. Later, he presented his lecture entitled "Statistics = Data Science?" as the fi rst of his 1998 P.C. Mahalanobis Memorial Lectures. These lectures honor Prasanta Chandra Mahalanobis, an Indian scientist and statistician and founder of the Indian Statistical Institute.

In 2001, William S. Cleveland introduced data science as an independent discipline, extending the fi eld of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in Vol. 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique. In his report, Cleveland establishes six technical areas which he believed to encompass the fi eld of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.

In 2008 D.J. Patil and Jeff Hammerbacher coined the term "data scientist" to defi ne their jobs at LinkedIn and Facebook, respectively.

................. Wikipedia

Page 11: CSIC 2015( June )

CSI Communications | June 2015 | 11

National Data Sharing and Accessibility Policy (NDSAP) and Big Data initiative of Govt. of India

https://data.gov.in/sites/default/fi les/NDSAP.pdf https://data.gov.in/

National Data Sharing and Accessibility Policy (NDSAP)

Aim : to provide an enabling provision and platform for proactive and open access to the data generated by various Government of India

entities.

Objectives: to facilitate access to Government of India owned shareable data (along with its usage information) in machine readable form

through a wide area network all over the country in a periodically updatable manner, within the framework of various related policies, acts and

rules of Government of India, thereby permitting a wider accessibility and usage by public.

• The principles on which data sharing and accessibility need to be based include: Openness, Flexibility, Transparency, Quality,

Security and Machine-readable.

• The Department of Science and Technology is serving the nodal functions of coordination and monitoring of policy through close

collaboration with all Central Ministries and the Department of Electronics and Information Technology by creating data.gov.in

through National Informatics Centre.

• As per NDSAP, every Department has to identify datasets by the following categories:

❖ Negative List: The datasets, which are confi dential in nature and would compromise to the county’s security if made public,

are put into this list. The datasets which contain personal information are also included in this list.

❖ Open List: This list comprises of datasets which don’t fall under negative list. These datasets shall be prioritized into high

value datasets and non-high values datasets.

• NDSAP recommends that datasets has to be published in an open format. It should be machine readable. Considering the current

analysis of data formats prevalent in Government, it is proposed that data should be published in any of the following formats:

❖ CSV (Comma separated values)

❖ XLS (Spread sheet - Excel)

❖ ODS/OTS (Open Document Formats for Spreadsheets)

❖ XML (Extensive Markup Language)

• Diff erent types of datasets generated both in geospatial and non-spatial form by Ministries/Departments shall be classifi ed as

shareable data and non-shareable data. The derived statistics like national accounts statistics, indicators like price index, databases

from census and surveys are the types of data produced by a statistical mechanism. However, the geospatial data consists primarily

of satellite data, maps, etc.

Open Government Data (OGD) Platform, India - https://data.gov.in - is a portal intended to be used by Government of India Ministries/

Departments their organizations to publish datasets, documents, services, tools and applications collected by them for public use. It intends

to increase transparency in the functioning of Government and also open avenues for many more innovative uses of Government Data to give

diff erent perspective.

This portal contains:

15,221 Resources, 3,596 Catalogs, 88 Departments, 46 APIs, 499 Visualizations. The data on this portal has been viewed 3.23 M times and downloaded 1.3 M times by 51,416 registered users.

Dr. Vipin TyagiJaypee University of Engineering and Technology, Raghogarh, Guna - MP, [email protected]

Details of the CategoriesPeriod – wise Membership Fee (Rs.)+ Service Tax Extra, as applicable

01 Year 02 Years 03 Years 04 Years 05 Years 10 Years 15 Years 20 Years

Institutional Members (Academic)

with 03 free Nominees

6,000 11,000 16,000 21,000 25,000 48,000 70,000 90,000

Institutional Members (Non-

Academic) with 04 free nominees

10,000 19,000 28,000 36,000 45,000 85,000 1,25,000 1,50,000

CSI Life Membership Fee + Service Taxes Extra, as applicableLife Membership Fee (after 30% Golden Jubilee Discount valid upto 31.12.2015), irrespective of any age group, is Rs. 7,000.00.

From 1st January, 2016, the Life Membership Fee shall be Rs. 10,000.00.Note: Service Taxes, as applicable, shall be extra in all the categories.

CSI Institutional Membership Fee + Service Taxes Extra, as applicable

❖ RDF (Resources Description Framework)

❖ KML (Keyhole Markup Language used for Maps)

❖ GML (Geography Markup Language)

❖ RSS/ATOM (Fast changing data e.g. hourly/daily)

Page 12: CSIC 2015( June )

CSI Communications | June 2015 | 12 www.csi-india.org

Cover Story

Pritee Parwekar* and Suresh Chandra Satapathy** *Dept. of CSE, ANITS, Visakhapatnam**Prof. and Head, Dept of CSE, ANITS, Visakhapatnam

IntroductionWireless Sensor Networks and

predominantly the Internet of Things (IoT)

has numerous devices that have capabilities

of sensing and actuating based on either

rule sets locally available or sourcing it

through higher computational platforms.[1].

These devices feed data streams which will

soon overwhelm the traditional approaches

to data management require a paradigm

shift in data management like the big data.

This paper discusses the issues with respect

to network attacks and employment of Big

Data analytics as backbone for intrusion

detection systems in emerging architectures

of Wireless Sensor Networks and Internet of

Things (IoT).

Big data with a backbone of cloud

computing is the state of art method

to offload considerable computation

requirements from both data centers

and terminal sensing devices. These are

all the more lucrative due to the inherent

qualities of flexibility and scalability[2].

However, cloud computing may not

be directly suitable for all applications

such as WSN (Wireless Sensor

Network) for its high requirement on

real time latency, immediate response

requirements which may be associated

with geographic mobility[3]. For WSN,

area of operation is in the physical world,

while cloud computes towards the edge

of the network.

However, towards semi-real time

issues like data mining for generating

anomaly patterns for intrusion detection

systems[4], a strong system using

technologies like Bigdata are considered

to be promising. The paper studies

the advances for big data in ubiquitous

Wireless Sensor Networks and focus

on the computation and storage, data

analysis and mining towards evolving a

collaborative intrusion detection system.

Challenges in Wireless Sensor Networks towards developing Intrusion Detection SystemsIDS schemes have been implemented in

wired and semi-wired networks. These

systems look for certain misbehavior

patterns in the network which would give a

whiff of a malicious act and thereby trigger

attack mitigating mechanism. WSNs

have an inherent drawback of limited

resource availability in form of energy as

well as computing capabilities. IDS thus

have a signifi cant contribution towards

Protecting WSNs from both internal and

external attacks. An IDS would look for an

anomaly in node behavior and once found

would re-confi gure the network to by-

pass the malicious node and thus prevent

a network attack.

Lately, researchers have proposed

a variety of IDSs and a few of them have

been specifi cally made applicable to WSN

structures (fl at, cluster, hierarchical)[5]

has shown how an IDS can be used to

detect misbehavior of nodes and inform

the neighbor nodes in the network to

invoke necessary countermeasures .Few

of the IDS which have been created for

the wired domains and ad hoc networks

have not been found to be applicable

in the same form in wireless sensor

networks. The network characteristics of

WSNs have confl icting requirements[6])

which comes in the way and complicates

the design of the security mechanisms.

Also as compared to adhoc networks the

computing and energy resources of sensor

nodes are constrained[7].

An IDS can approach the attack

under three classifi cations, namely,

Misuse detection, Anomaly detection,

Specifi cation-based detection as brought

out by[8]. Misuse detection involves

comparing the action or behavior of

nodes with a data bank of attack patterns.

These patterns have to be pre-defi ned and

recorded into the system. The limitation

of this technique is that it is knowledge

dependent to build attack patterns and

therefore fails to detect novel or modifi ed

attacks. The attack patterns database

also need to be regularly updated to

include freshly detected patterns. Here,

the effi ciency with regards to system

management is signifi cantly reduced,

as the network administrator is required

to constantly equip the IDS agents with

a current database. A rule-based or

misuse detection technique for a WSN

is a complex preposition. Practically,

replicating the attackers psyche is

diffi cult. The administrator of the network

is required to pre-empt and model attack

patterns futuristically. Moreover, WSNs

are severely memory constrained which

makes misuse-detection based IDSs in

WSNs diffi cult to implement, as they need

to store attack signatures[9].

Anomaly detection technique

concentrates on behaviors of the nodes

to decide whether they are normal or

anomalous. This method fi rst establishes

the features of a behavior which are

to be considered normal. These are

established by using self learning training

mechanisms. Subsequently, any activity

which does not comply with these pre-

established behaviors is treated as

intrusions. If a certain node does not

behave in accordance with the predefi ned

specifi cation, then the IDS will arrive at an

inference that the said node is malicious.

Any wrong inferences by the IDS would

trigger false alarms which in turn would

Leveraging Bigdata Towards Enabling Analytics Based Intrusion Detection Systems in Wireless Sensor Networks

Abstract: Wireless Sensor Networks are prone to network attacks like any other network. The typical characteristics of WSNs are their

resource constraints in form of energy and computational resources. With these limitations, equipping such networks with Intrusion Detection

capabilities is a challenge. The paper has explored the options proposed by researchers in enabling the network itself to fi ght against such

intrusions. However, the proposal given opts for a hybrid solution where the capabilities of BigData across today’s networks can be utilized to

work out a collaborative solution to equip such resource poor networks with an ability to detect and fi ght against intrusions. The paper charts

a roadmap towards this direction.

Page 13: CSIC 2015( June )

CSI Communications | June 2015 | 13

have aff ect the accuracy of detection.

Hence, this method has a substantial

false alarm rate. Also, an intrusion which

behaves analogous to pre-established

valid behaviors would not be identifi ed

as anomalous behavior and may not be

detected. Several IDS techniques have

been formulated for anomaly detection

in WSNs. Certain assumptions or metrics

are used to determine the behavior of

sensor nodes as normal or abnormal. This

approach is considered easier to apply

as compared to misuse or specifi cation

based detections and most researchers

use it as the main method to detect

intrusions. However, anomaly detection

techniques have a few similar strategies

as misuse detection, for eg. the watchdog

approach[10].

Misuse and anomaly detection

mechanisms are based on machine

learning techniques. However, with a

similar goal, specifi cation-based detection

technique depend on manually described

specifi cations where normal behavior is

defi ned. These specifi cations become

the datum for monitoring all actions. The

manual, labour intensive specifi cations

defi ning process is the main drawback

of this. Further a new malicious activity,

not previously defi ned, is not detected.

In certain cases, misuse and anomaly-

based detection techniques can be blend

together as hybrid detection mechanisms.

IDS to be selected depends on it

capability of outsourcing the computation

requirements to external agency outside

the network. Such capability can be

sourced from the following IDS applicable

to WSN:-

(a) A Partially Distributed Intrusion

Detection System for Wireless Sensor

Networks has been proposed by Eung Jun

Cho et al.[2] which requires low memory

and power. The IDS employs multiple

Bloom fi lter arrays to distribute attack

signatures. It is capable of detecting

fragmented attack signatures at the

application layer and unfragmented attack

signatures at the network layer. As per the

authors, the mechanism can handle denial

of service attacks.

(b) In the PCADID approach[12], the

WSN is partitioned into groups of sensor

nodes. Some of the nodes in each group

are identifi ed as monitor nodes, which

cooperate with each other to create a

global normal profi le. Here every monitor

node creates a sub profi le for its own

normal network traffi c using principal

component analysis (PCA) which it shares

with other monitor nodes. The shared

sub-profi les of the monitor nodes are used

to create the global normal profi le which

is then used to detect anomalies in the

network traffi c. With the normal network

behavior changing progressively, the

global normal profi le also gets updated.

The authors have shown that the PCADID

achieves a high detection rate with a low

false alarm rate.

(c) Author in paper[13] has described

an intruder tracking system for cluster-

based wireless sensor networks using

MAC address. The base station is

responsible for the detection and therefore

the system is more energy-effi cient as

well as facilitates early detection and

prevention of security threats and attacks.

Timely detection and prevention of the

intruder can avoid slowing down of the

network, sending of fake data, etc. Thus,

the Base Station (BS) centric security

system in Wireless networks can have a

considerable degree of security without

signifi cantly consuming energy of nodes

and cluster heads.

(d) Integrated Intrusion Detection

System (IIDS)[14] is a combination of

three individual IDSs: Intelligent Hybrid

Intrusion Detection System (IHIDS),

Hybrid Intrusion Detection System (HIDS)

and misuse Intrusion Detection System.

These are tailored for the sink, cluster

head and sensor node depending on

the likely types and frequency of attacks

these suff er from. The IIDS consists of an

anomaly and a misuse detection module

to increase the detection rate and lower

the false positive rate. A decision-making

module integrates the results and presents

a report of the attacks.

It may be noted that in all these IDS

the aim is to utilize the existing setup of

WSNs by optimizing the IDS algorithms

to facilitate early detection of the attack.

But with ever increasing ingenuity in

attacks, a limited signature databank with

a resource constrained WSN will always is

a bottleneck. It is therefore proposed that

the Cluster head in WSNs will only identify

anomalies and outsource the same to a

cloud infrastructure, where a real time

analytics solution using Big Data will be

employed and directives for handling

the attack will be sourced. The advanced

data mining techniques which traditional

IDS use, but could not be extended to

WSN environments, will now fi nd use. In

other words the Big Data technology will

be leveraged to come over the resource

constrained nature of classical WSNs and

make the best of IDS technologies from

other wired / wireless networks.

Challenges in Big Data Analytics Intrusions are of a variety of natures with

each day the intruders developing more

and more ingenious ways of intruding the

networks. The intrusions take place on

all sorts of networks and are not limited

Fig. 1: Flow Chart for a typical data analyti cs based soluti on

Page 14: CSIC 2015( June )

CSI Communications | June 2015 | 14 www.csi-india.org

to a particular type. A certain intrusion

methodology can be easily extended to

a diff erent network. Making sense of the

data, identifying non-obvious patterns,

and based on this predicting a future

possible intrusion behavior are studies

which have been favorites of researchers.

Knowledge Discovery in Data (KDD) is

about extracting non obvious information

from a pool of data. Data mining is used

to discover interrelations amongst the

datasets by using machine learning and

statistics. Analytics, like a superset,

comprises techniques of KDD, data

mining, text mining, statistical analysis,

rule based and predictive models, and

advanced and interactive visualization to

assist decisions and actions.

Data from various sources are used

to build models. The voluminous data

is required to be pre-processed. The

prepared data is then used to train a

model and to estimate its parameters.

Once the model is estimated, it should

be validated before its use. Normally,

this phase requires the use of the original

input data and specifi c methods to

validate the created model. Finally, the

model is applied to data as it arrives.

This phase, called model scoring, is used

to generate predictions, prescriptions,

and recommendations. The results

are interpreted and evaluated, used to

generate new models or calibrate existing

and are integrated to pre-processed data.

Analytics solutions can be classifi ed

as descriptive, predictive, or prescriptive.

Descriptive analytics uses previously

recorded data to predict and create

guidance reports for management; it

is related with modeling previously

encountered behavior. Analysis of current

and historical data is used in Predictive

analytics to predict the future. The

analysts use the prescriptive solutions by

determining actions and assessing their

eff ect on project objectives, specifi cations,

and the project constraints and then fi nally

arriving at a consolidated decision.

Using analytics may sound as a one

stop solution; however using is analytics is

a tedious, expensive and requiring several

consulting hours to develop and tailor a

solution for a particular project[3]. Such

solutions complex, with considerable

execution time and are hosted on the

project premises. Cloud computing

off ers a platform for the analytics, where

solutions are hosted in the Cloud to be

shared by multiple projects on a scalable

cost and resource model. To make this

happen, there are several technical issues

like the data management, tailoring of

models, date privacy, security, data quality

and currency.

The most tedious process of

analytics is getting the data ready for

analysis. Analyzing large volumes of data

requires effi cient methods for storage,

fi ltration and retrieval of data. Challenges

of deploying data on Cloud environments

and subsequently its management has

been understood and researched for

some time now[22,23,24]. Multiple Cloud

deployment models viz the private, public

or hybrid, are to be considered for arriving

at a Cloud analytics solutions:

Private: It is a cloud deployed on

the organizational network or by a third

party but exclusively for the organization.

A private Cloud is used by organizations

aiming for highest level of security and

data privacy. Such organizations aim for

using Cloud infrastructure to share the

services and resources within the various

arms of the organization which may be co-

located or located across the globe.

Public: This is a cloud deployed

over the Internet and publically available.

Public Cloud usually are highly effi cient

in terms of cost as well as performance.

In the public environment, the analytics

services and data management are

handled by the cloud service provider and

the organizations benefi t with insights of

public analytics results also.

Hybrid: This type combines both

Clouds where scalable resources from

the public Cloud can be extended to the

private Cloud. This is a midpath where

organizations at their level can deploy

analytics applications in a secure private

environment which is scalable, at a lower

cost and has higher degree of security as

compared to using a public Cloud.

Big Data is characterized by variety,

velocity, and volume where variety

represents the data types, while the rate

of data production and processing is

referred as velocity, and volume defi nes

the amount of data. Veracity means how

much of the data can be trusted based on

the reliability of its source.

There are some open challenges.

Researchers are tackling with ever more

challenging issues. There is an increasing

size of data which is unstructured. The

challenge is how to extract meaningful

data from the given data. Also with

steady strand eam of data streaming

from multiple sources, aggregation and

correlation of the data requires a paradigm

change in the methodologies. Subsequent

to fi ltering useful data, the challenge is

to effi ciently recognize and store this

important information extracted from

Fig. 2: Collaborati ve IDS with BigData backbone

Page 15: CSIC 2015( June )

CSI Communications | June 2015 | 15

unstructured data. Volumes of information

are overwhelming and a mechanism for

timely retrieval needs to be worked out.

A new fi le system needs to be designed

which can easily migrate diff erent types of

data and diff erent size of data between the

data centers or cloud providers.

Data integration in light of new

protocols and interfaces is another

challenge due to the variety of data

sources viz. structured, unstructured,

semi-structured.

Integrating Intrusion Detection Systems for Wireless Sensor Networks on Big Data Systems We have proposed the concept of alert

Correlation in Distributed Environment

for developing the cloud and Bigdata

based IDS for WSN. We intend to use

a Big data based fuzzy logic algorithm,

which would help in identifying intrusions

through pattern matching and thereby

reduce false alarms. Fuzzy logic dealing

with vagueness and imprecision has a

capability to represent exact forms of

reasoning in areas where fi rm decisions

have to be made characterized This is

found to be appropriate for intrusion

detection.

ArchitectureWe have the cluster head performing

the preliminary anomaly detection. The

cluster head uses the following rules[15] to

identify the anomaly:-

Interval Rule: This rule analyses the

time period between two consecutive

message receptions and verifi es whether

they comply with the allocated time.

Retransmission Rule: This rule

aims at pinpointing a node that is not

forwarding a message. This rule is used to

detect black hole and selective forwarding

attack.

Integrity Rule: If an attacker changes

the message payload then this rule

identifi es an anomaly.

Delay Rule: If the message is not

delivered on due time then this rule alerts

the system.

Repetition Rule: This rule detects if

a particular node sends message multiple

times and thereby detecting a possible

denial of service attack.

Radio Transmission Range: In

wireless neighboring nodes participate

in the transmission of message. In case

a network message is received but the

neighbor appears to be silent, then there

is an anomaly.

Jamming Rule: This rule analyses

the count of collisions per message

and ensures that it is lower than the

predetermined value.

Once the anomaly is detected, it is

passed to the Big Data based back end

to identify the possible sources of such

an attack. The analytics are expected to

provide a shortlisted result which would

be used for system learning and enabling

a watchdog from these sources as a future

advanced persistent threat.

The Big Data Back End will Work on the Following PrincipleNormalization: The cluster head that

supplies the data either online or offl ine

to the cloud based receiving component.

The data from the network with regards to

threats and anomalies is normalized. The

data from the cluster head comprises of

dynamic fi elds like date and time stamp,

username, port used, IP addresses of the

source and destination etc.

Pre-processing: The normalized

alerts have been allocated some standard

names in a certain format which are

recognizable by other components

involved with the correlation process.

Other pre-processing components may

be required since the cluster heads

free themselves of this memory once

delivered. The main task of pre-processing

component is providing alerts with

missing fi elds which are necessary for

other correlating components[16].

Categorization: In categorization,

similar events are categorized together

and the nature of occurrence on attacks at

a certain time interval is studied.

Correlation: The performance

of Correlation depends on combining

the three tasks of normalization, pre-

processing and categorization. The

key step for selecting a method for

correlation process is to consider nature

of environment followed by more ability

for reception of alerts, trace of tracks,

preparation logs with simple entities

and trace of events with such entities.

The quality of correlation step depends

on lower latency level of tools. The

correlation component discovers relations

between alerts in order to reconstruct the

attack scenarios. This would be the key

component towards intrusion detection

which cannot work in isolation. Co-

operation would be the key word towards

futuristic intrusion detection systems and

big data is the key technology to facilitate

the same.

False Alert Weeding: This component

is tasked with the responsibility of

distinguishing between false positive and

true positive alerts. Diff erent sensors have

their own advantages and disadvantages

in various attacks detection and this is a

famous bottleneck for low-level sensors to

generate lots of false positive alerts.

Attack strategy analysis: The attack

strategy analysis tries to comprehend

the real intentions of invaders. The

Fig. 3: Integrated IDS Model

Page 16: CSIC 2015( June )

CSI Communications | June 2015 | 16 www.csi-india.org

requirement for such an analysis is to

identify the correlation amongst low-level

alerts which would help establishing the

complete strategy of planned attack by

invaders. Prediction of attacks next steps

for suitable reaction against them and

spontaneous response toward prevention

from next damages are totally important

and useful [17].

Prioritization: Prioritization aims at

rating the alerts according to severity and

fi tting an operation against each type of

attack. The component of prioritization

of alerts would have to be provided an

intelligent backbone database in form of

a fuzzy logic/ genetic algorithm based

intelligence engine so as to consider types

of alerts as well as other information.

Prioritization of alerts will also depend

on the Security policies and the network

topologies.

Finally, once the solution is delivered,

the event, its response and the success rate

will get mapped into a data aggregating

component for future use by the present

IDS or which is shareable on the cloud

networks as a collaborative knowledge for

other IDS for Wireless or other networks.

ConclusionThe paper has been able to analyze the

available technology in cloud, Bigdata,

IDS and their applicability to WSNs.

Though an implementation has not been

shown in the paper, a clear road map has

been chalked for an Intrusion Detection

System in Wireless Sensor Networks

working on the principle of collaboration

over a Bigdata as a backbone. A full

fledged implementation using Hadoop is

being conceptualized and is next on our

agenda.

References[1] Pritee Parwekar : “From

Internet of Things towards

cloud of things”  Computer and

Communication Technology (ICCCT),

2011 2nd International Conference

on  Digital Object Identifi er:  10.1109/

ICCCT.2011.6075156  Publication

Year: 2011 , Page(s): 329 - 333 

[2] Fu Xiao, “Big Data in Ubiquitous

Wireless Sensor Networks”,

International Journal of

Distributed Sensor Networks

Volume  2014  (2014), Article

ID 781729.

[3] Adel A Ahmed, “A real–time routing

protocol with mobility support and

load distribution for mobile wireless

sensor networks”, International

Journal of Sensor Networks, Volume

15, Number 2/2014.

[4] Pritee Parwekar,“Application of

Data mining in Network Intrusion

Detection” Technical Paper selected

for presentation at the Indian Science

Congress 2008.

[5] CE Loo, MY Ng, C Leckie, and

M Palaniswami, Intrusion Detection

for Routing Attacks in Sensor

Networks, International Journal of

Distributed Sensor Networks, vol. 2,

pp. 313-332, 2006.

[6] J Lopez, R Roman, and C Alcaraz,

Analysis of Security Threats,

Requirements, Technologies and

Standards in Wireless Sensor

Networks, in Foundations of Security

Analysis and Design 2009, LNCS

56705, August 2009, pp. 289-338.

[7] R Roman, J Zhou, and J Lopez,

Applying Intrusion Detection Systems

to Wireless Sensor Networks, in

Consumer Communications and

Networking Conference, 2006,

pp. 640-644.

[8] Abror Abduvaliyev, Al-Sakib Khan

Pathan, Jianying Zhou, Rodrigo

Roman, and Wai-Choong Wong , On

the Vital Areas of Intrusion Detection

Systems in Wireless Sensor

Networks, IEEE communications,

surveys and tutorials, Vol. 15, No. 3,

Third Quarter 2013

[9] I Krontiris, T Dimitriou, and F C

Freiling, Towards Intrusion Detection

in Wireless Sensor Networks, in 13th

European Wireless Conference, Paris,

France, 2007.

[10] S Marti, T J Giuli, K. Lai, and M Baker,

Mitigating Routing Misbehavior

in Mobile Ad hoc Networks, in

MobiCom’00, 2000, pp. 255-265.

[11] Eung Jun Cho, Choong Seon Hong,

Sungwon Lee and Seokhee Jeon,

A Partially Distributed Intrusion

Detection System for Wireless

Sensor Networks, Journal on Sensors

, 2013.

[12] Ahmadi Livani, M., A PCA-based

distributed approach for intrusion

detection in wireless sensor

networks, Computer Networks and

Distributed Systems (CNDS), 2011

International Symposium

[13] Shio Kumar Singh , M P Singh , and

D K Singh , Intrusion Detection

Based Security Solution for  Cluster-

Based Wireless Sensor Networks,

nternational Journal of Advanced

Science and Technology, Vol. 30,

May, 2011.

[14] Shun-Sheng Wang, An Integrated

Intrusion Detection System for

Cluster-based Wireless Sensor

Networks, elsevier.

[15] Ali Ahmadian Ramaki et al,

“Enhancement Intrusion Detection

using Alert Correlation in Co-

operative Intrusion Detection

Systems”, Journal of Basic and

Applied Scientifi c Research, 2013.

[16] Valeur, F, Vigna, G, Kruegel, C, and

Kemmerer, R A, A Comprehensive

Approach to Intrusion Detection

Alert Correlation, IEEE Transactions

on Dependable and Secure

Computing, p. 146-169, July

2004, 1(3).

[17] Pietraszek, T, Using Adaptive Alert

Classifi cation to Reduce False

Positives in Intrusion Detection, In

the Proceedings of 7th International

Symposium, RAID 2004, p. 102-124,

Sophia Antipolis, France, 2004.

n

Abo

ut th

e A

utho

rs

Pritee Parwekar is pursuing her PhD. in Computer Science and Engg. from GITAM University, Vishakapatnam. Currently she is working with Dept of CSE of ANITS, Vishakapatnam. She has more than 15 years of teaching experience. Her research areas are Sensor network, Cloud Computing and IoT. She is a life Member of CSI. She has reviewed many papers from Springer and IEEE and has already published more than 15 papers with reputed publishers like IEEE and Springer.

Dr. Suresh Chandra Satapathy is PhD in CSE from JNTU, Hyderabad. He is currently working as Prof. and Head, Dept of CSE, ANITS, Vishakapatnam. He has more than 100 publications in both International Journals and conferences. He is the editorial board member of several proceedings with Springer. Currently he is guiding 8 scholars for PhD. He holds the Chairman Div-V(Education & Research) position in CSI and also a senior member of IEEE. His research interests are Data Mining, Machine Intelligence, Swarm Intelligence and Soft Computing.

Page 17: CSIC 2015( June )

CSI Communications | June 2015 | 17

IntroductionCryptography

Cryptography means converting the

data into a secret message (encryption)

and then reverting back the encrypted

message in the form of original data

(decryption).

The secret message thus generated

is called a cipher and it is very important

for secrecy and confi dentiality of

communication between the sender and

the receiver.

Cryptography Features

Since cryptography is basically a security

system, we want it to provide us with a

variety of features or functions which

provide secrecy and confi dentiality of the

data.

Authentication: Authentication

basically means that the identity of the

receiver as well as the sender should

be verifi ed in order to maintain the

authenticity of the data.

Secrecy or Confi dentiality: What we

mean by secrecy is that only the people

who are authenticated should be able

to encrypt or decrypt the data. This

maintains the confi dentiality of our data

thereby making it secure.

Integrity: During encryption or

decryption, we would want our data to be

free from any form of modifi cations. We

would want the data to be received as it

was sent without any modifi cations. This

feature is what is called the integrity. The

basic form of integrity is packet check sum

in IPv4 packets.

Non-Repudiation: This means that

both the sender and the receiver cannot

say that they have not received or sent

the message. Thereby making the process

free of any false claims.

Service Reliability and Availability: We

know that secure systems get hacked by

the hackers, which hampers the availability

of the security, and also hampers the type

of service to the customers. We should

ensure that the service given is exactly

what they expect out of a security system.

Encryption

Encryption is basically a process of making

information hidden or secret.

It is considered as the subset of

cryptography. The process of converting

plaintext into ciphertext is basically what

encryption is.

Ciphertext is basically a coded form

of the data which appears meaningless

and useless but on decryption, it gives us

the original plaintext.

So, in a nutshell we can say that

encryption is a process of converting

meaningful data into, what appears,

meaningless data.

Decryption

The encrypted data is of no use to the user

until and unless we convert it back to the

meaningful form.

The conversion of encrypted data

(cipher text) into useful/meaningful data,

can be termed as decryption.

Decryption is basically the opposite

process of encryption.

Diff erent encryption methods

There are three diff erent basic encryption

methods-

• Hashing :

Basically, hashing makes a unique

invariable length code for each data

text, which is called hash. Since each

hash is diff erent for each and every

text message, therefore it would be

very easy to fi nd small changes. Once

hashing is used to encrypt a data, it

can’t be decrypted back. So, we can

see that hashing is not technically an

encryption operation but it can be

taken as a method to see if the data

has not been tampered with.

• Symmetric methods:

Symmetric encryption, or private-key

cryptography, is basically an encryption

process where the same key is known to

both sender and receiver, and so the key

used to encrypt and decrypt the message

must remain secure, otherwise anyone

with any idea about the key would be able

to access the data. In this, encryption takes

place with one key, then the encrypted

data is sent and decryption again takes

place with the same key.

a. Block Cipher

• It works on invariant group of

bits. Block cipher has unvarying

transformation that is specifi ed

by asymmetric key.

• They are basically very useful in

designing and creating of many

other cryptographic rules or

protocols.

• Whenever we need to encrypt

data in a large amount, or in a

bulk, then block cipher is used.[2]

b. Stream Cipher

• It is basically a symmetric

ciphering method in which we

combine the plaintext digits

with random key stream.

• The concept followed in stream

cipher is encryption of data bit

by bit by the key data to form

ciphertext.

• In this, the encryption of each

state depends upon the current

state.

• In practice, a digit is typically

a bit and the combining operation

an exclusive-or (XOR).[3]

• Asymmetric methods:

It is also called public key cryptography.

It is unlike the previous in a way because

it uses 2 keys, one for the sender and

A Novel Approach to Secure Data Transmission using Logic Gates

ResearchFront

Rohit Rastogi*, Rishabh Mishra**, Sanyukta Sharma**, Pratyush Arya** and Anshika Nigam****Sr. Asst Professor, CSE-Dept-ABES Engg. College, Ghaziabad (U.P.)**B.Tech. (CSE)-Second Year, CSE-Dept.-ABES Engineering College, Ghaziabad (U.P.)***B.Tech, (IT)-Second Year, IT Dept.-ABES Engineering College, Ghaziabad (U.P.)

Abstract: Encryption and decryption processes have been carried out here using logic gates. We fi rstly generate a key using the concept of

cellular automation, which converts our readable data into non readable ciphertext. Further, that very same key is used to decrypt the data.

For encryption, we follow cellular automation followed by passing the key through a series of multiplexers (8x1 and 2x1 MUXs) in order to

create randomness. Also, we have used feedback network in order to create even more randomness which feeds the once used key back to the

combination in order to avoid exhaustion of keys.

This process is relatively cheaper and easy to implement. Further, more complex algorithms can be implemented using this basic method.

Page 18: CSIC 2015( June )

CSI Communications | June 2015 | 18 www.csi-india.org

another for the receiver. Therefore, it

has the potential to be more secure as

such. Using this method, a public key is

made readily available to everyone and

can be used to encrypt messages. And

unlike that public key, a private key is

made available in order to encrypt the

message.

Aim of this PaperWe are focussing on Data Encryption and

Decryption using 74xx Logic Gates.

Analysis of Cryptography Using

Logic Gates :

->Nowadays, for secure

communication, we need a system that

can transfer data from sender to receiver

safely without any manipulation. Thus,

encoding and decoding come into picture.

->We are encrypting and decrypting

data using logic gates in this paper.

->For encryption, an initial key is

needed. That key is generated by the

concept of cellular automata.

->Cellular Automata, or in general,

rule 30 of cellular automata will help us in

generating random initial key.

->Cellular automata is basically a

one dimensional collection of states (0

or 1). And the value of next state depends

on that of previous state which can be

calculated using a fi xed rule (Rule 30).

Rule 30 of Cellular Automata

• It basically deals with fi nding the

state of the ‘ith’ cell in the next

state by making use of the state

of the ith , (i-1)th and (i+1)th

cells in the current state.

• Consider this, we have 3

cells (the ‘ith’ cell and two

neighbours), and each cell can

have either 0 state or 1 state. So,

in total we can have 8 possible

combinations.

• Thus, the rule 30 of cellular

automata just gives us a way

to design the truth table for the

encryption system.

• Example consider three inputs 0

1 0. Let 0 be the (i-1)th element,

1 be the ith element and 0 be

the (i+1)th . The output of these

three inputs can be found as :

Output= [(i-1)th XOR((ith)+(i+1)th)]

Hence, for the above example,

Output = (0 XOR (1+0))

= (0 XOR 1)

= 1

Truth Table

Generation of KeyInternal Circuitry:

Working of Each Component During Key

Generation

• 2:1 Multiplexer:

1. The key that is generated by the

rule 30 is fed to the 2:1 MUX bit

by bit.

2. The 2:1 MUX has one selection

line (sel). For sel=1, we get an

output bit, and for sel=0,

3. the previous output is feedback

as input.

4. We are using 8 2:1 MUX,

therefore we get 8 outputs

corresponding to each bit.

• Shift registers:

1. The role of shift registers is

basically to provide a delay time.

2. And also to shift or pass the

digits bit by bit.

3. We use 2 4 bit shift registers in

this. This is because it provides

lesser complexity as compared

to 1 8 bit shift register.

• 8:1 Multiplexer :

1. We have used 8 8:1 MUX in the

circuitry because each MUX

gives 1 output, and in total we

need 8 outputs.

2. The output that is received

from the 2:1 MUX via the shift

registers acts as the selection

line for the 8:1 MUX. The input

that is fed is the original key that

was fed to the 2:1 MUX.

3. If the output obtained from

2:1 MUX be A,B,C,D,E,F,G and

H, then the selection lines are

taken from A to H taking three

at a time in circular order.

4. Thus, there are total 8 selection

lines (ABC, BCD, CDE, DEF, EFG,

FGH, GHA, HAB). One for each

8:1 MUX.

5. The output is fed to 2 4 bit shift

registers that shift these bits as

the output.

6. The fi nal output after shift

register acts as the new key

that is used in the encryption

process.

Working after Key GenerationAfter the new key is stored in the shift

registers, there are three processes that

take place.

Encryption ProcessFeedback ProcessDecryption Process

Encryption Process :

In the encryption process, the new key

is XORed with the data that is entered

by the user through shift registers. This

Fig. 1: Internal Circuitry for Key Generati on[1]

Fig. 2: Main Working of the Process[1]

Table1- Key formati on using cellular automati on[1]

Page 19: CSIC 2015( June )

CSI Communications | June 2015 | 19

XOR operation creates a secret text that

is known as CIPHERTEXT. The cipher text

is stored and shifted using shift registers.

Suppose our initial key was 00011110

The key that was obtained by the

next state register – 00110001

And let the data be – 11010010

Then, the cipher text - 00110001

(Final Key)

11010010 (Data)

11100011 (Cipher text)

Feedback Process :

In feedback process, we feedback the key

that is used for encryption back to the 2:1

MUX via 2 4 bit shift registers for further

usage.

This feedback ensures the strength of

our encryption because the key once used

is used up again, thereby making the key

more random and more diffi cult for the

user to guess.

Decryption Process

In the decryption process, the ciphertext

that is received by the receiver is again

XORed with the key to obtain the originally

sent data.

• Working Example:

Suppose data: 10010011

Key: 11100011

Cipher text created at the sender’s

end: 01110000

After decryption:

We XOR bit by bit the cipher text

with the key

10010011 which is the original data is

again obtained.[1]

Time ComplexityThe overall complexity is calculated as

follows :

• The total time complexity can be

easily calculated by the sum of

Complexities of all components.

• Let C1-Time complexity for key

generation + C2-Time complexity

of Encryption Process+ C3-Time

complexity of decryption Process.

• C1-( Time complexity of Shifting the

data + Data selection by 8:1 MUX)+

C2-(Time complexity of Shift

Register + XORing the bits) + C3-

(Time complexity of Shift Register

and XORing the Bits).

• If there are n bits in the data so, n bit

Key is required

• So, time complexity is C=C1+C2+C3

• C=[O(n)+2*8O(1)] +[O(n)+8O(1)]+

[O(n)+8O(1)]=3O(n)+32O(1)=O(n)

• So the whole process is of Linear Time

Complexity and P-Time Algorithm

Time Delay • The total time delay can be easily

calculated by the sum of time delay

of all components.

• T1-Time delay for key generation T2-

Time delay of Encryption Process

• T3-Time delay of decryption Process

• T1-(i.e. Time delay of Shift Reg. +

8:1 MUX)+ T2-Time delay of (Shift

Register + XORing the bits) + T3-

Time Delay of ( Shift Register and

XORing the Bits)

LimitationThere are some limitations in encrypting

and decrypting data using logic gates.

• The length of the key and that of

entered data should be same.

• Key should be known to both receiver

and sender, thereby making it prone

to thefts.

• The fact that the registers are only 8

bit wide is a big limitation.

• For real life data, we need to enhance

the capacity of registers.

Future Scope • Other c omplex and more

secure cipher algorithms can be

implemented through logic gates

and their applied combinational and

sequential circuits.

• All the components can be embedded

in a 20 pin chip as a unit

• Only external original text/binary 8

I/P can be designed as needed. As

a result, we may get 8 O/P values of

text/binary as a cipher message.

• Hence, the cost of hardware circuitry

may be reduced.

• Also, multiple chips can be used to

scale the process depending on our

requirement.

• More secure logics can be

implemented and circuits can be

designed.

ConclusionWith this paper, we would like to conclude

that although this method is primitive

but still advantageous because of basic

circuitry needed, thereby reducing cost of

the project formation, and making it easily

understandable and comprehensible.

Recommendation • The whole process can be taken as

an alternate method for secure data

transmission.

• It’s user friendly, easily understood,

easy to calculate, and easy to be

programmed.

• The linear time complexity shows

that its performance is good.

• The hardware resources are cheaper

and can be effi ciently implemented.

• It is scalable also for bigger data size

with the multiple units, treating the

8-bit data as a block.

AcknowledgementWe would like to sincerely thank

Ms. Upasana Sharma Faculty-ECE and

Prof. A K Arora(Head of Department,

Department of Electronics and

Communication Engineering, ABES

Engineering College, Ghaziabad) for

showing us the righteous path and helping

us whenever we needed.

Also we would like to thank the

Almighty God. It is because of Him that

we are what we are today.

References[1] http://electronicsmail.wordpress.

com/2012/10/14/data-encryption-

and-decryption-system-using-74xx-

logic-gates/

[2] h t t p : //e n .w i k i p e d i a .o rg /w i k i /

Stream_cipher

[3] http://en.wikipedia.org/wiki/Block_

cipher

[4] http://natureofcode.com/book/

chapter-7-cellular-automata/

[5] http://en.wikipedia.org/wiki/XOR_

cipher

[6] h t t p : //s t a c k o v e r f l o w . c o m /

questions/1379952/why-is-xor-

used-on-cryptography

[7] h t t p : //u p l o a d .w i k i m e d i a .o r g /

wikipedia/commons/f/f8/Crypto.

png n

Abo

ut th

e A

utho

r Mr. Rohit Rastogi received his B. E. degree in Computer Science and Engineering from C.C.S.Univ. Meerut in 2003,

the M.E. degree in Computer Science from NITTTR-Chandigarh Punjab Univ. Chandigarh in 2010.

He is a Sr. Asst. Professor of CSE Dept. in ABES Engineering. College, Ghaziabad (U.P.-India), affi liated to Gautam

Buddha Tech. University and Mahamaya Tech. University (earlier Uttar Pradesh Tech. University) at present and is

engaged in Clustering of Mixed Variety of Data and Attributes with real life application applied by Genetic Algorithm,

Pattern Recognition and Artifi cial Intelligence.

Page 20: CSIC 2015( June )

CSI Communications | June 2015 | 20 www.csi-india.org

IntroductionComputing involves the use of computer

as hardware and/or software to perform

the desired task. Cloud computing is

defi ned as a computing paradigm shift

where computing is moved away from

personal computers or an individual

application server to a “cloud” of

computers[1]. The benefi ts of cloud

include rapid provisioning, low investment

cost and easy access. Cloud is based on

pay-per-use model where the users are

charged only for the duration when the

services are used. Other characteristics

of cloud are broad network access, on-

demand self service, resource pooling and

rapid elasticity[16].

According to the US National Institute

of Standards and Technology (NIST)[2] cloud computing can be summarized

as: ‘A model for enabling convenient, on-

demand network access to a shared pool

of confi gurable computing resources (e.g.,

networks, servers, storage, applications, and

services) that can be rapidly provisioned and

released with minimal management eff ort or

service provider interaction’.

Cloud provides an illusion of infi nite

storage to the users at limited setup and

usage cost. It permits the user to perform

computationally intensive operations on

the cloud and that too at multiple disparate

locations[1]. The prime requirement of

cloud usage is the internet availability. Due

to the low cost and fast speed availability

of internet, organizations are motivated

to outsource their data on the cloud.

There are large number of cloud service

providers namely VMware, Microsoft,

Google, Salesforce.com, Rackspace and

Amazon[3]. The organizations can deploy

a private cloud or may use a public cloud

to store their data based on the sensitivity

of the data, time and budget to deploy the

cloud[4]. As public cloud involves less cost

and time, many organizations prefer using

a public cloud than setting up a private

cloud.

This use of public cloud introduces

security breaches such as data leakage,

data theft and reduced control over the

data as the cloud service provider can

easily access the data[15]. In order to

provide security, the data is encrypted

before outsourcing it to the cloud. So,

confi dentiality of the data is retained using

cryptography. The use of cryptography to

convert this confi dential data into human

unreadable form introduces the challenge

of eff ective searching over this data.

A native approach to search data is

to download the entire encrypted dataset

from the remote cloud server to the local

machine. The entire dataset is decrypted

and then the desired documents are

retrieved. As the end users use mobile

devices or thin clients to connect to the

cloud and these devices are limited by

the memory available so this approach

is ineffi cient. So an effi cient method to

perform searching on this encrypted data

is desired.

In this paper, the aim is to provide

with a cluster based search scheme

using which the desired documents can

be retrieved with fewer comparisons. In

the cluster based search scheme, the

entire document collection is partitioned

into multiple clusters to provide

effi cient searching. As the numbers of

comparisons are reduced, the average

search time is also reduced. The proposed

search scheme should be coherent to the

other existing approaches, i.e., only the

authorized users are provided with the

ability to search on this encrypted data,

the user is able to retrieve the results

without revealing the search terms to the

cloud server, the documents retrieved

from the cloud server and search pattern

should not be revealed to semi-trusted

server.

The contributions of this paper can

be summarized as follows. Firstly, we

propose a cluster based approach for

multi-keyword search over encrypted

cloud data. Secondly, we will propose

an effi cient method which requires less

number of comparisons and time to

declare a search unsuccessful.

Rest of this paper is organized as

follows. In Section II, we discuss the related

work. Section III gives the system model,

security requirements and the problem

formulation. The detailed description of

the proposed search scheme is presented

in Section IV. The need for query

randomization is presented in Section

V. The security analysis of the proposed

search scheme is presented in Section VI

whereas the performance analysis is done

in Section VII. Finally, Section VIII gives

the concluding remarks of the paper.

Related WorkDawn Xiaoding Song[5] introduced the

concept of searchable encryption scheme

without loss of confi dentiality. Under

the proposed approach, symmetric key

encryption is used to perform encryption

on the available data. It proposed the

use of non-index based searching on the

encrypted cloud data due to less overhead

An Effi cient Cluster-based Multi-Keyword Search on Encrypted Cloud Data

ResearchFront

Rohit Handa* and Rama Krishna Challa***Assistant Professor, CSE Department, BUEST, Baddi, India**Professor, CSE Department, NITTTR, Chandigarh, India

Abstract: Cloud computing involves the delivery of computing infrastructure resources as a service to the end users over internet. As an illusion

of infi nite resources availability is provided, the organizations outsource their data to the cloud. But this migration of confi dential data on cloud

leads to various security issues. To maintain confi dentiality, cryptography is employed which reduces the ease of searching data on the cloud. So,

an effi cient approach for searching data on cloud is desired. In this paper, a cluster based multi-keyword search scheme is proposed. The privacy

and security requirements as proposed in the literature are also implemented. To the best of our knowledge, the previous works are ineffi cient in

declaring a search unsuccessful without performing search over entire dataset. The performance analysis of the proposed search scheme over

synthetic data reveals that the number of comparisons required to perform search are reduced by 80% and the time required is reduced by 70%.

So, the proposed search scheme outperforms other search schemes in literature in terms of number of comparisons required and time required

to search the desired document on cloud by order of several magnitudes.

Page 21: CSIC 2015( June )

CSI Communications | June 2015 | 21

involved in searching as compared to the

keyword based approach. This method

is ineffi cient for data of large size as

it involves the use of symmetric key

cryptography for security. Also this work

is linear in document size.

Mehmet Ucal[10] proposed

improvements to Song’s approach. It is a

hybrid approach in which the keywords

are encrypted using stream cipher and

the non keywords are encrypted using

block cipher. To generate non keywords

of desired length padding is done.

The keyword based search scheme is

integrated into the existing approach to

perform a faster search operation with less

overhead. The encrypted fi le is of small

size which provides reduced encryption

time and memory overhead. But due to

the small size of the fi le, security can be

compromised.

Boneh et al.[13] modifi ed[5] and

introduced the use of Public Key Encryption

(PEK) for keyword search but this

approach is computationally expensive

due to the use of public key cryptography.

Also, keyword privacy cannot be provided

as the server can easily encrypt the

keyword with the public key and use the

received trapdoor to evaluate the cipher-

text. Goh[14] introduced the concept of

searchable indexes but the use of Bloom

fi lters introduces false positives which

leads to extra fi les being downloaded by

the mobile user than required.

Cao et al.[8] introduced the concept

of single keyword search over encrypted

cloud data. This approach provides data

security but is applicable to single keyword

search and requires the secret parameters

to be shared among the end users for

trapdoor generation. Hence provides weak

security.

Ning Cao et al.[6] modifi ed[8] to support

multi-keyword search over encrypted

data but this approach generates less

accurate results due to randomization.

It involves large computation overhead.

Also the security provided is weak due to

distribution of symmetric key among all

end users.

Ayad Ibrahim et al.[11] proposed the

concept of performing multi-keyword

ranked search over encrypted cloud data

using Privacy Preserving Mapping. This

approach provides index security, data

security, access privacy and trapdoor

security. As this approach is based

on Bloom fi lter so the number of false

positives is high. Also, the storage and

time overhead in construction of the index

is high.

Orencik et al.[7] proposed multi-

keyword search using forward index.

The proposed method is effi cient over

existing methods in literature but still

requires large number of comparisons to

retrieve the documents. The time required

for declaring an unsuccessful search

is high[17]. We have adopted the basic

scheme from[7] and modifi ed it to reduce

the time and number of comparisons

required using clustering.

Problem Formulation and Security RequirementsSystem Model

In order to provide cluster based multi-

keyword search on encrypted cloud data,

there are three diff erent entities coherent

to the previous works[6-12]:

Data owner: The data owner is the

entity responsible for the data. The data

owner holds the collection of encrypted

documents along with the indices to be

outsourced on the cloud. The keys used

during encryption of the documents are

under the control of data owner.

Users: They are the end users

interested in searching for some

documents stored on the cloud.

Server: It is assumed that the server

is semi-trusted. The role of the server is

to store the documents along with the

index generated by the data owner and

provide the search capability to the users.

It is desired that the server should not

learn any information from the encrypted

documents and/or indices.

Privacy Requirements

The encrypted documents are stored

on the server along with the cluster and

document index. The server is semi-

trusted and may try to extract information

from the search query and/or retrieved

results. It is desired that the server should

not be able to learn any information.

Even the cluster and document indices

should not reveal any information to the

cloud server. So, in this paper, the privacy

requirements of the proposed search

scheme are as follows:

1. Data Privacy: Only the authorized

user is able to learn the actual data

retrieved from the server.

2. Index Privacy: It is desired that the

cluster index, document index and

the query index generated should

not provide any relevant information

about the clusters, documents and

the search terms respectively to the

cloud server.

3. Trapdoor Privacy: It should not

be possible for the cloud server

to generate a valid trapdoor using

previously generated trapdoors for

some set of keywords.

4. Non-Impersonation: Only authorized

users are able to perform the

desired search. As per the current

authentication system, it should not

be possible to impersonate as an

authorized user.

Design Goals

In this paper, we propose a cluster based

approach for multi-keyword search

on encrypted cloud data. The goals of

the proposed search scheme are (i) to retrieve the desired relevant documents corresponding to the search query in an effi cient manner by reducing the number of comparisons and time required; (ii) to declare a search unsuccessful in an effi cient manner by performing fewer comparisons and in minimum possible time; (iii) to validate security of the proposed search scheme; (iv) to evaluate the performance of the proposed search scheme by conducting experiments on synthetic data.

Stages of the Proposed Search SchemeFigure 1 depicts the architecture of the proposed search scheme[18]. The overall search process can be performed in two stages:• Offl ine Stage: In the offl ine stage,

the data owner is responsible for generation of secure indices. The data owner extracts the keywords from the documents and generates the searchable index for each document. Based upon the similarity of the keywords, clusters are generated. For each cluster, cluster index is also generated. The data owner uploads the cluster index, document index and the encrypted documents on the cloud server. The secrecy of the keys used during the offl ine stage is the responsibility of the data owner.

• Online Stage: During this stage, any authorized user can perform multi-keyword search on the encrypted cloud data. As shown in step-1 of Fig. 1,

the authorized user requests data

Page 22: CSIC 2015( June )

CSI Communications | June 2015 | 22 www.csi-india.org

owner to provide the required security

parameters to generate the desired

search query. In step-2, the user sends

the search query generated using

the security parameters received in

step-1 to the cloud server. The cloud

server performs the desired search.

The metadata corresponding to the

retrieved documents are returned to

the user in step-3. During step-4, the

user analyzes the retrieved metadata

corresponding to the relevant

documents and requests the data owner

for the symmetric key corresponding

to the selected document. Using the

symmetric key received from the data

owner, the user can generate the plain

text corresponding to the encrypted

document.

Proposed Cluster-Based Multi-Keyword Search on Encrypted Cloud DataThe proposed search scheme includes

seven steps that can be classifi ed into

three phases, namely, cluster generation,

indexing and retrieval phase. The

indexing phase includes document index

generation, cluster index generation and

document encryption. The retrieval phase

includes query generation, document

searching and decryption.

Cluster Generation Phase

Initially keywords are extracted from each

document. Based on the similarity of the

keywords extracted from each document,

the documents are partitioned into

multiple clusters. So, the overall purpose

of this phase is to generate desired number

of clusters. As an example, consider an

organization willing to outsource their

confi dential data on cloud, the document

can be clustered based on categories such

as fi nance, inventory and personnel.

Indexing Phase

Document Index Generation: The keywords

extracted from each document in the

previous phase are used to generate the

document index. For each keyword wi

appearing within the document, the secret

key for HMAC (hash-based message

authentication code) is generated using

hash function (for example MD-5, SHA-

1 and SHA-2). The hash value calculated

on the given keyword is send to the data

owner. The data owner retrieves the secret

key corresponding to the hash value. The

secret key is shared with the end user

using public key encryption scheme. For

keywords generating same hash value, the

secret key is retrieved only once from the

data owner.

Upon receiving the secret key, HMAC

is calculated on the keyword to generate

the hexadecimal index. This hexadecimal

value is converted to binary equivalent

which is reduced in length. This process

of reduction of the index involves dividing

the binary string into smaller substrings of

equal length. If all the bits in the substring

are zero, then the output value for that

substring is 0. If any one of the bit is 1,

then the output is 1. The reduction step is

shown in Fig. 2.

The fi nal index is obtained by taking

the bitwise product of the indices obtained

for each keyword and is shown in Fig. 3.

The document index generation steps

can be summarized using Algorithm-1.

Here hash() function accepts search term

as input and generates the hash of the

given keyword. HMAC() function is used

to calculate HMAC using keyword and

secret key as input. Reduce() function is

used to convert the hexadecimal output

to binary string of required length. Bitwise

product() function is used to calculate

the bitwise product of all the indices

generated.

Algorithm-1: Document Index GenerationInput:F: the document collection

for each document Fi ∈F do

for each keyword Fi ∈ Fi do

secret_index ← hash (wi)

retrieve the secret_key corresponding

to the secret_index from the data owner

index ← HMAC (wi, secret_key)

Ii ← Reduce (index)

end for

Document Index I ← Bitwise product (Ii)

end for

return Document Index I

ii Cluster Index Generation: During

cluster generation phase, the documents

are partitioned into multiple clusters.

These documents are used to generate

the cluster index. The cluster index is

generated using the bitwise product of

the indices of all the documents appearing

within a cluster.

iii Document Encryption: To provide

confi dentiality, the data owner uses

symmetric key cryptography to encrypt

the documents. Depending on the choice

of the data owner, any symmetric key

encryption algorithm can be used. The

secret keys used during the encryption are

kept confi dential by the data owner. For

enhanced security, diff erent keys are used

for diff erent documents. Symmetric key

cryptography is preferred as it can handle

large size data and is fast.

Retrieval Phase

i Query Generation: The working of

the query generation method as given in

Algorithm-2 is as follows. The authorized

user willing to perform search on the

encrypted cloud data calculates the hash

value for each search term. Based on

the hash value generated, the secret key

corresponding to each search term is

obtained from the data owner. Using the

received secret keys, HMAC is calculated

and the process similar to the cluster

index generation is used to generate the

search query.

Algorithm-2: Query Index GenerationInput: {k1, k2,............ kn }: set of keywords

for each keyword ki do

secret_index ← hash (wi)

if (secret key corresponding to the

secret_index not previously received)

retrieve the secret_key corresponding

Fig. 1: Architecture of the proposed search scheme

Fig. 2: Reducti on of hash output

Fig. 3: Final Index calculati on using bitwise product

Index (Keyword1) = 111……..10

Index (Keyword2) = 101……..11

Final Index (Keyword) = 101……..10

Page 23: CSIC 2015( June )

CSI Communications | June 2015 | 23

to the secret_index from the data

owner

end if

index ← HMAC (wi, secret_key)

Ii ← Reduce (index)

end for

Query Index Q ← Bitwise product (Ii)

return Query Index Q

ii Document Searching on Cloud Server:

Upon receiving the query string, the

cloud server will select the appropriate

cluster by comparing the query string

with the cluster index. The comparison

is made by comparing the bit positions

with 0 values in the query index with the

corresponding bit position in the cluster

index. If both these values are zero, then

the matching process will continue else

it is assumed to be a mismatch. As the

search is conjunctive so the cluster with

all the search keywords is only selected.

There is no selection of cluster with

partial match with the search terms. The

cluster with 100% match of zeros with

the query string is only selected. The

cluster selection process is represented

in Algorithm-3.

After a cluster is selected, the

documents within that cluster are only

searched for the desired keywords. Similar

to the cluster selection process, the

value of zero in query string is compared

with the document index. If all the bits

with 0 values match the corresponding

bit position in document index, then

the document is selected as a relevant

response to the search query.

Algorithm-3: Cluster SelectionInput: Query String Q

for each cluster index Ii do

if for all the bits j with Qj= 0, the value

of Ii is also 0return cluster i

end if

end for

iii Document Decryption: The

metadata corresponding to the documents

retrieved as relevant is presented to the

user. The end user analyzes the metadata

and requests the cloud server for a

particular encrypted document. In order

to decrypt the document, the secret key

is required by the user. A request is made

to the data owner to provide the secret

key. Upon receiving the secret key, the

document is decrypted using the same

algorithm as employed by the data owner

during document encryption process.

Query RandomizationThe proposed search scheme permits

the user to search for the documents

containing the desired keywords on the

server but lacks search pattern privacy.

For identical search terms, the search

query generated is also identical. So, the

server can extract valuable information

from this search query about the user’s

search patterns. To avoid this, random

keywords are used[7]. These random

keywords are added to the list of keywords

of each document during the document

index generation step. So, these keywords

are present in each document.

During query generation phase,

some of the keywords are selected

randomly from this set and added to the

search terms. As random keywords are

used during query generation, the search

query generated for identical search

terms is not identical. As all the random

keywords generated are already added to

the list of keywords for each document, so

the retrieved search results are same as

obtained without query randomization.

Security AnalysisThe privacy requirements described in

Section 3(B) must be achieved by the

proposed cluster based approach for

multi-keyword search on encrypted cloud

data. In this section, we will analyze to

what extent our search scheme fulfi lls the

security requirements.

Theorem 1: Cluster-based multi-

keyword search on encrypted cloud

data provides data privacy, i.e., only the

authenticated end user is able to learn the

actual data retrieved from the server.

Proof: After performing the desired

search operation on the server, metadata

about the relevant documents is provided

to the end user. The end user (based on the

presented metadata) makes selection of the

desired documents. In order to extract the

contents from the document, it is desired

that the end user also learns the secret

key used for encryption of the documents.

The secret key is shared with the end user

using public key cryptography. The data

owner encrypts the secret key using the

end user’s public key. As the private key is

kept secret by the end user, so the secret key

can only be retrieved by the end user. If an

adversary learns the encrypted secret key

and the encrypted documents, then also he

is not able to extract the secret key as the

private key is secret. It is even impossible to

apply brute force technique to generate the

genuine private key from the known public

key.

Theorem 2: Cluster-based multi-

keyword search on encrypted cloud data

provides index privacy, i.e., no information is

leaked about the search terms from the query

index.

Proof: An adversary cannot learn

the trapdoor corresponding to the search

keywords as the trapdoor in transit is

encrypted using end user’s public key.

This trapdoor can only be retrieved by

the authorized end user. Also, random

keywords are added to search terms

during search query generation, even if an

adversary is able to learn the trapdoor for

the search keywords and the search query,

then also the adversary needs to generate

all dummy keywords used to generate the

search query. As the random keywords

used during query generation phase are

kept confi dential, it is impossible for the

adversary to learn the search terms even if

brute force technique is applied.

Theorem 3: Cluster-based multi-

keyword search on encrypted cloud data

provides trapdoor privacy, i.e., it should not

be possible for the cloud server to generate

a valid trapdoor using a given trapdoor for a

set of keywords.

Proof: Let K1 and K2 be two keywords

for which the query Q is known to an

adversary. To generate the search query,

random keywords are also inserted. In

order to perform the search, occurrence

of 0’s is matched with the corresponding

occurrences in cluster and document

index. For successful query generation

for any keyword K1 by an adversary, the

location of 0’s must be known else the

search is not possible. The possibility of

successfully selecting these bit positions

is very negligible even with brute force

technique. So, it provides trapdoor privacy.

Theorem 4: Cluster-based multi-

keyword search on encrypted cloud data

provides non-impersonation, i.e., only

authorized users are able to perform the

desired search. No one can impersonate as

an authorized user.

Proof: As the entire search process is

performed using public key cryptography,

only authorized users with secure private

key can retrieve the trapdoor for the

Page 24: CSIC 2015( June )

CSI Communications | June 2015 | 24 www.csi-india.org

search terms and generate the search

query. As the probability of generating the

valid private key for known public key is

negligible, so this method provides non-

impersonation.

Performance AnalysisIn this section, the performance of the

proposed cluster based approach for

multi-keyword search on encrypted cloud

data is present through experiments

on synthetic data. For the performance

analysis the data set is assumed to be

from 50 documents to 6000 documents.

It is assumed that there are 5 clusters.

From the implementation point of

view, synthetic dataset is created and

random keywords are assigned random

frequency of occurrence. The entire

simulation is done using JAVA and MySql

on Core 2 Duo Processor with 2GB RAM.

The results obtained for the existing

search scheme[7] and the proposed search

scheme are obtained on this system

and thus used for the comparison. The

performance of the approach can be

improved by using a machine of higher

confi guration and code optimization.

In order to generate the cluster index,

document index and the query index, a

secret key is required from the data owner.

To retrieve the secret key from the data

owner, MD-5 (Message Digest-5) is used

to calculate the secret index. The two least

signifi cant values are used to extract the

secret key form the database. To generate

the hexadecimal output for the keywords,

SHA-2 (Secure Hash Algorithm) family

based HMAC functions are used. The

outcome of SHA-224, SHA-256, SHA-

384 and SHA-512 is calculated for the

keywords and concatenated to generate

binary string of length 2688 bits. This

binary string so generated is reduced to

448 bits length coherent to the previous

work[7]. The reduction factor for the binary

output is 6, i.e., the binary string is reduced

to one-sixth of the original value. The

selection of the reduction factor aff ects

the memory required to store each index,

bandwidth required for the transmission of

the query string on the network, number of

comparisons required to select the cluster

or document within a cluster and the

retrieved documents. If the reduction factor

is small, then the storage space required

to store the cluster and document indices

will be more and bandwidth required

transferring the query string is also high. If

the reduction factor is too large, then the

number of false positives will be high. So an

optimal value of reduction factor is desired.

The reduction factor is assumed to be 6

coherent to the existing search scheme[7].

So, the fi nal output generated is of length

448 bits.

The computation cost of the proposed

search scheme is presented in Section VII

(A). Initially the document collection is

assumed to be equally divided within the

clusters, so the performance analysis of the

proposed search scheme under uniform

document distribution is done in Section

VII (B). As the document collection can be

non-uniform, so the performance analysis

of the proposed search scheme under

non-uniform distribution of documents

is presented in Section VII (C). The

comparison of the proposed search scheme

and the existing search scheme in declaring

a search as unsuccessful is presented in

Section VII (D). Section VII (E) compares

the proposed search scheme using hard

clustering, proposed search scheme using

soft clustering and the existing search

scheme[7].

Computation Cost of the Proposed

Search Scheme

The computation cost of the proposed

search scheme involves an additional step

of cluster index generation as compared to

the existing effi cient search scheme[7]. The

cluster index generation is performed only

once during the initial stages and only for

fewer clusters on powerful machines during

offl ine stage. So, the overall additional cost

incurred by this step is small.

Performance analysis assuming the

documents are equally divided between

the clusters.

For the purpose of initial performance

analysis, it is assumed that the documents

are equally divided among the clusters.

So, each cluster includes equal number of

documents.

Number of comparisons required to perform

a search

The proposed search scheme performs

the desired search by comparing the

query string generated with the cluster

index. After selecting the desired cluster,

the query index is compared with the

documents within the selected cluster.

As compared to the existing search

scheme[7] which compare the query string

with the indices of all the documents,

the proposed search scheme reduces the

number of comparisons required. Figure 4

demonstrate the number of comparisons

required by the existing search scheme

and the proposed search scheme. It can

be easily inferred that the number of

comparisons required are reduced by

80% for 6000 documents.

Theorem 5: Cluster-based multi-

keyword search on encrypted cloud data

reduces the number of comparisons required

to perform a search by an order of k, where k

is the number of clusters.

Proof: The documents are divided

into k clusters and the cluster index

is generated using bitwise product of

indices of all the documents within the

cluster. The search is reduced to initially

comparing the search query with cluster

index. Once the appropriate cluster

is selected, the documents within the

cluster are searched to retrieve the

relevant documents. Let n be the number

of documents and k be the number of

clusters generated. If distribution of

documents within the clusters is uniform,

then the numbers of comparisons required

by the proposed search scheme are K + n—k

,

which is signifi cantly less as compared to

n for large document collection.

Average time required to perform a search

As the query string is compared with the

cluster index and then with the document

index within the same cluster. So, the

proposed search scheme requires fewer

comparisons as compared to the existing

approach. The search time required

depends on the number of zeros in the

search query generated. As the number of

keywords increases, the numbers of zeros

in the search query also increase. As soon

as a mismatch is encountered between

the search index and document or cluster

index, further string matching with that

index terminates. Figure 5 (a-e) represent

the average search time required to search

Fig. 4: No. of Comparisons required

Page 25: CSIC 2015( June )

CSI Communications | June 2015 | 25

the desired document by varying the

number of keywords in the search query.

Fig. 6 (c): Average search ti me for three keyword search

Fig. 5 (d): Average search ti me for four keyword search

Fig. 5 (e): Average search ti me for fi ve keyword search

In Fig. 6, the average time required

to search any keyword is shown. It can

be inferred that the time required by the

proposed search scheme to search a

document is 70% less as compared to the

existing search scheme[7] for dataset with

6000 documents.

Fig. 6: Average search ti me

Performance Analysis assuming the

documents are unequally divided

between the clusters.

As the document collection is dynamic,

the documents in the cluster may be

unequally distributed. The performance

of the proposed search scheme as

opposed to the existing search scheme[7]

is evaluated in terms of the average search

time required to search for the desired

document.

Average time required to perform a search

Figure 7 (a-e) depicts the search time

required to search keywords on the cloud

assuming the documents are unequally

distributed among multiple clusters. As

unequal distribution of the documents

within clusters exists so the search

time required varies depending on the

number of documents within the cluster.

The comparison of the proposed search

scheme with the existing search scheme

reveals that the time required to search

relevant documents using the cluster

based approach requires less time as

compared to the existing approach[7].

Fig. 7 (a): Average search ti me for single keyword search

Fig. 7 (b): Average search ti me for two keyword search

Fig. 7 (c): Average search ti me for three keyword search

Fig. 7 (d): Average search ti me for four keyword search

Fig. 7 (e): Average search ti me for fi ve keyword search

Fig. 5 (a): Average search ti me for single keyword search

Fig. 5 (b): Average search ti me for two keyword search

Page 26: CSIC 2015( June )

CSI Communications | June 2015 | 26 www.csi-india.org

Figure 8 depicts the average time

required to search relevant documents

on the cloud assuming non-uniform

distribution of documents.

Fig. 8: Average search ti me

Performance analysis for unsuccessful

search

Number of comparisons to declare

unsuccessful search

As the search is conjunctive, a search

is declared unsuccessful if any one of

the search term is not present in the

entire document collection. As the

cluster index is generated using all the

keywords present within the documents

of the cluster. So, absence of a keyword

can be easily discovered by comparing

the query index with the cluster index

only. As inferred from Fig. 9 only fewer

comparisons are required to check if the

search is unsuccessful. In the proposed

search scheme only 5 (assuming 5

clusters) comparisons are required

to declare the search as unsuccessful

whereas in the existing search scheme

it require comparisons equal to the

number of documents.

Fig. 9: No. of Comparison required for declaring unsuccessful search

Theorem 6: Cluster-based multi-

keyword search on encrypted cloud data

provides reduces the number of comparisons

required to declare unsuccessful search.

Proof: The documents are divided

into k clusters, where the cluster index

is generated using the bitwise product of

indices of all the documents within the

cluster. So, a search is initially restricted

to comparing the search query with the

cluster index. In case a match is found

with the cluster index, then the documents

within that cluster are searched. If there

is no possible match between the query

string and the cluster index, then the

search is declared as unsuccessful. Let

n be the number of documents and k be

the number of clusters generated. In order

to declare a search as unsuccessful it

requires comparison between the search

index and the cluster index. Hence, it

requires only k comparisons where k << n.

Average time required for declaring a

search as unsuccessful

In order to declare a search as unsuccessful

only the query string is compared

with the cluster index. As only fewer

clusters are generated so the number

of comparisons required for declaring a

search as unsuccessful are signifi cantly

reduced. Due to few comparisons

required, the time required to declare a

search as unsuccessful is also reduced.

Figure 10 depicts the time required to

declare a search as unsuccessful search.

In the proposed search scheme the time

required is 0.07ms for the entire document

collection which is the time required to

compare the search query with the cluster

indices. In the existing search scheme[7]

the comparisons required are proportional

to the number of documents. Hence, the

time required is high.

Fig. 10: Average ti me required for declaring a search as unsuccessful

Performance analysis assuming soft

clustering

The documents are initially clustered

into multiple clusters depending on the

similarity of the keywords. It is possible

that a document may belong to a single

cluster or multiple clusters. In hard

clustering, a document appears in a

single cluster whereas in soft clustering, a

document may appear in multiple clusters.

If a document appears in multiple clusters,

then the search is performed by selecting

each cluster and then performing the

search within the selected cluster. As

multiple clusters are selected so that the

time required is more as compared to

hard clustering approach but signifi cantly

lower as compared to the existing search

scheme and is shown in Fig. 11.

Fig. 11: Average search ti me required using diff erent clustering methods

Conclusions In this paper, we have proposed a cluster-

based approach for multi-keyword search

on encrypted cloud data. The proposed

scheme permits the user to effi ciently

perform search over the encrypted cloud

data. To do so, the data owner generates

the cluster index and document index.

The documents are encrypted and

outsourced to the cloud. After performing

experiments using synthetic data the

performance of the proposed scheme

is analyzed as follows: (i) the proposed

search scheme reduces the time and

number of comparisons required

to retrieve the desired documents

considering both equal and unequal

distribution of documents within the

clusters; (ii) the proposed search scheme

addresses the issue of unsuccessful

search with signifi cant reduction in terms

of time and comparisons required; (iii)

the proposed search scheme requires less

time as compared to the existing effi cient

search scheme even if documents appear

in multiple clusters; (iv) through security

analysis, we show that our proposed

search scheme is secure and preserves

privacy.

Page 27: CSIC 2015( June )

CSI Communications | June 2015 | 27

Data Science- Venn diagram

The primary colors of data: Hacking Skills, Math and Stats Knowledge, and Substantive

Expertise

[Taken from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

with permission from the owner Dr. Drew Conway. Dr. Drew Conway, Head of Data at

Project Florida, is a leading expert in the application of computational methods to social

and behavioral problems at large-scale.]

Following the line of research, we

suggest, as a future work, (i) testing the

performance of the proposed scheme

on real data set; (ii) to fi nd a disjunctive

keyword search scheme.

References[1] “Cloud Computing,” http://

en.wikipedia.org/wiki/ Cloud_

computing.

[2] “The NIST Defi nition of Cloud

Computing,” http://csrc.nist.gov/

publ icat ions/nistpubs/800-145/

SP800-145.pdf.

[3] “Top-10 cloud service providers,”

h t t p : //s e a r c h c l o u d c o m p u t i n g .

t e c h t a r g e t . c o m /

p h o t o s t o r y/ 2 24 0 1 4 9 0 3 8 / To p -

1 0 - c l o u d - p r ov i d e r s - o f-2 01 2 / 1 /

Introduction.

[4] Morgan et al., “Factors aff ecting the

adoption of cloud computing: an

exploratory study,” Available at http://

www.staff.science.uu.nl/~vlaan107/

ecis/fi les/ECIS2013-0710-paper.pdf.

[5] D Song et al., “Practical techniques

for searches on encrypted data,” in

Proc. of IEEE Symp. on Security and

Privacy’00, Berkeley, CA, pp. 44–55,

2000.

[6] Ning Cao et al., “Privacy-preserving

multi-keyword ranked search over

encrypted cloud data,” in IEEE

Transactions on Parallel and Distributed

Systems, pp. 222-233, 2014.

[7] Cengiz Orencik and Erkay Savas, “An

effi cient privacy-preserving multi-

keyword search over encrypted

cloud data with ranking,” in Springer

Distributed and Parallel Databases, pp.

119–160, 2014.

[8] Ning Cao et al., “Secure ranked

keyword search over encrypted

cloud data,” in IEEE Proc of Int. Conf.

of Distributed Computing Systems

(ICDCS), Genoa, Italy, pp. 253–262,

2010.

[9] Y-C Chang and M Mitzenmacher,

“Privacy preserving keyword

searches on remote encrypted data,”

in Proc. of Third Springer Int. Conf. of

Applied Cryptography and Network

Security (ACNS), Springer, New York,

USA, pp. 442-455, 2005.

[10] Mehmet Ucal, “Searching on

encrypted data,” http://www.

researchgate .net /publ icat ion/

228757457_ Searching_on_

Encrypted_Data.

[11] Ayad Ibrahim et al., “Secure rank-

ordered search of multi-keyword

trapdoor over encrypted cloud data,”

in IEEE Asia-Pacifi c Services Computing

Conference (APSCC), Guilin, pp. 263-

270, 2012.

[12] Ning Cao et al., “Enabling effi cient

fuzzy keyword search over encrypted

data in cloud computing,” in IEEE

Transactions on Parallel and Distributed

Systems, pp. 1467-1479, 2010.

[13] Boneh D et al., “Public key encryption

with keyword search,” in Proc.

of Eurocrypt’04, volume 3027 of

Springer LNCS, pp. 506–522, 2004.

[14] E Goh, “Secure indexes,” in Cryptology

ePrint Archive, Report, 2003/216,

http://eprint./iacr.org/.

[15] Anu Khurana, C Rama Krishna and

Dr. Navdeep Kaur, “Searching over

Encrypted cloud data,” in Int. Conf. on

Communications & Electronics, 2013.

[16] Neelam S Khan, C Rama Krishna

and Anu Khurana, “Secure Ranked

Fuzzy Multi-Keyword Search over

Outsourced Encrypted Cloud Data,”

in 5th Int. Conf. on Computer

and Communication Technology

(ICCCT—2014), Allahabad, India.

[17] Rohit Handa and Rama Krishana

Challa, “A survey on searching

techniques over outsourced

encrypted cloud data” in 8th Int.

Conf. on Advanced Computing and

Communication Technologies, Panipat,

India, pp. 128-137, 2014.

[18] Rohit Handa and Rama Krishana

Challa, “A cluster-based multi-

keyword search on outsourced

encrypted cloud data” in 2nd Int. Conf.

on Computing for Sustainable Global

Development, India, pp. 3.87-3.92,

2015.

n

Rohit Handa is an Assistant Professor at Baddi University of Emerging Sciences & Technology, Baddi, H.P. (India). He

received his B.Tech. degree in CSE from M.M.E.C. Mullana (Kurukshetra University) and M.E. degree in CSE from

National Institute of Technical Teachers Training & Research, Chandigarh (Panjab University). His area of interest

includes Cryptography, Cloud Computing and Programming.

Rama Krishna Challa is a Professor in CSE Department at National Institute of Technical Teachers Training & Research,

Chandigarh (India). He received his B.Tech. from JNTU Govt. College of Engg., Anantapur and M.Tech. from CUSAT,

Cochin. He received his Ph.D. from IIT Kharagpur. His research area includes Wireless Communications & Networks,

Computer Networks, Distributed Computing, Cryptography & Cyber Security.Abo

ut t

he A

utho

rs

Page 28: CSIC 2015( June )

CSI Communications | June 2015 | 28 www.csi-india.org

IntroductionMobile ad hoc network is structured

by the nodes in the absence of a rigid

infrastructure where all nodes move

randomly and all the nodes process

themselves. In MANET, every node acts

not only as a node but also as a router. In an

infrastructure mobile network, nodes have

base stations within their transmission

range[2]. In contrary to this, mobile ad

hoc networks are indigenous and devoid

of infrastructure support. Low cost

and powerful wireless transceivers are

popularly used in mobile applications due

to the progress of wireless communication

technology. Due to the absence of fi xed

infrastructure, network topology in

MANET changes when nodes move in

or out of the network[1]. As a result, the

routing protocols need to adaptively

adjust routing based on available nodes.

The resources owned and controlled by

a node are said to be local to it, while

the resources owned and controlled by

other nodes and those that can only be

accessed through the network are said

to be remote. The external attackers

can inject false routing information

or advertise incorrect routing table

information to break down the network[4].

The compromised node attacker are able

to generate valid signatures using their

private keys[5], they are diffi cult to detect

and can create several damages in the

networks. Because of their own private

keys, the intrusion preventive measures

such as authentication and encryption

cannot reduce the eff ect of compromised

node attacker. Moreover, the wireless

channel is accessible to malicious

attacker and legitimate user; hence it is

more vulnerable to all kinds of network

attacks. One conspicuous characteristic of

MANETs, from the security point of view, is

the lack of protection. MANET is a capable

technology but it has certain features that

are considered inclined, which leads to

security weakness in the technology such

as, weakened centralized management,

less resource availability, scalability,

dynamic topology, limited power supply

etc. In MANET, all networking operations

such as routing the message and

forwarding the packets are performed by

nodes themselves in a centralized manner.

For these reasons, providing security to

mobile ad hoc network is very diffi cult

task[10]. To prevent and detect attacks like

black hole, wormhole, rushing attack etc

and to secure the communication among

the nodes of wireless ad hoc network,

many intrusion detection techniques have

been introduced. They can be classifi ed

into mainly three categories: signature

based, anomaly based and specifi cation

based intrusion detection[3].

The practical side of attacks can

be broadly grouped as application

level attacks and network level attacks.

Application level attacker tries to steal

into, bring about change or resist access

to information of a particular application,

whereas Network level attacker attempts

to restrict the capabilities of network,

reduces speed or completely stops the

computer network. Network level attacker

automatically leads to application level

attacks[6].

The rest of the paper is structured as:

Section 2 gives an issue for maintaining

stable neighborhood topology and route

maintenance in AODV. Proposed system

for Malicious Node Detection is discussed

in Section 3. Section 4 presents and

discusses the results obtained. Finally,

Section 5 draws conclusions and outlines

for future work.

Issue for Maintaining Stable Topology and Route Maintenance in AODVThe issue with topology management

is to control the movement of an

individual node so as to maintain a stable

neighborhood topology[8]. Consider

the nodes n0, n1, n2...nN. Let Rmax is the

maximum range of the nodes and D(0,1) is

the comparative distance between nodes

n0 and n1. Two nodes are called neighbor

nodes if they can communicate with each

other without the help of any routing and

network topology will be maintained if

D(0,1) ≤ Rmax for all nodes. Consider two

generic nodes n0 and n1, let Xn0(t) and

Yn1(t) be at their positions at time‘t’. We

assume the distance between two nodes

at time t as:

D{n0,n1} = (1)

A communication link between

n0 and n1 at time‘t’ exists if D{n0,n1}

(t)<R, where R is common radio range

of all nodes in the network consist of

homogeneous nodes and D{n0,n1} is

the distance between two nodes. While

transmitting the packets, a feasible hop to

hop path is searched out. This satisfi es the

bandwidth constraints. Energy plays a vital

role in maintaining stable neighborhood

topology, a mechanism is required to

calculate the energy values at diff erent

times[11]. Node’s energy consumption after

time‘t’ is calculated using equation[15],

Ec(t) =Pt*α + Pr*β (2)

Where,

A Collaborative Approach for Malicious Node Detection in Ad hoc Wireless Networks

ResearchFront

Shrikant V Sonekar* and Manali Kshirsagar***Research Scholar, Department of CSE, G.H.Raisoni College of Engineering, Nagpur, M.S., India**Research Guide, Department of CSE G.H.Raisoni College of Engineering, Nagpur, M.S., India

Abstract—Security is at stake when communication takes place between mobile nodes in a hostile environment. Contrary to the wired networks,

the exclusive uniqueness of mobile ad hoc networks create a number of major challenges to security design, like mutual wireless medium, open

peer-to-peer network architecture, stern resource constraints and highly dynamic topology. These unfavorable conditions obviously require a

case for creating multidimensional security remedies that obtain not only wide range protection but also acceptable network performance.

Popularly used existing routing protocols designed to incorporate the needs of such indigenous networks do not address possible threats aiming

at the disruption of the protocol itself. The major challenge in ad hoc wireless networks is energy ineffi ciency; under certain circumstances, it is

almost impossible to replace or recharge the batteries. Hence it is desirable to keep dissipation of energy at lower point. Some of the problems

are limited energy reserve and lack of centralized coordination. In this paper, we identify the security issues, discuss the challenges and propose

the collaborative approach for malicious node detection.

Page 29: CSIC 2015( June )

CSI Communications | June 2015 | 29

Ec(t) :- It is energy consumed by the

node after time t

Pt :- It is the maximum number of

packets transmitted after time

t by the node

Pr :- It is the maximum number of

packets received after time t

by the node

α and β are constant lies in between 0 to 1

If initial level of node energy is E, the

remaining energy ERem of a node at time t

can be calculated as,

ERem = E- E c(t) (3)

Whenever a node identifi es any link

break due to HELLO messages or link

layer acknowledgements, it broadcasts a

Route Reply (RREP) packet (same as DSR

protocol) to notify the source node and

the end nodes[9].

In Fig. 1, if the link available between

nodes N and O breaks on the path L-N-

O-R, then Route Reply (RREP) packet will

be sent by both the nodes O and N to

notify the starting and the destined nodes.

The main advantage of AODV is, it avoids

source routing and reduces the routing in

a large network. AODV is also benefi cial

in expanding-ring-search to restrict the

excessive entry of RREQ packets and it

also searches for routes for uncertain

destinations[7]. In addition to this, AODV

also provides destination sequence

numbers (DSeq), which allow the nodes

to acquire more up-to-date routes. Firstly,

it requires acknowledgements from both

directional links and periodic link layer to

detect broken links[14]. In addition to this,

AODV has to keep routing tables for route

maintenance contrary to DSR[12].

Proposed System and AlgorithmMobility is a crucial characteristic of a

cluster in MANET especially at the time

of cluster formation and cluster head

election. The cluster head is responsible

for controlling and managing the

network[13]. Each cluster head is

identified by its own id. The election

of a cluster head is very important for

constructing any network. Different

algorithms have been used for electing

the Cluster Head. We have used a simple

concept for the Cluster Head election. In

Fig. 2, the distance between Node A and

Node B from both coordinators (x & y)

is calculated. Initially keep Node A as i

and Node B as j, then compare xref of i

with xref of j and yref of i with yref of j,

set the value of minxdist and minydist.

i.e. threshold range Rth. Apply the same

process for all the nodes, compare xref

and yref of node i with nodes (k,l,...n).

Steps for Cluster Head Election Algorithm

Step 1: Begin.

Step 2: For every member in the

cluster

Step 3: Calculate the distance

using x and y variables with

other clusters.

Step 4: If ID is the lowest and is

closer to the maximum

number of nodes.

Step 5: Repeat step number 3 and 4.

Step 6: Elect cluster head with

minimum ID and maximum

connectivity.

Step 7: Stop.

CH=

(4)

Equation 4 shows the two major

parameters for electing a cluster head i.e.

the lowest ID and highest connectivity

which represents the variable ‘a’ and ‘b’.

Variable ‘a’ represents the single node

and ‘b’ represents the remaining nodes in

the cluster, from above equation we can

fi nd out that the node with the lowest ID

will be elected as the cluster head. The

proposed algorithm is a combination

of the highest connectivity and lowest

ID. The static window snapshot shows

the steps of cluster head election, the

distance of neighbor node is calculated by

every node, snapshot shows the value of

Cx=3423.99 and Cy=427.387.

The table I shows the number of

nodes in each cluster and based on the

distance parameter we get the cluster

heads. The cluster head is also considered

as a node.

Table I. Cluster, Nodes And Cluster Head

• Requesting all members of the

cluster:-

Before sending the packets to all

the nodes in the cluster, the cluster head

(CH) sends a REQUEST (tsp,p) message

to all the nodes in its set of request Rp

(Radio Range) and places the request on

request-queuep ((tsp,p) is the timestamp

of the request). When a cluster node

(CN) receives the REQUEST (tsp,p)

message from the cluster head, it returns

a timestamped REPLY message to cluster

head and places the CH request on the

request_queuep.

• Releasing the Position of the Head:-

Cluster head, upon exiting due to

low energy level, deletes its request from

the top of its request queue and sends a

timestamped RELEASE message to the

Fig. 1: Route maintenance in AODV

Fig. 2: Distance comparison between two nodes

Cluster Nodes Cluster Head

Cluster 1 4 2

Cluster 1 8 4

Cluster 1 12 6

Cluster 2 4 1

Cluster 2 8 7

Cluster 2 12 9

Cluster 3 4 2

Cluster 3 8 6

Cluster 3 12 8

Cluster 4 4 3

Cluster 4 8 5

Cluster 4 12 4

Fig. 3: Simulati on result of distance comparison between two nodes

Page 30: CSIC 2015( June )

CSI Communications | June 2015 | 30 www.csi-india.org

entire cluster node in its request set Rp.

When a cluster node receives a RELEASE

message from the cluster head, it removes

the request of the cluster head from its

request queue. This helps in detecting

the malicious node. The performance of

the algorithm depends on the number of

messages required. Proposed algorithm

requires 2(N-1) messages and the

synchronization delay is ‘T’[9].

Simulation, Results and DiscussionThe simulation parameters that are shown

in Table II consider both the accurateness

and the eff ectiveness of the simulation.

The experiment is carried out using

simulator (OMNet++).

Table II. Simulation Parameters

Figure 4 shows the communication

of cluster head with all other cluster heads

and, in fi gure 5 cluster head communicates

the MAC address of malicious node to

all other cluster heads. Based on the

parameters like energy dissipation, end

to end delivery ratio, packet delivery ratio,

throughput, wrong replay etc we have

declared node as a malicious node.

Where,

TP= threshold parameters;

tp1=msg_id (message_id); tp2=tm_st

(time stamp); tp3=pack_del (packet

delivery ratio); tp4=data_sent (forwarded

packets); tp5=mobility; tp6=ack_msg

(acknowledgement message); tp7=w_

replay (wrong_replay); tp8=end-to-end

delay; tp9=number of packets drop;

tp10=repetition of packets.

Table III shows the parameter and

simulation values. Based on the values

we declared node ‘n9’ as malicious node.

Once the cluster head knows it, then

sends MAC address of node ‘n9’ to all

other cluster heads which are in radio

range. Simulation is carried out on 14

nodes, for each node we have taken the

metrics like timestamp, packet delivery

ratio, throughput, acknowledgment,

wrong replay etc.

Packet deliver ratio and packet

dropped routing ratio metrics are chosen

to evaluate the impact of the sequence

number attack, resource consumption

attack and dropping routing packet

attacks. Malicious Node detection

accuracy for diff erent number of nodes in

the area of 837*837 M is carried out using

the simulation shown in Fig. 6.

Assumption:-

• We have assumed the maximum

threshold value as 4

• We have considered 10 threshold

values

The proposed system declares the

node as a malicious,

if

(5)

Preliminary results show that

proposed algorithm detects the

malicious node with more accuracy and

eff ectiveness. We have measured packet

delivery ratio i.e. the fraction of the total

packets generated that are successfully

delivered. This metrics refl ects the network

throughput. The general observation

shows that proposed algorithm reduces

the attack by around 60%. When AODV

is attacked, the potential of the network

Parameters Values

Number of Nodes 25

Network Size 1000*900

Speed of Nodes 0-10 m/sec

Transmission Range 100 m

Battery Power of Node 100 Unit

Pause Time 0-20 Sec

Data Payload 512 byte

Host pause time 5 seconds

Traffi c type CBR (UDP)

Movement Model Random

waypoint

Fig. 4: Communicati on of cluster head with all other cluster heads

Fig. 5: Cluster Head communicates MAC address of malicious node

NodesThreshold Parameters (Each TP has 10 points)

Total

Avg.

Points=

Total/10tp1 tp2 tp3 tp4 tp5 tp6 tp7 tp8 tp9 tp10

n1 4 2.4 3.3 5.5 5.1 4 2 4.1 2 2.2 34.6 3.46

n2 3 2.1 4.2 4.3 5.1 2.8 4.6 2.9 4 7 40 4

n3 4 4.2 5 4.3 5 1.2 1.5 1.6 1.8 5.1 33.7 3.37

n4 5.2 0.8 2.3 2.9 7.1 4.8 1.5 4.9 5.8 0.9 36.2 3.62

n5 5 2 3.5 6.1 0.8 0.9 1 1.5 1.2 1.6 23.6 2.36

n6 8 2.9 5 4.2 4.1 4.3 3.2 3.1 1.3 1.2 37.3 3.73

n7 1.3 2.5 0.9 0.8 0.5 1.9 5 4.6 1.7 4.1 23.3 2.33

n8 8 9 1.6 1.8 1.9 2.1 2 1 3 4 34.4 3.44

n9 4.1 4.5 4.1 5 7 3 3.6 4.1 2.1 2.6 40.1 4.1

n10 1 5.2 5.5 2.3 2.6 3.6 2.8 5.2 1.9 5.2 35.3 3.53

n11 1.6 1.5 2.5 2.6 2.1 2.1 2.3 2.5 4.1 4.4 25.7 2.57

n12 1.8 2 2 2.3 3.5 1.9 4.7 7 5.7 4 34.9 3.49

n13 1.8 1.9 2.5 6 6 6.3 5.6 6.5 1.8 1.6 40 4

n14 1.4 1.5 1.6 1.6 1.8 5.2 5.3 3.9 6.3 4.6 33.2 3.32

Table III. Comparative Chart of Threshold Parameters and Simulation Values

10>4

Page 31: CSIC 2015( June )

CSI Communications | June 2015 | 31

decreases signifi cantly. The dropping

of the packets disrupts the network

connectivity. The delivery of the packets

is reduced when AODV is under the

resource consumption attack. In Fig. 8,

we could observe that the proposed

algorithm delivery the maximum number

of packets.

Table IV shows the comparison

of proposed work with existing

algorithms, it has been observed that

proposed work support all the required

parameters.

Conclusion and Future ScopeMANET is a potential research area with

applied utility. Securing it is a challenging

task. There are so many issues that need

to be solved. All intrusion detection

systems face the problem of false

alarm that occurs whenever the system

inappropriately results into an alarm but

there is no harmful behavior occurs in

the network. The challenge here is to

utilize the available power in an efficient

manner and not to provide each node

with higher battery power. There is a

possibility that some key nodes will

overuse the network and will have their

energy consumed fast. Loose clustering

could be one solution for preserving

the energy at cluster head level. The

comparative chart proposed in the paper

gives the efficient way for detecting the

malicious node, the distance based

parameter is used to select the cluster

head. Two assumptions we have made

in our paper for detecting the malicious

node in the cluster. Research in MANET

security is still open. Further work is

needed to enhance the performance of

the secure routing protocols. Moreover,

there should be some mechanism which

will restrict a malicious node to move in

the other part of the network.

References[1] Perkins CE, “Ad hoc Networking”,

Addison-Wesley, New York, 198-264,

2001.

[2] Rajeswary Malladi and Dharma

P Agrawal, University of Cincinnatti,

OH, “Current and Future applications of

Mobile and Wireless Network”, Vol. 45,

Issue 10, PP. 144-146, Communications

of ACM,2002.

[3] Xia Wang, Iowa State

University, “Intrusion

Detection Techniques

in Wireless Ad hoc

Networks”, proceedings

of the 30th COMPSAC

’06 IEEE Computer

Society.

[4] Neal Krawetz,

“Introduction to

Network Security”,

Thomson Learning, pp

5-13, 2011.

[5] Chunfu Jia and Deqiang Chen,

“Performance Evaluation of a

Collaborative Intrusion Detection

System”, IEEE Computer Society, 5th

International Conference on Natural

Computation, 2009.

[6] Vivek Richariya, Pravin Kaushik, “A

Survey on Network Attack in Mobile

Adhoc Network”, International Journal

of Advance Research in Computer

Science and Software Engineering,

Volume 4 Issue 5, May 2014.

[7] Aditi Kumar, Praveen Thakur, “Routing

Attack and their Counter Strategies

in MANET”, International Journal of

Advance Research in Computer Science

and Software Engineering, Volume 4

Issue 5, May 2014.

[8] Sourav Sen Gupta, S S Ray, O Mistry

and M K Naskar, Jadavpur University,

Kolkata, “A Stochastic Approach for

Topology Management of Mobile Ad

hoc Networks”, Asian International

Mobile Computing Conference, pp 90-

99, 2007.

[9] Carvalho, O S F and G Roucairol,

“On Mutual Exclusion in Computer

Networks, Technical Correspondence”,

Communications of ACM, Feb.1983.

[10] Priyanka,Mukesh Dalal,”Security

in MANET: Eff ective value based

Malicious node detection and removal

scheme”, International Journal of

Advance Research in Computer Science

and Software Engineering,Volume 4

Issue 5, May 2014.

[11] M A Rizvi, “Issues and challenges

in Energy Aware Algorithms using

clusters in MANET”, International

Journal of computing communication

and networking, Volume 2 April –

June 13.

[12] Amitabh Mishra, Ketan Nadkarni

and Animesh Patcha, Virginia Tech,

“Intrusion Detection in Wireless

Ad hoc Networks”, IEEE Wireless

Communications,2004.

[13] M.Abolhasan, T.Wysochiki and E

Dutkiewixz, “A review of routing

protocols for mobile ad hoc networks”,

Elsevier Journal of Ad hoc Networks,

1-22, 2004.

[14] C Liu and J Kaiser, “A survey of mobile

ad hoc network routing protocols”,

University of Ulm Technical Report

Series, No. 2003-08, University of Ulm,

Germany, 2005.

[15] A Ephremides, “Energy concerns in

wireless networks”, IEEE Wireless

Communications, 9(4):48-59, 2002. n

Fig. 6: Malicious Node Detecti on accuracy for diff erent nodes

Fig.7: Maximum speed of node movement vs delivery rati o (%)

Fig. 8: Simulati on result for sequence number att ack

Parameters K Hop

Connectivity

Lowest ID

(LID)

Weighted

Cluster

Algorithm

Election

of CH

(ECH)

Existing Algorithm Proposed

Algorithm

Broadcast Yes No No Yes

Throughput No Yes Yes Yes

Location Yes No Yes Yes

Energy Yes No Yes Yes

Table IV. Comparison Between Existing and Proposed

Algorithm

Page 32: CSIC 2015( June )

CSI Communications | June 2015 | 32 www.csi-india.org

Overfi tting leads to public losing trust in

research fi ndings, many of which turn out to

be false. We examine some famous examples,

“the decline eff ect”, Miss America age, and

suggest approaches for avoiding overfi tting.

Many people were surprised by

a recent study which overturned the

conventional wisdom and said there was

no link between eating saturated fat and

heart disease. It seems that every week

there are some new results, especially

in medicine and social sciences, which

invalidate the old results.

The phenomenon of old results no

longer holding has been so widespread 

that some journalists started to call it

the  “cosmic habituation”  or “the decline

eff ect” - the bizarre theory that the laws of

the universe seem to change when you try

to repeat an experiment.

The explanation is much simpler.

Researchers too Frequently Commit the Cardinal Sin of Data Mining - Overfi tting the DataThe researchers test too many hypotheses

without proper statistical control, until

they happen to fi nd something interesting

and report it.  Not surprisingly, next time

the eff ect, which was (at least partly) due

to chance, will be much smaller or absent.

We note that Overfi tting is  not  the

same as another major data science

mistake - “confusing correlation and

causation”. The diff erence is that

overfi tting fi nds something where there

is nothing. In case of “correlation and

causation”, researchers can fi nd a genuine

novel correlation and only discover a cause

much later (see a great example from

astronomy in  Kirk D. Borne interview on

Big Data in Astrophysics and Correlation

vs. Causality). [http://www.kdnuggets.

com/2014/05/interview-kirk-borne-big-

data-astrophysics-correlation-causality.

html]

Every day we learn about new

research through various sources and very

often, we use these research fi ndings to

improve our understanding of the world

and make better decisions. How would

you feel if you were told that most of

the published (and heavily marketed)

research is biased, improperly planned,

hastily executed, insuffi ciently tested and

incompletely reported? That the results

were interesting by design and not by

nature?

The inherent fl aws of prevalent

research practices were very nicely

identifi ed and reported by John P. A.

Ioannidis in his famous paper Why

Most Published Research Findings

are False (PLoS Medicine, 2005)

[http://journals.plos.org/plosmedicine/

a r t i c l e ? i d = 1 0 . 1 3 7 1 / j o u r n a l .

pmed.0020124]. Deeply examining some

of the most highly regarded research

fi ndings in medicine, Ioannidis concluded

that very often either the results were

exaggerated or the fi ndings could not

be replicated. In his paper, he presented

statistical evidence that indeed most

claimed research fi ndings are false.  Dr.

Ioannidis now heads a new  METRICS

center at Stanford, where he continues

to work on making sure that research is

reproducible.

So, “bad” research is not new, but the

amount of it increased with time. One of

the very basic tests of how “scientifi c” a

research is would be to observe its results

when the same research is performed in

multiple diff erent environments (that are

applicable) randomly chosen. Ioannidis

noted that in order for a research fi nding

to be reliable, it should have:

• Large sample size and with large

eff ects

• Greater number of and lesser

selection of tested relationship

• Greater fl exibility in designs,

defi nitions, outcomes, and

analytical modes

• Minimal bias due to fi nancial

and other factors (including

popularity of that scientifi c fi eld)

Unfortunately, too often these rules

were violated, producing irreproducible

results.

To illustrate this, here some of the

more entertaining “discoveries” that

were reported using “overfi tting the data”

approach:

S&P 500 index is strongly related to Production of butter in Bangladesh

[http://nerdsonwallstreet.typepad.com/

my_weblog/files/dataminejune_2000.

pdf]

Age of Miss America  is strongly

related to Murders by steam, hot vapours and hot objects

[http://www.tylervigen.com/view_

correlation.php?id=2948]

… and many more interesting (and

totally spurious) fi ndings which you

can discover yourself using tools such

as  Google correlate  or  the one by Tyler

Vigen.

The Cardinal Sin of Data Mining and Data Science: Overfi tting

Article Gregory Piatetsky-Shapiro* and Anmol Rajpurohit***President of KDnuggets**Graduate student (MS, Computer Science), UC, Irvine

Page 33: CSIC 2015( June )

CSI Communications | June 2015 | 33

Human tendency for “magic thinking”

tends to give such unusual fi ndings much

higher notoriety (Octopus Paul was world-

famous for “predicting” World Cup results

in 2010) and this does not increase the

general public trust in science.

Several methods can be used to avoid

“overfi tting” the data

• Try to fi nd the simplest possible

hypothesis

• Regularization  (adding a penalty

for complexity)

• Randomization Testing

(randomize the class variable,

try your method on this data - if

it fi nd the same strong results,

something is wrong)

• Nested cross-validation  (do

feature selection on one level,

then run entire method in cross-

validation on outer level)

• Adjusting the  False Discovery

Rate

ConclusionGood data science is on the leading edge

of scientifi c understanding of the world,

and it is data scientists responsibility to

avoid overfi tting data and educate the

public and the media on the dangers of

bad data analysis.[Taken from http://www.kdnuggets.

com/2014/06/cardinal-sin-data-mining-data-

science.html with permission from Dr. Gregory

Piatetsky.]

n

Gregory Piatetsky-Shapiro, Ph.D. is the President of KDnuggets, which provides analytics and data mining consulting. Gregory is a founder of

KDD (Knowledge Discovery and Data mining conferences) and is one of the leading experts in the fi eld. Gregory was the fi rst recipient of ACM

SIGKDD Service Award (2000). He also received IEEE ICDM Outstanding Service Award (2007) for contributions to data mining fi eld and

community.

Anmol Rajpurohit is a graduate student (MS, Computer Science) at UC, Irvine. His areas of interest are data science, machine learning and

information retrieval. His novel analytics solution for online education was the runner-up at UCLA Developer’s Contest 2014.

Abo

ut t

he A

utho

rs

Computer Society of IndiaUnit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093

Tel. 91-22-2926 1700 • Fax: 91-22-2830 2133

Email: [email protected]

CSI - Communications

COLOUR

Colour Artwork (Soft copy format) or positives are required for colour advertisement

Back Cover Rs. 50,000/-

Inside Covers Rs. 40,000/-

Full Page Rs. 35,000/-

Double Spread Rs. 65,000/-

Centre Spread

(Additional 10% for bleed advertisement)

Rs. 70,000/-

MECHANICAL DATA

Full Page with Bleed 28.6 cms x 22.1 cms

Full Page 24.5 cms x 18.5 cms

Double Spread with Bleed 28.6 cms x 43.6 cms

Double Spread 24.5 cms x 40 cms

Special Incentive to any Individual/Organisation for getting sponsorship 15% of the advertisement value

Special Discount for any confirmed advertisement for 6 months 10%

Special Discount for any confirmed advertisement for 12 months 15%

All incentive payments will be made by cheque within 30 days of receipt of payment for advertisement.

All advertisements are subject to acceptance by the editorial team.

Material in the form of Artwork or Positive should reach latest by 20th of the month for insertion in the following

month.

All bookings should be addressed to :

Executive Secretary

Computer Society of IndiaTM

Unit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093 Tel. 91-22-2926 1700 • Fax: 91-22-2830 2133 Email: [email protected]

(ADVERTISING TARIFF)Rates effective from April, 2014

Page 34: CSIC 2015( June )

CSI Communications | June 2015 | 34 www.csi-india.org

Abo

ut th

e A

utho

r

Rahul Bhati Currently pursuing B. Tech. in Computer Engineering from Charotar University of Science and

Technology, Changa, Anand, Gujrat interested in competitive programming, Machine Learning, Cyber Security &

FOSS.

Programming.Tips() »Salting PasswordsTypically, System Designers choose one of two ways to store

users’ passwords:

1. In original format, as plain text.

2. As the digest (output) of a one-way hash function.

It probably goes without saying that the fi rst option is a

bad idea considering that any kind of compromise of the user/

password database immediately exposes login credentials clients

may be using on many other sites, and for second one, just hashing

the passwords is barely more secure. So, what we can do is to salt

passwords which means, instead of just hashing the password we

hash the salt + password. Here as presented instead of password,

store hash = sha256(password). However, this will not protect

against rainbow table attacks, dictionary, brute force attacks etc,

but creating such an attack is expensive (takes time).

Here is a simple yet eff ective implementation in python

using pbkdf2 for salting the password from https://github.com/

SimonSapin/snippets/blob/master/hashing_passwords.py

import hashlib

from os import urandom

from base64 import b64encode, b64decode

from itertools import izip

# From https://github.com/mitsuhiko/python-pbkdf2

from pbkdf2 import pbkdf2_bin

# Parameters to PBKDF2. Only affect new passwords.

SALT_LENGTH = 12

KEY_LENGTH = 24

HASH_FUNCTION = ‘sha256’ # Must be in hashlib.

# Linear to the hashing time. Adjust to be high but take a reasonable

# amount of time on your server. Measure with:

# python -m timeit -s ‘import passwords as p’ ‘p.make_hash(“something”)’

COST_FACTOR = 10000

def make_hash(password):

“””Generate a random salt and return a new

hash for the password.”””

if isinstance(password, unicode):

password = password.encode(‘utf-8’)

salt = b64encode(urandom(SALT_LENGTH))

return ‘PBKDF2${}${}${}${}’.format(

HASH_FUNCTION,

COST_FACTOR,

salt,

b64encode(pbkdf2_bin(password, salt, COST_FACTOR, KEY_LENGTH,

getattr(hashlib, HASH_FUNCTION)))) def check_hash(password, hash_):

“””Check a password against an existing hash.”””

if isinstance(password, unicode):

password = password.encode(‘utf-8’)

algorithm, hash_function, cost_factor, salt, hash_a = hash_.split(‘$’)

assert algorithm == ‘PBKDF2’

hash_a = b64decode(hash_a)

hash_b = pbkdf2_bin(password, salt, int(cost_factor), len(hash_a),

getattr(hashlib, hash_function))

assert len(hash_a) == len(hash_b) # we requested this from pbkdf2_bin()

# Same as “return hash_a == hash_b” but takes a constant time.

# See http://carlos.bueno.org/2011/10/timing.html

diff = 0

for char_a, char_b in izip(hash_a, hash_b):

diff |= ord(char_a) ^ ord(char_b)

return diff == 0

Practitioner Workbench

Rahul BhatiPursuing B. Tech. in Computer Engineering from Charotar University of Science and Technology, Changa, Anand, Gujrat

Page 35: CSIC 2015( June )

CSI Communications | June 2015 | 35

Programming.Learn("R") »Cluster Analysis in R LanguageData Science requires better statistical analysis for solving complex problems. R programming language in such cases is very popular among statisticians and data scientists. It is a platform for statistical computations and graphics visualizations. R programming language is used in various applications invloving huge amount of data. R includes a total of 5800 additional packages and around 120000 functions available at the Comprehensive R Archive Network. Here, we are explaining ‘Cluster package’ available in R.

Basically there are two types of clustering approaches:

partitioning and hierarchal. K-means is one of the most popular

partitioning approaches. It requires pre-declaration of number of

clusters to extract. In R’s partitioning approach, observations are

divided into K groups and reordered to form the most interrelated

clusters possible according to a given condition.

Before doing cluster analysis, data without any value should

be removed for achieving better cluster extraction. Rescaling

of variables should be done for comparability. This is called pre

processing of data.

# Prepare Datamydata <- na.omit(mydata) # listwise

deletion of missing valuesmydata <- scale(mydata) # standardize variables

Here na is used for missing values estimation.

K-means ClusteringK-means algorithm is executed by function kmeans (data, n) available in R. where data is a numeric dataset or matrix and n is the number of clusters to extract. The NbClust package can be used as a guide in selection of number of clusters.Association of set.seed function with kmeans guarantees that the results are reproducible. The  kmeans()  function has an  nstart  choice that attempts several initial confi gurations and select the best one among various solution. It returns the cluster memberships, centroids, sums of squares (within, between, total), and cluster sizes. This approach is often recommended.

Hierarchical ClusteringThis clustering builds a a hierarchy of clusters. hclust() function is used from package stats for Hierarchical clustering. Basically hierarchical clustering is having two approaches to build a hierarchy of clusters:

Agglomerative: It is a “Bottom Up” approach: each

observation begins in its own cluster, and pairs of clusters are

combined in a single cluster as go up the hierarchy. agnes() function

from cluster is used for this purpose.

Divisive: In this “Top Down” approach a large cluster is

available splitted in separate clusters to build a hierarchy of

clusters. Merging and splitting is performed in greedy manner.

Function diana() can be used for divisive hierarchical clustering.

Practitioner Workbench

Ghanshaym RaghuwanshiResearch Scholar, Jaypee University of Engineering and Technology, Guna - MP

Fig. 1: K-means Clustering algorithm executi on in R

Fig. 3: Hierarchical Clustering algorithm executi on in R

Fig. 2: Visualizati on of K-means Clustering

Fig. 4: Visualizati on of Hierarchical Clustering

-2 -1 0 1 2 3 4 5

-6-4

-20

2

K-means clustering

x[,1]

x[,2

]

14 1512 17 16 20

1918

11 137

3 8 6 101 2 54 90

24

68

10

Hierarchical clustering(comp

hclust (*, "complete")dist(x)

Heigh

t

Page 36: CSIC 2015( June )

CSI Communications | June 2015 | 36 www.csi-india.org

IntroductionData forms an integral part of any

organisation. Data gets captured

in various transactions within an

organization. Erroneous data has a

major impact on Information Technology

(IT). As foundation of any edifi ce the

key integral element to IT initiatives is

data. It’s imperative to maintain high

standards of data quality since this will

be key diff erentiator in future. In today’s

competitive world, data is most important

asset for any company. It is unique to any

company.

No matter what the data is meant to

be used for, it is very crucial to maintain

accurate and complete data in any

enterprise or system. If the data present

in the system does not adhere to the

principles of data quality, it will lead to

various issues in the organisation. It is,

thus, very important to adhere to data

quality standards while implementing

ERP solution to be able to get the desired

benefi ts in the long run.

This paper is organized as follows.

Section 2 briefs the 4 common attributes

of data quality, Section 3 describes the

case study with a retailing client, Section 4

focuses on potential methods to overcome

data quality challenges with case in hand,

Section 5 briefs on a benefi ts reaped on

successful implementation of this method,

and fi nally Section 6 concludes the paper.

Data Quality AttributesMaintaining data quality can seem like a

scary activity, but all it takes is having the

right people, processes and technology in

place.

Data Quality assessment ensures

specifi c criteria for data to be assessed for

an organization. These can be defi ned in

few set of questions about data like

• What

• Who

• How

• When

• Why

As enterprises grow, the data sharing

grows across the business lines and

diff erent entities. It becomes all the more

necessary to maintain the unit of data

quality uniform across various lines

As per one of the defi nitions, data

quality can be classifi ed in four attributes

viz

• Accuracy

• Timeliness

• Completeness, and

• Consistency

Data accuracy as an attribute

involves in measuring the diff erence

between the actual and correct value.

The timelines defi nes the importance of

data reaching the downstream system

in the defi ned SLA. The data reaching its

systems and sub-systems can be defi ned

within a period of time. This defi ned SLA

becomes the benchmark to measure the

data timeliness. Data completeness is

the state when all elements /attributes

as deemed necessary are present.

Consistency defi nes the comparison of

data between system & sub-systems.

All types of data such as customer

data, product data, fi nancial data and

employee data are at equal risk. Bad

data aff ects all departments such as

Operations, Sales, Marketing, and Finance.

Scott Ambler’s Surveys at www.

ambysoft.com/surveys/ clearly indicates

that there is problem with the quality

of data for around 46% [38 + 8] of

respondents. About 52% of respondents

are satisfi ed with the overall data quality

at present, but have a few apprehensions

about data. This indicates how critical it is

to address the data quality issue.

The following sections describe the

issues identifi ed with the data migrated

in the new ERP, steps taken to rectify

the data and benefi ts perceived while

implementing the improvement measures.

Case Study: A Large Retailing Client Background

A large fashion retailer streamlined its

bespoke applications by implementing

ERP from Oracle. This packaged solution

comprised of multiple modules catering

to various business processes. The suite

of products implemented laid a solid

foundation for the company.

The data was present in disparate

systems prior to implementing this ERP.

There was duplicity of data in terms of its

management. The business attributes of

any fl ow were defi ned diff erently in various

functions. There were lot of redundancy in

data since the same data was present in

diff erent servers.

Data Cutover posed a big threat for

this organization. Since the same item

(aka SKU) was present in diff erent servers

with varied information it was challenging

to identify the correct parameters for

migration. ERP in general demanded data

in a particular format was not present

in the existing system. This resulted at

times to provide data to the best of one’s

knowledge rather understanding the

implications of data provided. Data quality

was amiss.

The company in question is a retailer

having a wide range of apparel, footwear,

home centre products and various other

lines of business. The retailer had around

three million Stock Keeping Units (SKU).

Description and Extraction of Data

The source data was stores in an Oracle

database. This database was very complex

in nature. The database contained 1704

tables, had a size to tune of 430 Gigabytes,

and stored both master and transaction

data used by this retailer.

As part of the data migration

team, the author along with a team of

professionals had access to the data

Data Quality Perspective on Retail ERP Implementation : A Case Study

Case Study

Dinesh MohataConsultant, Oracle Retail Domain, TCS, Banglore

Abstract : To improve adaptability and increase their chances of survival in this age of cut throat competition, enterprises constantly

deploy new applications with better technologies; this is done in alignment with the rapid changes in business environment. This paper

discusses the issues and challenges faced related to data quality parameters while implementing a Retail ERP solution, with examples

and scenarios from real world. It also analyses ways to overcome these data-related challenges and reap the benefi ts of a new solution.

Page 37: CSIC 2015( June )

CSI Communications | June 2015 | 37

provided by the retailer to migrate from

their legacy data to the new confi gured

ERP. The data nuances were observed post

the data migration as well. The data was

further analysed during the steady state

support.

Data Migration Approach

Data was supplied by the retailer in Excel

fi les. A staging area was built to load the

data in the excel fi les for further validation.

Data from the excel fi les was loaded as it

is in the staging tables. The data validation

comprised of checking the data for sanity

in terms of conformance with data types

of the target table. For example the data

type of number should not be entered in

character data type. Data should confer

to pre-defi ned rules of the business. For

example one style should have all the

items belonging to the same tax category.

Post all the data validations data will

be accepted and loaded in the target

system/s. If data is rejected, same gets

communicated back to data owner.

Data Quality Dimension

The dimensions used for the study

pertained to following

• Data Accuracy

• Data Timeliness

• Data Completeness

• Data Consistency

The data analysis had been done on

the migrated data during the Retail ERP

implementation. The process involved

getting the data from the legacy system

and migrating it into the ERP. The data

migrated further had an impact on the

business in terms of impacting the day-

to-day operations with respect to data

quality.

The following table details the

fi ndings for each of the data quality

parameters with four dimensions:

accuracy, timeliness, completeness, and

consistency.

The above table qualifi es the various

data related issues encountered in this

analysis. It provides a matrix of the issues

along with the data quality parameters

each of them has impacted. The following

section details the analysis.

Data Accuracy

Accuracy refers to mismatch in the

expected result as compared to actual

result with respect to data. In the table

mentioned above, there are issues where

the data as expected by the retailer was

not migrated to the new ERP. Additionally,

there have been instances where the

data entered in the new system was not

as expected in lines of business of the

retailer. Primarily, majority of the issues

belonged to the Data Accuracy category.

The currency of the supplier in the supplier

master was defi ned as USD whereas the

currency for a SKU in its relationship with

supplier was defi ned in INR. This resulted

in conversion of the cost resulting in

erroneous data. All items defi ned in a style

should have the same tax category was a

defi ned rule but this was not the case

when new items got created in the system.

This rule was a semantic rule defi ned at

the retailer’s end. The Maximum Retail

Price (MRP) of an item should not exceed

the cost of an item. This occurred due

to master data incongruence in various

systems.

The systems allowed an input date

of year as 0123 instead of 2013. From the

RDBMS perspective, date entered from in

the system is valid. However, date fi eld in

the context of data is not correct.

Data Timeliness

Data should be present in systems and

sub systems when required for an action

at all times.

ERP messages between the source

and target are architected to fl ow using

the Message Oriented Middleware

(MOM) framework. All the messages

are further classifi ed in terms of master

or transactional data. Further to master

or transactional data, the messages are

classifi ed within groups called families.

For example, all the messages related to

item are classifi ed under item family.

It has been observed that at times

some messages get dropped while fl owing

through integration bus, resulting in data

mismatch between modules. The ERP

primary system has multiple message

families for one functional area and

sometimes the timing of transmission

aff ects the data integrity between

applications. For example, a new SKU

message is stuck in integration because

of an error but related messages fl ows

through and get rejected because the SKU

is not yet available in the downstream

system.

Sta

gin

g

Ta

ble

sExternal Files Data Load Process

Data Validation Accept/Reject

Load the validated

DataAccept

RejectCommunicate Issue

ER

P S

yste

m

Fig. 1 - Data Flow Diagram for Data Migrati on

Data Quality Parameters

Issue Accuracy Timeliness Completeness Consistency

MRP Based Cost

Indicator√

Multiple Tax

Category in a Style√ √

Currency Mismatch √ √

Supplier data

coherency√

Cost greater than

MRP√

Data Stuck in

Interface√

Invalid Date in the

current context√

Missing attributes √

Table: Data Quality Issues with respect to defi ned parameters

Page 38: CSIC 2015( June )

CSI Communications | June 2015 | 38 www.csi-india.org

This resulted in the data not reaching

the target system on time. For example,

the transactional data of order reached in

the downstream system without item.

Data Completeness

On analysing all the SKUs post migration

of the data, it was observed that certain

elements of the data were not completely

migrated.

For example, item data in the ERP

resides in the approximately 15 entities.

In certain cases, it was observed that data

was migrated in all entities, however data

was not complete. In few of the entities

certain attributes were missing. The

source data did not have the data required

for the target in certain cases for few

attributes.

It was observed that data was

incomplete in around 1% of the cases.

Data Consistency

The implications for the data have huge

repercussions on the overall governance

of the processes. For example, at this

organisation the following challenges

were faced in terms of data consistency.

The master system of supplier

data and the ERP system were not

in sync. Currency provided for the

supplier was wrong. This resulted in

provision of wrong cost to the supplier.

The transactions created in the system

post this wrong master data resulted in

incorrect computation of the values for

the transaction. This led to wrong margins

reported for the company. The business

decisions went wrong.

Supplier was created as a silo

element in Oracle Retail. This entailed that

all the attributes of supplier is supposed to

be correct based on the fi nancial system.

However, data provided in Oracle Retail

were not complete. For example, in Oracle

Retail supplier creation is tightly coupled

with delivery timelines, associations with

SKU supplied by the supplier at its primary

location, multiple location association

with its currency etc. These data elements

to complete a supplier creation in Oracle

Retail were not present in the source

system. However, when these fi elds were

interfaced with Oracle Financials they

created a gaping hole and resulted in data

coherency issues.

Overcoming Data Challenges Retailer faced challenges with respect to

data quality parameters. The data quality

from the implementation perspective

resulted in major level fi ndings. The data

quality parameters as highlighted in the

earlier sections went wrong due to the

following major reasons:

• Buggy Code – Incorrect data entered

the system due to the faulty code.

This faulty code was referred to the

data getting generated by the current

system. The logical interpretations

made by the program went wrong

at times technically and functionally.

Technically, the issue for example

was wrong initialisation of the

variables in the code. Functionally,

the issue was wrong computation

of derived value as the logic applied

was not appropriate. When issues

were unearthed the data fi x was

done in the production environment.

Subsequently, correction in the code

was done for long term fi x.

• Interface Issues - The data between

systems were not synchronised

within the stipulated time. There was

delay in the data being posted from

source system to target systems.

At times, due to inherent issues

with memory leakage in the system,

the interface does not behave

as intended. For example, from

performance perspective if the real

time interface is designed to handle

1000 records for a minute and it has

been loaded with 100,000 records,

the system might behave abnormally.

Additionally, data not reaching the

target system on time. For example,

the transactional data of order

reached in the downstream system

without item.

• Data Entry – The data entry relates

to wrong data being entered into

the systems. The data entry can be

done manually in the system or it

can be uploaded into the system

using spread sheets. Data entered

at times was incorrect due to lack

of understanding about the context

of the data, subsequently results in

repercussions within the system.

• Data Representation – The reporting

system had multiple versions of the

data in the reports created for the

end users. The reports had diff erent

interpretations of data from multiple

user community. For example, the

business logic of the stock aging

from the buyer versus the inventory

community was diff erent. This

resulted in erroneous representation.

The paper’s analysis puts further

insight for getting started with accurate

data management. The data quality can be

checked. It requires a commitment from

the top management to ensure there is no

compromise in the data accuracy. It has to

be an ongoing journey. Approach to the

entire data paradigm, can be structured

with certain pointers as outcome of this

analysis.

1. Reconciliation process Reconciliation processes were

adopted in various system / sub-systems

to ensure there is minimal mismatch. The

reconciliation results were published to

all stakeholders that resulted either in

correction of the code in the system/sub-

systems or fi xing of the incorrect data.

2. Introduction of AlertsAlerts are an eff ective mechanism

to monitor the anomalies within the

system or between systems. Alerts were

introduced in the system to warn the users

about the incoherency of data between

the systems.

3. Organizational Boot CampsUser boot camps were organised to

educate the end user about data entry

nuances along with the importance of

data. This helped in ensuring the on-going

data records entry was appropriate.

4. Data Entry Stringent measures for data upload

were adopted across all touch points.

Optimal load for the systems were

identifi ed for data processing. The data

entry was made more systematic. For

example, data entry for the date attribute

now had a data picker rather than the

manual data entry option in the system.

The batch execution was staggered for

data timeliness.

Benefi ts The following points list the benefi ts

derived by an organisation that

implements the data quality processes:

• Higher Customer Satisfaction

• Higher Operational Effi ciency

• Enhanced Decision Support System

• Correct Conclusions

• Bolstered Organisational Confi dence

• Higher ROI on IT Investment

Data quality completeness and

Page 39: CSIC 2015( June )

CSI Communications | June 2015 | 39

accuracy resulted in high trust by the

stakeholders in the system. With accuracy

and completeness attributes the user

community showed high confi dence in

the system. Data being present in the

system in timely fashion further resulted

in delight with the customer in terms of

taking decisions. The decision process

improvement resulted in the getting

a higher Return on Investment by the

systems. For example post the appropriate

data quality in place retailer used this data

to pass on vendors to get penalty of delays

in shipments.

The above points clearly prove a point

in terms of ensuring that processes of data

cutover should be followed religiously

to get the right kind of direction for the

enterprise.

Conclusion Data’s journey is quite fascinating. This

paper has provided us with an opportunity

to analyse some real data and get us

with multiple insights. Data has been

studies in both pre and post era of ERP

implementation at the retailer, and

identifying the data quality issues. The

data quality was measured in terms of

accuracy, consistency, timeliness and

completeness. The issues identifi ed during

the process were primarily due to faulty

code, interfacing data between systems

and incorrect entry of data in the systems.

Data had problems in terms of

definition across systems, in terms

of data transportation within system

and sub system, incorrect data entry

and last but not the least in terms of

consistency between systems. The data

at source with problems were arrested

with introduction of stringent data entry

mechanisms. Reconciliation processes

ensured to break data coherency and

consistency issues. The actions taken

resulted in huge benefits in terms of

customer satisfaction with the system

and further strengthened the confidence

of the end user in the system.

The journey of data from preparation

and cleansing to migration, adhering

to data quality, ensures that correct

processes are applied for a successful

implementation of ERP. Thus, we

can summarise that the data related

challenges as articulated in this paper

can be addressed through a robust data

quality program.

References[1] T H Davenport, “Putting the

enterprise into the enterprise

system”, Harvard Business Review,

76 (1998), pp. 121-131.

[2] Taking the First Steps Toward

Data Quality by Elizabeth Dial,

Technical Solution Architect, IBM

Corporation -http://www.ibm.

com/developerworks/data/library/

dmmag /DMMag_2010_Issue2/

FeatureDataQuality/index.html

[3] Ballou, D P, & Pazer, H (1985).

Modeling data and process quality in

multi-input multi-output information

systems. Management Science,

31(2), 150-162.

[4] Data Driven: Profi ting from Your Most

Important Business Asset Hardcover

– 22 Sep 2008 by Thomas C Redman

(Author)

[5] Data Quality Survey – www.ambysoft.

com/surveys - Scott Ambler

[6] A Case Study on the Analysis of

the Data Quality of a Large Medical

Database - 20th International

Workshop on Database and Expert

Systems Application.

[7] Ballou, D P, Madnick, S, & Wang,

R (2004). Assuring information

quality. Journal of Management

Information Systems, 20, 9–11.

n

Abo

ut th

e A

utho

r

Dinesh Mohata is a Consultant in the Oracle Retail domain at TCS. Dinesh has over 15 years of IT & consulting

experience in software design, development, deployment and testing. He has retail industry experience of over 12

years. His areas of interest include Agile Development Methodology, Data Quality & Implementation Data Cutover.

Dinesh can be reached at [email protected] or [email protected].

Guidelines of Sending CSI Activity Report• Student Branch activity Report : send to: [email protected] with a copy to  admn.offi [email protected]  and  director.

[email protected]

The report should be brief within 50 words highlighting the achievements and with a photograph with a resolution higher than 300 DPI.

• Chapter activity Report: send to: [email protected]

The report should be within 100 words highlighting the objective and clearly discussing the benefi ts to CSI Members. It should be accompanied by a photograph with a resolution higher than 300 DPI.

• Conference/ Seminar Report : should be sent by Div Chairs and RVPs to : [email protected]

The report should be brief within 150 words highlighting the objective and clearly discussing the benefi ts to CSI Members. It should be accompanied by a photograph with a resolution higher than 300 DPI.

Dr. Vipin Tyagi, VP, Region III ([email protected]) will be coordinating publishing of reports of these activities.

(Prof. Anirban Basu, Vice President, CSI)

Page 40: CSIC 2015( June )

CSI Communications | June 2015 | 40 www.csi-india.org

IntroductionWhen Edward Snowden made revelations

about records of millions of users being

accessed by NSA without their consent or

even knowledge in June, 2013, the whole

world was in for a shock. The concept of

their data not being safe on the network,

would have crossed their ears but not

their minds. This was an eye opener for all

internet users that any data that is online

is open for unauthorized scrutiny.

Internet has been designed with

the basic goal of providing functionality

and not security, so its architecture is

vulnerable. By vulnerability, it means

some inherent weaknesses which can be

exploited and thereby leading to security

threats and cyber attacks. Cyber-attack

is defi ned as “deliberate actions to alter,

disrupt, deceive, degrade, or destroy

computer systems or networks or the

information and/or programs resident in

or transiting these systems or networks.”

The Changing ScenarioThe threats in the cyber space have always

been a matter of concern. Computer worm

created by Robert Morris is recognized as

one of the fi rst worms to aff ect the world’s

cyber infrastructure. This self-propagating

worm succeeded in closing down much

of the internet in year 1989, when it was

created. Due to internet’s infancy at that time

the impact was not devastating. However, it

raised concerns and laid the foundation of

robust security systems, we see today.

1990’s saw various viruses and worms

going viral, ILOVEYOU and Melissa virus,

to name a few. These viruses travelled

into the network through e-mails and then

maliciously propagated themselves, leading

to higher network traffi c. Their threat led

to the development of antivirus software.

The anti-virus software stores signature

of already known viruses and check all the

incoming traffi c for their presence. They are

regularly updated for addition of recently

found virus signatures. In case the incoming

traffi c matches the signatures, it is barred

from entering the internal system.

The new millennium is the witness

to how cyber space and cyber attacks

have radically changed as internet grew

exponentially and permeated into the

fabric of everyday things. Individuals,

organizations and governments, all are

dependent on the Internet for plethora

of tasks. As things stand today, all

our data resides in the cloud, mobile

phones have been replaced by smart

phones, social networking is the way

of expression, cyber economy is on

rise, startups only means e-commerce,

Wi-Fi is off ered free at hotels, restaurants,

café’s, airports etc. These changes have

happened at an astronomical pace and

have tremendous eff ect on how risks &

threats were understood and perceived.

The cyber attacks have transformed in the

wildest possible ways by becoming more

organized, sophisticated and mean.

Most Prevalent AttacksSome of the common cyber attacks

include denial of service attacks, phishing,

defacement, SQL injection, IP spoofi ng etc.

A brief introduction about them follows:

Denial-of-Service (DOS) Attack -

Denial-of-Service (DOS) Attack is a

malicious attempt to make a server or

network resources unavailable to the user

by temporarily interrupting or suspending

the services of the host connected to the

internet. When this disruption is caused

by many computers distributed globally, it

is known as distributed DoS or DDoS. It is

a primitive attack, yet very common due to

their effi ciency and simplicity of arranging

off ensives. For DDoS, vulnerabilities need

not be known and exploited.

In March 2013, Spamhaus, a non-

profi t organization that aims to help

e-mail provider’s fi lter out spam and other

unwanted content was hit by DDoS attacks

at 300 Gbps; strong enough to take any

government’s internet infrastructure.

This attack eff ected internet services

globally. In November 2014, Hongkong

independent news sites were infl icted by

unprecedented in scale DDoS attacks. The

sites were pounded with junk traffi c at a

remarkable rate of 500 Gbps.SQL Injection-SQL injection is code

injecting or inserting attack on the application layer to maliciously read, retrieve, manipulate and or execute data in the database using structured query language. Its severity can range from simple reading of the data to completely destroying the data. Modern websites have dynamic pages consisting of login pages, shopping carts, various forms, search options etc. which prompt the user to submit data as input and based upon the input, retrieve output. All syntactically correct queries are executed by the SQL server whether they are semantically or logically correct queries or not. So a skillfully crafted query can yield desirable outputs like access to sensitive data and its manipulation.

Phishing- By masquerading as a

reputable, trustworthy entity, phishers can

send e-mails to the users inducing them to

visit websites by following the links provided

in the e-mail. The unsuspecting user is lured

into revealing his sensitive and confi dential

information on the fake website. This fake

website may further contain links to various

Area Prone to Cyber Attacks

Fig. 1: Indian websites defaced according to domain name in 2013

Abha Thakral*, Nitin Rakesh** and Abhinav Gupta****Assistant Professor, Department of Computer Science Engineering, Amity University, Noida**Deputy Head Corporate Resource Center & Associate Professor, Dept. of CSE, Amity University, Noida***Senior Chief Engineer- Advanced R&D, Samsung R & D Institute India - Delhi

Security Corner

Page 41: CSIC 2015( June )

CSI Communications | June 2015 | 41

malware. The fake websites are created by

phishers which may look exactly like original

websites of legitimate enterprises like user’s

bank, employer, favorite social networking

site, ISP etc.

Common scenario is that the user

receives an email from his bank or trusted

entity stating

• To enhance account’s security

• Or a fraudulent activity is suspected

on your account

• Or you will lose your important

information…etc

Such statements are crafted smartly

to look convincing and draw user’s

attention. The innocent user is then

requested to click on the link embedded

in the e-mail which leads him to bogus

website.

Defacement - The dictionary meaning

of defacement is act of damaging or

spoiling the surface of something. In

context of cyber attacks it refers to

changing the appearance of a website.

The attacker can deface a website by

maliciously breaking into the web server

that hosts the website and replacing its

content with their own.

Websites are the face of any

organization and defacing them may

lead to loss of brand image and faith of

customers. They may have little or no

fi nancial incentive but have immense

impact as they are visible to one and

all. Also defacement may be coupled

with a malware which can then aff ect

the computers on which the website

is opened. Religious and government

websites are major victims of defacement.

Such defacement may be done to bring

across a political or religious statement.

Indian government’s portal india.gov.

in was defaced on 19 February, 2014 and

a message regarding the issue of Kashmir

was posted. As per a report, 24,216 Indian

websites were defaced in year 2013. A

detailed study of domain wise Indian

websites defacements establishes that

.in websites were attacked the maximum

(fi gure 1).

Sectors Prone to Cyber AttacksThe cyber economy has become a

mirror image of the real economy, with

similar kind of business processes. The

technology-inspired, enabled and run

systems encompass all types of business

processes. No sector or domain has

been left untouched by its Midas touch,

ranging from fi nance to administration,

entertainment to education and

manufacturing to healthcare. And as these

sectors evolve technologically, the cyber

attack threats they face are also evolving.

Some sectors are more prone to attacks

than others.

Financial Sector - Financial Sector

is the prime target of cyber criminals.

The reason is obvious – an Indelible link

between money and crime. Banks and

fi nancial institutions deal with money

which is now stored and transacted online

in the digital making them vulnerable. In

addition to money, they are also vulnerable

on account of the sensitive data they

possess. According to Ponemon Institute

Survey, losses in US fi nancial services

companies due to cyber crime exceeded

$23 million.

The two way benefi t for the attacker

follows, as not only he can gain money but

also the peripheral information including

contact details and ID etc, that can be sold

in the black market later. A report suggests

that global black market for email ID’s and

ID Nos. is worth $5billion and growing.

More than 360,000 credit card

accounts were aff ected by the cyber

attack on third largest US Bank, Citibank

in May, 2011. Around $2.7 million were

stolen from breached accounts.

In attack against Lockheed Martin,

Secure ID’s were used by hackers. Secure

ID’s are the tokens used by offi ce workers

to access their systems. These secure ID’s

were made by EMC Corporation.

Health Care Sector - The healthcare

related information is considered to be

high price commodity as the healthcare

records have personally identifi able

information. This individually unique

information if stolen can be sold in the

black market or can be used for multitude

of attacks. Until recently, this sector was

not frequently targeted, but is gaining

popularity amongst attackers because

of abundance of personal information

and unpreparedness of the healthcare

industry to tackle such attacks. The health

care sector suff ered highest share of data

breach attacks in 2013 and 2014 with

7.4 million personal records being

exposed in US.

Just a month back i.e. in March 2015,

Premera Blue Cross, a health insurance

company announced that it faced cyber

attack that may have aff ected records of

11 million customers. The records include

history on medical problems, credit card

nos., social security no. etc.

Another health insurance company,

Anthem WLP has also admitted this

year that its 80 million customers may

have their personal data exposed to

cybercriminals.

Energy Sector - Energy sector

consists of oil, gas, coal, nuclear energy

and electricity. Since these are part of

critical national infrastructure, they are

also usually high on target for cyber

attacks. They are vulnerable as networked

corporate systems are established for their

distribution and servicing. Any attack in

this domain has signifi cant consequences.

World’s largest state owned Oil

Company Saudi Aramco was infected by

Shamoon virus which erased data from

its computers. As a result the largest oil-

producing company had two weeks of

downtime. So the organization lost its

data, its productivity, and its profi ts and

had to replace a huge number of infected

machines.

Telecommunications Sector- Telecom

has become part of critical infrastructure

as our dependence on it continues to grow.

Parallely the risks its faces also continue

to grow, with cyber attacks amongst them.

Attack on communication channels has a

very deep impact as sending and receiving

of critical information gets disrupted.

Cyber attackers by controlling the fl ow of

information can control the pulse of the

nation or state.

In 2014, Germany based Telecom

giant Deutsche Telekom registered close

to one million hacker attacks daily on

its grids. Furthermore, as per a study,

the threats due to cyber crimes caused

economic damages worth $575 billion to

German companies in 2013.

Internet of Things-Target sector in

future - The attack surface has increased

with almost every device In business and

at home getting connected to the internet.

Hacks against refrigerator and cars have

been occurring already. With IoT in its

evolving phase, new protocols are being

introduced which may come with new

vulnerabilities, leading to new threats.

Manufacturing and industrial environment

stand at more risk than individuals, as

they deploy control systems for activating,

monitoring and functioning of mechanical

controls. These control devices are

integrated with computer systems to then

control doors, windows, valves, equipment

arms etc. Also very high level of diversity

in industrial control system technologies

makes them more vulnerable.

Motivation for AttacksFinancial Gains – Money and crime is

traditionally linked because the biggest

motivation for attacks is fi nancial gains.

Page 42: CSIC 2015( June )

CSI Communications | June 2015 | 42 www.csi-india.org

The attacker makes money by selling

stolen data / intellectual property rights,

blackmailing user with secretive data and

through misuse of personal information/

photographs etc.

Political Reasons - Cyberspace can

be used to support propaganda, make a

political statement or to sustain an issue

by attacking websites. The websites are

defaced, temporarily brought down or

shut down permanently. In such a case the

attacker can be considered to be highly

skilled with latest fi nancial and technical

sabotage capabilities at his disposal.

Hackers - Hackers may be benign

explorers who for fun or out of curiosity

are exploring various weaknesses of the

internet. They may not be skilled and use

existing knowledge on the internet to

break in the websites and cause damage.

But as they mature and develop skills,

due to challenges by the peers and to

attain applaud amongst peers, hackers

may undertake malicious activities. It

may include espionage where secrets

are obtained without the knowledge and

permission of a user.

Anonymous - a group of hactivists,

very popular in the cyber space have

initiated and executed many clamorous

attacks against governments and

organizations. Anonymous collective

announced about themselves in the year

2008 by uploading a video on YouTube,

where through the video they waged a war

on the Church Of Scientology. Such was

the impact of this video that the protest

moved from cyberspace to the streets

where people assembled and marched in

opposition of the religious group.

Indian ScenarioCyberfraud in 2013 cost the world whooping

US$113 billion and India US$4 billion, amid

rising incidents of cybercrime. Cert-In, the

functional organization of Government of

India with objective of securing Indian cyber

space handled more than 71,000 incidents

ranging from spam to website intrusion,

phishing etc. It amounts to 225.38% growth

rate from 2012-13 and 3519.87% growth rate

from 2005-13 (fi gure 2). The detailed bar

chart gives number of malicious incidents

handled by it yearly, starting from 2005

onwards.

Reasons which fuel the vulnerability

in Indian context include:

• Growing Economy

• Advancements in IT infrastructure

• Political movements

• Population is increasingly embracing

the online platform

• Unpreparedness of organizations to

tackle attacks i.e. usage of old legacy

systems unequipped of tackling

sophisticated attacks.

FutureCyber capabilities have grown exponentially

and will play a crucial role in future confl icts.

So the conventional weapons like bullets,

bombs and missiles may be replaced by

cyber attacks. The sophistication level of

attacks have increased to such levels that

the attackers can remove all evidence of

their attacks i.e. attack footprints within

few minutes of the execution of the attacks.

And by merely seeing the methodology of

attacks, origin point, target chosen, language

used, servers deployed etc, who the attacker

is cannot be attributed. Cautions to be taken

include:

• Solutions which have the capacity

to analyze the network traffi c in real

time and take actions accordingly

need to be deployed.

• State-of-the art self-driven, self-

learning, self-upgrading tools and

techniques need to be developed.

• Extensive audit of every access

point into and out of the network to

ensure security must be applied. This

should also include employees and

third parties like contractors, agents,

vendors, suppliers and partners.

• New compliance regulation and

stringent controls must be deployed

by government keeping the current

security threats in mind. n

Ms. Abha Thakral is currently working as Assistant Professor on Grade II with Department of Computer Science and Engineering at Amity University Uttar Pradesh, Noida. She is also a Research Scholar working in the fi eld of Cyber Forensics.

Dr. Nitin Rakesh is Deputy Head Corporate Resource Centre & Associate Professor in Department of Computer Science and Engineering at Amity University Uttar Pradesh, Noida. His rich experience in Academic & Research includes Network Coding, Interconnection Networks & Architecture, Network Resiliency, Networks–on Chip, Network Algorithms, Parallel Algorithms and Fraud Detection in Online Phantom Transactions. He is member of IEEE, ACM, SIAM, IAENG and Life member of CSI. He is also a recipient of Drona Award for TGMC-2009 by IBM. Moreover he is responsible for Corporate Interface, Training and Placements.

Dr. Abhinav Gupta is Senior Chief Engineer at Samsung Research Institute Advanced R&D at Samsung R & D Institute India - Delhi. He is responsible for Product Innovation, Research & Innovation, Collaborative research with premium research organizations for futuristic product development. He is Doctor of Philosophy in Computer & Systems Sciences from Jawaharlal Nehru University, New Delhi. Master of Technology in Signal Processing from Indian Institute of Technology, Guwahati and Bachelor of Technology in Electrical Engineering from Institute of Engineering and Technology, Bareilly.A

bout

the

Aut

hors

Fig. 2: Security Events handled by CERT-In yearly

Page 43: CSIC 2015( June )

CSI Communications | June 2015 | 43

Brain Teaser Dr. Durgesh Kumar MishraChairman Division IV Communications, Professor (CSE) and Director Microsoft Innovation Center, Sri Aurobindo Institute of Technology, Indore

Crossword »Test your knowledge on Data SciencesSolution to the crossword with name of fi rst all correct solution providers(s) will appear in the next issue. Send your answer to CSI

Communications at email address [email protected] with subject: Crossword Solution – CSIC June Issue.

Solution to May 2015 crossword

CLUESACROSS2. Approximately 1000 Petabytes of data.

4. A workfl ow processing system.

6. Graphical representation of analyses.

9. A connectivity tool.

13. Making an intuition-based decision.

15. Framework for populating Hadoop with data.

16. Deviation of an object from average object.

18. Correctness of data.

19. Any delay in response.

20. A cloud computing platform by Microsoft.

21. A messaging system developed by Linkedin.

23. Ability to maintain performance with diff erent load.

24. Software framework for big data processing.

DOWN1. Knowledge as a set of concepts.3. Process of removing all data points that could lead to the identity

disclosure.5. An open source search engine built on Apache Lucene.7. A distributed and open source database.8. A visual abstraction of machines and database.10. Programming language suited for parallel data.11. An open-source software framework for big data.12. The task of grouping a set of objects in such a way that objects

in the same group are more similar to each other than to those in other groups.

14 The process of representing abstract data as images for better understanding.

15. A backup operational mode.17. Data about data.22. An Apache data serialization system.

Did you know• 90% of the total data in the globe has been

generated in last two years.• US National security agency build a data centre

in Bluffdale with capacity of 1 Yottabyte which equals to one trillion Terabytes.

• 4 million search queries per minute are received by Google.

• 2.5 million pieces of contents per minute are shared by Facebook users.

• 300,000 tweets are done per minute by Twitter users.• 220,000 new photo per minute are posted by Instagram users.• 72 hours of new video contents per minute are uploaded by

YouTube users.• 50,000 Apps per minute are downloaded by Apple users.• 571 websites are created per minute.

Rashid SheikhAssociate Professor, Sri Aurobindo Institute of Technology Indore

We are overwhelmed by the response and solutions received from our

enthusiastic readers.

Congratulations!All correct answers to May 2015 month’s crossword received

from the following readers:

Er. Aruna Devi (Surabhi Softwares, Mysore)

Ajit Kumar (Pondicherry University)

Akshay G. Joshi (PES Institute of Technology, Bangalore)

Page 44: CSIC 2015( June )

CSI Communications | June 2015 | 44 www.csi-india.org

Call for Papers

CCIS 2015 2015 International Conference on

Communication Control & Intelligent Systems(Technically sponsored by IEEE Uttar Pradesh Section in association with CSI Mathura Chapter)

(Sat-Sun) November 07-08, 2015(Conference id-36597)

www.gla.ac.in/ccis2015

Organized by: Department of Electronics & Communication Engineering

Introduction:The fi rst international conference and 10th conference in sequence,

Communication Control and Intelligent Systems (CCIS 2015) will be held on

November 07 & 08 2015. CCIS 2015 is an international conference where

theory, practice and applications of communication systems, control systems,

intelligent systems and related topics are presented and discussed.

About GLA University:GLA University runs courses as B.Tech (CE, CS, EE, EN, EC, ME), Diploma in

Engineering, B.Pharm, D.Pharm., BBA, BBA(Family Business), BCA, B.Sc.(Hons.),

B.Com (Hons.), B.Ed. , M.Tech (CE, CS, EC, ME, EE), M.Pharm (Pharmacology,

Pharmaceutical Chemistry), MBA, MCA, M.Sc.(Bio-Technology, Microbiology

& Immunology) & PhD. The university campus is spread over more than 120

acres of lush green pollution free grounds and is located on Delhi-Mathura

National Highway No.-02.

Conference Theme:Technical paper Submissions are invited under the following topics, but are not

limited to:-

Track-1 : Wireless and Wired Networks, Multimedia Communications,

Comp uter Networks, Optical networks, Networking & Applications, Next

Generation Services

Track-2 : Control Systems, Nonlinear Signals and Systems, Embedded systems

and software, intelligent systems, neural networks and fuzzy Logic, Robotics

and applications, Machine learning and soft computing, System identifi cation

and control, Algorithms and Computing.

Track-3 : VLSI Technology, Design & Testing , Signal processing, ,Bio-Medical

Processing, Speech image and video processing, Analog and Mixed Signal

Processing, Hardware Implementation for Signal Processing, Text processing,

Database and data mining

Track-4 : Monolithic and hybrid integrated (active and passive) components

and circuits, Antennas and phased arrays, RF packaging and package modeling,

RF MEMS and Microsystems, EMI/EMC

Track-5 : Adhoc Networks, ubiquitous and Cloud computing, Distributed and

parallel systems, Security and information systems, Network security

Submission Prospective authors are encouraged to submit their paper through easy chair.

The link is available on the conference website. Submissions must be plagiarism

free and not more than 5 pages in IEEE format. Use the following link to submit

your papers.

https://www.easychair.org/conferences/?conf=ccis2015Proceedings PublicationAll Accepted and presented papers of the conference by duly registered

author(s), will be submitted to IEEE Xplore digital library for possible publication.

Important Dates/Deadlines

June 11, 2015 Submission of regular paper

August 22, 2015 Paper acceptance notifi cation to authors

September 22, 2015 Last Date of registration

September 29, 2015 Last Date of Camera Ready Copy Submission

September 29, 2015 Last Date of Copyright form Submission

Registration DetailsAll delegates are required to register for the conference as per the following

details:

Corporate executive and professional Rs 12,000 /-

Academicians IEEE/ICEIT/CSI/IETE Members Rs 8,000 /-

Academicians Non Member Rs 10,000 /-

Students IEEE/ICEIT/CSI/IETE Members Rs 5,000 /-

Student Non Members Rs 6,000 /-

Academicians from abroad US$300

For any inquiry please Contact: [email protected]

GLA University, Mathura , 17 km stone, NH-2, Mathura Delhi Road, P.O.

Chaumuha, Mathura-281406, UP. India

Tel: (05662) 250909, 250900, 9927064017, Fax: (0566 - 2)241687, Website:

www.gla.ac.in

Mr. Vishal Goyal (Technical Program :�+91-7500446622Committee Chair)Mr. Atul Bansal (Technical Program :�+91-9760001881Committee Chair)

Mr. Aasheesh Shukla (Publication : � +91-8126130707Committee chair)Dr. T. R. Lenka (Publication :�+91-9435387419 Committee Chair)

Why Join CSI:1) To be a part of the distinguished fraternity of famous IT industry leaders, brilliant

scientists and dedicated academicians through Networking.

2) Professional Development at Individual level.

3) Training and Certifi cation in futuristic areas.

4) International Competitions and association with International bodies like IFIP and

SEARCC.

5) Career Support.

6) CSI Awards.

7) Various Publications.

Page 45: CSIC 2015( June )

CSI Communications | June 2015 | 45

Report from Kolkata ChapterThe National Conference on Computing, Communication and Information Processing (NCCCIP-2015) sponsored by All India Council

for Technical Education (AICTE), New Delhi under North East Quality Improvement Program (NEQIP) and technically sponsored by

Computer Society of India (CSI) Kolkata Chapter was held successfully during 2-3 May 2015 at North Eastern Regional Institute of

Science & Technology (NERIST), A Deemed University under MHRD Govt. of India, Nirjuli, Arunachal Pradesh. The conference was

organised by the Department of Computer Science & Engineering, NERIST.

The Inaugural function was attended by Prof. P.K.Tripathy, Dean(Academic) NERIST as Chief Guest, Prof. J.K.Mandal, Department of

Computer Science & Engineering, University of Kalyani, Prof. D. K. Lobiyal, School of Computer and System Sciences, JNU New Delhi

as the guest of honors. The Chief Guest released the proceedings of the conference. Shri Moirangthem Marjit Singh, Conference Chair

NCCCIP-2015 presented a detailed report on the conference.

Prof. J.K.Mandal and Prof.D. K. Lobiyal delivered keynote addresses on 2nd May 2015. It was followed by paper presentations. On 3rd

may there were keynote addresses delivered by Prof. S. K. Khatri, Director AIIT, Noida and Prof. P. Dutta Department of Computer

and System Sciences, Visva-Bharati University. An invited talk by Ani Taggu, RGU Doimukh was followed by paper presentation. The

conference was attended by faculty and students of NERIST including outstation participants. The closing function was attended by

Prof. M. F. Hussain, Dean(Administration),NERIST as the Chief Guest and presented the certifi cates to the paper presenters.

From Chapters and Divisions »

Report of Regional Student Convention 2015 Region-IIComputer Society of India, Region-II and Computer Society of India, Kolkata Chapter has organized the REGIONAL STUDENT

CONVENTION 2015 REGION-II ( East / North-East States) on 14th March, 2015 in collaboration with Narula Institute of Technology

at Agarpara.The Regional Student Convention 2015 aimed to bring the students together into a common platform with an intention

to achieve some demanding objectives. Firstly, to expose the students to the

concepts of academic writing, research presentation, critical thinking. This

regional convention enabled formal environment for students to meet each

other, share their ideas and get feedback – so that they can form a network of young

researchers. Student paper presentation, Quiz contest were the main focus of the

convention. Prof.(Dr.) A.K. Bagchi, (Retired Professor, ISI Kolkata), Delivered Key

Note Speech. Dignitaries attended the convention are, Dr. S. Raza (Chairman,

Patna Chapter), Dr. A.K. Nayek, (Director, IIBM Patna), Chief Guest, for the

Convention, Dr. A.K. Bagchi, (Retired Professor, ISI Kolkata), Delivered Key

Note Speech, Dr. J.K. Mandal, (Regional Student Convener), Former Dean & Prof.

Kalyani University, Mr. D.P. Sinha, (RVP- II), Dr. D.D. Sinha, (Fellow of CSI), Prof.

CU, Dr. P. Paul, (Vice-Chairman, CSI Kolkata), Prof. ISI Kolkata, Ms. Somdutta Chakraborty, State Student Coordinator, West Bengal,

Mr. Subir Lahiri, (Secratory CSI Kolkata), Mr. Aniruddha Nag. Mr. Sumantra Bhattacharyya, JIS College of Engineering. A total of 17

papers were selected and 20 paper presenter from 5 diff erent colleges presented the paper on that day More than sixty participants

from various part of East and North East Part of the country participated in the convention. The regional meet also took place on the

same day, in diff erent location on the same campus.

Like Computer Society of India on Facebook: https://www.facebook.com/CSIHQ for updates.

RVPs, Divisional Chairpersons, Chapter OBs and Student branch coordinators may send the activity reports, Photographs, or any

other information to update on the page to [email protected] .

Congratulations!!!

Dr. G. Satheesh Reddy, Honorary Fellow of Computer Society of India has been appointed

as Scientifi c Adviser to Raksha Mantri.

Page 46: CSIC 2015( June )

CSI Communications | June 2015 | 46 www.csi-india.org

Report from Division – I and Region – I By Prof. M. N. Hoda, Chairman, Division – I, Computer Society of India

IEEE Delhi Section, Computer Society of India Division – I and Region – I, ISTE Delhi Section and IETE Delhi Centre collaborated together for

an evening session held on “Technological Needs for Future Human Space Missions” at Bharati Vidyapeeth’s Institute of Computer Applications

and Management (BVICAM), New Delhi on the occasion of 40th Anniversary celebration of IEEE Delhi Section on 13th May, 2015. Dr. Kumar Krishen, Fellow, SDPS and Fellow, IETE NASA Johnson Space Centre, 2101 NASA Parkway, USA, was invited to deliver the talk.

Welcome address was delivered by Prof. M.N. Hoda, Director, BVICAM, New Delhi and Chairman, Division – I, CSI. Prof. Mini S. Thomas,

Chairman, IEEE Delhi Section briefed the audience about the genesis of the event and introduced the speaker to the audience. Dr. Kumar

Krishen explored various facts of Milky Way Galaxy along with the over arching constraints of space systems, during his knowledgeable

session. He discussed how the earth is changing dramatically since its beginning and how the survival of life on earth is aff ected by the na tural

disasters like Volcanoes, Earthquakes, Tsunamis, Tornados and Platelet Motions. He also sensitized the audience with the missions of various

nations along with their objectives such as Russian Lunar Mission, China manned moon mission, Moon exploration mission. He also briefed

the Japan’s mission to plan base station on moon by sending humanoid robots to the moon by 2020. The informative session concluded with

a question answer session with the audience followed by the Inauguration of Collabratec, research collaboration and networking platform of IEEE, by Mr. Daman Dev Sood, IEEE Delhi Section. Dr. N. K. Gupta, Chairman, ISTE Delhi Section, during his talk, mentioned that the occasion

is historical into the nature that all the fellow professional societies have come together to celebrate the 40th anniversary of IEEE Delhi Section

with such a knowledgeable evening session. The event ended with the vote of thanks by Mr. Shiv Kumar, Regional Vice President, Region (I),

Computer Society of India. The entire event was anchored by Mrs. Ritika Wason, Assistant Professor, BVICAM, New Delhi and co-ordinated

by Dr. Anupam Baliyan, Associate Professor, BVICAM, New Delhi. It was well attended by over 80 corporate members of CSI, IETE, IEEE and

ISTE and they also got ample opportunity of networking at the dinner.

Report from Patna ChapterAn one day National seminar

was organized by Indian Institute

of Business Management, Patna

in technical collaboration with

Computer Society of India Patna

Chapter on the theme “Role of Science Education in National Development” on 11th April 2015

at IIBM Auditorium, Patna.

The seminar was inaugurated

by the General President of Indian

Science Congress Association

Dr. A.K. Saxena in the presence

of Dr. Arun Kumar, General

Secretary, Dr. Vijay Laxmi Saxena,

Former General Secretary & Dr.

Dhyanendra Kumar, Treasurer

of Indian Science Congress

Association, Dr. Ranjit Kumar

Verma, Pro V.C., Patna University,

Prof. U.K. Singh, Fellow, CSI &

Director General, IIBM & Dr. Zakir

Husain Institute, Prof. A.K. Nayak,

Former National Chairman Div-III (Application) of CSI, & Mr. Rohit Singh, Chapter Patron of CSI Patna Chapter.

One technical session was organized on the theme IT Education in National Development in which the technical papers were presented

by Mr. Shams Raza, Immediate Past Chairman of CSI, Patna Chapter, Prof. Alok Kumar, Dean of IIBM, Patna, Mr. Shailesh Kr. Shrivastava,

Director, NIC, Patna, Prof. Ganesh Pandey, Dy. Director, Dr. Zakir Husain Institute & Majoj Kumar Mishra, Amity Business School, Patna.

Prof. A.K. Nayak delivered the welcome address, whereas Prof. U.K. Singh, the Member of Nomination Committee, CSI Presided over the

function. Mr. Purnendu Narayan, Secretary, CSI, Patna Chapter proposed the Vote of thanks.

Dr. Dhyanendra Kumar, Prof. A.K. Nayak, Prof. U.K. Singh, Dr. Vijaylaxmi Saxena, Dr. A.K. Saxea, Dr. Arun Kumar, Dr. Ranjit Kumar Verma, Mr. Rohit Singh

Page 47: CSIC 2015( June )

CSI Communications | June 2015 | 47

Workshop Organized by Computer Society of India, Noida Chapter and IMS-Noida on 5th May on ‘Net Neutrality’, Social and Economic Perspective

In his inaugural address, Shri Anuj Agarwal, Chairman, CSI, Noida Chapter mentioned that the

internet is the only non discriminatory medium and platform for the world where no body is

discriminated based on nationality, color of the skin, caste creed, birth origin, social or economic

status, sex or any other thing. Internet is not governed by any particular government or company

and it is guided by we the people, ‘the global citizen’.

Dr. Arvind Gupta, Key Note Speaker and National Head of the IT Cell of BJP, described the net

neutrality and also described the fi ne diff erence between ‘freedom on internet’ and ‘free internet’.

Mr. Rajan Mathews, Director General of Cellular Operators Association of India and a

renowned telecom expert, maintained that all telecom companies support Net Neutrality.

Mr. Gopal Agarwal, BJP Economic Cell and a senior active civil society member, mentioned in his address that one has to see the

entire debate from the consumer perspective. The consumer wants reliable services at an aff ordable price. He also put the case of

calculation the actual cost of telecom networks and operations because telecom is a public good.He presided over the session and

mentioned that the debate should remain focused on the substance and should not get politicized and in the current political scenario,

many people who may not know the nitty gritty of the subject may try to jeopardize a healthy debate. He also mentioned about

the social and economic impact of net neutrality. Shri Deepak Sahu, Editor-in-Chief, VarIndia.com supported net neutrality and put

forward the social perspective on the need to have net neutrality. He was categorical and clear that net neutrality can not be diluted.

Dr. Kamaljeet Singh, Director IMS, summed up the discussion and presented a vote of thanks.

Report on Information Technology by Nashik ChapterThe Nashik chapter celebrated its annual event Information Technology Day on 16th March 2015. On this occasion a program full of

activities like lectures, seminars, felicitations and awards to academic achievements and competition winners was arranged. The

program was conducted at Shankaracharya Kurtakoti auditorium. Industrialist, representatives of professional organisations, IT

professionals, Principals of colleges, students participated in the program with lot of enthusiasm.

The release was followed by felicitation of IT professionals namely

Shri Piyush Somani MD and CEO ESDS softwares, Suchit Tiwari

chairman of Cognifront, Joy Aloor CEO Fox controls, Rohit Kulkarni

of Neumann systems, Rajiv Papneja from ESDS, Gunwant Battase of

Nebula studios, Pramod Gaikwad of Silicon Valley, Mrs. Bhagyashree

Kenge of Cyberedge systems and Ruturaj Kohok of Nethority. Shri

Chintawar credited everyone from CSI for wonderful journey of fi fty

years and achieving great success. He was amazed by the fact that

the society is managed by volunteers and making a substantial impact

for IT professional and government initiatives like e-governance.

The chapter on this occasion of Golden Jubilee brought a special editi on of newslett er ACCESS

Report from Udaipur ChapterComputer Society of India - Udaipur chapter celebrated World Telecommunications and Information Society Day on 17 May 2015

in association with The Institution of Engineers (India) Udaipur local

chapter. Prof. S. S. Sarangdevot, VC, Rajasthan Vidyapeeth University,

Udaipur was Chief guest and Prof. Vipin Tyagi, Jaypee University of

Engineering and Technology, Guna - MP, Regional Vice President

- Region 3 of Computer Society of India was guest speaker on the

occasion. Er. A. S. Choondwat, Chairman, IEI, Udaipur Local Chapter,

Er. M. K. Mathur, Hon. Secy., IEI, , Udaipur Local Chapter, Dr. Y. C.

Bhatt, Chairman, CSI, Udaipur Chapter, Er. Amit Joshi, Hony. Secy.,

CSI Udaipur chapter were present on the occasion.

Page 48: CSIC 2015( June )

CSI Communications | June 2015 | 48 www.csi-india.org

SEARCC Executive Council Meeting- 27th April, 2015-SingaporeProf. Bipin V. Mehta, President, Computer Society of India attended SEARCC Executive Council Meeting on 27 April 2015, at

Singapore, as CSI is member of South East Asia Regional Computer

Confederation (SEARCC). Prof. Bipin Mehta in his presentation

gave the overview of the Computer Society of India.

In APEC Telecommunications and Information Working Group

Strategic Action Plan 2016-2020 following priority areas are

identifi ed:

1. Develop and support ICT Innovation

2. Promote a secure, resilient and trusted ICT environment

3. Promote regional economic integration

4. Enhance the Digital Economy and the Internet Economy

5. Strengthen cooperation

Mr. Yasas Abeywickrama showed interest to collaborate with CSI

for YITP Awards. He also briefed about SEARCC School Competition

which is being hosted by Sri Lanka Computer Society and requested

members to send teams.

CSI is the largest society in SEARCC, in which Dr. F.C. Kohli has taken

initiative to form this group in the South East Asia and contributed in

the activities of SEARCC. CSI can play a major role in SEARCC and

its various initiatives.

(L to R) Mr. Yasas V. Abeywickrama, Vice President - Computer Society of Sri Lanka, Mr. Kunaseelan Rajaretnam, (Council Member, Malaysian Nati onal Computer Confederati on), Mr. Mick Nades (President, Papua New Guinea Computer Society), Prof. Bipin V. Mehta (President, Computer Society of India), Dr. Dayan Rajapakse (President Computer Society of Sri Lanka & President, SEARCC)

Report from CSI Vadodara Chapter (Region III)

Babaria Institute of Technology, Department of CSE organized a One

day Workshop on “Advanced C using Qt” exclusively arranged for I year

CSI Student members on 29th April, 2015 in which a total of 45 students

actively participated.

By attending the workshop, students were exposed to New Open source

software for developing applications like Notepad, calculator etc. during

live hands-on-session.

Parti cipants with Speaker Prof. Atul Saurabh & Prof. Ketan B. Rathod

Report from Vellore Chapter

CSI Vellore Chapter and Student Branch organized a 48 hours Media and

Development hack fest called “Code Play” from 23rd to 26th April 2015,

the start-up industry CEO from Zophop, CarWale and Muto Technologies

attended the event, where 300 CSI volunteers participated in the event,

around 25 students got internship in above companies. The event was

organized by Prof. Shalini.L, Prof. Govinda. K and Prof. Jagadeesh.G.

Interacti on between CSI Student volunteers and CEO’s

Page 49: CSIC 2015( June )

CSI Communications | June 2015 | 49

From Student Branches »(REGION - I) (REGION -III )

DRONACHARYA COLLEGE OF ENGINEERING, GURGAON SRI AUROBINDO INSTITUTE OF TECHNOLOGY, INDORE

26 & 27-3-2015 – Chief Guest & speakers during two days Technical event

on Drontech 2K15 23-4-2015 – During programming contest on Code Scratch

(REGION-III) (REGION-III)G H PATEL COLLEGE OF ENGINEERING & TECHNOLOGY, VALLABH VIDYANAGAR SAGAR INSTITUTE OF SCIENCE TECHNOLOGY & RESEARCH, BHOPAL

27-3-2015 – During Expert Talk Detecting disease spread in a Geographic

location - A big data approach

16 to 18-4-2015 – During workshop  on Web and E-commerce Site

Development

(REGION-III) (REGION-IV)TRUBA COLLEGE OF ENGINEERING & TECHNOLOGY, INDORE SHRI SHANKARACHARYA INSTITUTE OF PROFESSIONAL MANAGEMENT &

TECHNOLOGY, RAIPUR

17 & 18-3-2015 – During Two Days National Workshop on Impact of

Cloud technology in education

30-3-2015 – Winners and organizers during State Level Student

Convention

(REGION-V) (REGION-V)GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN, MYSURU KLE DR M S SHESHGIRI COLLEGE OF ENGINEERING & TECH, BELGAUM

8-5-2015 – Student during the seminar on Awareness about the benefi ts

of GATE, PSU’s & IES9-3-2015 – During HACKATHON – Overnight Coding

Page 50: CSIC 2015( June )

CSI Communications | June 2015 | 50 www.csi-india.org

(REGION-V) (REGION-V)NMAM INSTITUTE OF TECHNOLOGY, NITTE SRINIVAS INSTITUTE OF TECHNOLOGY, MANGALORE

17-3-2015 – During one-day workshop on Ethical Hacking 4-4-2015 – During one day workshop on SDN and Data Centre

Networking

(REGION-VI) (REGION-VI)MARATHWADA INSTITUTE OF TECHNOLOGY, AURANGABAD MARATHWADA INSTITUTE OF TECHNOLOGY, AURANGABAD

17-4-2015 – During Expert talk on Career Guidance and Job Opportunities

on .NET

18-04-2015 – One-Day Workshop on Open-Source Testing Tool

Selenium. Mr. Anurang Dorle, IGATE,Pune

(REGION-VII) (REGION-VII)S A ENGINEERING COLLEGE, CHENNAI ADHIYAMAAN COLLEGE OF ENGINEERING, HOSUR

29-4-2015 – During International Conference on Futuristic Trends in

Computing & Communication

19 & 20-3-2015 – During Second National Conference on Trends in

Advanced Computing and Applications

(REGION-VII) (REGION-VII)EINSTEIN COLLEGE OF ENGINEERING, TIRUNELVELI SRM VALLIAMMAI ENGINEERING COLLEGE, KATTANKULATHUR

6-4-2015 – Dr. Velautham, Prof. Ezhilvanan, Mr. Mohan, Past President,

CSI, Dr. Ramar & Prof. Suresh Thangakrishnan during seminar on Focusing

Research and Documentation

25-4-2015 - Mr. Saravanan, Dr. Abdul Rasheed, Dr. Murugan,

Mr. Sitaraman, Mrs. Meenakshi & Mrs. Revathi during National

Conference on Recent Trends in Computational Intelligence

Page 51: CSIC 2015( June )

CSI Communications | June 2015 | 51

CSI Calendar 2015

Anirban BasuVice President, CSI & Chairman, Conf. Committee Email: [email protected]

Date Event Details & Organizers Contact Information

June 2015 event

19-20 June 2015 National Conference on Advance Trends in “Computer Science & Mathematical Techniques”, Organised by CSI Udaipur Chapter, Division IV, ACM Udaipur Chapter and Career Point University, Kota At Kota\, Rajasthan http://www.cpur.in/conference/ATCSMT15/index.php

Mr. Amit [email protected]

July 2015 events

3-4 July 2015 ICT4SD 2015 International Conference on ICT for Sustainable Development, Organized by ASSOCHAM Gujarat Chapter and Sabar Institute of Technology for Girls, GujaratKnowledge Partner : Computer Society of India At The Pride Hotel, Ahmedabad http//www.ict4sd.in

Mr. Amit [email protected]. Nisarg [email protected]

24-25 July 2015 International Conference on ICT in Health Care and E- Governance, at Sri Aurobindo Institute of Technology, Indore in associate with Computer Society of India Division III, Division IV, Indore Chapter, ACM Udaipur Chapter At Indore, India www.csi-udaipur.org/icthc-2015/

Dr. Durgesh Kumar [email protected]. A K Nayak [email protected]. Amit Josi [email protected]

Aug 2015 event

7-8 Aug 2015 ICICSE-2015: 3rd International Conference on Innovations in Computer Science & Engineering Dr. H S Saini [email protected]. D D Sarma [email protected]

Sept 2015 events

9-11 Sep 2015 Twelfth International Conference on Wireless and Optical Communications Networks WOCN2015 Next Generation Internet at M.S. Ramaiah Institute of Technology and Bangalore University, Bangalore, (in association with CSI Division IV)

Dr. Srinivasa K G [email protected] Dr. Guy Omidyar [email protected]. Durgesh Mishra [email protected]

10-12 Sep 2015 International Conference on Computer Communication and Control (IC42015) at Medicaps Group of Institutions, Indore (in association with CSI Division IV, Indore Chapter and IEEE MP Subsection)

Dr. Pramod S [email protected] Prof. Pankaj [email protected]

Oct 2015 events

9-10 Oct 2015 International Congress on Information and Communication Technology (ICICT-2014) at Udaipur ( in association with CSI Udaipur Chapter, Div-IV, SIG-WNs, SIG- e-Agriculture and ACM Udaipur Chapter) at Udaipur, India www.csi-udaipur.org/icict-2015/

Dr. Y C [email protected] Amit Joshi [email protected]

16-17 Oct 2015 6th International Conference on Transforming Healthcare with IT at Hotel Lalit Ashok, Bangalore

Mr. Suresh Kotchatill, Conference Coordinator, [email protected]

Kind Attention: Prospective Contributors of CSI CommunicationsPlease note that Cover Theme for forthcoming issue of July 2015 is planned as follows:

• July 2015 – Emerging Trends in ITArticles may be submitted in the categories such as: Cover Story, Research Front, Technical Trends and Article. Please send your contributions before 20th June 2015. The articles may be long (2500-3000 words maximum) or short (1000-1500 words) and authored in as original text. Plagiarism is strictly prohibited.

Please note that CSI Communications is a magazine for members at large and not a research journal for publishing full-fl edged research papers. Therefore, we expect articles written at the level of general audience of varied member categories. Equations and mathematical expressions within articles are not recommended and, if absolutely necessary, should be minimum. Include a brief biography of four to six lines for each author with high resolution author photograph.

Please send your articles in MS-Word and/or PDF format to Dr. Vipin Tyagi, Guest Editor , via email id [email protected] with a copy to [email protected].

(Issued on the behalf of Editorial Board CSI Communications)

Page 52: CSIC 2015( June )

Registered with Registrar of News Papers for India - RNI 31668/1978 If undelivered return to : Regd. No. MCN/222/20l5-2017 Samruddhi Venture Park, Unit No.3, Posting Date: 10 & 11 every month. Posted at Patrika Channel Mumbai-I 4th fl oor, MIDC, Marol, Andheri (E). Mumbai-400 093 Date of Publication: 10th of every month

CSI-2015 50th Golden Jubilee Annual Convention

on

Digital Life(02nd – 05th December, 2015)

Hosted by: Computer Society of India (CSI), Delhi ChapterPaper Submission Deadline: 17th August, 2015 [No Further Extension]

Paper Submission Link: http://www.csi-2015.org/PaperSubmission.php Convention Website: http://www.csi-2015.org/

Announcement and Call for PapersCSI-2015 invite full length original and unpublished research papers, based on theoretical or experimental contributions in the area of, primarily,

Computer Science and Information Technology and, generally, all interdisciplinary streams of Engineering Sciences, for presentation and publication

in the convention. CSI-2015 will be an amalgamation of the following ten diff erent Tracks organized parallel to each other, in addition to few theme

based Special Sessions:-

Track # 1: ICT Based Innovation Track # 6: Big Data Analytics

Track # 2: Next Generation Networks Track # 7: System and Architecture

Track # 3: Nature Inspired Computing Track # 8: Cyber Security

Track # 4: Real Time Language Translations Track # 9: Software Engineering

Track # 5: Sensors Track # 10: 3-D Silicon Photonics & High Performance Computing

CSI-2015 will be held at India International Centre (IIC), Lodhi Road, New Delhi (INDIA). The convention will provide a platform for technical

exchanges amongst scientists, teachers, scholars, engineers and research students from all around the world and will encompass regular paper

presentation sessions, invited talks, key note addresses, panel discussions and poster exhibitions.

Instruction for AuthorsAuthors from across diff erent parts of the world are invited to submit their papers. Authors should upload their papers online at http://www.csi-

2015.org/PaperSubmission.php. Unregistered authors should fi rst create an account on http://www.bvicam.ac.in/csi-2015/addMember.asp to log

on and upload paper. Only electronic submissions will be considered. Submissions through e-mail will not be considered.

Accepted papers shall be published by Springer in the form of Pre-Convention Proceedings, both, Soft Copy as well as Hard Copy and will be

indexed with the world’s leading indexing / abstracting / bibliographic databases.

Senior experts / researchers are also invited to submit their proposals online for organizing Special Sessions at http://www.bvicam.ac.in/csi-2015/specialSessions.asp.

Important Dates

Submission of Full Length Paper 17th August, 2015 Paper Acceptance Notifi cation 06th October, 2015

Submission of Camera Ready Copy (CRC) of the Paper

20th October, 2015 Registration Deadline (for inclusion

of Paper in Proceedings)

26th October, 2015

Detailed Call for Paper is available at http://www.csi-2015.org/CallForPapers.php. For any other query, please visit our web-portal at

http://www.csi-2015.org/home.php or write us back at [email protected]; [email protected]

Chief Patron Patron

Padmashree Dr. R. ChidambaramPrincipal Scientifi c Advisor (PSA), Govt. of India

Prof. S. V. RaghavanScientifi c Secretary, Offi ce of the PSA, Govt. of India

Chair, Programme Committee Chair, Organizing Committee Chair, Finance Committee

Prof. K. K. AggarwalChancellor, KRM University, Gurgaon and

Former Founder Vice Chancellor, GGSIP

University, New Delhi

Dr. Gulshan RaiNational Cyber Security Co-ordinator, Govt.

of India

Mr. Satish KhoslaManaging Director, Cognilytics Software and

Consulting Pvt ltd.

All correspondences, related to CSI-2015 must be addressed to

Prof. M. N. HodaSecretary, Programme Committee (PC), CSI – 2015

Director, Bharati Vidyapeeth’s

Institute of Computer Applications and Management (BVICAM)

A-4, Paschim Vihar, Rohtak Road, New Delhi – 110063 (INDIA)

Tel.:+ 91–11–25275055 Fax:+ 91–11–25255056 Mobile: +91–9212022066

E-Mail: [email protected]; [email protected]; Visit us at http://www.csi-2015.org/