csic 2015( june )
TRANSCRIPT
CSI Communications | June 2015 | 1
ISS
N 0
97
0-6
47
X
Cover StoryData Science – Data, Tools & Technologies 8
Cover StoryLeveraging Bigdata Towards Enabling Analytics Based Intrusion Detection Systems in Wireless Sensor Networks 12
ArticleThe Cardinal Sin of Data Mining and Data Science: Overfi tting 32
Security CornerArea Prone to Cyber Attacks 40
Research FrontA Novel Approach to Secure Data Transmission using Logic Gates 17
Volume No. 39 | Issue No. 3 | June 2015
CSI Communications | June 2015 | 2 www.csi-india.org
Know Your CSI
Executive Committee (2015-16/17) »President Vice-President Hon. Secretary Hon. Treasurer
Prof. Bipin V. Mehta Dr. Anirban Basu Mr. Sanjay Mohapatra Mr. R. K. [email protected] [email protected] [email protected] [email protected]
Immd. Past President
Mr. H. R. [email protected]
Nomination Committee (2015-2016)
Dr. Anil K. Saini Mr. Rajeev Kumar Singh Prof. (Dr.) U.K. Singh
Regional Vice-PresidentsRegion - I Region - II Region - III Region - IV
Mr. Shiv Kumar Mr. Devaprasanna Sinha Dr. Vipin Tyagi Mr. Hari Shankar Mishra Delhi, Punjab, Haryana, Himachal Assam, Bihar, West Bengal, Gujarat, Madhya Pradesh, Jharkhand, Chattisgarh,
Pradesh, Jammu & Kashmir, North Eastern States Rajasthan and other areas Orissa and other areas in
Uttar Pradesh, Uttaranchal and and other areas in in Western India Central & South
other areas in Northern India. East & North East India [email protected] Eastern India
[email protected] [email protected] [email protected]
Region - V Region - VI Region - VII
Mr. Raju L. Kanchibhotla Dr. Shirish S. Sane Mr. K. Govinda Karnataka and Andhra Pradesh Maharashtra and Goa Tamil Nadu, Pondicherry,
[email protected] [email protected] Andaman and Nicobar,
Kerala, Lakshadweep
Division ChairpersonsDivision-I : Hardware (2015-17) Division-II : Software (2014-16) Division-III : Applications (2015-17)
Prof. M. N. Hoda Dr. R. Nadarajan Mr. Ravikiran Mankikar [email protected] [email protected] [email protected]
Division-IV : Communications Division-V : Education and Research
(2014-16) (2015-17)
Dr. Durgesh Kumar Mishra Dr. Suresh Chandra Satapathy [email protected] [email protected]
Important links on CSI website »
Publication Committee (2015-16)
Dr. A.K. Nayak Chairman
Prof. M.N. Hoda Member
Dr. R. Nadarajan Member
Mr. Ravikiran Mankikar Member
Dr. Durgesh Kumar Mishra Member
Dr. Suresh Chandra Satapathy Member
Dr. Vipin Tyagi Member
Dr. R.N. Satapathy Member
Important Contact Details »For queries, correspondence regarding Membership, contact [email protected]
About CSI http://www.csi-india.org/about-csiStructure and Orgnisation http://www.csi-india.org/web/guest/structureandorganisationExecutive Committee http://www.csi-india.org/executive-committeeNomination Committee http://www.csi-india.org/web/guest/nominations-committeeStatutory Committees http://www.csi-india.org/web/guest/statutory-committeesWho's Who http://www.csi-india.org/web/guest/who-s-whoCSI Fellows http://www.csi-india.org/web/guest/csi-fellowsNational, Regional & State http://www.csi-india.org/web/guest/104Student Coordinators Collaborations http://www.csi-india.org/web/guest/collaborationsDistinguished Speakers http://www.csi-india.org/distinguished-speakersDivisions http://www.csi-india.org/web/guest/divisionsRegions http://www.csi-india.org/web/guest/regions1Chapters http://www.csi-india.org/web/guest/chaptersPolicy Guidelines http://www.csi-india.org/web/guest/policy-guidelinesStudent Branches http://www.csi-india.org/web/guest/student-branchesMembership Services http://www.csi-india.org/web/guest/membership-serviceUpcoming Events http://www.csi-india.org/web/guest/upcoming-eventsPublications http://www.csi-india.org/web/guest/publicationsStudent's Corner http://www.csi-india.org/web/education-directorate/student-s-cornerCSI Awards http://www.csi-india.org/web/guest/csi-awardsCSI Certifi cation http://www.csi-india.org/web/guest/csi-certifi cationUpcoming Webinars http://www.csi-india.org/web/guest/upcoming-webinarsAbout Membership http://www.csi-india.org/web/guest/about-membershipWhy Join CSI http://www.csi-india.org/why-join-csiMembership Benefi ts http://www.csi-india.org/membership-benefi tsBABA Scheme http://www.csi-india.org/membership-schemes-baba-schemeSpecial Interest Groups http://www.csi-india.org/special-interest-groups
Membership Subscription Fees http://www.csi-india.org/fee-structureMembership and Grades http://www.csi-india.org/web/guest/174Institutional Membership http://www.csi-india.org /web/guest/institiutional-
membershipBecome a member http://www.csi-india.org/web/guest/become-a-memberUpgrading and Renewing Membership http://www.csi-india.org/web/guest/183Download Forms http://www.csi-india.org/web/guest/downloadformsMembership Eligibility http://www.csi-india.org/web/guest/membership-eligibilityCode of Ethics http://www.csi-india.org/web/guest/code-of-ethicsFrom the President Desk http://www.csi-india.org/web/guest/president-s-deskCSI Communications (PDF Version) http://www.csi-india.org/web/guest/csi-communicationsCSI Communications (HTML Version) http://www.csi-india.org/web/guest/csi-communications-
html-versionCSI Journal of Computing http://www.csi-india.org/web/guest/journalCSI eNewsletter http://www.csi-india.org/web/guest/enewsletterCSIC Chapters SBs News http://www.csi-india.org/csic-chapters-sbs-newsEducation Directorate http://www.csi-india.org/web/education-directorate/homeNational Students Coordinator http://www.csi- india .org /web/national-students-
coordinators/homeAwards and Honors http://www.csi-india.org/web/guest/251eGovernance Awards http://www.csi-india.org/web/guest/e-governanceawardsIT Excellence Awards http://www.csi-india.org/web/guest/csiitexcellenceawardsYITP Awards http://www.csi-india.org/web/guest/csiyitp-awardsCSI Service Awards http://www.csi-india.org/web/guest/csi-service-awardsAcademic Excellence Awards http://www.csi-india.org/web/guest/academic-excellence-
awardsContact us http://www.csi-india.org/web/guest/contact-us
CSI Communications | June 2015 | 3
ContentsVolume No. 39 • Issue No. 3 • June 2015
CSI Communications
Please note:
CSI Communications is published by Computer
Society of India, a non-profi t organization.
Views and opinions expressed in the CSI
Communications are those of individual authors,
contributors and advertisers and they may diff er
from policies and offi cial statements of CSI. These
should not be construed as legal or professional
advice. The CSI, the publisher, the editors and the
contributors are not responsible for any decisions
taken by readers on the basis of these views and
opinions.
Although every care is being taken to ensure
genuineness of the writings in this publication,
CSI Communications does not attest to the
originality of the respective authors’ content.
© 2012 CSI. All rights reserved.
Instructors are permitted to photocopy isolated
articles for non-commercial classroom use
without fee. For any other copying, reprint or
republication, permission must be obtained in
writing from the Society. Copying for other than
personal use or internal reference, or of articles
or columns not owned by the Society without
explicit permission of the Society or the copyright
owner is strictly prohibited.
Printed and Published by Suchit Shrikrishna Gogwekar on Behalf of Computer Soceity of India, Printed at G.P.Off set Pvt Ltd. Unit No.81, Plot No.14, Marol Co-Op. Industrial Estate, off
Andheri Kurla Road, Andheri (East), Mumbai 400059 and Published from Computer Society of India, Samruddhi Venture Park, Unit No. 3, 4th Floor, Marol Industrial Area Andheri
(East), Mumbai 400093. Editor: A K NayakTel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Printed at GP Off set Pvt. Ltd., Mumbai 400 059.
Chief EditorDr. A K Nayak
Guest EditorDr. Vipin Tyagi
Published byExecutive Secretary
Mr. Suchit Gogwekar
For Computer Society of India
Design, Print and Dispatch byCyberMedia Services Limited
PLUSBrain TeaserDr. Durgesh Kumar Mishra
43
Reports 45
Student Branches News 49
Cover Story
8 Data Science – Data, Tools &
Technologies
Hardik A Gohel
12 Leveraging Bigdata Towards Enabling
Analytics Based Intrusion Detection
Systems in Wireless Sensor Networks
Pritee Parwekar and Suresh Chandra Satapathy
Research Front
17 A Novel Approach to Secure Data
Transmission using Logic Gates
Rohit Rastogi, Rishabh Mishra, Sanyukta
Sharma, Pratyush Arya and Anshika Nigam
20 An Effi cient Cluster-based Multi-Keyword
Search on Encrypted Cloud Data
Rohit Handa and Rama Krishna Challa
28 A Collaborative Approach for Malicious
Node Detection in Ad hoc Wireless
Networks
Shrikant V Sonekar and Manali Kshirsagar
Article
32 The Cardinal Sin of Data Mining and
Data Science: Overfi tting
Gregory Piatetsky-Shapiro
and Anmol Rajpurohit
Practitioner Workbench
34 Programming.Tips() »
Salting Passwords
Rahul Bhati
35 Programming.Learn("R") » Cluster Analysis in R Language
Ghanshaym Raghuwanshi
Case Study
36 Data Quality Perspective on Retail
ERP Implementation : A Case Study
Dinesh Mohata
Security Corner
40 Area Prone to Cyber Attacks
Abha Thakral, Nitin Rakesh and
Abhinav Gupta
Complaints of non-receipt of CSIC may be communicated to Mr. Ashish Pawar, 022-29261724, [email protected], indicating
name, membership no, validity of membership (other than life members), complete postal address with pin code and contact no.
CSI Communications | June 2015 | 4 www.csi-india.org
EditorialProf. A.K. NayakChief Editor
Dear Fellow CSI Members,
In the last few years, Data is increasing at a very high rate. Data being available since centuries in various forms is being digitized. There
has been an explosion in the amount of data that’s available. Now the problem is not about getting the data, the problem is what and how
to use it eff ectively. The data to be processed is not only the own data of an organization but all of the data that is available and relevant.
Using this huge amount of data eff ectively requires something diff erent from traditional statistics. Processing of this data requires
distinctive new skills and tools. It requires high performance computing, data processing, development and management of databases,
data mining and warehousing, mathematical representations, statistical modelling and analysis, and data visualization with the goal of
extracting information from the data collected for various applications. Data Science has emerged as a new area that combines all these
expertise intersecting the fi elds of social science and statistics, information and computer science, and design.
Our ability to process this voluminous data is limited by the lack of expertise. The databases are diffi cult to process using traditional
tools and to represent using standard graphics software. The data is also more heterogeneous in comparison to previous data. Digitized
text, audio, and visual content, like sensor and weblog data, is typically messy, incomplete, and unstructured; and frequently must be
processed with other data to be useful.
Recognizing the importance of Data Science in processing of voluminous data and to discuss various aspects of Data Science, the
publication committee of Computer Society of India selected the theme of CSI Communications (The Knowledge Digest for IT
Community) June issue as "DATA SCIENCE".
In cover story of this issue, "Data Science- Data, Tools and Technology" by H. A. Gohel, an overview of Data Science is given. We have
given an overview of National Data Sharing and Accessibility Policy (NDSAP) and Big data initiative of Govt. of India. P. Parewekar and
S. C. Satapathy have proposed a hybrid solution to utilize the capabilities of Bigdata across networks with an ability to detect and fi ght
against intrusions. In research front, we have included three articles. In "A Novel Approach to Secure Data Transmission using Logic
Gates" by R. Rastogi and his students have proposed a technique to transmit data in encrypted form. Another article by R. Handa and
R.K. Challa has given an effi cient search technique based on multi keywords to search data in the cloud. S. Sonekar and M. K. Shirsagar
have given "A Collaborative Approach for Malicious Node Detection in ad-hoc Wireless Networks".
An article by A. Thakral, N. Rakesh and A. Gupta has provided reasons of cyber security related vulnerabilities in Indian context and has
given certain measures to tackle these.
Finally, a case study "Data Quality Perspective on Retail ERP Implementation" by D. Mohata gives issues and challenges faced in
processing of data in implementing a retail ERP solution.
This issue contains the exclusive interview with Mr. Raj Saraf, Chairman of Zenith Computers and Zenith Infotech to get his views on the
Indian IT scenario and role of CSI in present context .
This issue also contains Practitioner's Workbench, Crosswords, CSI reports and news from divisions, chapters, student branches, and
Calendar of events.
We are thankful to Gregory Shapiro and A. Rajpurohit for permitting to share their views on overfi tting in data science.
The publication committee express it’s deep condolences on the sad demise of Late Hemant Sonawala, the Past President, fellow and life
time achievement awardee who was treated as one of the father fi gure in indian IT industry. We request the fellows and senior members
who are personally known to him to express their deep concern and communicate the same in [email protected].
I take this opportunity to express my thanks to the Guest Editor Dr. Vipin Tyagi, who agreed to bring out this issue. On behalf of publication
committee, I wish to express my sincere gratitude to all authors and reviewers for their support and signifi cant contribution in this issue.
I hope this issue will be successful in introducing various aspects of Data Science to IT community.
Finally, I look forward to receive the feedback, contribution, criticism, suggestions from our esteemed members and readers
Prof. A.K. Nayak
Chief Editor
CSI Communications | June 2015 | 5
CSI Communications, May 2015 issue with a theme “Cyber
Security” is appreciated by members at large. The Guest Editor
Dr. Vipin Tyagi, RVP3 has put his sincere eff orts in compiling
informative articles on Cloud Security; Cyber Security; Security,
Privacy and Trust in Social Networking sites etc. Today it is a
need of the day to educate professionals and citizens about use
and abuse of Cyber World.
Recently the meeting of the Executive Committee of CSI
was held at Kolkata, in which many decisions regarding the
functioning of CSI were taken. Website for CSI 2015 Convention,
which is being hosted by Delhi Chapter, is up. The Regional Vice
Presidents and Divisional Chairpersons gave overview of the
activities conducted by the chapters and also deliberated on
activities planned in the region and in the division. The conveners
for IT Excellence Award and YITP Awards for 2015 are Shri Raj
Saraf and Dr. Nilesh Modi respectively. Both the awards are very
popular as every year large number of nominations are received.
Mr. H. R. Mohan, Chairman, Awards Committee will send call
for Nominations for CSI Service Awards in due course of time.
Dr. Suresh Chandra Satapathy, Chairman, Division V (Education
& Research) will initiate CSI Research initiatives. Dr. A.K.Nayak,
Chairman, Publications Committee briefed about the various
initiatives in publications.
ExecCom nominated Regional Student Co-ordinators (RSC)
and State Student Co-ordinators (SSC) from the nominations
received for these positions.
I am happy to note that School of Computer Science at VIT
University has started off ering 2 Credit Course as CBCS for the
student members of CSI for co-extracurricular activities by them.
The activities are supervised by the faculty members. This is the
very good initiative by the university to encourage students to
join CSI as student member.
I had an opportunity to meet Dr. Dilip Kumar Sharma,
Chairman; Managing Committee Members of Mathura Chapter
and Prof. D. S. Chauhan, Vice Chancellor, GLA University,
Mathura. The activities conducted by the chapter with the
support of Dr. D.S. Chauhan are impressive. I also met faculty
members of Hindustan Institute of Management & Computer
Studies, Mathura. There is an active student branch in the
campus, which is conducting many technical activities.
CSI SIG -eGovernance has announced thirteenth
anniversary of the prestigious CSI Nihilent eGovernance Awards.
The process adopted to fi le the nomination is paperless. The
nominations can be fi led through the portal. The awards will be
given away to the winners during CSI 2015 at Delhi.
CSI Young Talent Search in Computer Programming for
the selection of teams to represent India at SEARCC (South
East Asia Regional Computer Confederation) International
Schools’ Software Competition - 2015 is announced. The top two
teams at the National level will represent India at the SEARCC
International Schools’ Software Competition 2015 (ISSC 2015)
to be held at Colombo, Sri Lanka between 9th and 11th October
2015. This is a good opportunity for schools to nominate their
teams in this competition.
The advanced application of IT in Agriculture is becoming
more popular due to usefulness to the farmers. Agriculture is
one of key sectors which aff ect the life. Cloud Computing, Social
Media, Image Processing etc. will help to improve GDP as well
happiness of citizens. The technology will benefi t farmers and
food processing units. It will bring parity in prices and quality
products to consumer. The role of IT education in Agriculture is
important and CSI can take the lead in this area.
It is a sad moment for all of us to know about the sad
demise of Shri Hemant Sonawala (78), Past President, Fellow
and recipient of Life Time Achievement Award of Computer
Society of India, on 30th May 2015 at Mumbai. He was a very
lively person, worked tirelessly all his life for the profession,
CSI, Digital and Hinditron. His contribution towards Computer
Society of India will be remembered for a long time. His demise is
a huge loss to the society and IT fraternity. Let us pray almighty
to rest his soul in eternal peace.
With best wishes,
Bipin V. Mehta
President’s Message Prof. Bipin Mehta
From : President’s Desk:: [email protected] : President's MessageDate : 1st June 2015
Dear Members,
CSI Communications | June 2015 | 6 www.csi-india.org
Our journey for making CSI, a professionally run society is
continuing unabated inspite of insinuations and casting of
aspersions by some senior members. We know our goal and we
are following Swami Vivekananda’s words “Arise, awake and stop
not till your goal is reached”.
We have already taken some bold decisions to bring in systems
and processes in CSI to help our Members.
1. When the CSI web site was down and the vendor Leo
Technosoft started demanding more payment, we conducted
our investigations on the reasons for such demands and on
payments made so far on development of CSI web site.
We found out (as detailed by Hony. Secretary) that since
2010, CSI has paid Rs.68,46,595.00 for development of
CSI Knowledge Portal. This includes payments to various
agencies: Rs.1517728.00 to Mindcraft Software Pvt. Ltd.,
Rs. 36,121,93.00 to Leo Technosoft Pvt Ltd., Rs. 16,166,74
to Consultant Mr. Mohan Datar and Rs.1,00,000.00 to
Ms. Shailaja Adurthi.
Unfortunately details of terms of payment, agreements,
deliverables etc. are not available in CSI offi ce and in Minutes
of ExecCom meetings. Further investigation is needed to
ascertain the circumstances which prompted the Presidents
of the relevant periods to approve such payments.
I was wondering why even after payments of such staggering
amounts, there have been so many complaints about the
web site. Why so many of our members who accessed the
web site complained about diff erent aspects of its working?
How many prospective members did we lose due to non
functioning of CSI Web site? How many members could not
edit their personal data due to erroneous behavior of the CSI
web site? The numbers are countless.
The vendor Leo Technosoft demanded more as they were
not happy with the amount of Rs. 36,121,93.00 paid from
July 2011 till September 2014. As the Vice President, Hony.
Secretary and Treasurer decided not to yield to their unjust
demands, the services were stopped.
Alternative plans have been made to develop a new portal
from scratch and the process has started with minimum
possible expense.
2. We have made signifi cant changes to our mouthpiece CSI
Communications. A new set of editors will be announced
soon and we are streamlining the process of publishing
reports on CSI activities. The guidelines are as follows:
Reports on Student Branch activities should be sent to: [email protected]
The report should be brief within 50 words highlighting the
achievements and with a photograph with a resolution higher
than 300 DPI.
Reports on Chapter Activities should be sent to: [email protected]
The report should be within 100 words highlighting the
objective and clearly discussing the benefi ts to CSI Members.
It should be accompanied by a photograph with a resolution
higher than 300 DPI.
Conference/ Seminar reports should be sent by Div Chairs and RVPs to [email protected]
Again the report should be brief within 150 words highlighting
the objective and clearly discussing the benefi ts to CSI
Members. It should be accompanied by a photograph with a
resolution higher than 300 DPI.
Members may note that we are trying to accommodate as
many reports as possible within the available space and are
requested to keep the guidelines in mind. There is necessity
to print good quality photographs and care need to be taken
on this.
I am glad that Dr. Vipin Tyagi, VP, Region III has agreed to coordinating publishing reports of these activities. He can
be contacted at [email protected] for any issues.
3. We have got very good response to our Call for Regional and
State Student Coordinators. The list of Coordinators is being
fi nalized and will be announced in June 2015.
4. Our resolve to off er various training programs to our members
has gathered momentum. A Two days Training program on
Embedded System Design using MSP 430 is being organized
at CSI- Education Directorate Chennai on June 13 and 14 in
association with NIELIT of Govt. of India.
CSI-Education Directorate is close to fi nalizing the agreement
for PMI Certifi cation for our Members.
5. Dr Suresh Satapathy, Division Chair, Education and Research
has been requested to prepare a list of Conferences happening
world wide whose Call for Papers will be of interest to our
members. The list will be included in CSIC soon.
Overall things are improving in CSI. The new ExecCom
believes in transparency, effi ciency, prudence and has zero
tolerance for fi nancial irregularities. We will continue our
Journey in this direction and are determined to improve
things with the cooperation of all our Members.
Best wishes,
Dr Anirban Basu
Vice President’s
Column
Prof. Dr. Anirban Basu, Vice President
CSI Communications | June 2015 | 7
Meeting with Mr. Raj Saraf, Chairman of Zenith Computers and Zenith Infotech
CSI Vice President, Dr. Anirban Basu and Hony. Secretary, Mr. Sanjay Mohapatra along with Mr. Ravikiran Mankikar, Chairman Division III, met Mr. Raj Saraf, Chairman of Zenith Computers and Zenith Infotech in his offi ce in Mumbai on May 4, 2015. Mr. Saraf has been closely associated with the Computer Society of India and has been a well wisher of CSI for long. As Mr. Saraf has been the doyen of the IT industry, Dr Anirban Basu and Mr. Sanjay Mohapatra felt that his views on the Indian IT scenario and role of CSI in the present scenario will be very relevant and interesting to CSI Members. The following is the summary of the discussions:
What is the present IT scene in the country?The present scenario of Hardware manufac turing is very bad as lot was expected from the budget, however nothing came which would have encouraged the domestic manufacturing to take it up. The import of hardware will continue like before as to import is cheaper than manufacture in India. The only possible item which will see local manufacturer may be Mobile phones but certainly not Desktop, Laptop, Thin client etc. In terms of Software the export will continue with a normal 10 to 15% growth.
What do you see in the near future in terms of technology, growth of IT industry and employment of Indian IT professionals?In terms of technology we see lot of growth happening as lot of companies shifted their R & D into India and the demand within India for high calibre IT professionals by MNC’s, large Indian companies are growing and the one who has the highest demand is the E-Commerce companies. I do not see any growth of IT professionals at the low level of Software companies requirements due to more & more automation in the Software sector.
What are plans of Zenith Group in terms of technology development and creating more job opportunities?In terms of Zenith Group the company is looking into areas of Cloud Technology and as it has been tradition of the Zenith group to go into the latest fi eld of IT. We feel that entire IT infrastructure requirement of user Private or Public will move to the Cloud. We feel more than 75% of the infrastructure will move to Infrastructure as a Service model. The company will be creating more jobs as with the latest Cloud technology the employees will learn more and will have better scope for developing their own skill sets.
What is your opinion of the role of CSI in Indian IT scene?CSI being the oldest body for IT professionals should try to recapture the earlier position by not allowing people to go away to other organisations like NASSCOM. Currently, the profi le of CSI is very good in education and R&D but very weak in commercial organisation and commercial IT professionals. Probably the best course would be for CSI to create a parallel organisation for the commercial market within CSI itself.
How can CSI be more eff ective?CSI to be more eff ective should be present in all forums Private or Government irrespective of the venue or location. Another thing CSI is much spread all over the country which should consolidate itself to not more than 8 to 10 locations.
How can Zenith and CSI work together in PM’s mission of Digital India?Zenith has been supporting CSI and with CSI can defi nitely work together in the PM’s mission of Digital India with biggest expertise which Zenith could off er by way of off ering Cloud Infrastructure Private or Public. In fact the whole emphasis of Digital India is based on internet and Zenith with infrastructure and CSI with diff erent applications can defi nitely partner in selective fi elds of Digital India program.
. h tt d ,,
h l
ee d n l
ee
gg ,,
Left to right: Mr. Ravikiran Mankikar, Chairman Division III, Dr. Anirban Basu, Vice President, Mr. Raj Saraf, Chairman, Zenith Computers and Mr. Sanjay Mohapatra,
Hony. Secretary on May 4, 2015
Word of CondolenceOffi ce Bearers, Executive Committee Members, Fellows and Members of Computer Society of India express deep condolences on the sad demise of Shri Hemantbhai Sonawala, Past President, Fellow and Life Time Achievement Awardee of CSI. Shri Hemantbhai, a technology entrepreneur, driven by his mission of “Better life through Technology” for the last four decades, was one of the founding fathers of the IT industry in India.
Shri Hemantbhai pursued the Indian dream at a time when few Indians were returning to India. His strong belief in India and its potential brought him back from US shores to set up business in India to lay the foundation for India’s growth story as an IT superpower. For him, IT does not just mean Information Technology, but Indian Talent. His focus for the last four decades has been to leverage India’s talent in engineering to make India a self reliant economy and to position it as a leader in the global scenario.
His contribution towards Computer Society of India, education, his philanthropic services and empowering young members in CSI, will be remembered for a long time.
His demise is not only loss to his family, but a huge loss to the society and IT fraternity.
May his soul rest in eternal peace.
CSI Communications | June 2015 | 8 www.csi-india.org
IntroductionIf there was a time machine for real, I
would like to take the readers 20 years
back before explaining the title of the
article. There’s a simple reason for it; to
demonstrate how technology today has
worked wonders for almost all the domains
including fi nance, marketing, government
agencies, forensics, education and so on.
Yes, technology it is which has taken on
the roll, but I won’t be talking about the
machines and circuits, but something that
has transformed today’s decision making
process. Let me help you peep through the
offi ce of a CEO, which depicts a typical
corporate scene of the 90s. A company
is seriously hit by its competitors, who
have captured the market. The boardroom
is fi lled with all the top management
executives of the company, struggling
to analyze the situations. The directors
have never ending questions for the
statisticians, marketing executives,
sales managers and the other top notch
professionals. What were our sales this
year? How have we performed over the
last 10 years? What are the market shares
of the product and the deviation of the
fi nancial fi gures? What amount of revenue
are we getting from our best products in
the metro regions? A high profi le meeting
leaves the CEO completely stressed, and
the managers with stringent deadlines
for answers to never ending business
questions, that are going to be really tough
to answer. The next 48 hours are not less
than a nightmare for the managers, who
spend their stressful days and sleepless
nights to fi nd answer to the questions.
They keep scurrying the fi les and sales
reports with their sleeves rolled up under
the lamplight. Piles of printed data of sales
fi gures and a computer running business
software, which seems to be just an
extension of a business calculator. They
wished if there was something to make
their tasks easy.
Yes, there is something today
invented that has helped the business
pundits to answer these tricky questions
better backed by strong evidence. This is
where technology has taken the center
stage and it boasts about the capabilities
of Business Intelligence and Data Science.
The professionals today have been well
equipped with such tools with the support
of the BI (Business Intelligence) experts.
They make the data speak for literally
everything that has happened for over the
period of years. They are able to analyze
the growth of company over the timeline,
performance of products, employees,
divisions in distributed geographical areas
and much more. Well this seems cool for
those who are market research people,
and also for those budding start-ups who
wish to analyze the market before giving
a dive into it. Well that’s not all. Data
science has helped the professionals walk
the extra mile to impress the CEO. After
answering the business questions, they
show the predictions of how the market
would move in the coming months?
What’s the best point of investment?
How would the existing product perform
in the next fi nancial year? Answer to such
important business questions can help
companies fetch billions of dollars from
the market. Such important statistics help
the decision making of the strategists
and the directors of the corporation with
their investments, product launches,
organization restructuring and much
more.
Data Science - Data vs Information vs KnowledgeThere could be three things possible when we consider a technical scenario: data, information and knowledge. Some real facts that are stored in some physical medium may be termed as data. So maybe the traffic signal information and bus GPS data is getting logged somewhere into the database. If you run a simple query on that database, you would get list of columns with unique identifiers, numbers, timestamps and some ids. This could barely make some sense to the viewer. The bus crosses a signal is a fact and the log getting created is data. So what’s the point in logging those alpha-numeric characters in a database, which grows at a rapid rate? This is a very valid question, which is going to be answered soon. The database administrator has the complete idea of the database schema, column mappings, id mappings and other technical stuff. He designs a complex query, which helps render a dataset that seems more
nominal to be read, because he has
simplified the data in a readable format.
The report now has some columns like
date, bus number, stop name, journeys
and arrival time. This is something that
makes more sense than rows of data
that seemed Greek and Latin at the
start. This is a transformed version of
captured data that provides you the
correct facts. This is the information
that helps you understand the running
of buses and at what time the next bus
would come at the mentioned stop. Just
a second, is it just for the information
to the passenger that we are running
a cluster of servers with a team of
technical brains monitoring it? Every
Monday morning, the project director of
the bus services of the city, receives an
automated mail that gives him a report
of the complete bus system and how
it performed the last week. It includes
complete information right from the
bus frequencies, stops occupancies, bus
accidents, traffic information and much
more. Now when he comes to know that
there are 3 buses running on a route that
has only 2 passengers coming every 3
hours, he can make the decisions to
change the route of some buses, so that
the stops overloaded with passengers
could benefit from the empty running
buses. This decision could be taken
because he knew about the complete
bus service system performance. Now
that’s smart isn’t it? That is because
he had the knowledge about the things
through the reports he checked in his
Monday mails.
Now there may be some non-tech
people or may be some technical ones but
not working with data, which may pop up
with a question of extracting knowledge
from data. It’s easy to understand the
logging of data and querying the database
to get information. But imagine a bus
system with 200 buses running across
the cities, with each bus sending a signal
every 5 seconds and each stop sending
signals of arrival of each bus and also of
its departure. Wouldn’t that cumulate to
crores of records in a week? Moreover,
how would one read all those records
and come up with a condensed report?
There’s when something called business
Data Science – Data, Tools & Technologies
Cover Story
Hardik A GohelAssistant Professor, AITS, Rajkot
CSI Communications | June 2015 | 9
intelligence comes to place, which has got
some intersection with data science. A lot
of people say that business intelligence
and data science are two completely
diff erent things, but factually, apart from
reporting of historical data, the other
segments of BI collaborate closely with
data science to render useful analytics. So
what is the word BI? It has been the buzz
of the IT market for quite a while. If I go by
the defi nition provided by the world’s top
consultancy giant (Gartner Inc.), “Business
intelligence (BI) is an umbrella term that
includes the applications, infrastructure
and tools, and best practices that enable
access to and analysis of information
to improve and optimize decisions and
performance.” The defi nition seems to
be tough for novices to digest, so I would
surely take you through. Getting a deeper
into the bus service system would help us
understand the mentioned jargons easily.
Say for example, the bus service system
has numerous bus operators, which design
the bus schedules, routes and the journeys
of the same. Generally, they defi ne this in
an excel spreadsheet which is easier to
maintain and review. They give this to the
bus services technical team by uploading
the fi les to the FTP servers. This is the initial
crunch of data that the team receives, for
the fi rst day of the week. Moreover, the
buses generate signals every 5 seconds, to
keep the system updated of their presence
on the streets, which also helps them in
monitoring the vehicles. A signal of every
5 seconds are amounts to some 3, 45,000
signals per day by the buses. There might
be some 200 stops in the city, that report
the system, the entry and exit of each bus
on the stop in each journey, which may
amount to some 3,00,000 signals per
day. Considering the complete system
with other logging mechanisms of tickets
and passenger counts, the database could
expect some 12, 00,000 records per
day, which account to almost a crore per
week. Remember the data given by bus
service operators in the excel fi les? How
would you include that as a part of your
analytics? Also there might be data in fl at
fi les given by bus stops about their daily
data in fl at fi les or CSVs. Even if the DBA
(Database Administrator) dealt with such
huge amount of data, how would he deal
with data which is coming from diff erent
sources like CSVs, Excels, and Satellite
data of the buses? Business Intelligence
comes to rescue with its impressive ETL
technology. The ETL or data Extract,
Transform and Load, helps integrate the
data from various data sources (which
are not generally structured) into a single
place. This helps us get data at one place
to help us start perform analysis on the
data. But still we have a problem. Billions
of records are getting logged every week
into a database. How could you store all of
them into a single database? This would
lead to a situation that can be termed
as data explosion, where it becomes
diffi cult to handle data. Moreover the
query would take years to execute if you
ran a complex query over some years
of historical data. Data Warehousing,
a component of Business Intelligence
helps us get this done. The load of the live
database is reduced by archiving historical
data in a data warehouse and letting
the live data come into the production
database. The data-warehouse is a copy
of the transactional database that is
restructured for analysis purposes (again
using ETL). Still the data-warehouse is an
OLTP database, which is not suitable for
analysis. So, heard of OLAP cubes? The
second and most important component
of BI comes to the focus with the
Analytical cubes. OLAP (Online Analytical
Processing) cubes are BI components
that store data in a compressed and
pre-aggregated form that are helpful for
running analytical queries. These cubes
are structured in a way to store data
in an optimized way. The cubes have
capacity to store historical data of several
years. Water in the well never helped
to quench thirst of the thirsty. What we
were concerned with was the knowledge
to gain insights to make decisions. So the
third front of BI off ers reporting services,
which help represent the data through
interactive reports. These reports help
us get a bird eye view of the happenings
of the business. This is the point where
BI may bid good bye and let core data
science take the center stage. Going by
the Wikipedia defi nition, “Data Science
is the extraction of knowledge from data.”
We now are well versed with knowledge
and data. But the various techniques to get
knowledge out of the stored information
or data make this a subject of interest.
Professionals working in the fi eld of data
science are termed as data scientists.
Applying data science techniques on data
varies from case to case, and it needs to
have a well-planned approach. There can
be a general plan for performing data
science over some datasets. Moreover,
the data professional must be certain
with the type of output that he wants
after performing the required analytics.
The fi eld of data science can be very
interesting as it borrows a lot of things
from a myriad of disciplines. There are
techniques, algorithms and patterns
derived from areas like Information theory,
Information technology, mathematics,
statistics, programming, probability
models, data engineering, data modeling,
pattern learning, predictive modeling
and analytics, business intelligence,
data compression and high performance
computing. The predictive modeling,
theories and models of data mining
have added a lot to data science, as they
have enhanced the predictive capabilities
of the fi eld.
The Data ExplosionWith the rise of technology and data
storage systems, we have been able to log
data into servers. Over the period of years,
the cost of storage hardware has gone
down, which has allowed IT companies
to buy numerous commodity servers and
storage systems to store data and also
to extend data storage as a service to its
clients. Content generated from analog
systems inform of sensors, mobile devices,
instruments, web logs and transactions
has been digitized and stored. It’s worth
highlighting the fact that 90% of the data
in the world today has been generated in
the past two years. Data scientists have
applied numerous techniques on this
massive data to identify patterns that has
added to the commercial and social value
of humans. This avalanche of data has
led to inception of new technologies like
that of Big Data, which help us perform
our experiments better and quicker on the
incoming data. Several high performance
computing systems like that of Hadoop
and Cluster computing have helped data
scientists explore petabytes of data in
a much quicker way than ever before. It
is an additive for a data scientist if he is
well-versed with big data technologies.
Since a single person cannot be a jack
of all trades especially in such complex
projects, generally the data analytics
team has several big data developers,
administrators and architects on board to
assist the core data scientists to expedite
the analytics process.
CSI Communications | June 2015 | 10 www.csi-india.org
Tools and TechnologiesAs far as data science is concerned, it is
not one technology that would help you
get through. Along with strong domain
expertise and analytical capabilities,
you need to have strong knowledge of
a bunch of technologies. Since we are
going to have a lot of data coming in,
the data is generally in spreadsheet or
in a RDBMS. If at all the incoming data
is in some other format, we have data
transformation tools, as explained earlier
to get it converted. When the data sets are
in excel, you would need to have strong
excel skills to transform and restructure
data as part of the data preparation
process. Similarly if you are working on an
RDMBS, SQL is something you should be
handy with. A lot of complex queries need
to be prepared to get the datasets ready.
The core part of data science comes with
the data modeling, predictive analytics
and algorithms that form the spine of
the trade. Generally there are numerous
existing libraries that help the exploration
process, so a good sound knowledge of R
statistics and Python would be good, as
you would be spending a lot of time with
their consoles. There are some statistical
models that need to be custom coded to
get the model working. Integrating these
algorithmic predictive engines into real
time or existing applications might ask
for some experience with programming
languages like Java and Ruby. The
reporting holds enormous importance
as visualizations are seen as the results
of any data science project. There are
technologies like SAS and SPSS that serve
a complete data science stack right from
data integration to report rendering. If
you have done custom coding in open
source technologies, then D3.js and
other JavaScript frameworks could help
you build stunning visualizations. The
emerging career is seen as a data scientist
who is able to handle the big data and
get results with minimum latency. For a
person to analyze terabytes of data, big
data is a recommended solution for them.
This makes Hadoop with any distribution
necessary for the developer. To perform
predictive analysis and data mining on the
Hadoop stack, Mahout is one of the most
popular technologies. So know-how of
these could work wonders for you. Also for
algorithm design and modeling purposes,
good understanding of statistics is a must,
even if you don’t come from a statistical
background or schooling. The list never
ends, and also practically it’s not possible
to learn all at once, but a programming
language, machine learning language,
RDBMS and big data technology is a must
for a data scientist these days.
ConclusionThe future of data science is certainly
green as it is one of the most in-demand
jobs in the market. The companies today
want to know more about the markets and
products before investing. Departments
today are hungry for analytics over the
tons of data stored in the data servers.
The demand-supply model is completely
imbalanced now, due to the high demand
of the data scientists and the scarcity of
those. Every company today wants to
employ these trained professionals which
could help them assist to grow better and
faster. For the same reason the corporate
are ready to shell out hefty amounts
to them. Data is growing and deriving
commercial and business value out of it is
the need of the hour.
References[1] Anonymous (2015) What is Data
Science? A New Field Emerges, Available at: http://datascience.berkeley.edu/about/what-is-data-science/ (Accessed: 10th May 2015).
[2] IBM Expert (2014) About data scientists, Available at: http://w w w - 0 1 . i b m . c o m /s o f t w a r e /d a t a / i n f o s p h e r e / d a t a -scientist/ (Accessed: 12th May 2015).
[3] Richard Rivera Adam Haverson (2014) Data Scientist vs Data Analyst, Available at:https://www.captechconsulting.com/blogs/data-scientist-vs-data-analyst (Accessed: 14th May 2015).
[4] S.N.Smith (2015) Data Science Beta, Available at:http://datascience.stackexchange.com/ (Accessed: 13th May 2015).
[5] Vijay Krishnan (2014) What algorithms do data scientists actually use at work?, Available at: http://www.quora.com/What-algorithms-do-data-scientists-actually-use-at-
work (Accessed: 14th May 2015).
n
Abo
ut th
e A
utho
r
Hardik A Gohel, an academician and researcher, is an Assistant Professor at AITS, Rajkot and life member of CSI.
His research spans Artifi cial Intelligence and Intelligent Web Applications and Services. He has 35 publications in
Journals and proceedings of national and international conferences. He is also working as a Research Consultant. He
can be reached at [email protected]
The term "data science” has existed for over thirty years and was used initially as a substitute for computer science by "Peter Naur" in 1960. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications. In 1996, members of the International Federation of Classifi cation Societies (IFCS) met in Kobe for their biennial conference. Here, for the fi rst time, the term data science was included in the title of the conference ("Data Science, classifi cation, and related methods").
In Nov. 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the University of Michigan. In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In conclusion, he coined the term "data science" and advocated that statistics be renamed data science and statisticians data scientists. Later, he presented his lecture entitled "Statistics = Data Science?" as the fi rst of his 1998 P.C. Mahalanobis Memorial Lectures. These lectures honor Prasanta Chandra Mahalanobis, an Indian scientist and statistician and founder of the Indian Statistical Institute.
In 2001, William S. Cleveland introduced data science as an independent discipline, extending the fi eld of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in Vol. 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique. In his report, Cleveland establishes six technical areas which he believed to encompass the fi eld of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.
In 2008 D.J. Patil and Jeff Hammerbacher coined the term "data scientist" to defi ne their jobs at LinkedIn and Facebook, respectively.
................. Wikipedia
CSI Communications | June 2015 | 11
National Data Sharing and Accessibility Policy (NDSAP) and Big Data initiative of Govt. of India
https://data.gov.in/sites/default/fi les/NDSAP.pdf https://data.gov.in/
National Data Sharing and Accessibility Policy (NDSAP)
Aim : to provide an enabling provision and platform for proactive and open access to the data generated by various Government of India
entities.
Objectives: to facilitate access to Government of India owned shareable data (along with its usage information) in machine readable form
through a wide area network all over the country in a periodically updatable manner, within the framework of various related policies, acts and
rules of Government of India, thereby permitting a wider accessibility and usage by public.
• The principles on which data sharing and accessibility need to be based include: Openness, Flexibility, Transparency, Quality,
Security and Machine-readable.
• The Department of Science and Technology is serving the nodal functions of coordination and monitoring of policy through close
collaboration with all Central Ministries and the Department of Electronics and Information Technology by creating data.gov.in
through National Informatics Centre.
• As per NDSAP, every Department has to identify datasets by the following categories:
❖ Negative List: The datasets, which are confi dential in nature and would compromise to the county’s security if made public,
are put into this list. The datasets which contain personal information are also included in this list.
❖ Open List: This list comprises of datasets which don’t fall under negative list. These datasets shall be prioritized into high
value datasets and non-high values datasets.
• NDSAP recommends that datasets has to be published in an open format. It should be machine readable. Considering the current
analysis of data formats prevalent in Government, it is proposed that data should be published in any of the following formats:
❖ CSV (Comma separated values)
❖ XLS (Spread sheet - Excel)
❖ ODS/OTS (Open Document Formats for Spreadsheets)
❖ XML (Extensive Markup Language)
• Diff erent types of datasets generated both in geospatial and non-spatial form by Ministries/Departments shall be classifi ed as
shareable data and non-shareable data. The derived statistics like national accounts statistics, indicators like price index, databases
from census and surveys are the types of data produced by a statistical mechanism. However, the geospatial data consists primarily
of satellite data, maps, etc.
Open Government Data (OGD) Platform, India - https://data.gov.in - is a portal intended to be used by Government of India Ministries/
Departments their organizations to publish datasets, documents, services, tools and applications collected by them for public use. It intends
to increase transparency in the functioning of Government and also open avenues for many more innovative uses of Government Data to give
diff erent perspective.
This portal contains:
15,221 Resources, 3,596 Catalogs, 88 Departments, 46 APIs, 499 Visualizations. The data on this portal has been viewed 3.23 M times and downloaded 1.3 M times by 51,416 registered users.
Dr. Vipin TyagiJaypee University of Engineering and Technology, Raghogarh, Guna - MP, [email protected]
Details of the CategoriesPeriod – wise Membership Fee (Rs.)+ Service Tax Extra, as applicable
01 Year 02 Years 03 Years 04 Years 05 Years 10 Years 15 Years 20 Years
Institutional Members (Academic)
with 03 free Nominees
6,000 11,000 16,000 21,000 25,000 48,000 70,000 90,000
Institutional Members (Non-
Academic) with 04 free nominees
10,000 19,000 28,000 36,000 45,000 85,000 1,25,000 1,50,000
CSI Life Membership Fee + Service Taxes Extra, as applicableLife Membership Fee (after 30% Golden Jubilee Discount valid upto 31.12.2015), irrespective of any age group, is Rs. 7,000.00.
From 1st January, 2016, the Life Membership Fee shall be Rs. 10,000.00.Note: Service Taxes, as applicable, shall be extra in all the categories.
CSI Institutional Membership Fee + Service Taxes Extra, as applicable
❖ RDF (Resources Description Framework)
❖ KML (Keyhole Markup Language used for Maps)
❖ GML (Geography Markup Language)
❖ RSS/ATOM (Fast changing data e.g. hourly/daily)
CSI Communications | June 2015 | 12 www.csi-india.org
Cover Story
Pritee Parwekar* and Suresh Chandra Satapathy** *Dept. of CSE, ANITS, Visakhapatnam**Prof. and Head, Dept of CSE, ANITS, Visakhapatnam
IntroductionWireless Sensor Networks and
predominantly the Internet of Things (IoT)
has numerous devices that have capabilities
of sensing and actuating based on either
rule sets locally available or sourcing it
through higher computational platforms.[1].
These devices feed data streams which will
soon overwhelm the traditional approaches
to data management require a paradigm
shift in data management like the big data.
This paper discusses the issues with respect
to network attacks and employment of Big
Data analytics as backbone for intrusion
detection systems in emerging architectures
of Wireless Sensor Networks and Internet of
Things (IoT).
Big data with a backbone of cloud
computing is the state of art method
to offload considerable computation
requirements from both data centers
and terminal sensing devices. These are
all the more lucrative due to the inherent
qualities of flexibility and scalability[2].
However, cloud computing may not
be directly suitable for all applications
such as WSN (Wireless Sensor
Network) for its high requirement on
real time latency, immediate response
requirements which may be associated
with geographic mobility[3]. For WSN,
area of operation is in the physical world,
while cloud computes towards the edge
of the network.
However, towards semi-real time
issues like data mining for generating
anomaly patterns for intrusion detection
systems[4], a strong system using
technologies like Bigdata are considered
to be promising. The paper studies
the advances for big data in ubiquitous
Wireless Sensor Networks and focus
on the computation and storage, data
analysis and mining towards evolving a
collaborative intrusion detection system.
Challenges in Wireless Sensor Networks towards developing Intrusion Detection SystemsIDS schemes have been implemented in
wired and semi-wired networks. These
systems look for certain misbehavior
patterns in the network which would give a
whiff of a malicious act and thereby trigger
attack mitigating mechanism. WSNs
have an inherent drawback of limited
resource availability in form of energy as
well as computing capabilities. IDS thus
have a signifi cant contribution towards
Protecting WSNs from both internal and
external attacks. An IDS would look for an
anomaly in node behavior and once found
would re-confi gure the network to by-
pass the malicious node and thus prevent
a network attack.
Lately, researchers have proposed
a variety of IDSs and a few of them have
been specifi cally made applicable to WSN
structures (fl at, cluster, hierarchical)[5]
has shown how an IDS can be used to
detect misbehavior of nodes and inform
the neighbor nodes in the network to
invoke necessary countermeasures .Few
of the IDS which have been created for
the wired domains and ad hoc networks
have not been found to be applicable
in the same form in wireless sensor
networks. The network characteristics of
WSNs have confl icting requirements[6])
which comes in the way and complicates
the design of the security mechanisms.
Also as compared to adhoc networks the
computing and energy resources of sensor
nodes are constrained[7].
An IDS can approach the attack
under three classifi cations, namely,
Misuse detection, Anomaly detection,
Specifi cation-based detection as brought
out by[8]. Misuse detection involves
comparing the action or behavior of
nodes with a data bank of attack patterns.
These patterns have to be pre-defi ned and
recorded into the system. The limitation
of this technique is that it is knowledge
dependent to build attack patterns and
therefore fails to detect novel or modifi ed
attacks. The attack patterns database
also need to be regularly updated to
include freshly detected patterns. Here,
the effi ciency with regards to system
management is signifi cantly reduced,
as the network administrator is required
to constantly equip the IDS agents with
a current database. A rule-based or
misuse detection technique for a WSN
is a complex preposition. Practically,
replicating the attackers psyche is
diffi cult. The administrator of the network
is required to pre-empt and model attack
patterns futuristically. Moreover, WSNs
are severely memory constrained which
makes misuse-detection based IDSs in
WSNs diffi cult to implement, as they need
to store attack signatures[9].
Anomaly detection technique
concentrates on behaviors of the nodes
to decide whether they are normal or
anomalous. This method fi rst establishes
the features of a behavior which are
to be considered normal. These are
established by using self learning training
mechanisms. Subsequently, any activity
which does not comply with these pre-
established behaviors is treated as
intrusions. If a certain node does not
behave in accordance with the predefi ned
specifi cation, then the IDS will arrive at an
inference that the said node is malicious.
Any wrong inferences by the IDS would
trigger false alarms which in turn would
Leveraging Bigdata Towards Enabling Analytics Based Intrusion Detection Systems in Wireless Sensor Networks
Abstract: Wireless Sensor Networks are prone to network attacks like any other network. The typical characteristics of WSNs are their
resource constraints in form of energy and computational resources. With these limitations, equipping such networks with Intrusion Detection
capabilities is a challenge. The paper has explored the options proposed by researchers in enabling the network itself to fi ght against such
intrusions. However, the proposal given opts for a hybrid solution where the capabilities of BigData across today’s networks can be utilized to
work out a collaborative solution to equip such resource poor networks with an ability to detect and fi ght against intrusions. The paper charts
a roadmap towards this direction.
CSI Communications | June 2015 | 13
have aff ect the accuracy of detection.
Hence, this method has a substantial
false alarm rate. Also, an intrusion which
behaves analogous to pre-established
valid behaviors would not be identifi ed
as anomalous behavior and may not be
detected. Several IDS techniques have
been formulated for anomaly detection
in WSNs. Certain assumptions or metrics
are used to determine the behavior of
sensor nodes as normal or abnormal. This
approach is considered easier to apply
as compared to misuse or specifi cation
based detections and most researchers
use it as the main method to detect
intrusions. However, anomaly detection
techniques have a few similar strategies
as misuse detection, for eg. the watchdog
approach[10].
Misuse and anomaly detection
mechanisms are based on machine
learning techniques. However, with a
similar goal, specifi cation-based detection
technique depend on manually described
specifi cations where normal behavior is
defi ned. These specifi cations become
the datum for monitoring all actions. The
manual, labour intensive specifi cations
defi ning process is the main drawback
of this. Further a new malicious activity,
not previously defi ned, is not detected.
In certain cases, misuse and anomaly-
based detection techniques can be blend
together as hybrid detection mechanisms.
IDS to be selected depends on it
capability of outsourcing the computation
requirements to external agency outside
the network. Such capability can be
sourced from the following IDS applicable
to WSN:-
(a) A Partially Distributed Intrusion
Detection System for Wireless Sensor
Networks has been proposed by Eung Jun
Cho et al.[2] which requires low memory
and power. The IDS employs multiple
Bloom fi lter arrays to distribute attack
signatures. It is capable of detecting
fragmented attack signatures at the
application layer and unfragmented attack
signatures at the network layer. As per the
authors, the mechanism can handle denial
of service attacks.
(b) In the PCADID approach[12], the
WSN is partitioned into groups of sensor
nodes. Some of the nodes in each group
are identifi ed as monitor nodes, which
cooperate with each other to create a
global normal profi le. Here every monitor
node creates a sub profi le for its own
normal network traffi c using principal
component analysis (PCA) which it shares
with other monitor nodes. The shared
sub-profi les of the monitor nodes are used
to create the global normal profi le which
is then used to detect anomalies in the
network traffi c. With the normal network
behavior changing progressively, the
global normal profi le also gets updated.
The authors have shown that the PCADID
achieves a high detection rate with a low
false alarm rate.
(c) Author in paper[13] has described
an intruder tracking system for cluster-
based wireless sensor networks using
MAC address. The base station is
responsible for the detection and therefore
the system is more energy-effi cient as
well as facilitates early detection and
prevention of security threats and attacks.
Timely detection and prevention of the
intruder can avoid slowing down of the
network, sending of fake data, etc. Thus,
the Base Station (BS) centric security
system in Wireless networks can have a
considerable degree of security without
signifi cantly consuming energy of nodes
and cluster heads.
(d) Integrated Intrusion Detection
System (IIDS)[14] is a combination of
three individual IDSs: Intelligent Hybrid
Intrusion Detection System (IHIDS),
Hybrid Intrusion Detection System (HIDS)
and misuse Intrusion Detection System.
These are tailored for the sink, cluster
head and sensor node depending on
the likely types and frequency of attacks
these suff er from. The IIDS consists of an
anomaly and a misuse detection module
to increase the detection rate and lower
the false positive rate. A decision-making
module integrates the results and presents
a report of the attacks.
It may be noted that in all these IDS
the aim is to utilize the existing setup of
WSNs by optimizing the IDS algorithms
to facilitate early detection of the attack.
But with ever increasing ingenuity in
attacks, a limited signature databank with
a resource constrained WSN will always is
a bottleneck. It is therefore proposed that
the Cluster head in WSNs will only identify
anomalies and outsource the same to a
cloud infrastructure, where a real time
analytics solution using Big Data will be
employed and directives for handling
the attack will be sourced. The advanced
data mining techniques which traditional
IDS use, but could not be extended to
WSN environments, will now fi nd use. In
other words the Big Data technology will
be leveraged to come over the resource
constrained nature of classical WSNs and
make the best of IDS technologies from
other wired / wireless networks.
Challenges in Big Data Analytics Intrusions are of a variety of natures with
each day the intruders developing more
and more ingenious ways of intruding the
networks. The intrusions take place on
all sorts of networks and are not limited
Fig. 1: Flow Chart for a typical data analyti cs based soluti on
CSI Communications | June 2015 | 14 www.csi-india.org
to a particular type. A certain intrusion
methodology can be easily extended to
a diff erent network. Making sense of the
data, identifying non-obvious patterns,
and based on this predicting a future
possible intrusion behavior are studies
which have been favorites of researchers.
Knowledge Discovery in Data (KDD) is
about extracting non obvious information
from a pool of data. Data mining is used
to discover interrelations amongst the
datasets by using machine learning and
statistics. Analytics, like a superset,
comprises techniques of KDD, data
mining, text mining, statistical analysis,
rule based and predictive models, and
advanced and interactive visualization to
assist decisions and actions.
Data from various sources are used
to build models. The voluminous data
is required to be pre-processed. The
prepared data is then used to train a
model and to estimate its parameters.
Once the model is estimated, it should
be validated before its use. Normally,
this phase requires the use of the original
input data and specifi c methods to
validate the created model. Finally, the
model is applied to data as it arrives.
This phase, called model scoring, is used
to generate predictions, prescriptions,
and recommendations. The results
are interpreted and evaluated, used to
generate new models or calibrate existing
and are integrated to pre-processed data.
Analytics solutions can be classifi ed
as descriptive, predictive, or prescriptive.
Descriptive analytics uses previously
recorded data to predict and create
guidance reports for management; it
is related with modeling previously
encountered behavior. Analysis of current
and historical data is used in Predictive
analytics to predict the future. The
analysts use the prescriptive solutions by
determining actions and assessing their
eff ect on project objectives, specifi cations,
and the project constraints and then fi nally
arriving at a consolidated decision.
Using analytics may sound as a one
stop solution; however using is analytics is
a tedious, expensive and requiring several
consulting hours to develop and tailor a
solution for a particular project[3]. Such
solutions complex, with considerable
execution time and are hosted on the
project premises. Cloud computing
off ers a platform for the analytics, where
solutions are hosted in the Cloud to be
shared by multiple projects on a scalable
cost and resource model. To make this
happen, there are several technical issues
like the data management, tailoring of
models, date privacy, security, data quality
and currency.
The most tedious process of
analytics is getting the data ready for
analysis. Analyzing large volumes of data
requires effi cient methods for storage,
fi ltration and retrieval of data. Challenges
of deploying data on Cloud environments
and subsequently its management has
been understood and researched for
some time now[22,23,24]. Multiple Cloud
deployment models viz the private, public
or hybrid, are to be considered for arriving
at a Cloud analytics solutions:
Private: It is a cloud deployed on
the organizational network or by a third
party but exclusively for the organization.
A private Cloud is used by organizations
aiming for highest level of security and
data privacy. Such organizations aim for
using Cloud infrastructure to share the
services and resources within the various
arms of the organization which may be co-
located or located across the globe.
Public: This is a cloud deployed
over the Internet and publically available.
Public Cloud usually are highly effi cient
in terms of cost as well as performance.
In the public environment, the analytics
services and data management are
handled by the cloud service provider and
the organizations benefi t with insights of
public analytics results also.
Hybrid: This type combines both
Clouds where scalable resources from
the public Cloud can be extended to the
private Cloud. This is a midpath where
organizations at their level can deploy
analytics applications in a secure private
environment which is scalable, at a lower
cost and has higher degree of security as
compared to using a public Cloud.
Big Data is characterized by variety,
velocity, and volume where variety
represents the data types, while the rate
of data production and processing is
referred as velocity, and volume defi nes
the amount of data. Veracity means how
much of the data can be trusted based on
the reliability of its source.
There are some open challenges.
Researchers are tackling with ever more
challenging issues. There is an increasing
size of data which is unstructured. The
challenge is how to extract meaningful
data from the given data. Also with
steady strand eam of data streaming
from multiple sources, aggregation and
correlation of the data requires a paradigm
change in the methodologies. Subsequent
to fi ltering useful data, the challenge is
to effi ciently recognize and store this
important information extracted from
Fig. 2: Collaborati ve IDS with BigData backbone
CSI Communications | June 2015 | 15
unstructured data. Volumes of information
are overwhelming and a mechanism for
timely retrieval needs to be worked out.
A new fi le system needs to be designed
which can easily migrate diff erent types of
data and diff erent size of data between the
data centers or cloud providers.
Data integration in light of new
protocols and interfaces is another
challenge due to the variety of data
sources viz. structured, unstructured,
semi-structured.
Integrating Intrusion Detection Systems for Wireless Sensor Networks on Big Data Systems We have proposed the concept of alert
Correlation in Distributed Environment
for developing the cloud and Bigdata
based IDS for WSN. We intend to use
a Big data based fuzzy logic algorithm,
which would help in identifying intrusions
through pattern matching and thereby
reduce false alarms. Fuzzy logic dealing
with vagueness and imprecision has a
capability to represent exact forms of
reasoning in areas where fi rm decisions
have to be made characterized This is
found to be appropriate for intrusion
detection.
ArchitectureWe have the cluster head performing
the preliminary anomaly detection. The
cluster head uses the following rules[15] to
identify the anomaly:-
Interval Rule: This rule analyses the
time period between two consecutive
message receptions and verifi es whether
they comply with the allocated time.
Retransmission Rule: This rule
aims at pinpointing a node that is not
forwarding a message. This rule is used to
detect black hole and selective forwarding
attack.
Integrity Rule: If an attacker changes
the message payload then this rule
identifi es an anomaly.
Delay Rule: If the message is not
delivered on due time then this rule alerts
the system.
Repetition Rule: This rule detects if
a particular node sends message multiple
times and thereby detecting a possible
denial of service attack.
Radio Transmission Range: In
wireless neighboring nodes participate
in the transmission of message. In case
a network message is received but the
neighbor appears to be silent, then there
is an anomaly.
Jamming Rule: This rule analyses
the count of collisions per message
and ensures that it is lower than the
predetermined value.
Once the anomaly is detected, it is
passed to the Big Data based back end
to identify the possible sources of such
an attack. The analytics are expected to
provide a shortlisted result which would
be used for system learning and enabling
a watchdog from these sources as a future
advanced persistent threat.
The Big Data Back End will Work on the Following PrincipleNormalization: The cluster head that
supplies the data either online or offl ine
to the cloud based receiving component.
The data from the network with regards to
threats and anomalies is normalized. The
data from the cluster head comprises of
dynamic fi elds like date and time stamp,
username, port used, IP addresses of the
source and destination etc.
Pre-processing: The normalized
alerts have been allocated some standard
names in a certain format which are
recognizable by other components
involved with the correlation process.
Other pre-processing components may
be required since the cluster heads
free themselves of this memory once
delivered. The main task of pre-processing
component is providing alerts with
missing fi elds which are necessary for
other correlating components[16].
Categorization: In categorization,
similar events are categorized together
and the nature of occurrence on attacks at
a certain time interval is studied.
Correlation: The performance
of Correlation depends on combining
the three tasks of normalization, pre-
processing and categorization. The
key step for selecting a method for
correlation process is to consider nature
of environment followed by more ability
for reception of alerts, trace of tracks,
preparation logs with simple entities
and trace of events with such entities.
The quality of correlation step depends
on lower latency level of tools. The
correlation component discovers relations
between alerts in order to reconstruct the
attack scenarios. This would be the key
component towards intrusion detection
which cannot work in isolation. Co-
operation would be the key word towards
futuristic intrusion detection systems and
big data is the key technology to facilitate
the same.
False Alert Weeding: This component
is tasked with the responsibility of
distinguishing between false positive and
true positive alerts. Diff erent sensors have
their own advantages and disadvantages
in various attacks detection and this is a
famous bottleneck for low-level sensors to
generate lots of false positive alerts.
Attack strategy analysis: The attack
strategy analysis tries to comprehend
the real intentions of invaders. The
Fig. 3: Integrated IDS Model
CSI Communications | June 2015 | 16 www.csi-india.org
requirement for such an analysis is to
identify the correlation amongst low-level
alerts which would help establishing the
complete strategy of planned attack by
invaders. Prediction of attacks next steps
for suitable reaction against them and
spontaneous response toward prevention
from next damages are totally important
and useful [17].
Prioritization: Prioritization aims at
rating the alerts according to severity and
fi tting an operation against each type of
attack. The component of prioritization
of alerts would have to be provided an
intelligent backbone database in form of
a fuzzy logic/ genetic algorithm based
intelligence engine so as to consider types
of alerts as well as other information.
Prioritization of alerts will also depend
on the Security policies and the network
topologies.
Finally, once the solution is delivered,
the event, its response and the success rate
will get mapped into a data aggregating
component for future use by the present
IDS or which is shareable on the cloud
networks as a collaborative knowledge for
other IDS for Wireless or other networks.
ConclusionThe paper has been able to analyze the
available technology in cloud, Bigdata,
IDS and their applicability to WSNs.
Though an implementation has not been
shown in the paper, a clear road map has
been chalked for an Intrusion Detection
System in Wireless Sensor Networks
working on the principle of collaboration
over a Bigdata as a backbone. A full
fledged implementation using Hadoop is
being conceptualized and is next on our
agenda.
References[1] Pritee Parwekar : “From
Internet of Things towards
cloud of things” Computer and
Communication Technology (ICCCT),
2011 2nd International Conference
on Digital Object Identifi er: 10.1109/
ICCCT.2011.6075156 Publication
Year: 2011 , Page(s): 329 - 333
[2] Fu Xiao, “Big Data in Ubiquitous
Wireless Sensor Networks”,
International Journal of
Distributed Sensor Networks
Volume 2014 (2014), Article
ID 781729.
[3] Adel A Ahmed, “A real–time routing
protocol with mobility support and
load distribution for mobile wireless
sensor networks”, International
Journal of Sensor Networks, Volume
15, Number 2/2014.
[4] Pritee Parwekar,“Application of
Data mining in Network Intrusion
Detection” Technical Paper selected
for presentation at the Indian Science
Congress 2008.
[5] CE Loo, MY Ng, C Leckie, and
M Palaniswami, Intrusion Detection
for Routing Attacks in Sensor
Networks, International Journal of
Distributed Sensor Networks, vol. 2,
pp. 313-332, 2006.
[6] J Lopez, R Roman, and C Alcaraz,
Analysis of Security Threats,
Requirements, Technologies and
Standards in Wireless Sensor
Networks, in Foundations of Security
Analysis and Design 2009, LNCS
56705, August 2009, pp. 289-338.
[7] R Roman, J Zhou, and J Lopez,
Applying Intrusion Detection Systems
to Wireless Sensor Networks, in
Consumer Communications and
Networking Conference, 2006,
pp. 640-644.
[8] Abror Abduvaliyev, Al-Sakib Khan
Pathan, Jianying Zhou, Rodrigo
Roman, and Wai-Choong Wong , On
the Vital Areas of Intrusion Detection
Systems in Wireless Sensor
Networks, IEEE communications,
surveys and tutorials, Vol. 15, No. 3,
Third Quarter 2013
[9] I Krontiris, T Dimitriou, and F C
Freiling, Towards Intrusion Detection
in Wireless Sensor Networks, in 13th
European Wireless Conference, Paris,
France, 2007.
[10] S Marti, T J Giuli, K. Lai, and M Baker,
Mitigating Routing Misbehavior
in Mobile Ad hoc Networks, in
MobiCom’00, 2000, pp. 255-265.
[11] Eung Jun Cho, Choong Seon Hong,
Sungwon Lee and Seokhee Jeon,
A Partially Distributed Intrusion
Detection System for Wireless
Sensor Networks, Journal on Sensors
, 2013.
[12] Ahmadi Livani, M., A PCA-based
distributed approach for intrusion
detection in wireless sensor
networks, Computer Networks and
Distributed Systems (CNDS), 2011
International Symposium
[13] Shio Kumar Singh , M P Singh , and
D K Singh , Intrusion Detection
Based Security Solution for Cluster-
Based Wireless Sensor Networks,
nternational Journal of Advanced
Science and Technology, Vol. 30,
May, 2011.
[14] Shun-Sheng Wang, An Integrated
Intrusion Detection System for
Cluster-based Wireless Sensor
Networks, elsevier.
[15] Ali Ahmadian Ramaki et al,
“Enhancement Intrusion Detection
using Alert Correlation in Co-
operative Intrusion Detection
Systems”, Journal of Basic and
Applied Scientifi c Research, 2013.
[16] Valeur, F, Vigna, G, Kruegel, C, and
Kemmerer, R A, A Comprehensive
Approach to Intrusion Detection
Alert Correlation, IEEE Transactions
on Dependable and Secure
Computing, p. 146-169, July
2004, 1(3).
[17] Pietraszek, T, Using Adaptive Alert
Classifi cation to Reduce False
Positives in Intrusion Detection, In
the Proceedings of 7th International
Symposium, RAID 2004, p. 102-124,
Sophia Antipolis, France, 2004.
n
Abo
ut th
e A
utho
rs
Pritee Parwekar is pursuing her PhD. in Computer Science and Engg. from GITAM University, Vishakapatnam. Currently she is working with Dept of CSE of ANITS, Vishakapatnam. She has more than 15 years of teaching experience. Her research areas are Sensor network, Cloud Computing and IoT. She is a life Member of CSI. She has reviewed many papers from Springer and IEEE and has already published more than 15 papers with reputed publishers like IEEE and Springer.
Dr. Suresh Chandra Satapathy is PhD in CSE from JNTU, Hyderabad. He is currently working as Prof. and Head, Dept of CSE, ANITS, Vishakapatnam. He has more than 100 publications in both International Journals and conferences. He is the editorial board member of several proceedings with Springer. Currently he is guiding 8 scholars for PhD. He holds the Chairman Div-V(Education & Research) position in CSI and also a senior member of IEEE. His research interests are Data Mining, Machine Intelligence, Swarm Intelligence and Soft Computing.
CSI Communications | June 2015 | 17
IntroductionCryptography
Cryptography means converting the
data into a secret message (encryption)
and then reverting back the encrypted
message in the form of original data
(decryption).
The secret message thus generated
is called a cipher and it is very important
for secrecy and confi dentiality of
communication between the sender and
the receiver.
Cryptography Features
Since cryptography is basically a security
system, we want it to provide us with a
variety of features or functions which
provide secrecy and confi dentiality of the
data.
Authentication: Authentication
basically means that the identity of the
receiver as well as the sender should
be verifi ed in order to maintain the
authenticity of the data.
Secrecy or Confi dentiality: What we
mean by secrecy is that only the people
who are authenticated should be able
to encrypt or decrypt the data. This
maintains the confi dentiality of our data
thereby making it secure.
Integrity: During encryption or
decryption, we would want our data to be
free from any form of modifi cations. We
would want the data to be received as it
was sent without any modifi cations. This
feature is what is called the integrity. The
basic form of integrity is packet check sum
in IPv4 packets.
Non-Repudiation: This means that
both the sender and the receiver cannot
say that they have not received or sent
the message. Thereby making the process
free of any false claims.
Service Reliability and Availability: We
know that secure systems get hacked by
the hackers, which hampers the availability
of the security, and also hampers the type
of service to the customers. We should
ensure that the service given is exactly
what they expect out of a security system.
Encryption
Encryption is basically a process of making
information hidden or secret.
It is considered as the subset of
cryptography. The process of converting
plaintext into ciphertext is basically what
encryption is.
Ciphertext is basically a coded form
of the data which appears meaningless
and useless but on decryption, it gives us
the original plaintext.
So, in a nutshell we can say that
encryption is a process of converting
meaningful data into, what appears,
meaningless data.
Decryption
The encrypted data is of no use to the user
until and unless we convert it back to the
meaningful form.
The conversion of encrypted data
(cipher text) into useful/meaningful data,
can be termed as decryption.
Decryption is basically the opposite
process of encryption.
Diff erent encryption methods
There are three diff erent basic encryption
methods-
• Hashing :
Basically, hashing makes a unique
invariable length code for each data
text, which is called hash. Since each
hash is diff erent for each and every
text message, therefore it would be
very easy to fi nd small changes. Once
hashing is used to encrypt a data, it
can’t be decrypted back. So, we can
see that hashing is not technically an
encryption operation but it can be
taken as a method to see if the data
has not been tampered with.
• Symmetric methods:
Symmetric encryption, or private-key
cryptography, is basically an encryption
process where the same key is known to
both sender and receiver, and so the key
used to encrypt and decrypt the message
must remain secure, otherwise anyone
with any idea about the key would be able
to access the data. In this, encryption takes
place with one key, then the encrypted
data is sent and decryption again takes
place with the same key.
a. Block Cipher
• It works on invariant group of
bits. Block cipher has unvarying
transformation that is specifi ed
by asymmetric key.
• They are basically very useful in
designing and creating of many
other cryptographic rules or
protocols.
• Whenever we need to encrypt
data in a large amount, or in a
bulk, then block cipher is used.[2]
b. Stream Cipher
• It is basically a symmetric
ciphering method in which we
combine the plaintext digits
with random key stream.
• The concept followed in stream
cipher is encryption of data bit
by bit by the key data to form
ciphertext.
• In this, the encryption of each
state depends upon the current
state.
• In practice, a digit is typically
a bit and the combining operation
an exclusive-or (XOR).[3]
• Asymmetric methods:
It is also called public key cryptography.
It is unlike the previous in a way because
it uses 2 keys, one for the sender and
A Novel Approach to Secure Data Transmission using Logic Gates
ResearchFront
Rohit Rastogi*, Rishabh Mishra**, Sanyukta Sharma**, Pratyush Arya** and Anshika Nigam****Sr. Asst Professor, CSE-Dept-ABES Engg. College, Ghaziabad (U.P.)**B.Tech. (CSE)-Second Year, CSE-Dept.-ABES Engineering College, Ghaziabad (U.P.)***B.Tech, (IT)-Second Year, IT Dept.-ABES Engineering College, Ghaziabad (U.P.)
Abstract: Encryption and decryption processes have been carried out here using logic gates. We fi rstly generate a key using the concept of
cellular automation, which converts our readable data into non readable ciphertext. Further, that very same key is used to decrypt the data.
For encryption, we follow cellular automation followed by passing the key through a series of multiplexers (8x1 and 2x1 MUXs) in order to
create randomness. Also, we have used feedback network in order to create even more randomness which feeds the once used key back to the
combination in order to avoid exhaustion of keys.
This process is relatively cheaper and easy to implement. Further, more complex algorithms can be implemented using this basic method.
CSI Communications | June 2015 | 18 www.csi-india.org
another for the receiver. Therefore, it
has the potential to be more secure as
such. Using this method, a public key is
made readily available to everyone and
can be used to encrypt messages. And
unlike that public key, a private key is
made available in order to encrypt the
message.
Aim of this PaperWe are focussing on Data Encryption and
Decryption using 74xx Logic Gates.
Analysis of Cryptography Using
Logic Gates :
->Nowadays, for secure
communication, we need a system that
can transfer data from sender to receiver
safely without any manipulation. Thus,
encoding and decoding come into picture.
->We are encrypting and decrypting
data using logic gates in this paper.
->For encryption, an initial key is
needed. That key is generated by the
concept of cellular automata.
->Cellular Automata, or in general,
rule 30 of cellular automata will help us in
generating random initial key.
->Cellular automata is basically a
one dimensional collection of states (0
or 1). And the value of next state depends
on that of previous state which can be
calculated using a fi xed rule (Rule 30).
Rule 30 of Cellular Automata
• It basically deals with fi nding the
state of the ‘ith’ cell in the next
state by making use of the state
of the ith , (i-1)th and (i+1)th
cells in the current state.
• Consider this, we have 3
cells (the ‘ith’ cell and two
neighbours), and each cell can
have either 0 state or 1 state. So,
in total we can have 8 possible
combinations.
• Thus, the rule 30 of cellular
automata just gives us a way
to design the truth table for the
encryption system.
• Example consider three inputs 0
1 0. Let 0 be the (i-1)th element,
1 be the ith element and 0 be
the (i+1)th . The output of these
three inputs can be found as :
Output= [(i-1)th XOR((ith)+(i+1)th)]
Hence, for the above example,
Output = (0 XOR (1+0))
= (0 XOR 1)
= 1
Truth Table
Generation of KeyInternal Circuitry:
Working of Each Component During Key
Generation
• 2:1 Multiplexer:
1. The key that is generated by the
rule 30 is fed to the 2:1 MUX bit
by bit.
2. The 2:1 MUX has one selection
line (sel). For sel=1, we get an
output bit, and for sel=0,
3. the previous output is feedback
as input.
4. We are using 8 2:1 MUX,
therefore we get 8 outputs
corresponding to each bit.
• Shift registers:
1. The role of shift registers is
basically to provide a delay time.
2. And also to shift or pass the
digits bit by bit.
3. We use 2 4 bit shift registers in
this. This is because it provides
lesser complexity as compared
to 1 8 bit shift register.
• 8:1 Multiplexer :
1. We have used 8 8:1 MUX in the
circuitry because each MUX
gives 1 output, and in total we
need 8 outputs.
2. The output that is received
from the 2:1 MUX via the shift
registers acts as the selection
line for the 8:1 MUX. The input
that is fed is the original key that
was fed to the 2:1 MUX.
3. If the output obtained from
2:1 MUX be A,B,C,D,E,F,G and
H, then the selection lines are
taken from A to H taking three
at a time in circular order.
4. Thus, there are total 8 selection
lines (ABC, BCD, CDE, DEF, EFG,
FGH, GHA, HAB). One for each
8:1 MUX.
5. The output is fed to 2 4 bit shift
registers that shift these bits as
the output.
6. The fi nal output after shift
register acts as the new key
that is used in the encryption
process.
Working after Key GenerationAfter the new key is stored in the shift
registers, there are three processes that
take place.
Encryption ProcessFeedback ProcessDecryption Process
Encryption Process :
In the encryption process, the new key
is XORed with the data that is entered
by the user through shift registers. This
Fig. 1: Internal Circuitry for Key Generati on[1]
Fig. 2: Main Working of the Process[1]
Table1- Key formati on using cellular automati on[1]
CSI Communications | June 2015 | 19
XOR operation creates a secret text that
is known as CIPHERTEXT. The cipher text
is stored and shifted using shift registers.
Suppose our initial key was 00011110
The key that was obtained by the
next state register – 00110001
And let the data be – 11010010
Then, the cipher text - 00110001
(Final Key)
11010010 (Data)
11100011 (Cipher text)
Feedback Process :
In feedback process, we feedback the key
that is used for encryption back to the 2:1
MUX via 2 4 bit shift registers for further
usage.
This feedback ensures the strength of
our encryption because the key once used
is used up again, thereby making the key
more random and more diffi cult for the
user to guess.
Decryption Process
In the decryption process, the ciphertext
that is received by the receiver is again
XORed with the key to obtain the originally
sent data.
• Working Example:
Suppose data: 10010011
Key: 11100011
Cipher text created at the sender’s
end: 01110000
After decryption:
We XOR bit by bit the cipher text
with the key
10010011 which is the original data is
again obtained.[1]
Time ComplexityThe overall complexity is calculated as
follows :
• The total time complexity can be
easily calculated by the sum of
Complexities of all components.
• Let C1-Time complexity for key
generation + C2-Time complexity
of Encryption Process+ C3-Time
complexity of decryption Process.
• C1-( Time complexity of Shifting the
data + Data selection by 8:1 MUX)+
C2-(Time complexity of Shift
Register + XORing the bits) + C3-
(Time complexity of Shift Register
and XORing the Bits).
• If there are n bits in the data so, n bit
Key is required
• So, time complexity is C=C1+C2+C3
• C=[O(n)+2*8O(1)] +[O(n)+8O(1)]+
[O(n)+8O(1)]=3O(n)+32O(1)=O(n)
• So the whole process is of Linear Time
Complexity and P-Time Algorithm
Time Delay • The total time delay can be easily
calculated by the sum of time delay
of all components.
• T1-Time delay for key generation T2-
Time delay of Encryption Process
• T3-Time delay of decryption Process
• T1-(i.e. Time delay of Shift Reg. +
8:1 MUX)+ T2-Time delay of (Shift
Register + XORing the bits) + T3-
Time Delay of ( Shift Register and
XORing the Bits)
LimitationThere are some limitations in encrypting
and decrypting data using logic gates.
• The length of the key and that of
entered data should be same.
• Key should be known to both receiver
and sender, thereby making it prone
to thefts.
• The fact that the registers are only 8
bit wide is a big limitation.
• For real life data, we need to enhance
the capacity of registers.
Future Scope • Other c omplex and more
secure cipher algorithms can be
implemented through logic gates
and their applied combinational and
sequential circuits.
• All the components can be embedded
in a 20 pin chip as a unit
• Only external original text/binary 8
I/P can be designed as needed. As
a result, we may get 8 O/P values of
text/binary as a cipher message.
• Hence, the cost of hardware circuitry
may be reduced.
• Also, multiple chips can be used to
scale the process depending on our
requirement.
• More secure logics can be
implemented and circuits can be
designed.
ConclusionWith this paper, we would like to conclude
that although this method is primitive
but still advantageous because of basic
circuitry needed, thereby reducing cost of
the project formation, and making it easily
understandable and comprehensible.
Recommendation • The whole process can be taken as
an alternate method for secure data
transmission.
• It’s user friendly, easily understood,
easy to calculate, and easy to be
programmed.
• The linear time complexity shows
that its performance is good.
• The hardware resources are cheaper
and can be effi ciently implemented.
• It is scalable also for bigger data size
with the multiple units, treating the
8-bit data as a block.
AcknowledgementWe would like to sincerely thank
Ms. Upasana Sharma Faculty-ECE and
Prof. A K Arora(Head of Department,
Department of Electronics and
Communication Engineering, ABES
Engineering College, Ghaziabad) for
showing us the righteous path and helping
us whenever we needed.
Also we would like to thank the
Almighty God. It is because of Him that
we are what we are today.
References[1] http://electronicsmail.wordpress.
com/2012/10/14/data-encryption-
and-decryption-system-using-74xx-
logic-gates/
[2] h t t p : //e n .w i k i p e d i a .o rg /w i k i /
Stream_cipher
[3] http://en.wikipedia.org/wiki/Block_
cipher
[4] http://natureofcode.com/book/
chapter-7-cellular-automata/
[5] http://en.wikipedia.org/wiki/XOR_
cipher
[6] h t t p : //s t a c k o v e r f l o w . c o m /
questions/1379952/why-is-xor-
used-on-cryptography
[7] h t t p : //u p l o a d .w i k i m e d i a .o r g /
wikipedia/commons/f/f8/Crypto.
png n
Abo
ut th
e A
utho
r Mr. Rohit Rastogi received his B. E. degree in Computer Science and Engineering from C.C.S.Univ. Meerut in 2003,
the M.E. degree in Computer Science from NITTTR-Chandigarh Punjab Univ. Chandigarh in 2010.
He is a Sr. Asst. Professor of CSE Dept. in ABES Engineering. College, Ghaziabad (U.P.-India), affi liated to Gautam
Buddha Tech. University and Mahamaya Tech. University (earlier Uttar Pradesh Tech. University) at present and is
engaged in Clustering of Mixed Variety of Data and Attributes with real life application applied by Genetic Algorithm,
Pattern Recognition and Artifi cial Intelligence.
CSI Communications | June 2015 | 20 www.csi-india.org
IntroductionComputing involves the use of computer
as hardware and/or software to perform
the desired task. Cloud computing is
defi ned as a computing paradigm shift
where computing is moved away from
personal computers or an individual
application server to a “cloud” of
computers[1]. The benefi ts of cloud
include rapid provisioning, low investment
cost and easy access. Cloud is based on
pay-per-use model where the users are
charged only for the duration when the
services are used. Other characteristics
of cloud are broad network access, on-
demand self service, resource pooling and
rapid elasticity[16].
According to the US National Institute
of Standards and Technology (NIST)[2] cloud computing can be summarized
as: ‘A model for enabling convenient, on-
demand network access to a shared pool
of confi gurable computing resources (e.g.,
networks, servers, storage, applications, and
services) that can be rapidly provisioned and
released with minimal management eff ort or
service provider interaction’.
Cloud provides an illusion of infi nite
storage to the users at limited setup and
usage cost. It permits the user to perform
computationally intensive operations on
the cloud and that too at multiple disparate
locations[1]. The prime requirement of
cloud usage is the internet availability. Due
to the low cost and fast speed availability
of internet, organizations are motivated
to outsource their data on the cloud.
There are large number of cloud service
providers namely VMware, Microsoft,
Google, Salesforce.com, Rackspace and
Amazon[3]. The organizations can deploy
a private cloud or may use a public cloud
to store their data based on the sensitivity
of the data, time and budget to deploy the
cloud[4]. As public cloud involves less cost
and time, many organizations prefer using
a public cloud than setting up a private
cloud.
This use of public cloud introduces
security breaches such as data leakage,
data theft and reduced control over the
data as the cloud service provider can
easily access the data[15]. In order to
provide security, the data is encrypted
before outsourcing it to the cloud. So,
confi dentiality of the data is retained using
cryptography. The use of cryptography to
convert this confi dential data into human
unreadable form introduces the challenge
of eff ective searching over this data.
A native approach to search data is
to download the entire encrypted dataset
from the remote cloud server to the local
machine. The entire dataset is decrypted
and then the desired documents are
retrieved. As the end users use mobile
devices or thin clients to connect to the
cloud and these devices are limited by
the memory available so this approach
is ineffi cient. So an effi cient method to
perform searching on this encrypted data
is desired.
In this paper, the aim is to provide
with a cluster based search scheme
using which the desired documents can
be retrieved with fewer comparisons. In
the cluster based search scheme, the
entire document collection is partitioned
into multiple clusters to provide
effi cient searching. As the numbers of
comparisons are reduced, the average
search time is also reduced. The proposed
search scheme should be coherent to the
other existing approaches, i.e., only the
authorized users are provided with the
ability to search on this encrypted data,
the user is able to retrieve the results
without revealing the search terms to the
cloud server, the documents retrieved
from the cloud server and search pattern
should not be revealed to semi-trusted
server.
The contributions of this paper can
be summarized as follows. Firstly, we
propose a cluster based approach for
multi-keyword search over encrypted
cloud data. Secondly, we will propose
an effi cient method which requires less
number of comparisons and time to
declare a search unsuccessful.
Rest of this paper is organized as
follows. In Section II, we discuss the related
work. Section III gives the system model,
security requirements and the problem
formulation. The detailed description of
the proposed search scheme is presented
in Section IV. The need for query
randomization is presented in Section
V. The security analysis of the proposed
search scheme is presented in Section VI
whereas the performance analysis is done
in Section VII. Finally, Section VIII gives
the concluding remarks of the paper.
Related WorkDawn Xiaoding Song[5] introduced the
concept of searchable encryption scheme
without loss of confi dentiality. Under
the proposed approach, symmetric key
encryption is used to perform encryption
on the available data. It proposed the
use of non-index based searching on the
encrypted cloud data due to less overhead
An Effi cient Cluster-based Multi-Keyword Search on Encrypted Cloud Data
ResearchFront
Rohit Handa* and Rama Krishna Challa***Assistant Professor, CSE Department, BUEST, Baddi, India**Professor, CSE Department, NITTTR, Chandigarh, India
Abstract: Cloud computing involves the delivery of computing infrastructure resources as a service to the end users over internet. As an illusion
of infi nite resources availability is provided, the organizations outsource their data to the cloud. But this migration of confi dential data on cloud
leads to various security issues. To maintain confi dentiality, cryptography is employed which reduces the ease of searching data on the cloud. So,
an effi cient approach for searching data on cloud is desired. In this paper, a cluster based multi-keyword search scheme is proposed. The privacy
and security requirements as proposed in the literature are also implemented. To the best of our knowledge, the previous works are ineffi cient in
declaring a search unsuccessful without performing search over entire dataset. The performance analysis of the proposed search scheme over
synthetic data reveals that the number of comparisons required to perform search are reduced by 80% and the time required is reduced by 70%.
So, the proposed search scheme outperforms other search schemes in literature in terms of number of comparisons required and time required
to search the desired document on cloud by order of several magnitudes.
CSI Communications | June 2015 | 21
involved in searching as compared to the
keyword based approach. This method
is ineffi cient for data of large size as
it involves the use of symmetric key
cryptography for security. Also this work
is linear in document size.
Mehmet Ucal[10] proposed
improvements to Song’s approach. It is a
hybrid approach in which the keywords
are encrypted using stream cipher and
the non keywords are encrypted using
block cipher. To generate non keywords
of desired length padding is done.
The keyword based search scheme is
integrated into the existing approach to
perform a faster search operation with less
overhead. The encrypted fi le is of small
size which provides reduced encryption
time and memory overhead. But due to
the small size of the fi le, security can be
compromised.
Boneh et al.[13] modifi ed[5] and
introduced the use of Public Key Encryption
(PEK) for keyword search but this
approach is computationally expensive
due to the use of public key cryptography.
Also, keyword privacy cannot be provided
as the server can easily encrypt the
keyword with the public key and use the
received trapdoor to evaluate the cipher-
text. Goh[14] introduced the concept of
searchable indexes but the use of Bloom
fi lters introduces false positives which
leads to extra fi les being downloaded by
the mobile user than required.
Cao et al.[8] introduced the concept
of single keyword search over encrypted
cloud data. This approach provides data
security but is applicable to single keyword
search and requires the secret parameters
to be shared among the end users for
trapdoor generation. Hence provides weak
security.
Ning Cao et al.[6] modifi ed[8] to support
multi-keyword search over encrypted
data but this approach generates less
accurate results due to randomization.
It involves large computation overhead.
Also the security provided is weak due to
distribution of symmetric key among all
end users.
Ayad Ibrahim et al.[11] proposed the
concept of performing multi-keyword
ranked search over encrypted cloud data
using Privacy Preserving Mapping. This
approach provides index security, data
security, access privacy and trapdoor
security. As this approach is based
on Bloom fi lter so the number of false
positives is high. Also, the storage and
time overhead in construction of the index
is high.
Orencik et al.[7] proposed multi-
keyword search using forward index.
The proposed method is effi cient over
existing methods in literature but still
requires large number of comparisons to
retrieve the documents. The time required
for declaring an unsuccessful search
is high[17]. We have adopted the basic
scheme from[7] and modifi ed it to reduce
the time and number of comparisons
required using clustering.
Problem Formulation and Security RequirementsSystem Model
In order to provide cluster based multi-
keyword search on encrypted cloud data,
there are three diff erent entities coherent
to the previous works[6-12]:
Data owner: The data owner is the
entity responsible for the data. The data
owner holds the collection of encrypted
documents along with the indices to be
outsourced on the cloud. The keys used
during encryption of the documents are
under the control of data owner.
Users: They are the end users
interested in searching for some
documents stored on the cloud.
Server: It is assumed that the server
is semi-trusted. The role of the server is
to store the documents along with the
index generated by the data owner and
provide the search capability to the users.
It is desired that the server should not
learn any information from the encrypted
documents and/or indices.
Privacy Requirements
The encrypted documents are stored
on the server along with the cluster and
document index. The server is semi-
trusted and may try to extract information
from the search query and/or retrieved
results. It is desired that the server should
not be able to learn any information.
Even the cluster and document indices
should not reveal any information to the
cloud server. So, in this paper, the privacy
requirements of the proposed search
scheme are as follows:
1. Data Privacy: Only the authorized
user is able to learn the actual data
retrieved from the server.
2. Index Privacy: It is desired that the
cluster index, document index and
the query index generated should
not provide any relevant information
about the clusters, documents and
the search terms respectively to the
cloud server.
3. Trapdoor Privacy: It should not
be possible for the cloud server
to generate a valid trapdoor using
previously generated trapdoors for
some set of keywords.
4. Non-Impersonation: Only authorized
users are able to perform the
desired search. As per the current
authentication system, it should not
be possible to impersonate as an
authorized user.
Design Goals
In this paper, we propose a cluster based
approach for multi-keyword search
on encrypted cloud data. The goals of
the proposed search scheme are (i) to retrieve the desired relevant documents corresponding to the search query in an effi cient manner by reducing the number of comparisons and time required; (ii) to declare a search unsuccessful in an effi cient manner by performing fewer comparisons and in minimum possible time; (iii) to validate security of the proposed search scheme; (iv) to evaluate the performance of the proposed search scheme by conducting experiments on synthetic data.
Stages of the Proposed Search SchemeFigure 1 depicts the architecture of the proposed search scheme[18]. The overall search process can be performed in two stages:• Offl ine Stage: In the offl ine stage,
the data owner is responsible for generation of secure indices. The data owner extracts the keywords from the documents and generates the searchable index for each document. Based upon the similarity of the keywords, clusters are generated. For each cluster, cluster index is also generated. The data owner uploads the cluster index, document index and the encrypted documents on the cloud server. The secrecy of the keys used during the offl ine stage is the responsibility of the data owner.
• Online Stage: During this stage, any authorized user can perform multi-keyword search on the encrypted cloud data. As shown in step-1 of Fig. 1,
the authorized user requests data
CSI Communications | June 2015 | 22 www.csi-india.org
owner to provide the required security
parameters to generate the desired
search query. In step-2, the user sends
the search query generated using
the security parameters received in
step-1 to the cloud server. The cloud
server performs the desired search.
The metadata corresponding to the
retrieved documents are returned to
the user in step-3. During step-4, the
user analyzes the retrieved metadata
corresponding to the relevant
documents and requests the data owner
for the symmetric key corresponding
to the selected document. Using the
symmetric key received from the data
owner, the user can generate the plain
text corresponding to the encrypted
document.
Proposed Cluster-Based Multi-Keyword Search on Encrypted Cloud DataThe proposed search scheme includes
seven steps that can be classifi ed into
three phases, namely, cluster generation,
indexing and retrieval phase. The
indexing phase includes document index
generation, cluster index generation and
document encryption. The retrieval phase
includes query generation, document
searching and decryption.
Cluster Generation Phase
Initially keywords are extracted from each
document. Based on the similarity of the
keywords extracted from each document,
the documents are partitioned into
multiple clusters. So, the overall purpose
of this phase is to generate desired number
of clusters. As an example, consider an
organization willing to outsource their
confi dential data on cloud, the document
can be clustered based on categories such
as fi nance, inventory and personnel.
Indexing Phase
Document Index Generation: The keywords
extracted from each document in the
previous phase are used to generate the
document index. For each keyword wi
appearing within the document, the secret
key for HMAC (hash-based message
authentication code) is generated using
hash function (for example MD-5, SHA-
1 and SHA-2). The hash value calculated
on the given keyword is send to the data
owner. The data owner retrieves the secret
key corresponding to the hash value. The
secret key is shared with the end user
using public key encryption scheme. For
keywords generating same hash value, the
secret key is retrieved only once from the
data owner.
Upon receiving the secret key, HMAC
is calculated on the keyword to generate
the hexadecimal index. This hexadecimal
value is converted to binary equivalent
which is reduced in length. This process
of reduction of the index involves dividing
the binary string into smaller substrings of
equal length. If all the bits in the substring
are zero, then the output value for that
substring is 0. If any one of the bit is 1,
then the output is 1. The reduction step is
shown in Fig. 2.
The fi nal index is obtained by taking
the bitwise product of the indices obtained
for each keyword and is shown in Fig. 3.
The document index generation steps
can be summarized using Algorithm-1.
Here hash() function accepts search term
as input and generates the hash of the
given keyword. HMAC() function is used
to calculate HMAC using keyword and
secret key as input. Reduce() function is
used to convert the hexadecimal output
to binary string of required length. Bitwise
product() function is used to calculate
the bitwise product of all the indices
generated.
Algorithm-1: Document Index GenerationInput:F: the document collection
for each document Fi ∈F do
for each keyword Fi ∈ Fi do
secret_index ← hash (wi)
retrieve the secret_key corresponding
to the secret_index from the data owner
index ← HMAC (wi, secret_key)
Ii ← Reduce (index)
end for
Document Index I ← Bitwise product (Ii)
end for
return Document Index I
ii Cluster Index Generation: During
cluster generation phase, the documents
are partitioned into multiple clusters.
These documents are used to generate
the cluster index. The cluster index is
generated using the bitwise product of
the indices of all the documents appearing
within a cluster.
iii Document Encryption: To provide
confi dentiality, the data owner uses
symmetric key cryptography to encrypt
the documents. Depending on the choice
of the data owner, any symmetric key
encryption algorithm can be used. The
secret keys used during the encryption are
kept confi dential by the data owner. For
enhanced security, diff erent keys are used
for diff erent documents. Symmetric key
cryptography is preferred as it can handle
large size data and is fast.
Retrieval Phase
i Query Generation: The working of
the query generation method as given in
Algorithm-2 is as follows. The authorized
user willing to perform search on the
encrypted cloud data calculates the hash
value for each search term. Based on
the hash value generated, the secret key
corresponding to each search term is
obtained from the data owner. Using the
received secret keys, HMAC is calculated
and the process similar to the cluster
index generation is used to generate the
search query.
Algorithm-2: Query Index GenerationInput: {k1, k2,............ kn }: set of keywords
for each keyword ki do
secret_index ← hash (wi)
if (secret key corresponding to the
secret_index not previously received)
retrieve the secret_key corresponding
Fig. 1: Architecture of the proposed search scheme
Fig. 2: Reducti on of hash output
Fig. 3: Final Index calculati on using bitwise product
Index (Keyword1) = 111……..10
Index (Keyword2) = 101……..11
Final Index (Keyword) = 101……..10
CSI Communications | June 2015 | 23
to the secret_index from the data
owner
end if
index ← HMAC (wi, secret_key)
Ii ← Reduce (index)
end for
Query Index Q ← Bitwise product (Ii)
return Query Index Q
ii Document Searching on Cloud Server:
Upon receiving the query string, the
cloud server will select the appropriate
cluster by comparing the query string
with the cluster index. The comparison
is made by comparing the bit positions
with 0 values in the query index with the
corresponding bit position in the cluster
index. If both these values are zero, then
the matching process will continue else
it is assumed to be a mismatch. As the
search is conjunctive so the cluster with
all the search keywords is only selected.
There is no selection of cluster with
partial match with the search terms. The
cluster with 100% match of zeros with
the query string is only selected. The
cluster selection process is represented
in Algorithm-3.
After a cluster is selected, the
documents within that cluster are only
searched for the desired keywords. Similar
to the cluster selection process, the
value of zero in query string is compared
with the document index. If all the bits
with 0 values match the corresponding
bit position in document index, then
the document is selected as a relevant
response to the search query.
Algorithm-3: Cluster SelectionInput: Query String Q
for each cluster index Ii do
if for all the bits j with Qj= 0, the value
of Ii is also 0return cluster i
end if
end for
iii Document Decryption: The
metadata corresponding to the documents
retrieved as relevant is presented to the
user. The end user analyzes the metadata
and requests the cloud server for a
particular encrypted document. In order
to decrypt the document, the secret key
is required by the user. A request is made
to the data owner to provide the secret
key. Upon receiving the secret key, the
document is decrypted using the same
algorithm as employed by the data owner
during document encryption process.
Query RandomizationThe proposed search scheme permits
the user to search for the documents
containing the desired keywords on the
server but lacks search pattern privacy.
For identical search terms, the search
query generated is also identical. So, the
server can extract valuable information
from this search query about the user’s
search patterns. To avoid this, random
keywords are used[7]. These random
keywords are added to the list of keywords
of each document during the document
index generation step. So, these keywords
are present in each document.
During query generation phase,
some of the keywords are selected
randomly from this set and added to the
search terms. As random keywords are
used during query generation, the search
query generated for identical search
terms is not identical. As all the random
keywords generated are already added to
the list of keywords for each document, so
the retrieved search results are same as
obtained without query randomization.
Security AnalysisThe privacy requirements described in
Section 3(B) must be achieved by the
proposed cluster based approach for
multi-keyword search on encrypted cloud
data. In this section, we will analyze to
what extent our search scheme fulfi lls the
security requirements.
Theorem 1: Cluster-based multi-
keyword search on encrypted cloud
data provides data privacy, i.e., only the
authenticated end user is able to learn the
actual data retrieved from the server.
Proof: After performing the desired
search operation on the server, metadata
about the relevant documents is provided
to the end user. The end user (based on the
presented metadata) makes selection of the
desired documents. In order to extract the
contents from the document, it is desired
that the end user also learns the secret
key used for encryption of the documents.
The secret key is shared with the end user
using public key cryptography. The data
owner encrypts the secret key using the
end user’s public key. As the private key is
kept secret by the end user, so the secret key
can only be retrieved by the end user. If an
adversary learns the encrypted secret key
and the encrypted documents, then also he
is not able to extract the secret key as the
private key is secret. It is even impossible to
apply brute force technique to generate the
genuine private key from the known public
key.
Theorem 2: Cluster-based multi-
keyword search on encrypted cloud data
provides index privacy, i.e., no information is
leaked about the search terms from the query
index.
Proof: An adversary cannot learn
the trapdoor corresponding to the search
keywords as the trapdoor in transit is
encrypted using end user’s public key.
This trapdoor can only be retrieved by
the authorized end user. Also, random
keywords are added to search terms
during search query generation, even if an
adversary is able to learn the trapdoor for
the search keywords and the search query,
then also the adversary needs to generate
all dummy keywords used to generate the
search query. As the random keywords
used during query generation phase are
kept confi dential, it is impossible for the
adversary to learn the search terms even if
brute force technique is applied.
Theorem 3: Cluster-based multi-
keyword search on encrypted cloud data
provides trapdoor privacy, i.e., it should not
be possible for the cloud server to generate
a valid trapdoor using a given trapdoor for a
set of keywords.
Proof: Let K1 and K2 be two keywords
for which the query Q is known to an
adversary. To generate the search query,
random keywords are also inserted. In
order to perform the search, occurrence
of 0’s is matched with the corresponding
occurrences in cluster and document
index. For successful query generation
for any keyword K1 by an adversary, the
location of 0’s must be known else the
search is not possible. The possibility of
successfully selecting these bit positions
is very negligible even with brute force
technique. So, it provides trapdoor privacy.
Theorem 4: Cluster-based multi-
keyword search on encrypted cloud data
provides non-impersonation, i.e., only
authorized users are able to perform the
desired search. No one can impersonate as
an authorized user.
Proof: As the entire search process is
performed using public key cryptography,
only authorized users with secure private
key can retrieve the trapdoor for the
CSI Communications | June 2015 | 24 www.csi-india.org
search terms and generate the search
query. As the probability of generating the
valid private key for known public key is
negligible, so this method provides non-
impersonation.
Performance AnalysisIn this section, the performance of the
proposed cluster based approach for
multi-keyword search on encrypted cloud
data is present through experiments
on synthetic data. For the performance
analysis the data set is assumed to be
from 50 documents to 6000 documents.
It is assumed that there are 5 clusters.
From the implementation point of
view, synthetic dataset is created and
random keywords are assigned random
frequency of occurrence. The entire
simulation is done using JAVA and MySql
on Core 2 Duo Processor with 2GB RAM.
The results obtained for the existing
search scheme[7] and the proposed search
scheme are obtained on this system
and thus used for the comparison. The
performance of the approach can be
improved by using a machine of higher
confi guration and code optimization.
In order to generate the cluster index,
document index and the query index, a
secret key is required from the data owner.
To retrieve the secret key from the data
owner, MD-5 (Message Digest-5) is used
to calculate the secret index. The two least
signifi cant values are used to extract the
secret key form the database. To generate
the hexadecimal output for the keywords,
SHA-2 (Secure Hash Algorithm) family
based HMAC functions are used. The
outcome of SHA-224, SHA-256, SHA-
384 and SHA-512 is calculated for the
keywords and concatenated to generate
binary string of length 2688 bits. This
binary string so generated is reduced to
448 bits length coherent to the previous
work[7]. The reduction factor for the binary
output is 6, i.e., the binary string is reduced
to one-sixth of the original value. The
selection of the reduction factor aff ects
the memory required to store each index,
bandwidth required for the transmission of
the query string on the network, number of
comparisons required to select the cluster
or document within a cluster and the
retrieved documents. If the reduction factor
is small, then the storage space required
to store the cluster and document indices
will be more and bandwidth required
transferring the query string is also high. If
the reduction factor is too large, then the
number of false positives will be high. So an
optimal value of reduction factor is desired.
The reduction factor is assumed to be 6
coherent to the existing search scheme[7].
So, the fi nal output generated is of length
448 bits.
The computation cost of the proposed
search scheme is presented in Section VII
(A). Initially the document collection is
assumed to be equally divided within the
clusters, so the performance analysis of the
proposed search scheme under uniform
document distribution is done in Section
VII (B). As the document collection can be
non-uniform, so the performance analysis
of the proposed search scheme under
non-uniform distribution of documents
is presented in Section VII (C). The
comparison of the proposed search scheme
and the existing search scheme in declaring
a search as unsuccessful is presented in
Section VII (D). Section VII (E) compares
the proposed search scheme using hard
clustering, proposed search scheme using
soft clustering and the existing search
scheme[7].
Computation Cost of the Proposed
Search Scheme
The computation cost of the proposed
search scheme involves an additional step
of cluster index generation as compared to
the existing effi cient search scheme[7]. The
cluster index generation is performed only
once during the initial stages and only for
fewer clusters on powerful machines during
offl ine stage. So, the overall additional cost
incurred by this step is small.
Performance analysis assuming the
documents are equally divided between
the clusters.
For the purpose of initial performance
analysis, it is assumed that the documents
are equally divided among the clusters.
So, each cluster includes equal number of
documents.
Number of comparisons required to perform
a search
The proposed search scheme performs
the desired search by comparing the
query string generated with the cluster
index. After selecting the desired cluster,
the query index is compared with the
documents within the selected cluster.
As compared to the existing search
scheme[7] which compare the query string
with the indices of all the documents,
the proposed search scheme reduces the
number of comparisons required. Figure 4
demonstrate the number of comparisons
required by the existing search scheme
and the proposed search scheme. It can
be easily inferred that the number of
comparisons required are reduced by
80% for 6000 documents.
Theorem 5: Cluster-based multi-
keyword search on encrypted cloud data
reduces the number of comparisons required
to perform a search by an order of k, where k
is the number of clusters.
Proof: The documents are divided
into k clusters and the cluster index
is generated using bitwise product of
indices of all the documents within the
cluster. The search is reduced to initially
comparing the search query with cluster
index. Once the appropriate cluster
is selected, the documents within the
cluster are searched to retrieve the
relevant documents. Let n be the number
of documents and k be the number of
clusters generated. If distribution of
documents within the clusters is uniform,
then the numbers of comparisons required
by the proposed search scheme are K + n—k
,
which is signifi cantly less as compared to
n for large document collection.
Average time required to perform a search
As the query string is compared with the
cluster index and then with the document
index within the same cluster. So, the
proposed search scheme requires fewer
comparisons as compared to the existing
approach. The search time required
depends on the number of zeros in the
search query generated. As the number of
keywords increases, the numbers of zeros
in the search query also increase. As soon
as a mismatch is encountered between
the search index and document or cluster
index, further string matching with that
index terminates. Figure 5 (a-e) represent
the average search time required to search
Fig. 4: No. of Comparisons required
CSI Communications | June 2015 | 25
the desired document by varying the
number of keywords in the search query.
Fig. 6 (c): Average search ti me for three keyword search
Fig. 5 (d): Average search ti me for four keyword search
Fig. 5 (e): Average search ti me for fi ve keyword search
In Fig. 6, the average time required
to search any keyword is shown. It can
be inferred that the time required by the
proposed search scheme to search a
document is 70% less as compared to the
existing search scheme[7] for dataset with
6000 documents.
Fig. 6: Average search ti me
Performance Analysis assuming the
documents are unequally divided
between the clusters.
As the document collection is dynamic,
the documents in the cluster may be
unequally distributed. The performance
of the proposed search scheme as
opposed to the existing search scheme[7]
is evaluated in terms of the average search
time required to search for the desired
document.
Average time required to perform a search
Figure 7 (a-e) depicts the search time
required to search keywords on the cloud
assuming the documents are unequally
distributed among multiple clusters. As
unequal distribution of the documents
within clusters exists so the search
time required varies depending on the
number of documents within the cluster.
The comparison of the proposed search
scheme with the existing search scheme
reveals that the time required to search
relevant documents using the cluster
based approach requires less time as
compared to the existing approach[7].
Fig. 7 (a): Average search ti me for single keyword search
Fig. 7 (b): Average search ti me for two keyword search
Fig. 7 (c): Average search ti me for three keyword search
Fig. 7 (d): Average search ti me for four keyword search
Fig. 7 (e): Average search ti me for fi ve keyword search
Fig. 5 (a): Average search ti me for single keyword search
Fig. 5 (b): Average search ti me for two keyword search
CSI Communications | June 2015 | 26 www.csi-india.org
Figure 8 depicts the average time
required to search relevant documents
on the cloud assuming non-uniform
distribution of documents.
Fig. 8: Average search ti me
Performance analysis for unsuccessful
search
Number of comparisons to declare
unsuccessful search
As the search is conjunctive, a search
is declared unsuccessful if any one of
the search term is not present in the
entire document collection. As the
cluster index is generated using all the
keywords present within the documents
of the cluster. So, absence of a keyword
can be easily discovered by comparing
the query index with the cluster index
only. As inferred from Fig. 9 only fewer
comparisons are required to check if the
search is unsuccessful. In the proposed
search scheme only 5 (assuming 5
clusters) comparisons are required
to declare the search as unsuccessful
whereas in the existing search scheme
it require comparisons equal to the
number of documents.
Fig. 9: No. of Comparison required for declaring unsuccessful search
Theorem 6: Cluster-based multi-
keyword search on encrypted cloud data
provides reduces the number of comparisons
required to declare unsuccessful search.
Proof: The documents are divided
into k clusters, where the cluster index
is generated using the bitwise product of
indices of all the documents within the
cluster. So, a search is initially restricted
to comparing the search query with the
cluster index. In case a match is found
with the cluster index, then the documents
within that cluster are searched. If there
is no possible match between the query
string and the cluster index, then the
search is declared as unsuccessful. Let
n be the number of documents and k be
the number of clusters generated. In order
to declare a search as unsuccessful it
requires comparison between the search
index and the cluster index. Hence, it
requires only k comparisons where k << n.
Average time required for declaring a
search as unsuccessful
In order to declare a search as unsuccessful
only the query string is compared
with the cluster index. As only fewer
clusters are generated so the number
of comparisons required for declaring a
search as unsuccessful are signifi cantly
reduced. Due to few comparisons
required, the time required to declare a
search as unsuccessful is also reduced.
Figure 10 depicts the time required to
declare a search as unsuccessful search.
In the proposed search scheme the time
required is 0.07ms for the entire document
collection which is the time required to
compare the search query with the cluster
indices. In the existing search scheme[7]
the comparisons required are proportional
to the number of documents. Hence, the
time required is high.
Fig. 10: Average ti me required for declaring a search as unsuccessful
Performance analysis assuming soft
clustering
The documents are initially clustered
into multiple clusters depending on the
similarity of the keywords. It is possible
that a document may belong to a single
cluster or multiple clusters. In hard
clustering, a document appears in a
single cluster whereas in soft clustering, a
document may appear in multiple clusters.
If a document appears in multiple clusters,
then the search is performed by selecting
each cluster and then performing the
search within the selected cluster. As
multiple clusters are selected so that the
time required is more as compared to
hard clustering approach but signifi cantly
lower as compared to the existing search
scheme and is shown in Fig. 11.
Fig. 11: Average search ti me required using diff erent clustering methods
Conclusions In this paper, we have proposed a cluster-
based approach for multi-keyword search
on encrypted cloud data. The proposed
scheme permits the user to effi ciently
perform search over the encrypted cloud
data. To do so, the data owner generates
the cluster index and document index.
The documents are encrypted and
outsourced to the cloud. After performing
experiments using synthetic data the
performance of the proposed scheme
is analyzed as follows: (i) the proposed
search scheme reduces the time and
number of comparisons required
to retrieve the desired documents
considering both equal and unequal
distribution of documents within the
clusters; (ii) the proposed search scheme
addresses the issue of unsuccessful
search with signifi cant reduction in terms
of time and comparisons required; (iii)
the proposed search scheme requires less
time as compared to the existing effi cient
search scheme even if documents appear
in multiple clusters; (iv) through security
analysis, we show that our proposed
search scheme is secure and preserves
privacy.
CSI Communications | June 2015 | 27
Data Science- Venn diagram
The primary colors of data: Hacking Skills, Math and Stats Knowledge, and Substantive
Expertise
[Taken from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
with permission from the owner Dr. Drew Conway. Dr. Drew Conway, Head of Data at
Project Florida, is a leading expert in the application of computational methods to social
and behavioral problems at large-scale.]
Following the line of research, we
suggest, as a future work, (i) testing the
performance of the proposed scheme
on real data set; (ii) to fi nd a disjunctive
keyword search scheme.
References[1] “Cloud Computing,” http://
en.wikipedia.org/wiki/ Cloud_
computing.
[2] “The NIST Defi nition of Cloud
Computing,” http://csrc.nist.gov/
publ icat ions/nistpubs/800-145/
SP800-145.pdf.
[3] “Top-10 cloud service providers,”
h t t p : //s e a r c h c l o u d c o m p u t i n g .
t e c h t a r g e t . c o m /
p h o t o s t o r y/ 2 24 0 1 4 9 0 3 8 / To p -
1 0 - c l o u d - p r ov i d e r s - o f-2 01 2 / 1 /
Introduction.
[4] Morgan et al., “Factors aff ecting the
adoption of cloud computing: an
exploratory study,” Available at http://
www.staff.science.uu.nl/~vlaan107/
ecis/fi les/ECIS2013-0710-paper.pdf.
[5] D Song et al., “Practical techniques
for searches on encrypted data,” in
Proc. of IEEE Symp. on Security and
Privacy’00, Berkeley, CA, pp. 44–55,
2000.
[6] Ning Cao et al., “Privacy-preserving
multi-keyword ranked search over
encrypted cloud data,” in IEEE
Transactions on Parallel and Distributed
Systems, pp. 222-233, 2014.
[7] Cengiz Orencik and Erkay Savas, “An
effi cient privacy-preserving multi-
keyword search over encrypted
cloud data with ranking,” in Springer
Distributed and Parallel Databases, pp.
119–160, 2014.
[8] Ning Cao et al., “Secure ranked
keyword search over encrypted
cloud data,” in IEEE Proc of Int. Conf.
of Distributed Computing Systems
(ICDCS), Genoa, Italy, pp. 253–262,
2010.
[9] Y-C Chang and M Mitzenmacher,
“Privacy preserving keyword
searches on remote encrypted data,”
in Proc. of Third Springer Int. Conf. of
Applied Cryptography and Network
Security (ACNS), Springer, New York,
USA, pp. 442-455, 2005.
[10] Mehmet Ucal, “Searching on
encrypted data,” http://www.
researchgate .net /publ icat ion/
228757457_ Searching_on_
Encrypted_Data.
[11] Ayad Ibrahim et al., “Secure rank-
ordered search of multi-keyword
trapdoor over encrypted cloud data,”
in IEEE Asia-Pacifi c Services Computing
Conference (APSCC), Guilin, pp. 263-
270, 2012.
[12] Ning Cao et al., “Enabling effi cient
fuzzy keyword search over encrypted
data in cloud computing,” in IEEE
Transactions on Parallel and Distributed
Systems, pp. 1467-1479, 2010.
[13] Boneh D et al., “Public key encryption
with keyword search,” in Proc.
of Eurocrypt’04, volume 3027 of
Springer LNCS, pp. 506–522, 2004.
[14] E Goh, “Secure indexes,” in Cryptology
ePrint Archive, Report, 2003/216,
http://eprint./iacr.org/.
[15] Anu Khurana, C Rama Krishna and
Dr. Navdeep Kaur, “Searching over
Encrypted cloud data,” in Int. Conf. on
Communications & Electronics, 2013.
[16] Neelam S Khan, C Rama Krishna
and Anu Khurana, “Secure Ranked
Fuzzy Multi-Keyword Search over
Outsourced Encrypted Cloud Data,”
in 5th Int. Conf. on Computer
and Communication Technology
(ICCCT—2014), Allahabad, India.
[17] Rohit Handa and Rama Krishana
Challa, “A survey on searching
techniques over outsourced
encrypted cloud data” in 8th Int.
Conf. on Advanced Computing and
Communication Technologies, Panipat,
India, pp. 128-137, 2014.
[18] Rohit Handa and Rama Krishana
Challa, “A cluster-based multi-
keyword search on outsourced
encrypted cloud data” in 2nd Int. Conf.
on Computing for Sustainable Global
Development, India, pp. 3.87-3.92,
2015.
n
Rohit Handa is an Assistant Professor at Baddi University of Emerging Sciences & Technology, Baddi, H.P. (India). He
received his B.Tech. degree in CSE from M.M.E.C. Mullana (Kurukshetra University) and M.E. degree in CSE from
National Institute of Technical Teachers Training & Research, Chandigarh (Panjab University). His area of interest
includes Cryptography, Cloud Computing and Programming.
Rama Krishna Challa is a Professor in CSE Department at National Institute of Technical Teachers Training & Research,
Chandigarh (India). He received his B.Tech. from JNTU Govt. College of Engg., Anantapur and M.Tech. from CUSAT,
Cochin. He received his Ph.D. from IIT Kharagpur. His research area includes Wireless Communications & Networks,
Computer Networks, Distributed Computing, Cryptography & Cyber Security.Abo
ut t
he A
utho
rs
CSI Communications | June 2015 | 28 www.csi-india.org
IntroductionMobile ad hoc network is structured
by the nodes in the absence of a rigid
infrastructure where all nodes move
randomly and all the nodes process
themselves. In MANET, every node acts
not only as a node but also as a router. In an
infrastructure mobile network, nodes have
base stations within their transmission
range[2]. In contrary to this, mobile ad
hoc networks are indigenous and devoid
of infrastructure support. Low cost
and powerful wireless transceivers are
popularly used in mobile applications due
to the progress of wireless communication
technology. Due to the absence of fi xed
infrastructure, network topology in
MANET changes when nodes move in
or out of the network[1]. As a result, the
routing protocols need to adaptively
adjust routing based on available nodes.
The resources owned and controlled by
a node are said to be local to it, while
the resources owned and controlled by
other nodes and those that can only be
accessed through the network are said
to be remote. The external attackers
can inject false routing information
or advertise incorrect routing table
information to break down the network[4].
The compromised node attacker are able
to generate valid signatures using their
private keys[5], they are diffi cult to detect
and can create several damages in the
networks. Because of their own private
keys, the intrusion preventive measures
such as authentication and encryption
cannot reduce the eff ect of compromised
node attacker. Moreover, the wireless
channel is accessible to malicious
attacker and legitimate user; hence it is
more vulnerable to all kinds of network
attacks. One conspicuous characteristic of
MANETs, from the security point of view, is
the lack of protection. MANET is a capable
technology but it has certain features that
are considered inclined, which leads to
security weakness in the technology such
as, weakened centralized management,
less resource availability, scalability,
dynamic topology, limited power supply
etc. In MANET, all networking operations
such as routing the message and
forwarding the packets are performed by
nodes themselves in a centralized manner.
For these reasons, providing security to
mobile ad hoc network is very diffi cult
task[10]. To prevent and detect attacks like
black hole, wormhole, rushing attack etc
and to secure the communication among
the nodes of wireless ad hoc network,
many intrusion detection techniques have
been introduced. They can be classifi ed
into mainly three categories: signature
based, anomaly based and specifi cation
based intrusion detection[3].
The practical side of attacks can
be broadly grouped as application
level attacks and network level attacks.
Application level attacker tries to steal
into, bring about change or resist access
to information of a particular application,
whereas Network level attacker attempts
to restrict the capabilities of network,
reduces speed or completely stops the
computer network. Network level attacker
automatically leads to application level
attacks[6].
The rest of the paper is structured as:
Section 2 gives an issue for maintaining
stable neighborhood topology and route
maintenance in AODV. Proposed system
for Malicious Node Detection is discussed
in Section 3. Section 4 presents and
discusses the results obtained. Finally,
Section 5 draws conclusions and outlines
for future work.
Issue for Maintaining Stable Topology and Route Maintenance in AODVThe issue with topology management
is to control the movement of an
individual node so as to maintain a stable
neighborhood topology[8]. Consider
the nodes n0, n1, n2...nN. Let Rmax is the
maximum range of the nodes and D(0,1) is
the comparative distance between nodes
n0 and n1. Two nodes are called neighbor
nodes if they can communicate with each
other without the help of any routing and
network topology will be maintained if
D(0,1) ≤ Rmax for all nodes. Consider two
generic nodes n0 and n1, let Xn0(t) and
Yn1(t) be at their positions at time‘t’. We
assume the distance between two nodes
at time t as:
D{n0,n1} = (1)
A communication link between
n0 and n1 at time‘t’ exists if D{n0,n1}
(t)<R, where R is common radio range
of all nodes in the network consist of
homogeneous nodes and D{n0,n1} is
the distance between two nodes. While
transmitting the packets, a feasible hop to
hop path is searched out. This satisfi es the
bandwidth constraints. Energy plays a vital
role in maintaining stable neighborhood
topology, a mechanism is required to
calculate the energy values at diff erent
times[11]. Node’s energy consumption after
time‘t’ is calculated using equation[15],
Ec(t) =Pt*α + Pr*β (2)
Where,
A Collaborative Approach for Malicious Node Detection in Ad hoc Wireless Networks
ResearchFront
Shrikant V Sonekar* and Manali Kshirsagar***Research Scholar, Department of CSE, G.H.Raisoni College of Engineering, Nagpur, M.S., India**Research Guide, Department of CSE G.H.Raisoni College of Engineering, Nagpur, M.S., India
Abstract—Security is at stake when communication takes place between mobile nodes in a hostile environment. Contrary to the wired networks,
the exclusive uniqueness of mobile ad hoc networks create a number of major challenges to security design, like mutual wireless medium, open
peer-to-peer network architecture, stern resource constraints and highly dynamic topology. These unfavorable conditions obviously require a
case for creating multidimensional security remedies that obtain not only wide range protection but also acceptable network performance.
Popularly used existing routing protocols designed to incorporate the needs of such indigenous networks do not address possible threats aiming
at the disruption of the protocol itself. The major challenge in ad hoc wireless networks is energy ineffi ciency; under certain circumstances, it is
almost impossible to replace or recharge the batteries. Hence it is desirable to keep dissipation of energy at lower point. Some of the problems
are limited energy reserve and lack of centralized coordination. In this paper, we identify the security issues, discuss the challenges and propose
the collaborative approach for malicious node detection.
CSI Communications | June 2015 | 29
Ec(t) :- It is energy consumed by the
node after time t
Pt :- It is the maximum number of
packets transmitted after time
t by the node
Pr :- It is the maximum number of
packets received after time t
by the node
α and β are constant lies in between 0 to 1
If initial level of node energy is E, the
remaining energy ERem of a node at time t
can be calculated as,
ERem = E- E c(t) (3)
Whenever a node identifi es any link
break due to HELLO messages or link
layer acknowledgements, it broadcasts a
Route Reply (RREP) packet (same as DSR
protocol) to notify the source node and
the end nodes[9].
In Fig. 1, if the link available between
nodes N and O breaks on the path L-N-
O-R, then Route Reply (RREP) packet will
be sent by both the nodes O and N to
notify the starting and the destined nodes.
The main advantage of AODV is, it avoids
source routing and reduces the routing in
a large network. AODV is also benefi cial
in expanding-ring-search to restrict the
excessive entry of RREQ packets and it
also searches for routes for uncertain
destinations[7]. In addition to this, AODV
also provides destination sequence
numbers (DSeq), which allow the nodes
to acquire more up-to-date routes. Firstly,
it requires acknowledgements from both
directional links and periodic link layer to
detect broken links[14]. In addition to this,
AODV has to keep routing tables for route
maintenance contrary to DSR[12].
Proposed System and AlgorithmMobility is a crucial characteristic of a
cluster in MANET especially at the time
of cluster formation and cluster head
election. The cluster head is responsible
for controlling and managing the
network[13]. Each cluster head is
identified by its own id. The election
of a cluster head is very important for
constructing any network. Different
algorithms have been used for electing
the Cluster Head. We have used a simple
concept for the Cluster Head election. In
Fig. 2, the distance between Node A and
Node B from both coordinators (x & y)
is calculated. Initially keep Node A as i
and Node B as j, then compare xref of i
with xref of j and yref of i with yref of j,
set the value of minxdist and minydist.
i.e. threshold range Rth. Apply the same
process for all the nodes, compare xref
and yref of node i with nodes (k,l,...n).
Steps for Cluster Head Election Algorithm
Step 1: Begin.
Step 2: For every member in the
cluster
Step 3: Calculate the distance
using x and y variables with
other clusters.
Step 4: If ID is the lowest and is
closer to the maximum
number of nodes.
Step 5: Repeat step number 3 and 4.
Step 6: Elect cluster head with
minimum ID and maximum
connectivity.
Step 7: Stop.
CH=
(4)
Equation 4 shows the two major
parameters for electing a cluster head i.e.
the lowest ID and highest connectivity
which represents the variable ‘a’ and ‘b’.
Variable ‘a’ represents the single node
and ‘b’ represents the remaining nodes in
the cluster, from above equation we can
fi nd out that the node with the lowest ID
will be elected as the cluster head. The
proposed algorithm is a combination
of the highest connectivity and lowest
ID. The static window snapshot shows
the steps of cluster head election, the
distance of neighbor node is calculated by
every node, snapshot shows the value of
Cx=3423.99 and Cy=427.387.
The table I shows the number of
nodes in each cluster and based on the
distance parameter we get the cluster
heads. The cluster head is also considered
as a node.
Table I. Cluster, Nodes And Cluster Head
• Requesting all members of the
cluster:-
Before sending the packets to all
the nodes in the cluster, the cluster head
(CH) sends a REQUEST (tsp,p) message
to all the nodes in its set of request Rp
(Radio Range) and places the request on
request-queuep ((tsp,p) is the timestamp
of the request). When a cluster node
(CN) receives the REQUEST (tsp,p)
message from the cluster head, it returns
a timestamped REPLY message to cluster
head and places the CH request on the
request_queuep.
• Releasing the Position of the Head:-
Cluster head, upon exiting due to
low energy level, deletes its request from
the top of its request queue and sends a
timestamped RELEASE message to the
Fig. 1: Route maintenance in AODV
Fig. 2: Distance comparison between two nodes
Cluster Nodes Cluster Head
Cluster 1 4 2
Cluster 1 8 4
Cluster 1 12 6
Cluster 2 4 1
Cluster 2 8 7
Cluster 2 12 9
Cluster 3 4 2
Cluster 3 8 6
Cluster 3 12 8
Cluster 4 4 3
Cluster 4 8 5
Cluster 4 12 4
Fig. 3: Simulati on result of distance comparison between two nodes
CSI Communications | June 2015 | 30 www.csi-india.org
entire cluster node in its request set Rp.
When a cluster node receives a RELEASE
message from the cluster head, it removes
the request of the cluster head from its
request queue. This helps in detecting
the malicious node. The performance of
the algorithm depends on the number of
messages required. Proposed algorithm
requires 2(N-1) messages and the
synchronization delay is ‘T’[9].
Simulation, Results and DiscussionThe simulation parameters that are shown
in Table II consider both the accurateness
and the eff ectiveness of the simulation.
The experiment is carried out using
simulator (OMNet++).
Table II. Simulation Parameters
Figure 4 shows the communication
of cluster head with all other cluster heads
and, in fi gure 5 cluster head communicates
the MAC address of malicious node to
all other cluster heads. Based on the
parameters like energy dissipation, end
to end delivery ratio, packet delivery ratio,
throughput, wrong replay etc we have
declared node as a malicious node.
Where,
TP= threshold parameters;
tp1=msg_id (message_id); tp2=tm_st
(time stamp); tp3=pack_del (packet
delivery ratio); tp4=data_sent (forwarded
packets); tp5=mobility; tp6=ack_msg
(acknowledgement message); tp7=w_
replay (wrong_replay); tp8=end-to-end
delay; tp9=number of packets drop;
tp10=repetition of packets.
Table III shows the parameter and
simulation values. Based on the values
we declared node ‘n9’ as malicious node.
Once the cluster head knows it, then
sends MAC address of node ‘n9’ to all
other cluster heads which are in radio
range. Simulation is carried out on 14
nodes, for each node we have taken the
metrics like timestamp, packet delivery
ratio, throughput, acknowledgment,
wrong replay etc.
Packet deliver ratio and packet
dropped routing ratio metrics are chosen
to evaluate the impact of the sequence
number attack, resource consumption
attack and dropping routing packet
attacks. Malicious Node detection
accuracy for diff erent number of nodes in
the area of 837*837 M is carried out using
the simulation shown in Fig. 6.
Assumption:-
• We have assumed the maximum
threshold value as 4
• We have considered 10 threshold
values
The proposed system declares the
node as a malicious,
if
(5)
Preliminary results show that
proposed algorithm detects the
malicious node with more accuracy and
eff ectiveness. We have measured packet
delivery ratio i.e. the fraction of the total
packets generated that are successfully
delivered. This metrics refl ects the network
throughput. The general observation
shows that proposed algorithm reduces
the attack by around 60%. When AODV
is attacked, the potential of the network
Parameters Values
Number of Nodes 25
Network Size 1000*900
Speed of Nodes 0-10 m/sec
Transmission Range 100 m
Battery Power of Node 100 Unit
Pause Time 0-20 Sec
Data Payload 512 byte
Host pause time 5 seconds
Traffi c type CBR (UDP)
Movement Model Random
waypoint
Fig. 4: Communicati on of cluster head with all other cluster heads
Fig. 5: Cluster Head communicates MAC address of malicious node
NodesThreshold Parameters (Each TP has 10 points)
Total
Avg.
Points=
Total/10tp1 tp2 tp3 tp4 tp5 tp6 tp7 tp8 tp9 tp10
n1 4 2.4 3.3 5.5 5.1 4 2 4.1 2 2.2 34.6 3.46
n2 3 2.1 4.2 4.3 5.1 2.8 4.6 2.9 4 7 40 4
n3 4 4.2 5 4.3 5 1.2 1.5 1.6 1.8 5.1 33.7 3.37
n4 5.2 0.8 2.3 2.9 7.1 4.8 1.5 4.9 5.8 0.9 36.2 3.62
n5 5 2 3.5 6.1 0.8 0.9 1 1.5 1.2 1.6 23.6 2.36
n6 8 2.9 5 4.2 4.1 4.3 3.2 3.1 1.3 1.2 37.3 3.73
n7 1.3 2.5 0.9 0.8 0.5 1.9 5 4.6 1.7 4.1 23.3 2.33
n8 8 9 1.6 1.8 1.9 2.1 2 1 3 4 34.4 3.44
n9 4.1 4.5 4.1 5 7 3 3.6 4.1 2.1 2.6 40.1 4.1
n10 1 5.2 5.5 2.3 2.6 3.6 2.8 5.2 1.9 5.2 35.3 3.53
n11 1.6 1.5 2.5 2.6 2.1 2.1 2.3 2.5 4.1 4.4 25.7 2.57
n12 1.8 2 2 2.3 3.5 1.9 4.7 7 5.7 4 34.9 3.49
n13 1.8 1.9 2.5 6 6 6.3 5.6 6.5 1.8 1.6 40 4
n14 1.4 1.5 1.6 1.6 1.8 5.2 5.3 3.9 6.3 4.6 33.2 3.32
Table III. Comparative Chart of Threshold Parameters and Simulation Values
10>4
CSI Communications | June 2015 | 31
decreases signifi cantly. The dropping
of the packets disrupts the network
connectivity. The delivery of the packets
is reduced when AODV is under the
resource consumption attack. In Fig. 8,
we could observe that the proposed
algorithm delivery the maximum number
of packets.
Table IV shows the comparison
of proposed work with existing
algorithms, it has been observed that
proposed work support all the required
parameters.
Conclusion and Future ScopeMANET is a potential research area with
applied utility. Securing it is a challenging
task. There are so many issues that need
to be solved. All intrusion detection
systems face the problem of false
alarm that occurs whenever the system
inappropriately results into an alarm but
there is no harmful behavior occurs in
the network. The challenge here is to
utilize the available power in an efficient
manner and not to provide each node
with higher battery power. There is a
possibility that some key nodes will
overuse the network and will have their
energy consumed fast. Loose clustering
could be one solution for preserving
the energy at cluster head level. The
comparative chart proposed in the paper
gives the efficient way for detecting the
malicious node, the distance based
parameter is used to select the cluster
head. Two assumptions we have made
in our paper for detecting the malicious
node in the cluster. Research in MANET
security is still open. Further work is
needed to enhance the performance of
the secure routing protocols. Moreover,
there should be some mechanism which
will restrict a malicious node to move in
the other part of the network.
References[1] Perkins CE, “Ad hoc Networking”,
Addison-Wesley, New York, 198-264,
2001.
[2] Rajeswary Malladi and Dharma
P Agrawal, University of Cincinnatti,
OH, “Current and Future applications of
Mobile and Wireless Network”, Vol. 45,
Issue 10, PP. 144-146, Communications
of ACM,2002.
[3] Xia Wang, Iowa State
University, “Intrusion
Detection Techniques
in Wireless Ad hoc
Networks”, proceedings
of the 30th COMPSAC
’06 IEEE Computer
Society.
[4] Neal Krawetz,
“Introduction to
Network Security”,
Thomson Learning, pp
5-13, 2011.
[5] Chunfu Jia and Deqiang Chen,
“Performance Evaluation of a
Collaborative Intrusion Detection
System”, IEEE Computer Society, 5th
International Conference on Natural
Computation, 2009.
[6] Vivek Richariya, Pravin Kaushik, “A
Survey on Network Attack in Mobile
Adhoc Network”, International Journal
of Advance Research in Computer
Science and Software Engineering,
Volume 4 Issue 5, May 2014.
[7] Aditi Kumar, Praveen Thakur, “Routing
Attack and their Counter Strategies
in MANET”, International Journal of
Advance Research in Computer Science
and Software Engineering, Volume 4
Issue 5, May 2014.
[8] Sourav Sen Gupta, S S Ray, O Mistry
and M K Naskar, Jadavpur University,
Kolkata, “A Stochastic Approach for
Topology Management of Mobile Ad
hoc Networks”, Asian International
Mobile Computing Conference, pp 90-
99, 2007.
[9] Carvalho, O S F and G Roucairol,
“On Mutual Exclusion in Computer
Networks, Technical Correspondence”,
Communications of ACM, Feb.1983.
[10] Priyanka,Mukesh Dalal,”Security
in MANET: Eff ective value based
Malicious node detection and removal
scheme”, International Journal of
Advance Research in Computer Science
and Software Engineering,Volume 4
Issue 5, May 2014.
[11] M A Rizvi, “Issues and challenges
in Energy Aware Algorithms using
clusters in MANET”, International
Journal of computing communication
and networking, Volume 2 April –
June 13.
[12] Amitabh Mishra, Ketan Nadkarni
and Animesh Patcha, Virginia Tech,
“Intrusion Detection in Wireless
Ad hoc Networks”, IEEE Wireless
Communications,2004.
[13] M.Abolhasan, T.Wysochiki and E
Dutkiewixz, “A review of routing
protocols for mobile ad hoc networks”,
Elsevier Journal of Ad hoc Networks,
1-22, 2004.
[14] C Liu and J Kaiser, “A survey of mobile
ad hoc network routing protocols”,
University of Ulm Technical Report
Series, No. 2003-08, University of Ulm,
Germany, 2005.
[15] A Ephremides, “Energy concerns in
wireless networks”, IEEE Wireless
Communications, 9(4):48-59, 2002. n
Fig. 6: Malicious Node Detecti on accuracy for diff erent nodes
Fig.7: Maximum speed of node movement vs delivery rati o (%)
Fig. 8: Simulati on result for sequence number att ack
Parameters K Hop
Connectivity
Lowest ID
(LID)
Weighted
Cluster
Algorithm
Election
of CH
(ECH)
Existing Algorithm Proposed
Algorithm
Broadcast Yes No No Yes
Throughput No Yes Yes Yes
Location Yes No Yes Yes
Energy Yes No Yes Yes
Table IV. Comparison Between Existing and Proposed
Algorithm
CSI Communications | June 2015 | 32 www.csi-india.org
Overfi tting leads to public losing trust in
research fi ndings, many of which turn out to
be false. We examine some famous examples,
“the decline eff ect”, Miss America age, and
suggest approaches for avoiding overfi tting.
Many people were surprised by
a recent study which overturned the
conventional wisdom and said there was
no link between eating saturated fat and
heart disease. It seems that every week
there are some new results, especially
in medicine and social sciences, which
invalidate the old results.
The phenomenon of old results no
longer holding has been so widespread
that some journalists started to call it
the “cosmic habituation” or “the decline
eff ect” - the bizarre theory that the laws of
the universe seem to change when you try
to repeat an experiment.
The explanation is much simpler.
Researchers too Frequently Commit the Cardinal Sin of Data Mining - Overfi tting the DataThe researchers test too many hypotheses
without proper statistical control, until
they happen to fi nd something interesting
and report it. Not surprisingly, next time
the eff ect, which was (at least partly) due
to chance, will be much smaller or absent.
We note that Overfi tting is not the
same as another major data science
mistake - “confusing correlation and
causation”. The diff erence is that
overfi tting fi nds something where there
is nothing. In case of “correlation and
causation”, researchers can fi nd a genuine
novel correlation and only discover a cause
much later (see a great example from
astronomy in Kirk D. Borne interview on
Big Data in Astrophysics and Correlation
vs. Causality). [http://www.kdnuggets.
com/2014/05/interview-kirk-borne-big-
data-astrophysics-correlation-causality.
html]
Every day we learn about new
research through various sources and very
often, we use these research fi ndings to
improve our understanding of the world
and make better decisions. How would
you feel if you were told that most of
the published (and heavily marketed)
research is biased, improperly planned,
hastily executed, insuffi ciently tested and
incompletely reported? That the results
were interesting by design and not by
nature?
The inherent fl aws of prevalent
research practices were very nicely
identifi ed and reported by John P. A.
Ioannidis in his famous paper Why
Most Published Research Findings
are False (PLoS Medicine, 2005)
[http://journals.plos.org/plosmedicine/
a r t i c l e ? i d = 1 0 . 1 3 7 1 / j o u r n a l .
pmed.0020124]. Deeply examining some
of the most highly regarded research
fi ndings in medicine, Ioannidis concluded
that very often either the results were
exaggerated or the fi ndings could not
be replicated. In his paper, he presented
statistical evidence that indeed most
claimed research fi ndings are false. Dr.
Ioannidis now heads a new METRICS
center at Stanford, where he continues
to work on making sure that research is
reproducible.
So, “bad” research is not new, but the
amount of it increased with time. One of
the very basic tests of how “scientifi c” a
research is would be to observe its results
when the same research is performed in
multiple diff erent environments (that are
applicable) randomly chosen. Ioannidis
noted that in order for a research fi nding
to be reliable, it should have:
• Large sample size and with large
eff ects
• Greater number of and lesser
selection of tested relationship
• Greater fl exibility in designs,
defi nitions, outcomes, and
analytical modes
• Minimal bias due to fi nancial
and other factors (including
popularity of that scientifi c fi eld)
Unfortunately, too often these rules
were violated, producing irreproducible
results.
To illustrate this, here some of the
more entertaining “discoveries” that
were reported using “overfi tting the data”
approach:
S&P 500 index is strongly related to Production of butter in Bangladesh
[http://nerdsonwallstreet.typepad.com/
my_weblog/files/dataminejune_2000.
pdf]
Age of Miss America is strongly
related to Murders by steam, hot vapours and hot objects
[http://www.tylervigen.com/view_
correlation.php?id=2948]
… and many more interesting (and
totally spurious) fi ndings which you
can discover yourself using tools such
as Google correlate or the one by Tyler
Vigen.
The Cardinal Sin of Data Mining and Data Science: Overfi tting
Article Gregory Piatetsky-Shapiro* and Anmol Rajpurohit***President of KDnuggets**Graduate student (MS, Computer Science), UC, Irvine
CSI Communications | June 2015 | 33
Human tendency for “magic thinking”
tends to give such unusual fi ndings much
higher notoriety (Octopus Paul was world-
famous for “predicting” World Cup results
in 2010) and this does not increase the
general public trust in science.
Several methods can be used to avoid
“overfi tting” the data
• Try to fi nd the simplest possible
hypothesis
• Regularization (adding a penalty
for complexity)
• Randomization Testing
(randomize the class variable,
try your method on this data - if
it fi nd the same strong results,
something is wrong)
• Nested cross-validation (do
feature selection on one level,
then run entire method in cross-
validation on outer level)
• Adjusting the False Discovery
Rate
ConclusionGood data science is on the leading edge
of scientifi c understanding of the world,
and it is data scientists responsibility to
avoid overfi tting data and educate the
public and the media on the dangers of
bad data analysis.[Taken from http://www.kdnuggets.
com/2014/06/cardinal-sin-data-mining-data-
science.html with permission from Dr. Gregory
Piatetsky.]
n
Gregory Piatetsky-Shapiro, Ph.D. is the President of KDnuggets, which provides analytics and data mining consulting. Gregory is a founder of
KDD (Knowledge Discovery and Data mining conferences) and is one of the leading experts in the fi eld. Gregory was the fi rst recipient of ACM
SIGKDD Service Award (2000). He also received IEEE ICDM Outstanding Service Award (2007) for contributions to data mining fi eld and
community.
Anmol Rajpurohit is a graduate student (MS, Computer Science) at UC, Irvine. His areas of interest are data science, machine learning and
information retrieval. His novel analytics solution for online education was the runner-up at UCLA Developer’s Contest 2014.
Abo
ut t
he A
utho
rs
Computer Society of IndiaUnit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093
Tel. 91-22-2926 1700 • Fax: 91-22-2830 2133
Email: [email protected]
CSI - Communications
COLOUR
Colour Artwork (Soft copy format) or positives are required for colour advertisement
Back Cover Rs. 50,000/-
Inside Covers Rs. 40,000/-
Full Page Rs. 35,000/-
Double Spread Rs. 65,000/-
Centre Spread
(Additional 10% for bleed advertisement)
Rs. 70,000/-
MECHANICAL DATA
Full Page with Bleed 28.6 cms x 22.1 cms
Full Page 24.5 cms x 18.5 cms
Double Spread with Bleed 28.6 cms x 43.6 cms
Double Spread 24.5 cms x 40 cms
Special Incentive to any Individual/Organisation for getting sponsorship 15% of the advertisement value
Special Discount for any confirmed advertisement for 6 months 10%
Special Discount for any confirmed advertisement for 12 months 15%
All incentive payments will be made by cheque within 30 days of receipt of payment for advertisement.
All advertisements are subject to acceptance by the editorial team.
Material in the form of Artwork or Positive should reach latest by 20th of the month for insertion in the following
month.
All bookings should be addressed to :
Executive Secretary
Computer Society of IndiaTM
Unit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093 Tel. 91-22-2926 1700 • Fax: 91-22-2830 2133 Email: [email protected]
(ADVERTISING TARIFF)Rates effective from April, 2014
CSI Communications | June 2015 | 34 www.csi-india.org
Abo
ut th
e A
utho
r
Rahul Bhati Currently pursuing B. Tech. in Computer Engineering from Charotar University of Science and
Technology, Changa, Anand, Gujrat interested in competitive programming, Machine Learning, Cyber Security &
FOSS.
Programming.Tips() »Salting PasswordsTypically, System Designers choose one of two ways to store
users’ passwords:
1. In original format, as plain text.
2. As the digest (output) of a one-way hash function.
It probably goes without saying that the fi rst option is a
bad idea considering that any kind of compromise of the user/
password database immediately exposes login credentials clients
may be using on many other sites, and for second one, just hashing
the passwords is barely more secure. So, what we can do is to salt
passwords which means, instead of just hashing the password we
hash the salt + password. Here as presented instead of password,
store hash = sha256(password). However, this will not protect
against rainbow table attacks, dictionary, brute force attacks etc,
but creating such an attack is expensive (takes time).
Here is a simple yet eff ective implementation in python
using pbkdf2 for salting the password from https://github.com/
SimonSapin/snippets/blob/master/hashing_passwords.py
import hashlib
from os import urandom
from base64 import b64encode, b64decode
from itertools import izip
# From https://github.com/mitsuhiko/python-pbkdf2
from pbkdf2 import pbkdf2_bin
# Parameters to PBKDF2. Only affect new passwords.
SALT_LENGTH = 12
KEY_LENGTH = 24
HASH_FUNCTION = ‘sha256’ # Must be in hashlib.
# Linear to the hashing time. Adjust to be high but take a reasonable
# amount of time on your server. Measure with:
# python -m timeit -s ‘import passwords as p’ ‘p.make_hash(“something”)’
COST_FACTOR = 10000
def make_hash(password):
“””Generate a random salt and return a new
hash for the password.”””
if isinstance(password, unicode):
password = password.encode(‘utf-8’)
salt = b64encode(urandom(SALT_LENGTH))
return ‘PBKDF2${}${}${}${}’.format(
HASH_FUNCTION,
COST_FACTOR,
salt,
b64encode(pbkdf2_bin(password, salt, COST_FACTOR, KEY_LENGTH,
getattr(hashlib, HASH_FUNCTION)))) def check_hash(password, hash_):
“””Check a password against an existing hash.”””
if isinstance(password, unicode):
password = password.encode(‘utf-8’)
algorithm, hash_function, cost_factor, salt, hash_a = hash_.split(‘$’)
assert algorithm == ‘PBKDF2’
hash_a = b64decode(hash_a)
hash_b = pbkdf2_bin(password, salt, int(cost_factor), len(hash_a),
getattr(hashlib, hash_function))
assert len(hash_a) == len(hash_b) # we requested this from pbkdf2_bin()
# Same as “return hash_a == hash_b” but takes a constant time.
# See http://carlos.bueno.org/2011/10/timing.html
diff = 0
for char_a, char_b in izip(hash_a, hash_b):
diff |= ord(char_a) ^ ord(char_b)
return diff == 0
Practitioner Workbench
Rahul BhatiPursuing B. Tech. in Computer Engineering from Charotar University of Science and Technology, Changa, Anand, Gujrat
CSI Communications | June 2015 | 35
Programming.Learn("R") »Cluster Analysis in R LanguageData Science requires better statistical analysis for solving complex problems. R programming language in such cases is very popular among statisticians and data scientists. It is a platform for statistical computations and graphics visualizations. R programming language is used in various applications invloving huge amount of data. R includes a total of 5800 additional packages and around 120000 functions available at the Comprehensive R Archive Network. Here, we are explaining ‘Cluster package’ available in R.
Basically there are two types of clustering approaches:
partitioning and hierarchal. K-means is one of the most popular
partitioning approaches. It requires pre-declaration of number of
clusters to extract. In R’s partitioning approach, observations are
divided into K groups and reordered to form the most interrelated
clusters possible according to a given condition.
Before doing cluster analysis, data without any value should
be removed for achieving better cluster extraction. Rescaling
of variables should be done for comparability. This is called pre
processing of data.
# Prepare Datamydata <- na.omit(mydata) # listwise
deletion of missing valuesmydata <- scale(mydata) # standardize variables
Here na is used for missing values estimation.
K-means ClusteringK-means algorithm is executed by function kmeans (data, n) available in R. where data is a numeric dataset or matrix and n is the number of clusters to extract. The NbClust package can be used as a guide in selection of number of clusters.Association of set.seed function with kmeans guarantees that the results are reproducible. The kmeans() function has an nstart choice that attempts several initial confi gurations and select the best one among various solution. It returns the cluster memberships, centroids, sums of squares (within, between, total), and cluster sizes. This approach is often recommended.
Hierarchical ClusteringThis clustering builds a a hierarchy of clusters. hclust() function is used from package stats for Hierarchical clustering. Basically hierarchical clustering is having two approaches to build a hierarchy of clusters:
Agglomerative: It is a “Bottom Up” approach: each
observation begins in its own cluster, and pairs of clusters are
combined in a single cluster as go up the hierarchy. agnes() function
from cluster is used for this purpose.
Divisive: In this “Top Down” approach a large cluster is
available splitted in separate clusters to build a hierarchy of
clusters. Merging and splitting is performed in greedy manner.
Function diana() can be used for divisive hierarchical clustering.
Practitioner Workbench
Ghanshaym RaghuwanshiResearch Scholar, Jaypee University of Engineering and Technology, Guna - MP
Fig. 1: K-means Clustering algorithm executi on in R
Fig. 3: Hierarchical Clustering algorithm executi on in R
Fig. 2: Visualizati on of K-means Clustering
Fig. 4: Visualizati on of Hierarchical Clustering
-2 -1 0 1 2 3 4 5
-6-4
-20
2
K-means clustering
x[,1]
x[,2
]
14 1512 17 16 20
1918
11 137
3 8 6 101 2 54 90
24
68
10
Hierarchical clustering(comp
hclust (*, "complete")dist(x)
Heigh
t
CSI Communications | June 2015 | 36 www.csi-india.org
IntroductionData forms an integral part of any
organisation. Data gets captured
in various transactions within an
organization. Erroneous data has a
major impact on Information Technology
(IT). As foundation of any edifi ce the
key integral element to IT initiatives is
data. It’s imperative to maintain high
standards of data quality since this will
be key diff erentiator in future. In today’s
competitive world, data is most important
asset for any company. It is unique to any
company.
No matter what the data is meant to
be used for, it is very crucial to maintain
accurate and complete data in any
enterprise or system. If the data present
in the system does not adhere to the
principles of data quality, it will lead to
various issues in the organisation. It is,
thus, very important to adhere to data
quality standards while implementing
ERP solution to be able to get the desired
benefi ts in the long run.
This paper is organized as follows.
Section 2 briefs the 4 common attributes
of data quality, Section 3 describes the
case study with a retailing client, Section 4
focuses on potential methods to overcome
data quality challenges with case in hand,
Section 5 briefs on a benefi ts reaped on
successful implementation of this method,
and fi nally Section 6 concludes the paper.
Data Quality AttributesMaintaining data quality can seem like a
scary activity, but all it takes is having the
right people, processes and technology in
place.
Data Quality assessment ensures
specifi c criteria for data to be assessed for
an organization. These can be defi ned in
few set of questions about data like
• What
• Who
• How
• When
• Why
As enterprises grow, the data sharing
grows across the business lines and
diff erent entities. It becomes all the more
necessary to maintain the unit of data
quality uniform across various lines
As per one of the defi nitions, data
quality can be classifi ed in four attributes
viz
• Accuracy
• Timeliness
• Completeness, and
• Consistency
Data accuracy as an attribute
involves in measuring the diff erence
between the actual and correct value.
The timelines defi nes the importance of
data reaching the downstream system
in the defi ned SLA. The data reaching its
systems and sub-systems can be defi ned
within a period of time. This defi ned SLA
becomes the benchmark to measure the
data timeliness. Data completeness is
the state when all elements /attributes
as deemed necessary are present.
Consistency defi nes the comparison of
data between system & sub-systems.
All types of data such as customer
data, product data, fi nancial data and
employee data are at equal risk. Bad
data aff ects all departments such as
Operations, Sales, Marketing, and Finance.
Scott Ambler’s Surveys at www.
ambysoft.com/surveys/ clearly indicates
that there is problem with the quality
of data for around 46% [38 + 8] of
respondents. About 52% of respondents
are satisfi ed with the overall data quality
at present, but have a few apprehensions
about data. This indicates how critical it is
to address the data quality issue.
The following sections describe the
issues identifi ed with the data migrated
in the new ERP, steps taken to rectify
the data and benefi ts perceived while
implementing the improvement measures.
Case Study: A Large Retailing Client Background
A large fashion retailer streamlined its
bespoke applications by implementing
ERP from Oracle. This packaged solution
comprised of multiple modules catering
to various business processes. The suite
of products implemented laid a solid
foundation for the company.
The data was present in disparate
systems prior to implementing this ERP.
There was duplicity of data in terms of its
management. The business attributes of
any fl ow were defi ned diff erently in various
functions. There were lot of redundancy in
data since the same data was present in
diff erent servers.
Data Cutover posed a big threat for
this organization. Since the same item
(aka SKU) was present in diff erent servers
with varied information it was challenging
to identify the correct parameters for
migration. ERP in general demanded data
in a particular format was not present
in the existing system. This resulted at
times to provide data to the best of one’s
knowledge rather understanding the
implications of data provided. Data quality
was amiss.
The company in question is a retailer
having a wide range of apparel, footwear,
home centre products and various other
lines of business. The retailer had around
three million Stock Keeping Units (SKU).
Description and Extraction of Data
The source data was stores in an Oracle
database. This database was very complex
in nature. The database contained 1704
tables, had a size to tune of 430 Gigabytes,
and stored both master and transaction
data used by this retailer.
As part of the data migration
team, the author along with a team of
professionals had access to the data
Data Quality Perspective on Retail ERP Implementation : A Case Study
Case Study
Dinesh MohataConsultant, Oracle Retail Domain, TCS, Banglore
Abstract : To improve adaptability and increase their chances of survival in this age of cut throat competition, enterprises constantly
deploy new applications with better technologies; this is done in alignment with the rapid changes in business environment. This paper
discusses the issues and challenges faced related to data quality parameters while implementing a Retail ERP solution, with examples
and scenarios from real world. It also analyses ways to overcome these data-related challenges and reap the benefi ts of a new solution.
CSI Communications | June 2015 | 37
provided by the retailer to migrate from
their legacy data to the new confi gured
ERP. The data nuances were observed post
the data migration as well. The data was
further analysed during the steady state
support.
Data Migration Approach
Data was supplied by the retailer in Excel
fi les. A staging area was built to load the
data in the excel fi les for further validation.
Data from the excel fi les was loaded as it
is in the staging tables. The data validation
comprised of checking the data for sanity
in terms of conformance with data types
of the target table. For example the data
type of number should not be entered in
character data type. Data should confer
to pre-defi ned rules of the business. For
example one style should have all the
items belonging to the same tax category.
Post all the data validations data will
be accepted and loaded in the target
system/s. If data is rejected, same gets
communicated back to data owner.
Data Quality Dimension
The dimensions used for the study
pertained to following
• Data Accuracy
• Data Timeliness
• Data Completeness
• Data Consistency
The data analysis had been done on
the migrated data during the Retail ERP
implementation. The process involved
getting the data from the legacy system
and migrating it into the ERP. The data
migrated further had an impact on the
business in terms of impacting the day-
to-day operations with respect to data
quality.
The following table details the
fi ndings for each of the data quality
parameters with four dimensions:
accuracy, timeliness, completeness, and
consistency.
The above table qualifi es the various
data related issues encountered in this
analysis. It provides a matrix of the issues
along with the data quality parameters
each of them has impacted. The following
section details the analysis.
Data Accuracy
Accuracy refers to mismatch in the
expected result as compared to actual
result with respect to data. In the table
mentioned above, there are issues where
the data as expected by the retailer was
not migrated to the new ERP. Additionally,
there have been instances where the
data entered in the new system was not
as expected in lines of business of the
retailer. Primarily, majority of the issues
belonged to the Data Accuracy category.
The currency of the supplier in the supplier
master was defi ned as USD whereas the
currency for a SKU in its relationship with
supplier was defi ned in INR. This resulted
in conversion of the cost resulting in
erroneous data. All items defi ned in a style
should have the same tax category was a
defi ned rule but this was not the case
when new items got created in the system.
This rule was a semantic rule defi ned at
the retailer’s end. The Maximum Retail
Price (MRP) of an item should not exceed
the cost of an item. This occurred due
to master data incongruence in various
systems.
The systems allowed an input date
of year as 0123 instead of 2013. From the
RDBMS perspective, date entered from in
the system is valid. However, date fi eld in
the context of data is not correct.
Data Timeliness
Data should be present in systems and
sub systems when required for an action
at all times.
ERP messages between the source
and target are architected to fl ow using
the Message Oriented Middleware
(MOM) framework. All the messages
are further classifi ed in terms of master
or transactional data. Further to master
or transactional data, the messages are
classifi ed within groups called families.
For example, all the messages related to
item are classifi ed under item family.
It has been observed that at times
some messages get dropped while fl owing
through integration bus, resulting in data
mismatch between modules. The ERP
primary system has multiple message
families for one functional area and
sometimes the timing of transmission
aff ects the data integrity between
applications. For example, a new SKU
message is stuck in integration because
of an error but related messages fl ows
through and get rejected because the SKU
is not yet available in the downstream
system.
Sta
gin
g
Ta
ble
sExternal Files Data Load Process
Data Validation Accept/Reject
Load the validated
DataAccept
RejectCommunicate Issue
ER
P S
yste
m
Fig. 1 - Data Flow Diagram for Data Migrati on
Data Quality Parameters
Issue Accuracy Timeliness Completeness Consistency
MRP Based Cost
Indicator√
Multiple Tax
Category in a Style√ √
Currency Mismatch √ √
Supplier data
coherency√
Cost greater than
MRP√
Data Stuck in
Interface√
Invalid Date in the
current context√
Missing attributes √
Table: Data Quality Issues with respect to defi ned parameters
CSI Communications | June 2015 | 38 www.csi-india.org
This resulted in the data not reaching
the target system on time. For example,
the transactional data of order reached in
the downstream system without item.
Data Completeness
On analysing all the SKUs post migration
of the data, it was observed that certain
elements of the data were not completely
migrated.
For example, item data in the ERP
resides in the approximately 15 entities.
In certain cases, it was observed that data
was migrated in all entities, however data
was not complete. In few of the entities
certain attributes were missing. The
source data did not have the data required
for the target in certain cases for few
attributes.
It was observed that data was
incomplete in around 1% of the cases.
Data Consistency
The implications for the data have huge
repercussions on the overall governance
of the processes. For example, at this
organisation the following challenges
were faced in terms of data consistency.
The master system of supplier
data and the ERP system were not
in sync. Currency provided for the
supplier was wrong. This resulted in
provision of wrong cost to the supplier.
The transactions created in the system
post this wrong master data resulted in
incorrect computation of the values for
the transaction. This led to wrong margins
reported for the company. The business
decisions went wrong.
Supplier was created as a silo
element in Oracle Retail. This entailed that
all the attributes of supplier is supposed to
be correct based on the fi nancial system.
However, data provided in Oracle Retail
were not complete. For example, in Oracle
Retail supplier creation is tightly coupled
with delivery timelines, associations with
SKU supplied by the supplier at its primary
location, multiple location association
with its currency etc. These data elements
to complete a supplier creation in Oracle
Retail were not present in the source
system. However, when these fi elds were
interfaced with Oracle Financials they
created a gaping hole and resulted in data
coherency issues.
Overcoming Data Challenges Retailer faced challenges with respect to
data quality parameters. The data quality
from the implementation perspective
resulted in major level fi ndings. The data
quality parameters as highlighted in the
earlier sections went wrong due to the
following major reasons:
• Buggy Code – Incorrect data entered
the system due to the faulty code.
This faulty code was referred to the
data getting generated by the current
system. The logical interpretations
made by the program went wrong
at times technically and functionally.
Technically, the issue for example
was wrong initialisation of the
variables in the code. Functionally,
the issue was wrong computation
of derived value as the logic applied
was not appropriate. When issues
were unearthed the data fi x was
done in the production environment.
Subsequently, correction in the code
was done for long term fi x.
• Interface Issues - The data between
systems were not synchronised
within the stipulated time. There was
delay in the data being posted from
source system to target systems.
At times, due to inherent issues
with memory leakage in the system,
the interface does not behave
as intended. For example, from
performance perspective if the real
time interface is designed to handle
1000 records for a minute and it has
been loaded with 100,000 records,
the system might behave abnormally.
Additionally, data not reaching the
target system on time. For example,
the transactional data of order
reached in the downstream system
without item.
• Data Entry – The data entry relates
to wrong data being entered into
the systems. The data entry can be
done manually in the system or it
can be uploaded into the system
using spread sheets. Data entered
at times was incorrect due to lack
of understanding about the context
of the data, subsequently results in
repercussions within the system.
• Data Representation – The reporting
system had multiple versions of the
data in the reports created for the
end users. The reports had diff erent
interpretations of data from multiple
user community. For example, the
business logic of the stock aging
from the buyer versus the inventory
community was diff erent. This
resulted in erroneous representation.
The paper’s analysis puts further
insight for getting started with accurate
data management. The data quality can be
checked. It requires a commitment from
the top management to ensure there is no
compromise in the data accuracy. It has to
be an ongoing journey. Approach to the
entire data paradigm, can be structured
with certain pointers as outcome of this
analysis.
1. Reconciliation process Reconciliation processes were
adopted in various system / sub-systems
to ensure there is minimal mismatch. The
reconciliation results were published to
all stakeholders that resulted either in
correction of the code in the system/sub-
systems or fi xing of the incorrect data.
2. Introduction of AlertsAlerts are an eff ective mechanism
to monitor the anomalies within the
system or between systems. Alerts were
introduced in the system to warn the users
about the incoherency of data between
the systems.
3. Organizational Boot CampsUser boot camps were organised to
educate the end user about data entry
nuances along with the importance of
data. This helped in ensuring the on-going
data records entry was appropriate.
4. Data Entry Stringent measures for data upload
were adopted across all touch points.
Optimal load for the systems were
identifi ed for data processing. The data
entry was made more systematic. For
example, data entry for the date attribute
now had a data picker rather than the
manual data entry option in the system.
The batch execution was staggered for
data timeliness.
Benefi ts The following points list the benefi ts
derived by an organisation that
implements the data quality processes:
• Higher Customer Satisfaction
• Higher Operational Effi ciency
• Enhanced Decision Support System
• Correct Conclusions
• Bolstered Organisational Confi dence
• Higher ROI on IT Investment
Data quality completeness and
CSI Communications | June 2015 | 39
accuracy resulted in high trust by the
stakeholders in the system. With accuracy
and completeness attributes the user
community showed high confi dence in
the system. Data being present in the
system in timely fashion further resulted
in delight with the customer in terms of
taking decisions. The decision process
improvement resulted in the getting
a higher Return on Investment by the
systems. For example post the appropriate
data quality in place retailer used this data
to pass on vendors to get penalty of delays
in shipments.
The above points clearly prove a point
in terms of ensuring that processes of data
cutover should be followed religiously
to get the right kind of direction for the
enterprise.
Conclusion Data’s journey is quite fascinating. This
paper has provided us with an opportunity
to analyse some real data and get us
with multiple insights. Data has been
studies in both pre and post era of ERP
implementation at the retailer, and
identifying the data quality issues. The
data quality was measured in terms of
accuracy, consistency, timeliness and
completeness. The issues identifi ed during
the process were primarily due to faulty
code, interfacing data between systems
and incorrect entry of data in the systems.
Data had problems in terms of
definition across systems, in terms
of data transportation within system
and sub system, incorrect data entry
and last but not the least in terms of
consistency between systems. The data
at source with problems were arrested
with introduction of stringent data entry
mechanisms. Reconciliation processes
ensured to break data coherency and
consistency issues. The actions taken
resulted in huge benefits in terms of
customer satisfaction with the system
and further strengthened the confidence
of the end user in the system.
The journey of data from preparation
and cleansing to migration, adhering
to data quality, ensures that correct
processes are applied for a successful
implementation of ERP. Thus, we
can summarise that the data related
challenges as articulated in this paper
can be addressed through a robust data
quality program.
References[1] T H Davenport, “Putting the
enterprise into the enterprise
system”, Harvard Business Review,
76 (1998), pp. 121-131.
[2] Taking the First Steps Toward
Data Quality by Elizabeth Dial,
Technical Solution Architect, IBM
Corporation -http://www.ibm.
com/developerworks/data/library/
dmmag /DMMag_2010_Issue2/
FeatureDataQuality/index.html
[3] Ballou, D P, & Pazer, H (1985).
Modeling data and process quality in
multi-input multi-output information
systems. Management Science,
31(2), 150-162.
[4] Data Driven: Profi ting from Your Most
Important Business Asset Hardcover
– 22 Sep 2008 by Thomas C Redman
(Author)
[5] Data Quality Survey – www.ambysoft.
com/surveys - Scott Ambler
[6] A Case Study on the Analysis of
the Data Quality of a Large Medical
Database - 20th International
Workshop on Database and Expert
Systems Application.
[7] Ballou, D P, Madnick, S, & Wang,
R (2004). Assuring information
quality. Journal of Management
Information Systems, 20, 9–11.
n
Abo
ut th
e A
utho
r
Dinesh Mohata is a Consultant in the Oracle Retail domain at TCS. Dinesh has over 15 years of IT & consulting
experience in software design, development, deployment and testing. He has retail industry experience of over 12
years. His areas of interest include Agile Development Methodology, Data Quality & Implementation Data Cutover.
Dinesh can be reached at [email protected] or [email protected].
Guidelines of Sending CSI Activity Report• Student Branch activity Report : send to: [email protected] with a copy to admn.offi [email protected] and director.
The report should be brief within 50 words highlighting the achievements and with a photograph with a resolution higher than 300 DPI.
• Chapter activity Report: send to: [email protected]
The report should be within 100 words highlighting the objective and clearly discussing the benefi ts to CSI Members. It should be accompanied by a photograph with a resolution higher than 300 DPI.
• Conference/ Seminar Report : should be sent by Div Chairs and RVPs to : [email protected]
The report should be brief within 150 words highlighting the objective and clearly discussing the benefi ts to CSI Members. It should be accompanied by a photograph with a resolution higher than 300 DPI.
Dr. Vipin Tyagi, VP, Region III ([email protected]) will be coordinating publishing of reports of these activities.
(Prof. Anirban Basu, Vice President, CSI)
CSI Communications | June 2015 | 40 www.csi-india.org
IntroductionWhen Edward Snowden made revelations
about records of millions of users being
accessed by NSA without their consent or
even knowledge in June, 2013, the whole
world was in for a shock. The concept of
their data not being safe on the network,
would have crossed their ears but not
their minds. This was an eye opener for all
internet users that any data that is online
is open for unauthorized scrutiny.
Internet has been designed with
the basic goal of providing functionality
and not security, so its architecture is
vulnerable. By vulnerability, it means
some inherent weaknesses which can be
exploited and thereby leading to security
threats and cyber attacks. Cyber-attack
is defi ned as “deliberate actions to alter,
disrupt, deceive, degrade, or destroy
computer systems or networks or the
information and/or programs resident in
or transiting these systems or networks.”
The Changing ScenarioThe threats in the cyber space have always
been a matter of concern. Computer worm
created by Robert Morris is recognized as
one of the fi rst worms to aff ect the world’s
cyber infrastructure. This self-propagating
worm succeeded in closing down much
of the internet in year 1989, when it was
created. Due to internet’s infancy at that time
the impact was not devastating. However, it
raised concerns and laid the foundation of
robust security systems, we see today.
1990’s saw various viruses and worms
going viral, ILOVEYOU and Melissa virus,
to name a few. These viruses travelled
into the network through e-mails and then
maliciously propagated themselves, leading
to higher network traffi c. Their threat led
to the development of antivirus software.
The anti-virus software stores signature
of already known viruses and check all the
incoming traffi c for their presence. They are
regularly updated for addition of recently
found virus signatures. In case the incoming
traffi c matches the signatures, it is barred
from entering the internal system.
The new millennium is the witness
to how cyber space and cyber attacks
have radically changed as internet grew
exponentially and permeated into the
fabric of everyday things. Individuals,
organizations and governments, all are
dependent on the Internet for plethora
of tasks. As things stand today, all
our data resides in the cloud, mobile
phones have been replaced by smart
phones, social networking is the way
of expression, cyber economy is on
rise, startups only means e-commerce,
Wi-Fi is off ered free at hotels, restaurants,
café’s, airports etc. These changes have
happened at an astronomical pace and
have tremendous eff ect on how risks &
threats were understood and perceived.
The cyber attacks have transformed in the
wildest possible ways by becoming more
organized, sophisticated and mean.
Most Prevalent AttacksSome of the common cyber attacks
include denial of service attacks, phishing,
defacement, SQL injection, IP spoofi ng etc.
A brief introduction about them follows:
Denial-of-Service (DOS) Attack -
Denial-of-Service (DOS) Attack is a
malicious attempt to make a server or
network resources unavailable to the user
by temporarily interrupting or suspending
the services of the host connected to the
internet. When this disruption is caused
by many computers distributed globally, it
is known as distributed DoS or DDoS. It is
a primitive attack, yet very common due to
their effi ciency and simplicity of arranging
off ensives. For DDoS, vulnerabilities need
not be known and exploited.
In March 2013, Spamhaus, a non-
profi t organization that aims to help
e-mail provider’s fi lter out spam and other
unwanted content was hit by DDoS attacks
at 300 Gbps; strong enough to take any
government’s internet infrastructure.
This attack eff ected internet services
globally. In November 2014, Hongkong
independent news sites were infl icted by
unprecedented in scale DDoS attacks. The
sites were pounded with junk traffi c at a
remarkable rate of 500 Gbps.SQL Injection-SQL injection is code
injecting or inserting attack on the application layer to maliciously read, retrieve, manipulate and or execute data in the database using structured query language. Its severity can range from simple reading of the data to completely destroying the data. Modern websites have dynamic pages consisting of login pages, shopping carts, various forms, search options etc. which prompt the user to submit data as input and based upon the input, retrieve output. All syntactically correct queries are executed by the SQL server whether they are semantically or logically correct queries or not. So a skillfully crafted query can yield desirable outputs like access to sensitive data and its manipulation.
Phishing- By masquerading as a
reputable, trustworthy entity, phishers can
send e-mails to the users inducing them to
visit websites by following the links provided
in the e-mail. The unsuspecting user is lured
into revealing his sensitive and confi dential
information on the fake website. This fake
website may further contain links to various
Area Prone to Cyber Attacks
Fig. 1: Indian websites defaced according to domain name in 2013
Abha Thakral*, Nitin Rakesh** and Abhinav Gupta****Assistant Professor, Department of Computer Science Engineering, Amity University, Noida**Deputy Head Corporate Resource Center & Associate Professor, Dept. of CSE, Amity University, Noida***Senior Chief Engineer- Advanced R&D, Samsung R & D Institute India - Delhi
Security Corner
CSI Communications | June 2015 | 41
malware. The fake websites are created by
phishers which may look exactly like original
websites of legitimate enterprises like user’s
bank, employer, favorite social networking
site, ISP etc.
Common scenario is that the user
receives an email from his bank or trusted
entity stating
• To enhance account’s security
• Or a fraudulent activity is suspected
on your account
• Or you will lose your important
information…etc
Such statements are crafted smartly
to look convincing and draw user’s
attention. The innocent user is then
requested to click on the link embedded
in the e-mail which leads him to bogus
website.
Defacement - The dictionary meaning
of defacement is act of damaging or
spoiling the surface of something. In
context of cyber attacks it refers to
changing the appearance of a website.
The attacker can deface a website by
maliciously breaking into the web server
that hosts the website and replacing its
content with their own.
Websites are the face of any
organization and defacing them may
lead to loss of brand image and faith of
customers. They may have little or no
fi nancial incentive but have immense
impact as they are visible to one and
all. Also defacement may be coupled
with a malware which can then aff ect
the computers on which the website
is opened. Religious and government
websites are major victims of defacement.
Such defacement may be done to bring
across a political or religious statement.
Indian government’s portal india.gov.
in was defaced on 19 February, 2014 and
a message regarding the issue of Kashmir
was posted. As per a report, 24,216 Indian
websites were defaced in year 2013. A
detailed study of domain wise Indian
websites defacements establishes that
.in websites were attacked the maximum
(fi gure 1).
Sectors Prone to Cyber AttacksThe cyber economy has become a
mirror image of the real economy, with
similar kind of business processes. The
technology-inspired, enabled and run
systems encompass all types of business
processes. No sector or domain has
been left untouched by its Midas touch,
ranging from fi nance to administration,
entertainment to education and
manufacturing to healthcare. And as these
sectors evolve technologically, the cyber
attack threats they face are also evolving.
Some sectors are more prone to attacks
than others.
Financial Sector - Financial Sector
is the prime target of cyber criminals.
The reason is obvious – an Indelible link
between money and crime. Banks and
fi nancial institutions deal with money
which is now stored and transacted online
in the digital making them vulnerable. In
addition to money, they are also vulnerable
on account of the sensitive data they
possess. According to Ponemon Institute
Survey, losses in US fi nancial services
companies due to cyber crime exceeded
$23 million.
The two way benefi t for the attacker
follows, as not only he can gain money but
also the peripheral information including
contact details and ID etc, that can be sold
in the black market later. A report suggests
that global black market for email ID’s and
ID Nos. is worth $5billion and growing.
More than 360,000 credit card
accounts were aff ected by the cyber
attack on third largest US Bank, Citibank
in May, 2011. Around $2.7 million were
stolen from breached accounts.
In attack against Lockheed Martin,
Secure ID’s were used by hackers. Secure
ID’s are the tokens used by offi ce workers
to access their systems. These secure ID’s
were made by EMC Corporation.
Health Care Sector - The healthcare
related information is considered to be
high price commodity as the healthcare
records have personally identifi able
information. This individually unique
information if stolen can be sold in the
black market or can be used for multitude
of attacks. Until recently, this sector was
not frequently targeted, but is gaining
popularity amongst attackers because
of abundance of personal information
and unpreparedness of the healthcare
industry to tackle such attacks. The health
care sector suff ered highest share of data
breach attacks in 2013 and 2014 with
7.4 million personal records being
exposed in US.
Just a month back i.e. in March 2015,
Premera Blue Cross, a health insurance
company announced that it faced cyber
attack that may have aff ected records of
11 million customers. The records include
history on medical problems, credit card
nos., social security no. etc.
Another health insurance company,
Anthem WLP has also admitted this
year that its 80 million customers may
have their personal data exposed to
cybercriminals.
Energy Sector - Energy sector
consists of oil, gas, coal, nuclear energy
and electricity. Since these are part of
critical national infrastructure, they are
also usually high on target for cyber
attacks. They are vulnerable as networked
corporate systems are established for their
distribution and servicing. Any attack in
this domain has signifi cant consequences.
World’s largest state owned Oil
Company Saudi Aramco was infected by
Shamoon virus which erased data from
its computers. As a result the largest oil-
producing company had two weeks of
downtime. So the organization lost its
data, its productivity, and its profi ts and
had to replace a huge number of infected
machines.
Telecommunications Sector- Telecom
has become part of critical infrastructure
as our dependence on it continues to grow.
Parallely the risks its faces also continue
to grow, with cyber attacks amongst them.
Attack on communication channels has a
very deep impact as sending and receiving
of critical information gets disrupted.
Cyber attackers by controlling the fl ow of
information can control the pulse of the
nation or state.
In 2014, Germany based Telecom
giant Deutsche Telekom registered close
to one million hacker attacks daily on
its grids. Furthermore, as per a study,
the threats due to cyber crimes caused
economic damages worth $575 billion to
German companies in 2013.
Internet of Things-Target sector in
future - The attack surface has increased
with almost every device In business and
at home getting connected to the internet.
Hacks against refrigerator and cars have
been occurring already. With IoT in its
evolving phase, new protocols are being
introduced which may come with new
vulnerabilities, leading to new threats.
Manufacturing and industrial environment
stand at more risk than individuals, as
they deploy control systems for activating,
monitoring and functioning of mechanical
controls. These control devices are
integrated with computer systems to then
control doors, windows, valves, equipment
arms etc. Also very high level of diversity
in industrial control system technologies
makes them more vulnerable.
Motivation for AttacksFinancial Gains – Money and crime is
traditionally linked because the biggest
motivation for attacks is fi nancial gains.
CSI Communications | June 2015 | 42 www.csi-india.org
The attacker makes money by selling
stolen data / intellectual property rights,
blackmailing user with secretive data and
through misuse of personal information/
photographs etc.
Political Reasons - Cyberspace can
be used to support propaganda, make a
political statement or to sustain an issue
by attacking websites. The websites are
defaced, temporarily brought down or
shut down permanently. In such a case the
attacker can be considered to be highly
skilled with latest fi nancial and technical
sabotage capabilities at his disposal.
Hackers - Hackers may be benign
explorers who for fun or out of curiosity
are exploring various weaknesses of the
internet. They may not be skilled and use
existing knowledge on the internet to
break in the websites and cause damage.
But as they mature and develop skills,
due to challenges by the peers and to
attain applaud amongst peers, hackers
may undertake malicious activities. It
may include espionage where secrets
are obtained without the knowledge and
permission of a user.
Anonymous - a group of hactivists,
very popular in the cyber space have
initiated and executed many clamorous
attacks against governments and
organizations. Anonymous collective
announced about themselves in the year
2008 by uploading a video on YouTube,
where through the video they waged a war
on the Church Of Scientology. Such was
the impact of this video that the protest
moved from cyberspace to the streets
where people assembled and marched in
opposition of the religious group.
Indian ScenarioCyberfraud in 2013 cost the world whooping
US$113 billion and India US$4 billion, amid
rising incidents of cybercrime. Cert-In, the
functional organization of Government of
India with objective of securing Indian cyber
space handled more than 71,000 incidents
ranging from spam to website intrusion,
phishing etc. It amounts to 225.38% growth
rate from 2012-13 and 3519.87% growth rate
from 2005-13 (fi gure 2). The detailed bar
chart gives number of malicious incidents
handled by it yearly, starting from 2005
onwards.
Reasons which fuel the vulnerability
in Indian context include:
• Growing Economy
• Advancements in IT infrastructure
• Political movements
• Population is increasingly embracing
the online platform
• Unpreparedness of organizations to
tackle attacks i.e. usage of old legacy
systems unequipped of tackling
sophisticated attacks.
FutureCyber capabilities have grown exponentially
and will play a crucial role in future confl icts.
So the conventional weapons like bullets,
bombs and missiles may be replaced by
cyber attacks. The sophistication level of
attacks have increased to such levels that
the attackers can remove all evidence of
their attacks i.e. attack footprints within
few minutes of the execution of the attacks.
And by merely seeing the methodology of
attacks, origin point, target chosen, language
used, servers deployed etc, who the attacker
is cannot be attributed. Cautions to be taken
include:
• Solutions which have the capacity
to analyze the network traffi c in real
time and take actions accordingly
need to be deployed.
• State-of-the art self-driven, self-
learning, self-upgrading tools and
techniques need to be developed.
• Extensive audit of every access
point into and out of the network to
ensure security must be applied. This
should also include employees and
third parties like contractors, agents,
vendors, suppliers and partners.
• New compliance regulation and
stringent controls must be deployed
by government keeping the current
security threats in mind. n
Ms. Abha Thakral is currently working as Assistant Professor on Grade II with Department of Computer Science and Engineering at Amity University Uttar Pradesh, Noida. She is also a Research Scholar working in the fi eld of Cyber Forensics.
Dr. Nitin Rakesh is Deputy Head Corporate Resource Centre & Associate Professor in Department of Computer Science and Engineering at Amity University Uttar Pradesh, Noida. His rich experience in Academic & Research includes Network Coding, Interconnection Networks & Architecture, Network Resiliency, Networks–on Chip, Network Algorithms, Parallel Algorithms and Fraud Detection in Online Phantom Transactions. He is member of IEEE, ACM, SIAM, IAENG and Life member of CSI. He is also a recipient of Drona Award for TGMC-2009 by IBM. Moreover he is responsible for Corporate Interface, Training and Placements.
Dr. Abhinav Gupta is Senior Chief Engineer at Samsung Research Institute Advanced R&D at Samsung R & D Institute India - Delhi. He is responsible for Product Innovation, Research & Innovation, Collaborative research with premium research organizations for futuristic product development. He is Doctor of Philosophy in Computer & Systems Sciences from Jawaharlal Nehru University, New Delhi. Master of Technology in Signal Processing from Indian Institute of Technology, Guwahati and Bachelor of Technology in Electrical Engineering from Institute of Engineering and Technology, Bareilly.A
bout
the
Aut
hors
Fig. 2: Security Events handled by CERT-In yearly
CSI Communications | June 2015 | 43
Brain Teaser Dr. Durgesh Kumar MishraChairman Division IV Communications, Professor (CSE) and Director Microsoft Innovation Center, Sri Aurobindo Institute of Technology, Indore
Crossword »Test your knowledge on Data SciencesSolution to the crossword with name of fi rst all correct solution providers(s) will appear in the next issue. Send your answer to CSI
Communications at email address [email protected] with subject: Crossword Solution – CSIC June Issue.
Solution to May 2015 crossword
CLUESACROSS2. Approximately 1000 Petabytes of data.
4. A workfl ow processing system.
6. Graphical representation of analyses.
9. A connectivity tool.
13. Making an intuition-based decision.
15. Framework for populating Hadoop with data.
16. Deviation of an object from average object.
18. Correctness of data.
19. Any delay in response.
20. A cloud computing platform by Microsoft.
21. A messaging system developed by Linkedin.
23. Ability to maintain performance with diff erent load.
24. Software framework for big data processing.
DOWN1. Knowledge as a set of concepts.3. Process of removing all data points that could lead to the identity
disclosure.5. An open source search engine built on Apache Lucene.7. A distributed and open source database.8. A visual abstraction of machines and database.10. Programming language suited for parallel data.11. An open-source software framework for big data.12. The task of grouping a set of objects in such a way that objects
in the same group are more similar to each other than to those in other groups.
14 The process of representing abstract data as images for better understanding.
15. A backup operational mode.17. Data about data.22. An Apache data serialization system.
Did you know• 90% of the total data in the globe has been
generated in last two years.• US National security agency build a data centre
in Bluffdale with capacity of 1 Yottabyte which equals to one trillion Terabytes.
• 4 million search queries per minute are received by Google.
• 2.5 million pieces of contents per minute are shared by Facebook users.
• 300,000 tweets are done per minute by Twitter users.• 220,000 new photo per minute are posted by Instagram users.• 72 hours of new video contents per minute are uploaded by
YouTube users.• 50,000 Apps per minute are downloaded by Apple users.• 571 websites are created per minute.
Rashid SheikhAssociate Professor, Sri Aurobindo Institute of Technology Indore
We are overwhelmed by the response and solutions received from our
enthusiastic readers.
Congratulations!All correct answers to May 2015 month’s crossword received
from the following readers:
Er. Aruna Devi (Surabhi Softwares, Mysore)
Ajit Kumar (Pondicherry University)
Akshay G. Joshi (PES Institute of Technology, Bangalore)
CSI Communications | June 2015 | 44 www.csi-india.org
Call for Papers
CCIS 2015 2015 International Conference on
Communication Control & Intelligent Systems(Technically sponsored by IEEE Uttar Pradesh Section in association with CSI Mathura Chapter)
(Sat-Sun) November 07-08, 2015(Conference id-36597)
www.gla.ac.in/ccis2015
Organized by: Department of Electronics & Communication Engineering
Introduction:The fi rst international conference and 10th conference in sequence,
Communication Control and Intelligent Systems (CCIS 2015) will be held on
November 07 & 08 2015. CCIS 2015 is an international conference where
theory, practice and applications of communication systems, control systems,
intelligent systems and related topics are presented and discussed.
About GLA University:GLA University runs courses as B.Tech (CE, CS, EE, EN, EC, ME), Diploma in
Engineering, B.Pharm, D.Pharm., BBA, BBA(Family Business), BCA, B.Sc.(Hons.),
B.Com (Hons.), B.Ed. , M.Tech (CE, CS, EC, ME, EE), M.Pharm (Pharmacology,
Pharmaceutical Chemistry), MBA, MCA, M.Sc.(Bio-Technology, Microbiology
& Immunology) & PhD. The university campus is spread over more than 120
acres of lush green pollution free grounds and is located on Delhi-Mathura
National Highway No.-02.
Conference Theme:Technical paper Submissions are invited under the following topics, but are not
limited to:-
Track-1 : Wireless and Wired Networks, Multimedia Communications,
Comp uter Networks, Optical networks, Networking & Applications, Next
Generation Services
Track-2 : Control Systems, Nonlinear Signals and Systems, Embedded systems
and software, intelligent systems, neural networks and fuzzy Logic, Robotics
and applications, Machine learning and soft computing, System identifi cation
and control, Algorithms and Computing.
Track-3 : VLSI Technology, Design & Testing , Signal processing, ,Bio-Medical
Processing, Speech image and video processing, Analog and Mixed Signal
Processing, Hardware Implementation for Signal Processing, Text processing,
Database and data mining
Track-4 : Monolithic and hybrid integrated (active and passive) components
and circuits, Antennas and phased arrays, RF packaging and package modeling,
RF MEMS and Microsystems, EMI/EMC
Track-5 : Adhoc Networks, ubiquitous and Cloud computing, Distributed and
parallel systems, Security and information systems, Network security
Submission Prospective authors are encouraged to submit their paper through easy chair.
The link is available on the conference website. Submissions must be plagiarism
free and not more than 5 pages in IEEE format. Use the following link to submit
your papers.
https://www.easychair.org/conferences/?conf=ccis2015Proceedings PublicationAll Accepted and presented papers of the conference by duly registered
author(s), will be submitted to IEEE Xplore digital library for possible publication.
Important Dates/Deadlines
June 11, 2015 Submission of regular paper
August 22, 2015 Paper acceptance notifi cation to authors
September 22, 2015 Last Date of registration
September 29, 2015 Last Date of Camera Ready Copy Submission
September 29, 2015 Last Date of Copyright form Submission
Registration DetailsAll delegates are required to register for the conference as per the following
details:
Corporate executive and professional Rs 12,000 /-
Academicians IEEE/ICEIT/CSI/IETE Members Rs 8,000 /-
Academicians Non Member Rs 10,000 /-
Students IEEE/ICEIT/CSI/IETE Members Rs 5,000 /-
Student Non Members Rs 6,000 /-
Academicians from abroad US$300
For any inquiry please Contact: [email protected]
GLA University, Mathura , 17 km stone, NH-2, Mathura Delhi Road, P.O.
Chaumuha, Mathura-281406, UP. India
Tel: (05662) 250909, 250900, 9927064017, Fax: (0566 - 2)241687, Website:
www.gla.ac.in
Mr. Vishal Goyal (Technical Program :�+91-7500446622Committee Chair)Mr. Atul Bansal (Technical Program :�+91-9760001881Committee Chair)
Mr. Aasheesh Shukla (Publication : � +91-8126130707Committee chair)Dr. T. R. Lenka (Publication :�+91-9435387419 Committee Chair)
Why Join CSI:1) To be a part of the distinguished fraternity of famous IT industry leaders, brilliant
scientists and dedicated academicians through Networking.
2) Professional Development at Individual level.
3) Training and Certifi cation in futuristic areas.
4) International Competitions and association with International bodies like IFIP and
SEARCC.
5) Career Support.
6) CSI Awards.
7) Various Publications.
CSI Communications | June 2015 | 45
Report from Kolkata ChapterThe National Conference on Computing, Communication and Information Processing (NCCCIP-2015) sponsored by All India Council
for Technical Education (AICTE), New Delhi under North East Quality Improvement Program (NEQIP) and technically sponsored by
Computer Society of India (CSI) Kolkata Chapter was held successfully during 2-3 May 2015 at North Eastern Regional Institute of
Science & Technology (NERIST), A Deemed University under MHRD Govt. of India, Nirjuli, Arunachal Pradesh. The conference was
organised by the Department of Computer Science & Engineering, NERIST.
The Inaugural function was attended by Prof. P.K.Tripathy, Dean(Academic) NERIST as Chief Guest, Prof. J.K.Mandal, Department of
Computer Science & Engineering, University of Kalyani, Prof. D. K. Lobiyal, School of Computer and System Sciences, JNU New Delhi
as the guest of honors. The Chief Guest released the proceedings of the conference. Shri Moirangthem Marjit Singh, Conference Chair
NCCCIP-2015 presented a detailed report on the conference.
Prof. J.K.Mandal and Prof.D. K. Lobiyal delivered keynote addresses on 2nd May 2015. It was followed by paper presentations. On 3rd
may there were keynote addresses delivered by Prof. S. K. Khatri, Director AIIT, Noida and Prof. P. Dutta Department of Computer
and System Sciences, Visva-Bharati University. An invited talk by Ani Taggu, RGU Doimukh was followed by paper presentation. The
conference was attended by faculty and students of NERIST including outstation participants. The closing function was attended by
Prof. M. F. Hussain, Dean(Administration),NERIST as the Chief Guest and presented the certifi cates to the paper presenters.
From Chapters and Divisions »
Report of Regional Student Convention 2015 Region-IIComputer Society of India, Region-II and Computer Society of India, Kolkata Chapter has organized the REGIONAL STUDENT
CONVENTION 2015 REGION-II ( East / North-East States) on 14th March, 2015 in collaboration with Narula Institute of Technology
at Agarpara.The Regional Student Convention 2015 aimed to bring the students together into a common platform with an intention
to achieve some demanding objectives. Firstly, to expose the students to the
concepts of academic writing, research presentation, critical thinking. This
regional convention enabled formal environment for students to meet each
other, share their ideas and get feedback – so that they can form a network of young
researchers. Student paper presentation, Quiz contest were the main focus of the
convention. Prof.(Dr.) A.K. Bagchi, (Retired Professor, ISI Kolkata), Delivered Key
Note Speech. Dignitaries attended the convention are, Dr. S. Raza (Chairman,
Patna Chapter), Dr. A.K. Nayek, (Director, IIBM Patna), Chief Guest, for the
Convention, Dr. A.K. Bagchi, (Retired Professor, ISI Kolkata), Delivered Key
Note Speech, Dr. J.K. Mandal, (Regional Student Convener), Former Dean & Prof.
Kalyani University, Mr. D.P. Sinha, (RVP- II), Dr. D.D. Sinha, (Fellow of CSI), Prof.
CU, Dr. P. Paul, (Vice-Chairman, CSI Kolkata), Prof. ISI Kolkata, Ms. Somdutta Chakraborty, State Student Coordinator, West Bengal,
Mr. Subir Lahiri, (Secratory CSI Kolkata), Mr. Aniruddha Nag. Mr. Sumantra Bhattacharyya, JIS College of Engineering. A total of 17
papers were selected and 20 paper presenter from 5 diff erent colleges presented the paper on that day More than sixty participants
from various part of East and North East Part of the country participated in the convention. The regional meet also took place on the
same day, in diff erent location on the same campus.
Like Computer Society of India on Facebook: https://www.facebook.com/CSIHQ for updates.
RVPs, Divisional Chairpersons, Chapter OBs and Student branch coordinators may send the activity reports, Photographs, or any
other information to update on the page to [email protected] .
Congratulations!!!
Dr. G. Satheesh Reddy, Honorary Fellow of Computer Society of India has been appointed
as Scientifi c Adviser to Raksha Mantri.
CSI Communications | June 2015 | 46 www.csi-india.org
Report from Division – I and Region – I By Prof. M. N. Hoda, Chairman, Division – I, Computer Society of India
IEEE Delhi Section, Computer Society of India Division – I and Region – I, ISTE Delhi Section and IETE Delhi Centre collaborated together for
an evening session held on “Technological Needs for Future Human Space Missions” at Bharati Vidyapeeth’s Institute of Computer Applications
and Management (BVICAM), New Delhi on the occasion of 40th Anniversary celebration of IEEE Delhi Section on 13th May, 2015. Dr. Kumar Krishen, Fellow, SDPS and Fellow, IETE NASA Johnson Space Centre, 2101 NASA Parkway, USA, was invited to deliver the talk.
Welcome address was delivered by Prof. M.N. Hoda, Director, BVICAM, New Delhi and Chairman, Division – I, CSI. Prof. Mini S. Thomas,
Chairman, IEEE Delhi Section briefed the audience about the genesis of the event and introduced the speaker to the audience. Dr. Kumar
Krishen explored various facts of Milky Way Galaxy along with the over arching constraints of space systems, during his knowledgeable
session. He discussed how the earth is changing dramatically since its beginning and how the survival of life on earth is aff ected by the na tural
disasters like Volcanoes, Earthquakes, Tsunamis, Tornados and Platelet Motions. He also sensitized the audience with the missions of various
nations along with their objectives such as Russian Lunar Mission, China manned moon mission, Moon exploration mission. He also briefed
the Japan’s mission to plan base station on moon by sending humanoid robots to the moon by 2020. The informative session concluded with
a question answer session with the audience followed by the Inauguration of Collabratec, research collaboration and networking platform of IEEE, by Mr. Daman Dev Sood, IEEE Delhi Section. Dr. N. K. Gupta, Chairman, ISTE Delhi Section, during his talk, mentioned that the occasion
is historical into the nature that all the fellow professional societies have come together to celebrate the 40th anniversary of IEEE Delhi Section
with such a knowledgeable evening session. The event ended with the vote of thanks by Mr. Shiv Kumar, Regional Vice President, Region (I),
Computer Society of India. The entire event was anchored by Mrs. Ritika Wason, Assistant Professor, BVICAM, New Delhi and co-ordinated
by Dr. Anupam Baliyan, Associate Professor, BVICAM, New Delhi. It was well attended by over 80 corporate members of CSI, IETE, IEEE and
ISTE and they also got ample opportunity of networking at the dinner.
Report from Patna ChapterAn one day National seminar
was organized by Indian Institute
of Business Management, Patna
in technical collaboration with
Computer Society of India Patna
Chapter on the theme “Role of Science Education in National Development” on 11th April 2015
at IIBM Auditorium, Patna.
The seminar was inaugurated
by the General President of Indian
Science Congress Association
Dr. A.K. Saxena in the presence
of Dr. Arun Kumar, General
Secretary, Dr. Vijay Laxmi Saxena,
Former General Secretary & Dr.
Dhyanendra Kumar, Treasurer
of Indian Science Congress
Association, Dr. Ranjit Kumar
Verma, Pro V.C., Patna University,
Prof. U.K. Singh, Fellow, CSI &
Director General, IIBM & Dr. Zakir
Husain Institute, Prof. A.K. Nayak,
Former National Chairman Div-III (Application) of CSI, & Mr. Rohit Singh, Chapter Patron of CSI Patna Chapter.
One technical session was organized on the theme IT Education in National Development in which the technical papers were presented
by Mr. Shams Raza, Immediate Past Chairman of CSI, Patna Chapter, Prof. Alok Kumar, Dean of IIBM, Patna, Mr. Shailesh Kr. Shrivastava,
Director, NIC, Patna, Prof. Ganesh Pandey, Dy. Director, Dr. Zakir Husain Institute & Majoj Kumar Mishra, Amity Business School, Patna.
Prof. A.K. Nayak delivered the welcome address, whereas Prof. U.K. Singh, the Member of Nomination Committee, CSI Presided over the
function. Mr. Purnendu Narayan, Secretary, CSI, Patna Chapter proposed the Vote of thanks.
Dr. Dhyanendra Kumar, Prof. A.K. Nayak, Prof. U.K. Singh, Dr. Vijaylaxmi Saxena, Dr. A.K. Saxea, Dr. Arun Kumar, Dr. Ranjit Kumar Verma, Mr. Rohit Singh
CSI Communications | June 2015 | 47
Workshop Organized by Computer Society of India, Noida Chapter and IMS-Noida on 5th May on ‘Net Neutrality’, Social and Economic Perspective
In his inaugural address, Shri Anuj Agarwal, Chairman, CSI, Noida Chapter mentioned that the
internet is the only non discriminatory medium and platform for the world where no body is
discriminated based on nationality, color of the skin, caste creed, birth origin, social or economic
status, sex or any other thing. Internet is not governed by any particular government or company
and it is guided by we the people, ‘the global citizen’.
Dr. Arvind Gupta, Key Note Speaker and National Head of the IT Cell of BJP, described the net
neutrality and also described the fi ne diff erence between ‘freedom on internet’ and ‘free internet’.
Mr. Rajan Mathews, Director General of Cellular Operators Association of India and a
renowned telecom expert, maintained that all telecom companies support Net Neutrality.
Mr. Gopal Agarwal, BJP Economic Cell and a senior active civil society member, mentioned in his address that one has to see the
entire debate from the consumer perspective. The consumer wants reliable services at an aff ordable price. He also put the case of
calculation the actual cost of telecom networks and operations because telecom is a public good.He presided over the session and
mentioned that the debate should remain focused on the substance and should not get politicized and in the current political scenario,
many people who may not know the nitty gritty of the subject may try to jeopardize a healthy debate. He also mentioned about
the social and economic impact of net neutrality. Shri Deepak Sahu, Editor-in-Chief, VarIndia.com supported net neutrality and put
forward the social perspective on the need to have net neutrality. He was categorical and clear that net neutrality can not be diluted.
Dr. Kamaljeet Singh, Director IMS, summed up the discussion and presented a vote of thanks.
Report on Information Technology by Nashik ChapterThe Nashik chapter celebrated its annual event Information Technology Day on 16th March 2015. On this occasion a program full of
activities like lectures, seminars, felicitations and awards to academic achievements and competition winners was arranged. The
program was conducted at Shankaracharya Kurtakoti auditorium. Industrialist, representatives of professional organisations, IT
professionals, Principals of colleges, students participated in the program with lot of enthusiasm.
The release was followed by felicitation of IT professionals namely
Shri Piyush Somani MD and CEO ESDS softwares, Suchit Tiwari
chairman of Cognifront, Joy Aloor CEO Fox controls, Rohit Kulkarni
of Neumann systems, Rajiv Papneja from ESDS, Gunwant Battase of
Nebula studios, Pramod Gaikwad of Silicon Valley, Mrs. Bhagyashree
Kenge of Cyberedge systems and Ruturaj Kohok of Nethority. Shri
Chintawar credited everyone from CSI for wonderful journey of fi fty
years and achieving great success. He was amazed by the fact that
the society is managed by volunteers and making a substantial impact
for IT professional and government initiatives like e-governance.
The chapter on this occasion of Golden Jubilee brought a special editi on of newslett er ACCESS
Report from Udaipur ChapterComputer Society of India - Udaipur chapter celebrated World Telecommunications and Information Society Day on 17 May 2015
in association with The Institution of Engineers (India) Udaipur local
chapter. Prof. S. S. Sarangdevot, VC, Rajasthan Vidyapeeth University,
Udaipur was Chief guest and Prof. Vipin Tyagi, Jaypee University of
Engineering and Technology, Guna - MP, Regional Vice President
- Region 3 of Computer Society of India was guest speaker on the
occasion. Er. A. S. Choondwat, Chairman, IEI, Udaipur Local Chapter,
Er. M. K. Mathur, Hon. Secy., IEI, , Udaipur Local Chapter, Dr. Y. C.
Bhatt, Chairman, CSI, Udaipur Chapter, Er. Amit Joshi, Hony. Secy.,
CSI Udaipur chapter were present on the occasion.
CSI Communications | June 2015 | 48 www.csi-india.org
SEARCC Executive Council Meeting- 27th April, 2015-SingaporeProf. Bipin V. Mehta, President, Computer Society of India attended SEARCC Executive Council Meeting on 27 April 2015, at
Singapore, as CSI is member of South East Asia Regional Computer
Confederation (SEARCC). Prof. Bipin Mehta in his presentation
gave the overview of the Computer Society of India.
In APEC Telecommunications and Information Working Group
Strategic Action Plan 2016-2020 following priority areas are
identifi ed:
1. Develop and support ICT Innovation
2. Promote a secure, resilient and trusted ICT environment
3. Promote regional economic integration
4. Enhance the Digital Economy and the Internet Economy
5. Strengthen cooperation
Mr. Yasas Abeywickrama showed interest to collaborate with CSI
for YITP Awards. He also briefed about SEARCC School Competition
which is being hosted by Sri Lanka Computer Society and requested
members to send teams.
CSI is the largest society in SEARCC, in which Dr. F.C. Kohli has taken
initiative to form this group in the South East Asia and contributed in
the activities of SEARCC. CSI can play a major role in SEARCC and
its various initiatives.
(L to R) Mr. Yasas V. Abeywickrama, Vice President - Computer Society of Sri Lanka, Mr. Kunaseelan Rajaretnam, (Council Member, Malaysian Nati onal Computer Confederati on), Mr. Mick Nades (President, Papua New Guinea Computer Society), Prof. Bipin V. Mehta (President, Computer Society of India), Dr. Dayan Rajapakse (President Computer Society of Sri Lanka & President, SEARCC)
Report from CSI Vadodara Chapter (Region III)
Babaria Institute of Technology, Department of CSE organized a One
day Workshop on “Advanced C using Qt” exclusively arranged for I year
CSI Student members on 29th April, 2015 in which a total of 45 students
actively participated.
By attending the workshop, students were exposed to New Open source
software for developing applications like Notepad, calculator etc. during
live hands-on-session.
Parti cipants with Speaker Prof. Atul Saurabh & Prof. Ketan B. Rathod
Report from Vellore Chapter
CSI Vellore Chapter and Student Branch organized a 48 hours Media and
Development hack fest called “Code Play” from 23rd to 26th April 2015,
the start-up industry CEO from Zophop, CarWale and Muto Technologies
attended the event, where 300 CSI volunteers participated in the event,
around 25 students got internship in above companies. The event was
organized by Prof. Shalini.L, Prof. Govinda. K and Prof. Jagadeesh.G.
Interacti on between CSI Student volunteers and CEO’s
CSI Communications | June 2015 | 49
From Student Branches »(REGION - I) (REGION -III )
DRONACHARYA COLLEGE OF ENGINEERING, GURGAON SRI AUROBINDO INSTITUTE OF TECHNOLOGY, INDORE
26 & 27-3-2015 – Chief Guest & speakers during two days Technical event
on Drontech 2K15 23-4-2015 – During programming contest on Code Scratch
(REGION-III) (REGION-III)G H PATEL COLLEGE OF ENGINEERING & TECHNOLOGY, VALLABH VIDYANAGAR SAGAR INSTITUTE OF SCIENCE TECHNOLOGY & RESEARCH, BHOPAL
27-3-2015 – During Expert Talk Detecting disease spread in a Geographic
location - A big data approach
16 to 18-4-2015 – During workshop on Web and E-commerce Site
Development
(REGION-III) (REGION-IV)TRUBA COLLEGE OF ENGINEERING & TECHNOLOGY, INDORE SHRI SHANKARACHARYA INSTITUTE OF PROFESSIONAL MANAGEMENT &
TECHNOLOGY, RAIPUR
17 & 18-3-2015 – During Two Days National Workshop on Impact of
Cloud technology in education
30-3-2015 – Winners and organizers during State Level Student
Convention
(REGION-V) (REGION-V)GSSS INSTITUTE OF ENGINEERING & TECHNOLOGY FOR WOMEN, MYSURU KLE DR M S SHESHGIRI COLLEGE OF ENGINEERING & TECH, BELGAUM
8-5-2015 – Student during the seminar on Awareness about the benefi ts
of GATE, PSU’s & IES9-3-2015 – During HACKATHON – Overnight Coding
CSI Communications | June 2015 | 50 www.csi-india.org
(REGION-V) (REGION-V)NMAM INSTITUTE OF TECHNOLOGY, NITTE SRINIVAS INSTITUTE OF TECHNOLOGY, MANGALORE
17-3-2015 – During one-day workshop on Ethical Hacking 4-4-2015 – During one day workshop on SDN and Data Centre
Networking
(REGION-VI) (REGION-VI)MARATHWADA INSTITUTE OF TECHNOLOGY, AURANGABAD MARATHWADA INSTITUTE OF TECHNOLOGY, AURANGABAD
17-4-2015 – During Expert talk on Career Guidance and Job Opportunities
on .NET
18-04-2015 – One-Day Workshop on Open-Source Testing Tool
Selenium. Mr. Anurang Dorle, IGATE,Pune
(REGION-VII) (REGION-VII)S A ENGINEERING COLLEGE, CHENNAI ADHIYAMAAN COLLEGE OF ENGINEERING, HOSUR
29-4-2015 – During International Conference on Futuristic Trends in
Computing & Communication
19 & 20-3-2015 – During Second National Conference on Trends in
Advanced Computing and Applications
(REGION-VII) (REGION-VII)EINSTEIN COLLEGE OF ENGINEERING, TIRUNELVELI SRM VALLIAMMAI ENGINEERING COLLEGE, KATTANKULATHUR
6-4-2015 – Dr. Velautham, Prof. Ezhilvanan, Mr. Mohan, Past President,
CSI, Dr. Ramar & Prof. Suresh Thangakrishnan during seminar on Focusing
Research and Documentation
25-4-2015 - Mr. Saravanan, Dr. Abdul Rasheed, Dr. Murugan,
Mr. Sitaraman, Mrs. Meenakshi & Mrs. Revathi during National
Conference on Recent Trends in Computational Intelligence
CSI Communications | June 2015 | 51
CSI Calendar 2015
Anirban BasuVice President, CSI & Chairman, Conf. Committee Email: [email protected]
Date Event Details & Organizers Contact Information
June 2015 event
19-20 June 2015 National Conference on Advance Trends in “Computer Science & Mathematical Techniques”, Organised by CSI Udaipur Chapter, Division IV, ACM Udaipur Chapter and Career Point University, Kota At Kota\, Rajasthan http://www.cpur.in/conference/ATCSMT15/index.php
Mr. Amit [email protected]
July 2015 events
3-4 July 2015 ICT4SD 2015 International Conference on ICT for Sustainable Development, Organized by ASSOCHAM Gujarat Chapter and Sabar Institute of Technology for Girls, GujaratKnowledge Partner : Computer Society of India At The Pride Hotel, Ahmedabad http//www.ict4sd.in
Mr. Amit [email protected]. Nisarg [email protected]
24-25 July 2015 International Conference on ICT in Health Care and E- Governance, at Sri Aurobindo Institute of Technology, Indore in associate with Computer Society of India Division III, Division IV, Indore Chapter, ACM Udaipur Chapter At Indore, India www.csi-udaipur.org/icthc-2015/
Dr. Durgesh Kumar [email protected]. A K Nayak [email protected]. Amit Josi [email protected]
Aug 2015 event
7-8 Aug 2015 ICICSE-2015: 3rd International Conference on Innovations in Computer Science & Engineering Dr. H S Saini [email protected]. D D Sarma [email protected]
Sept 2015 events
9-11 Sep 2015 Twelfth International Conference on Wireless and Optical Communications Networks WOCN2015 Next Generation Internet at M.S. Ramaiah Institute of Technology and Bangalore University, Bangalore, (in association with CSI Division IV)
Dr. Srinivasa K G [email protected] Dr. Guy Omidyar [email protected]. Durgesh Mishra [email protected]
10-12 Sep 2015 International Conference on Computer Communication and Control (IC42015) at Medicaps Group of Institutions, Indore (in association with CSI Division IV, Indore Chapter and IEEE MP Subsection)
Dr. Pramod S [email protected] Prof. Pankaj [email protected]
Oct 2015 events
9-10 Oct 2015 International Congress on Information and Communication Technology (ICICT-2014) at Udaipur ( in association with CSI Udaipur Chapter, Div-IV, SIG-WNs, SIG- e-Agriculture and ACM Udaipur Chapter) at Udaipur, India www.csi-udaipur.org/icict-2015/
Dr. Y C [email protected] Amit Joshi [email protected]
16-17 Oct 2015 6th International Conference on Transforming Healthcare with IT at Hotel Lalit Ashok, Bangalore
Mr. Suresh Kotchatill, Conference Coordinator, [email protected]
Kind Attention: Prospective Contributors of CSI CommunicationsPlease note that Cover Theme for forthcoming issue of July 2015 is planned as follows:
• July 2015 – Emerging Trends in ITArticles may be submitted in the categories such as: Cover Story, Research Front, Technical Trends and Article. Please send your contributions before 20th June 2015. The articles may be long (2500-3000 words maximum) or short (1000-1500 words) and authored in as original text. Plagiarism is strictly prohibited.
Please note that CSI Communications is a magazine for members at large and not a research journal for publishing full-fl edged research papers. Therefore, we expect articles written at the level of general audience of varied member categories. Equations and mathematical expressions within articles are not recommended and, if absolutely necessary, should be minimum. Include a brief biography of four to six lines for each author with high resolution author photograph.
Please send your articles in MS-Word and/or PDF format to Dr. Vipin Tyagi, Guest Editor , via email id [email protected] with a copy to [email protected].
(Issued on the behalf of Editorial Board CSI Communications)
Registered with Registrar of News Papers for India - RNI 31668/1978 If undelivered return to : Regd. No. MCN/222/20l5-2017 Samruddhi Venture Park, Unit No.3, Posting Date: 10 & 11 every month. Posted at Patrika Channel Mumbai-I 4th fl oor, MIDC, Marol, Andheri (E). Mumbai-400 093 Date of Publication: 10th of every month
CSI-2015 50th Golden Jubilee Annual Convention
on
Digital Life(02nd – 05th December, 2015)
Hosted by: Computer Society of India (CSI), Delhi ChapterPaper Submission Deadline: 17th August, 2015 [No Further Extension]
Paper Submission Link: http://www.csi-2015.org/PaperSubmission.php Convention Website: http://www.csi-2015.org/
Announcement and Call for PapersCSI-2015 invite full length original and unpublished research papers, based on theoretical or experimental contributions in the area of, primarily,
Computer Science and Information Technology and, generally, all interdisciplinary streams of Engineering Sciences, for presentation and publication
in the convention. CSI-2015 will be an amalgamation of the following ten diff erent Tracks organized parallel to each other, in addition to few theme
based Special Sessions:-
Track # 1: ICT Based Innovation Track # 6: Big Data Analytics
Track # 2: Next Generation Networks Track # 7: System and Architecture
Track # 3: Nature Inspired Computing Track # 8: Cyber Security
Track # 4: Real Time Language Translations Track # 9: Software Engineering
Track # 5: Sensors Track # 10: 3-D Silicon Photonics & High Performance Computing
CSI-2015 will be held at India International Centre (IIC), Lodhi Road, New Delhi (INDIA). The convention will provide a platform for technical
exchanges amongst scientists, teachers, scholars, engineers and research students from all around the world and will encompass regular paper
presentation sessions, invited talks, key note addresses, panel discussions and poster exhibitions.
Instruction for AuthorsAuthors from across diff erent parts of the world are invited to submit their papers. Authors should upload their papers online at http://www.csi-
2015.org/PaperSubmission.php. Unregistered authors should fi rst create an account on http://www.bvicam.ac.in/csi-2015/addMember.asp to log
on and upload paper. Only electronic submissions will be considered. Submissions through e-mail will not be considered.
Accepted papers shall be published by Springer in the form of Pre-Convention Proceedings, both, Soft Copy as well as Hard Copy and will be
indexed with the world’s leading indexing / abstracting / bibliographic databases.
Senior experts / researchers are also invited to submit their proposals online for organizing Special Sessions at http://www.bvicam.ac.in/csi-2015/specialSessions.asp.
Important Dates
Submission of Full Length Paper 17th August, 2015 Paper Acceptance Notifi cation 06th October, 2015
Submission of Camera Ready Copy (CRC) of the Paper
20th October, 2015 Registration Deadline (for inclusion
of Paper in Proceedings)
26th October, 2015
Detailed Call for Paper is available at http://www.csi-2015.org/CallForPapers.php. For any other query, please visit our web-portal at
http://www.csi-2015.org/home.php or write us back at [email protected]; [email protected]
Chief Patron Patron
Padmashree Dr. R. ChidambaramPrincipal Scientifi c Advisor (PSA), Govt. of India
Prof. S. V. RaghavanScientifi c Secretary, Offi ce of the PSA, Govt. of India
Chair, Programme Committee Chair, Organizing Committee Chair, Finance Committee
Prof. K. K. AggarwalChancellor, KRM University, Gurgaon and
Former Founder Vice Chancellor, GGSIP
University, New Delhi
Dr. Gulshan RaiNational Cyber Security Co-ordinator, Govt.
of India
Mr. Satish KhoslaManaging Director, Cognilytics Software and
Consulting Pvt ltd.
All correspondences, related to CSI-2015 must be addressed to
Prof. M. N. HodaSecretary, Programme Committee (PC), CSI – 2015
Director, Bharati Vidyapeeth’s
Institute of Computer Applications and Management (BVICAM)
A-4, Paschim Vihar, Rohtak Road, New Delhi – 110063 (INDIA)
Tel.:+ 91–11–25275055 Fax:+ 91–11–25255056 Mobile: +91–9212022066
E-Mail: [email protected]; [email protected]; Visit us at http://www.csi-2015.org/