knowledge digest for it community · understand why the problem is complex and requires a machine...

Knowledge Digest for IT Community

RESEARCH FRONTRemote Monitoring and Localizationusing Sensors: Tools for e-Governance 17

ARTICLEOntology Modeling in E-Governance for aSemantic Digital India 25

Volume No. 41 | Issue No. 7 | October 2017 50/-

52 pages including cover

COVER STORYCSI Nihilent eGovernance Awards 7

TECHNICAL TRENDSMeri Sadak 2.0 :One step closer to SMART CITY 15

ww

w.c

si-in

dia.

org

ISSN

097

0-64

7X

Knowledge Digest for IT Community

RESEARCH FRONTEnterprise Information Security RiskManagement 20

ARTICLEApplication Security using Blockchain inCyber Physical System 25

SECURITY CORNERSecurity Issues in Cyber Physical Systems 31

Volume No. 41 | Issue No. 9 | December 2017 50/-

52 pages including cover

COVER STORYCyber Physical Systems (CPS) and its Implications 8

TECHNICAL TRENDSMachine Learning in Advanced Python 11

ww

w.c

si-in

dia.

org

ISSN

097

0-64

7X

3 C S I C O M M U N I C A T I O N S | D E C E M B E R 2 0 1 7

CSI COMMUNICATIONS

Please note:CSI Communications is published by Computer Society of India, a non-profit organization. Views and opinions expressed in the CSI Communications are those of individual authors, contributors and advertisers and they may differ from policies and official statements of CSI. These should not be construed as legal or professional advice. The CSI, the publisher, the editors and the contributors are not responsible for any decisions taken by readers on the basis of these views and opinions.Although every care is being taken to ensure genuineness of the writings in this publication, CSI Communications does not attest to the originality of the respective authors’ content. © 2012 CSI. All rights reserved.Instructors are permitted to photocopy isolated articles for non-commercial classroom use without fee. For any other copying, reprint or republication, permission must be obtained in writing from the Society. Copying for other than personal use or internal reference, or of articles or columns not owned by the Society without explicit permission of the Society or the copyright owner is strictly prohibited.

P L U SKnow Your CSI 2nd CoverICANN|60 6CSI Patna Chapter Report 7Report on CSI Student Conventions : Karnataka & Haryana State Level convention

40

State Student Convention 2017, West Bengal 41Latex Workshop & Workshop on Python - Programming Tool for Data Science 41CSI Reports 42Student Branches News 44CSI Calendar 2017-18 3rd CoverThe 2017 India-Africa ICT Summit Back Page

ContentsCover StoryCyber Physical Systems (CPS) and its ImplicationsS. Suseela and T. Kavitha

8

Technical TrendsMachine Learning in Advanced PythonSuchithra M S and Maya L Pai

11

Blockchain: A PrimerDurgesh Barwal, Rajat Kumar Behera and Abhaya Kumar Sahoo

15

Research FrontEnterprise Information Security Risk ManagementK. Srujan Raju and M. Varaprasad Rao

20

ArticlesApplication Security using Blockchain in Cyber Physical SystemPoonam N. Railkar, Sandesh Mahamure and Dr. Parikshit N. Mahalle

25

Cyber Physical Systems and Smart CitiesNishtha Kesswani and Sanjay Kumar

29

Security CornerSecurity Issues in Cyber Physical SystemsSwati Maurya and Anurag Jain

31

Cyber Security and Human RightsSubrata Paul, Anirban Mitra and Brojo Kishore Mishra

34

Practitioner WorkbenchFun with Digital Image Processing in PHP on Windows and Linux PlatformBaisa L. Gunjal

36

Printed and Published by Prof. A. K. Nayak on behalf of Computer Society of India, Printed at G.P. Offset Pvt. Ltd. 269 / A2, Shah & Nahar Industrial Estate, Dhanraj Mill Compound, Lower Parel (W), Mumbai 400 013 and published from Computer Society of India, Samruddhi Venture Park, Unit-3, 4th Floor, Marol Industrial Area, Andheri (East), Mumbai 400 093. Tel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected]

Chief EditorS S AGRAWAL KIIT Group, Gurgaon

EditorPRASHANT R. NAIR Amrita Vishwa Vidyapeetham, Coimbatore

Published byA. K. NAYAK Hony. SecretaryFor Computer Society of India

Editorial Board:Arun B Samaddar, NIT, Sikkim

Bhabani Shankar Prasad Mishra, KIIT University, Bhubanewar

Debajyoti Mukhopadhyay, MIT, Pune

J. Yogapriya, Kongunadu Engg. College, Trichy

M Sasikumar, CDAC, Mumbai,

R Subburaj, SRM University, Chennai

R K Samanta, Siliguri Inst. of Tech., West Bengal

R N Behera, NIC, Bhubaneswar

Sudhakar A M, University of Mysore

Sunil Pandey, ITS, Ghaziabad

Shailesh K Srivastava, NIC, Patna

Vishal Mehrotra, TCS

Design, Print and Dispatch byGP OFFSET PVT. LTD.

VOLUME NO. 41 • ISSUE NO. 9 • DECEMBER 2017

• CoverStory• TechnicalTrends• ResearchFront• Articles• InnovationsinIT• SecurityCorner• PractitionerWorkbench• BrainTeaser• ChapterReports• Studentbranchreports

www.csi-india.org 4

C S I C O M M U N I C A T I O N S | D E C E M B E R 2 0 1 7

Dear Fellow CSI Members,

The theme for the Computer Society of India (CSI) Communications (The Knowledge Digest for IT Community) December 2017 issue is Cyber Physical Systems.

“Cyber-Physical Systems or “smart” systems are co-engineered interacting networks of physical and computational components. These systems will provide the foundation of our critical infrastructure, form the basis of emerging and future smart services, and improve our quality of life in many areas.”

National Institute of Standard & Technology (NIST), USA

After a series of thematic issues focusing on ICT in applications such as education, governance, agriculture and health, CSI Communications is focusing on cyber physical systems in this issue after an issue on the research topic of machine learning. The next issue is also based on research theme, Machine Intelligence.

Cyber Physical Systems (CPS) is poised to bring advances in personalized health care, emergency response, traffic flow management, and electric power generation and delivery. This technology builds on embedded systems, computers and software embedded in devices whose principle mission is not computation, such as cars, toys, medical devices, and scientific instruments. CPS integrates the dynamics of the physical processes with those of the software and networking, providing abstractions and modeling, design, and analysis techniques for the integrated whole

The Cover story in this issue is “Cyber Physical Systems (CPS) and its Implications” by S. Suseela & T. Kavitha. In the cover story, the authors have traced the evolution and described the architecture, applications, platforms and functions of CPS.

The technical trends showcased are “Machine Learning in Advanced Python” by Suchithra M.S. & Maya L Pai and “Blockchain: A Primer” by Durgesh Barwal Rajat Kumar Behera & Abhaya Kumar Sahoo

In Research front, we have “Enterprise Information Security Risk Management” by K. Srujan Raju & M. Varaprasad Rao, who throw light upon current research and approaches for enterprise information security risk management.

Other articles in this issue on CPS provide us information on its applications in smart cities by Nishtha Kesswani & Sanjay Kumar and Application Security using Blockchain in CPS by Poonam N. Railkar Sandesh Mahamure & Parikshit N. Mahalle

The Security Corner has 2 contributions, “Security Issues in Cyber Physical Systems” by Swati Maurya & Anurag Jain and “Cyber Security and Human Rights” by Subrata Paul, Anirban Mitra & Brojo Kishore Mishra.

We have revived the Practitioner’s Workbench in this issue with “Fun with Digital Image Processing in PHP on Windows and Linux Platform” by Baisa L. Gunjal

This issue also contains collage of ICANN 60 participation by CSI, MoU with Cisco, CSI activity reports from chapters & student branches and calendar of events

We are thankful to entire ExecCom for their continuous support in bringing this issue successfully.

We wish to express our sincere gratitude to the CSI publications committee, editorial board, authors and reviewers for their contributions and support to this issue.

We look forward to receive constructive feedback and suggestions from our esteemed members and readers at [email protected].

With kind regards,

Prof. (Dr.) S. S. Agrawal, Chief Editor Prof. Prashant R. Nair, Editor

Editorial

Prof. Prashant R. Nair Editor

Prof. (Dr.) S. S. AgrawalChief Editor


Machine Learning in Advanced Python Suchithra M S Maya L Pai

School of Arts & Sciences, Amrita University, Kochi, India. School of Arts & Sciences, Amrita University, Kochi, India. Email: [email protected] Email: [email protected]

Machine learning is a growing field and a motivated developer can quickly learn it up and start making very real and useful contributions. Machine learning algorithms are a big part of machine learning. Machine learning algorithms contain a lot of mathematics and theory. But we do not need to know about algorithm’s work to be able to implement them and apply them to achieve real and valuable results. This is achieved through different machine learning tools. In this study, we explain about machine learning and machine learning algorithms. The usage of machine learning tools like Weka, R and Python and a review on recent trends of machine learning is also given due attention.

Index Terms - machine learning, algorithms, tools, python.I. Introduction

A machine learning developer is a developer that built machine learning systems. These systems contain algorithms that could learn from data. Applied machine learning can be overwhelming. There are so many things to try and explore on a given problem. The developer can use a structured process, just like using a structured process to develop software [1]. The template for a multi-step process when using machine learning to address a complex problem is1. Define the problem.2. Prepare the data.3. Spot check various learning

algorithms.4. Tune well-performing learning

algorithms.5. Visualize the results.

To speed up the process, understand the problem a little bit from many different perspectives. � What is the problem? � Why does the problem need to be

solved? � How would I solve the problem?

This last step helps us to understand why the problem is complex and requires a machine learning based solution. To get the best results, we must understand how algorithms work. Mathematics plays an important role in

understanding algorithms. There is a much easier way by using the language and methods that developers already know:

¬ Simple and clear algorithm descriptions.

¬ Code examples without libraries.We can build up functions to

evaluate predictions, estimate the skill of models and even implement the learning algorithms themselves. A machine learning professional uses machine learning to solve real-world problemsII. Applied machine learning

Understanding of the following four areas are needed for designing applied machine learning projects [2].1. Data Preparation:

In this method, the developer loads the data from standard CSV file format for manipulation and prepares the data for machine learning algorithms. The performance of algorithm on testing data can be estimated using algorithm evaluation techniques. To evaluate the efficiency of predictions made on unseen data the scoring methods are used. The best worse case results are analyzed through Baseline Modeling techniques to improve on a problem. Once we have a test harness that we can trust, select and evaluate 5 to 10 standard workhorse algorithms. This gives us an idea of how difficult our problem is and which algorithms

might be worth spending some time on tuning. Test Harness algorithm is used to evaluate different methodologies on the same problem by comparing the results from different techniques.2. Linear Algorithms: � Simple Linear Regression [3]:

It is used for numerical value prediction and the dataset contains only a single input. � Multivariate Linear Regression:

It is also used for numerical value prediction and the dataset contains more than one input. It is trained by using Stochastic Gradient Descent. � Logistic Regression:

This method is used for class value prediction on two class problems and it is trained by Stochastic Gradient Descent. � Perceptron:

The easiest model of neural network for classification problems is perceptron and it is trained by using Stochastic Gradient Descent.3. Nonlinear Algorithms � Regression and Classification

Trees:These are decision trees and

that are applied to regression and classification problems. � Naive Bayes:

It is an application of Bayes’ Theorem for classification problems.

TECHNICAL TRENDS


www.csi-india.org 12


The theory of probability is the base for Naïve Bayes. � Backpropagation:

The commonly used method of artificial neural network and it is widely applicable to supervised learning or classification that roots the broader field of deep learning. � k-Nearest Neighbors (KNN):

These algorithms are used for predicting categorical or numerical outputs directly from the training data. � Learning Vector Quantization

(LVQ): A widely used method of neural

network is LVQ which is more efficient than KNN.4. Ensemble Algorithms � Bootstrap Aggregation:

It involves an ensemble of decision trees and also known as bagging. � Random Forest:

This is an extension of bagging which results in faster training and better performance. � Stacked Aggregation:

This method learns how to combine the predictions from multiple models in an efficient method. It is an ensemble method and also known as blending or stacking.

Many complex machine learning problems can be reduced to one of four core problem types: Classification, Regression, Clustering and Rule extraction. If we can map everyday problems to one of these problems, we can then find and start testing algorithms that can address those problems. Examples of machine learning problems:1. Spam Detection: To identify the

given email message in a mail inbox as spam or not.

2. Credit Card Fraud Detection: To identify the credit card transactions that were not made by the customer by the giving the transactions for a customer in a month.

3. Digit Recognition: To identify the digit for each handwritten character by giving the handwritten zip codes on envelopes.

4. Speech Understanding: To identify the specific request made by the

user. That is by giving an utterance from a user, it identifies the specific request made.

IV. Machine learning algorithmsMachine learning is closely related

to many fields, i.e., it is a multidisciplinary field. It is very difficult to differentiate machine learning from related fields. Machine Learning is built on the field of Computer Science and mathematics. Knowing these foundational fields can help us to understand why certain mathematical language is used when describing algorithms, such as vectors, matrices, functions and distributions.

Three specific foundational fields include: � Probability: It is the study of

characterizing the possibility of random events.

� Statistics: It is the study of processes to collect, analyzes, explain and present data.

� Artificial Intelligence: It is the construction and study of computational intelligent systems.Machine learning also has sibling

fields that sit alongside. These special fields give context to machine learning algorithms. These include: � Computational Intelligence: It

is the study and construction of complex systems.

� Data Mining: It is the construction and study of computational systems that discover useful relationships and patterns from large data sets.A useful way to group algorithms

is by their similarity in structure or learning style [4]. The five classes of machine learning algorithm that can be used to group algorithms by structure and learning style are:1. Regression: linear regression,

logistic regression and stepwise regression.

2. I n s t a n c e - b a s e d M e t h o d s : k-nearest neighbor, learning vector quantization and self-organizing map.

3. Decision Tree Learning: C4.5, CART and ID3.

4. Kernel Methods: support vector machine, radial basis network and linear discriminant analysis.

5. Artificial Neural Networks: Perceptron, Hopfield and back-

propagation.Our goal is to effectively use time

to process algorithms. That is to build a robust test harness so that we can throw algorithms in and very quickly learn what works and what doesn’t.

There are 2 concerns when building a test harness: � What is the performance measures

used to evaluate algorithms? � What data to use to train and test

our algorithm? � Once we have a test harness that

we can trust, select and evaluate 5-to-10 standard workhorse algorithms. This gives us an idea of how difficult our problem is and which algorithms might be worth spending some time on tuning. This technique is called spot-checking. There are two main tactics that we can use to get the most out of machine learning algorithms: Algorithm tuning and Ensembles. Generally, machine learning algorithms can be explained as learning a output function (f) that perfectly maps input variables (P) to an output variable (Q).

Q = f (P)Our goal in evaluating different

algorithms and even different configurations of an algorithm is to find a good approximation for the output function (f) to get really good predictions (Q) [5].

We can often get a boost in performance by combining the predictions from multiple well performing models. These techniques are called ensemble machine learning algorithms and are often internally simpler than we first think. When investigating how machine learning algorithms work, there are two ensemble methods I would recommend looking into:1. Bagging (e.g.: Random forest)2. Boosting (e.g.: Adaboost)

These are two very simple foundations of very powerful ensemble machine learning algorithms [6].V. Machine Learning Tools1. Weka Tool

The best machine learning tool for beginners is Weka. There are three main reasons to use Weka for beginners:

TECHNICAL TRENDS


TECHNICAL TRENDS

� It has a graphical interface, which means that there is no programming.

� It offers a suite of state-of-the-art machine learning algorithms, including ensemble methods.

� It is free and open source software.Weka platform allows us to quickly

design and run experiments. We must experiment to discover how to get good results. The Weka experimenter allows us to do this.1. Start Weka2. Design a new experiment

� Select a Dataset � Select one or more algorithms

or algorithm configurations3. Run the experiment4. Review the results and use

statistics to check for significanceWith a few clicks we can quickly

design experiments to test our ideas and intuitions on our problem. It is a very powerful feature that few machine learning platforms offer.2. R Tool

R is a platform that is used by some of the best data scientists in the world. The reason is not the strange scripting language. It is because of the vast number of techniques available. Academics that develop new machine learning algorithms use R, meaning that often new algorithms appear on R platform before any other. With packages like caret, we can access hundreds of the top machine learning algorithms in R through a consistent interface, ideal for spot checking techniques on our dataset.1. Python

Python cannot be ignored in machine learning. It is rapidly catching up to platforms like R in terms of capability and adoption. The cause is the scikit-learn Python library for machine learning that is built on top of the SciPy stack, harnessing the speed and power of Python libraries such as Numpy for fast data manipulation at C-like speeds. The scikit-learn library is fully featured, offering a suite of algorithms to choose from as well as data preparation scheme and clever Pipelines that allow us to design how data flows from one element to the next.

Python is the fastest-growing platform for applied machine learning

among experts of data scientists. We cannot get started with machine learning in Python until we have access to the platform. We must download and install the Python 2.7 platform on our computer. We also need to install the SciPy platform and the scikit-learn library. We can install everything at once with Anaconda. Anaconda is recommended for beginners. We can load our own data from CSV files.

The general structure for working through a machine learning problem in Python with Pandas and scikit-learn can be divided into 6 steps:1. Install the Python and SciPy

platform.2. Load a standard dataset.3. Summarize the data using

statistical functions in Pandas.4. Visualize the data using plotting

function in Pandas.5. Evaluate machine learning

algorithms in scikit-learn.6. Develop a final model and make

some predictions on new data.The better we can understand our

data, the better and more accurate the models that we can build. The first step to understanding our data is to use descriptive statistics. To learn how to use descriptive statistics to understand our data, the helper functions provided on the Pandas Data Frame. A second way to improve our understanding of our data is by using data visualization techniques (e.g. plotting). We can use plotting in Python to understand attributes alone and their interactions. Data visualization is the fastest way to learn more about our data. Pandas in Python use number of ways to effectively understand our machine learning data. The different types of methods used to plot our data in Python is as follows: � Box and Whisker Plots � Histograms � Correlation Matrix Plot � Density Plots � Scatterplot Matrix

The consistent interface in Python uses Scikit-learn to provide a range of supervised and unsupervised learning algorithms. The library must be installed before we can use scikit-learn [9]. The Library is built upon the Scientific Python (SciPy). This library stack includes:

� SciPy: The basic library for scientific computations

� NumPy: It is based on n-dimensional array package.

� Matplotlib: It is used for complete 2D/3D plotting

� Pandas: It can be used as an effective data analysis and structuring tool.

� Sympy: The symbolic mathematics is represented by this method.

� IPython: It is an enhanced interactive console used in computing environmentThe modules or extensions for

SciPy are commonly named as SciKits. A Python library called Theano is used for fast numerical computation and it helps in the development of deep learning models [8]. Theano library is used in Python as a compiler for mathematical expressions. Another Python library called TensorFlow [10] is also used to develop deep learning models. It is a platform that cannot be ignore by machine learning experts. It is used by the Google DeepMind research group. It is used in some of Google’s production systems with the backing of Google. The capability to run on CPUs, GPUs and large clusters is the advantage of Tensor Flow. Because of this it does have more of a production focus. The necessity to take a lot of code to develop even very easy neural network models is the difficulty of both Theano and TensorFlow. This problem is addressed by the Keras library and it is concerns with providing a package for both Theano and TensorFlow. To define and evaluate deep learning models in just a few lines of code is possible with clean and simple API provided by Keras library., it dominances the power of Theano and TensorFlow because of the ease of use. For applied deep learning, Keras is quickly becoming the prominent library. The life-cycle of a model can be summarized as follows:1. Define our Sequential model2. Add configured layers.3. Compile our model.4. Fit our model. 5. Make predictions.

V. ConclusionFrom this paper, we will be able

to understand the machine learning


www.csi-india.org 14


concepts and different types of machine learning algorithms. This paper concludes how can we select machine learning algorithms based on the problems and will be able to understand how python helps to solve machine learning problems. The impressive growth of python is illustrated in figure 1. It highlights the most advanced techniques in python to support machine learning.References[1] Brownlee, Jason. “Machine

learning mastery.” URL: http://machinelearningmastery. com/discover-feature-engineering-how-toengineer-features-and-how-to-get-good-at-it (2014).

[2] Brownlee, Jason. “A tour of machine learning algorithms.” Machine Learning Mastery (2013).

[3] Brownlee, J. “Linear Regression for Machine Learning-Machine Learning Mastery.” Machine Learning Mastery (2017).

[4] Brownlee, Jason. “How to Prepare Data for Machine Learning.” Machine Learning Mastery 25 (2013).

[5] Brownlee, J. “Machine Learning Algorithms.” Machine Learning Mastery (2015).

[6] Brownlee, Jason. “Supervised and

Unsupervised Machine Learning Algorithms.” Machine Learning Mastery (2016).

[7] https://www.kdnuggets.com/2017/08/python-overtakes-r-leader-analytics-data-science.html

[8] Al-Rfou, Rami, et al. “Theano: A Python framework for fast computation of

mathematical expressions.” arXiv preprint (2016).

[9] Raschka, Sebastian. Python machine learning. Packt Publishing Ltd, 2015.

[10] Abadi, Martín, et al. “TensorFlow: A System for Large-Scale Machine Learning.” OSDI. Vol. 16. 2016

n

50%

40%

30%

20%

10%

0% Share in 2016 Share in 2017

34%

41%42%

36%

8.5%12%

16%

11%

Fig. 1 : Share of Python, R, Both, or Other platforms usage for Analytics, Data Science, Machine Learning, 2016 vs 2017 [7]

TECHNICAL TRENDS

About the Authors

Dr. Maya L Pai born on July 21, 1961. She received the M.Sc. and Ph.D. degrees from Cochin University of Science and Technology (CUSAT), Kerala, India in 1983 and 2016, respectively.In 2000, she joined the Amrita Institute of Computer Technology, Kochi, India, as a Senior Lecturer. In 2003, Amrita Institute of Computer Technology became Amrita University. Now she is working at Amrita University as Assistant Professor (Senior Grade) and HOD, Department of Computer Science and IT. She has published papers in referred national and international journals. Her research interests include Data Mining, Machine Learning and Discrete mathematics.

Suchithra M S born on March 20, 1989. She received the M.E degree in Computer Science and Engineering from Anna University, Chennai, India in 2013.She has worked as Assistant Professor in Computer Science and Engineering from 2014 to 2016 in colleges under Calicut University. In 2016, she joined the School of Arts and Sciences, Amrita University, Kochi, India, as a Research Scholar. She has published papers in referred national and international journals. Her research interests include Data Mining, Machine Learning and Soil Science.

knowledge digest for it community · understand why the problem is complex and requires a machine...

Documents