bibudhendu pati rajkumar buyya kuan-ching li editors advanced … · 2020. 2. 19. · computing,...

30
Advances in Intelligent Systems and Computing 1082 Bibudhendu Pati Chhabi Rani Panigrahi Rajkumar Buyya Kuan-Ching Li   Editors Advanced Computing and Intelligent Engineering Proceedings of ICACIE 2018, Volume 1

Upload: others

Post on 23-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Advances in Intelligent Systems and Computing 1082

Bibudhendu PatiChhabi Rani PanigrahiRajkumar BuyyaKuan-Ching Li   Editors

Advanced Computing and Intelligent EngineeringProceedings of ICACIE 2018, Volume 1

Page 2: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Advances in Intelligent Systems and Computing

Volume 1082

Series Editor

Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,Warsaw, Poland

Advisory Editors

Nikhil R. Pal, Indian Statistical Institute, Kolkata, IndiaRafael Bello Perez, Faculty of Mathematics, Physics and Computing,Universidad Central de Las Villas, Santa Clara, CubaEmilio S. Corchado, University of Salamanca, Salamanca, SpainHani Hagras, School of Computer Science and Electronic Engineering,University of Essex, Colchester, UKLászló T. Kóczy, Department of Automation, Széchenyi István University,Gyor, HungaryVladik Kreinovich, Department of Computer Science, University of Texasat El Paso, El Paso, TX, USAChin-Teng Lin, Department of Electrical Engineering, National ChiaoTung University, Hsinchu, TaiwanJie Lu, Faculty of Engineering and Information Technology,University of Technology Sydney, Sydney, NSW, AustraliaPatricia Melin, Graduate Program of Computer Science, Tijuana Instituteof Technology, Tijuana, MexicoNadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,Rio de Janeiro, BrazilNgoc Thanh Nguyen , Faculty of Computer Science and Management,Wrocław University of Technology, Wrocław, PolandJun Wang, Department of Mechanical and Automation Engineering,The Chinese University of Hong Kong, Shatin, Hong Kong

Page 3: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

The series “Advances in Intelligent Systems and Computing” contains publicationson theory, applications, and design methods of Intelligent Systems and IntelligentComputing. Virtually all disciplines such as engineering, natural sciences, computerand information science, ICT, economics, business, e-commerce, environment,healthcare, life science are covered. The list of topics spans all the areas of modernintelligent systems and computing such as: computational intelligence, soft comput-ing including neural networks, fuzzy systems, evolutionary computing and the fusionof these paradigms, social intelligence, ambient intelligence, computational neuro-science, artificial life, virtual worlds and society, cognitive science and systems,Perception and Vision, DNA and immune based systems, self-organizing andadaptive systems, e-Learning and teaching, human-centered and human-centriccomputing, recommender systems, intelligent control, robotics and mechatronicsincluding human-machine teaming, knowledge-based paradigms, learning para-digms, machine ethics, intelligent data analysis, knowledge management, intelligentagents, intelligent decision making and support, intelligent network security, trustmanagement, interactive entertainment, Web intelligence and multimedia.

The publications within “Advances in Intelligent Systems and Computing” areprimarily proceedings of important conferences, symposia and congresses. Theycover significant recent developments in the field, both of a foundational andapplicable character. An important characteristic feature of the series is the shortpublication time and world-wide distribution. This permits a rapid and broaddissemination of research results.

** Indexing: The books of this series are submitted to ISI Proceedings,EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/11156

Page 4: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Bibudhendu Pati • Chhabi Rani Panigrahi •

Rajkumar Buyya • Kuan-Ching LiEditors

Advanced Computingand Intelligent EngineeringProceedings of ICACIE 2018, Volume 1

123

Page 5: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

EditorsBibudhendu PatiDepartment of Computer ScienceRama Devi Women’s UniversityBhubaneswar, Odisha, India

Chhabi Rani PanigrahiDepartment of Computer ScienceRama Devi Women’s UniversityBhubaneswar, Odisha, India

Rajkumar BuyyaCloud ComputingThe University of MelbourneMelbourne, VIC, Australia

Kuan-Ching LiDepartment of Computer Science andInformation EngineeringProvidence UniversityTaichung, Taiwan

ISSN 2194-5357 ISSN 2194-5365 (electronic)Advances in Intelligent Systems and ComputingISBN 978-981-15-1080-9 ISBN 978-981-15-1081-6 (eBook)https://doi.org/10.1007/978-981-15-1081-6

© Springer Nature Singapore Pte Ltd. 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmissionor information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exempt fromthe relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor theauthors or the editors give a warranty, expressed or implied, with respect to the material containedherein or for any errors or omissions that may have been made. The publisher remains neutral with regardto jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,Singapore

Page 6: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Preface

This volume contains the papers presented at the 3rd International Conference onAdvanced Computing and Intelligent Engineering (ICACIE 2018). ICACIE 2018(www.icacie.com) was held during 22–24 December 2018, at the Institute ofTechnical Education and Research (ITER), Siksha ‘O’ Anusandhan Deemed to beUniversity, Bhubaneswar, India, in collaboration with Rama Devi Women’sUniversity, Bhubaneswar, India. There were 528 submissions, and each qualifiedsubmission was reviewed by a minimum of two Technical Program Committeemembers using the criteria of relevance, originality, technical quality and presen-tation. The committee accepted 103 full papers for oral presentation at the con-ference, and the overall acceptance rate is 20%.

ICACIE 2018 was an initiative taken by the organizers, which focuses onresearch and applications on topics of advanced computing and intelligent engi-neering. The focus was also to present state-of-the-art scientific results, to dis-seminate modern technologies and to promote collaborative research in the field ofadvanced computing and intelligent engineering.

Researchers presented their work in the conference and had an excellentopportunity to interact with eminent professors, scientists and scholars in their areaof research. All participants were benefitted from discussions that facilitated theemergence of innovative ideas and approaches. Many distinguished professors,well-known scholars, industry leaders and young researchers participated in makingICACIE 2018 an immense success.

We also had panel discussion on the emerging topic entitled IntellectualProperty and Standardisation in Smart City consisting of panellists from softwareindustries like TCS and Infosys, educationalist and entrepreneurs.

We thank all the Technical Program Committee members and allreviewers/sub-reviewers for their timely and thorough participation during thereview process.

We express our sincere gratitude to Prof. Manojranjan Nayak, President, S‘O’ADeemed to be University, and Prof. Damodar Acharya, Chairman, AdvisorCommittee, for their endless support towards organizing the conference. We extendour sincere thanks to Honourable Vice-Chancellor, Dr. Amit Banerjee, for allowing

v

Page 7: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

us to organize ICACIE 2018 on the campus and also for his timely support. We alsoexpress our sincere gratitude to Honourable Patron Vice-Chancellor, Prof. PadmajaMishra, Rama Devi Women’s University, for her moral support and ceaselessencouragement towards successful completion of the conference. We thankProf. Pradeep Kumar Sahoo, Dean Faculty of Engineering and Technology, for hisguidance in local organization of ICACIE 2018. We also thank General Chairs,Prof. P. K. Nanda, Prof. Bibudhendu Pati and Prof. A. K. Nayak, for their valuableguidance during review of papers as well as other aspects of the conference. Weappreciate the time and efforts put in by the members of the local organizing team atthe Institute of Technical Education and Research (ITER), Siksha ‘O’ AnusandhanDeemed to be University, Bhubaneswar, India, especially the faculty members ofDepartment of Computer Applications, Computer Science & InformationTechnology, and Computer Science & Engineering, student volunteers, andadministrative and hostel management staff, who dedicated their time and efforts tomake ICACIE 2018 successful. We thank Er. Subhashis Das Mohapatra fordesigning and maintaining ICACIE 2018 Website.

Bhubaneswar, India Bibudhendu PatiBhubaneswar, India Chhabi Rani PanigrahiMelbourne, Australia Rajkumar BuyyaTaichung, Taiwan Kuan-Ching Li

vi Preface

Page 8: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

About This Book

The book focuses on theory, practice and application in the broad areas of advancedcomputing techniques and intelligent engineering. This two-volume book includes109 scholarly articles, which have been accepted for presentation from 528 sub-missions in the 3rd International Conference on Advanced Computing andIntelligent Engineering held at the Institute of Technical Education and Research(ITER), Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar, India, incollaboration with Rama Devi Women’s University, Bhubaneswar, India, during22–24 December 2018. The first volume of this book consists of 54 papers, and thesecond volume contains 49 papers with a total of 103 papers. This book bringstogether academic scientists, professors, research scholars and students to share anddisseminate their knowledge and scientific research works related to advancedcomputing and intelligent engineering. It helps to provide a platform for the youngresearchers to find the practical challenges encountered in these areas of researchand the solutions adopted. The book helps to disseminate the knowledge aboutsome innovative and active research directions in the field of advanced computingtechniques and intelligent engineering, along with some current issues and appli-cations of related topics.

vii

Page 9: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Contents

Machine Learning Applications

Effect of Dimensionality Reduction on Classification Accuracyfor Protein–Protein Interaction Prediction . . . . . . . . . . . . . . . . . . . . . . . 3Satyajit Mahapatra, Anish Kumar, Animesh Sharmaand Sitanshu Sekhar Sahu

Early Detection of Breast Cancer Using Support Vector MachineWith Sequential Minimal Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 13Kumar Avinash, M. B. Bijoy and P. B. Jayaraj

Clustering Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25N. Karthika and B. Janet

Heuristic Algorithm for Resolving Pronominal Anaphorain Hindi Dialects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Seema Mahato, Ani Thomas and Neelam Sahu

End-to-End Reinforcement Learning for Self-driving Car . . . . . . . . . . . 53Rohan Chopra and Sanjiban Sekhar Roy

Understanding Antibiotic Resistance Using Different MachineLearning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Tanaya Priyadarshini Pradhan, N. K. Debata and Tripti Swarnkar

Computer Vision Aided Study for Melanoma Detection: A DeepLearning Versus Conventional Supervised Learning Approach . . . . . . . 75S. S. Poorna, M. Ravi Kiran Reddy, Nukala Akhil, Suraj Kamath,Lekshmi Mohan, K. Anuraj and Haripriya S. Pradeep

Clustering of Association Rules on Microarray GeneExpression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85S. Alagukumar, C. Devi Arockia Vanitha and R. Lawrance

ix

Page 10: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Boolean Association Rule Mining on Microarray GeneExpression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99R. Vengateshkumar, S. Alagukumar and R. Lawrance

Early Detection of Alzheimer’s Disease Using Multi-feature Fusionand an Ensemble of Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113G. Janakasudha and P. Jayashree

Data Mining Applications

An SNN-DBSCAN Based Clustering Algorithm for Big Data . . . . . . . . 127Sriniwas Pandey, Mamata Samal and Sraban Kumar Mohanty

Wavelet Transform Domain Methods for Resolution Enhancementof Satellite Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Mansing Rathod, Jayashree Khanapuri and Dilendra Hiran

Improving Query Results in Ontology-Based Case-Based Reasoningby Dynamic Assignment of Feature Weights . . . . . . . . . . . . . . . . . . . . . 153J. Navin Chandar and G. Kavitha

A Survey on Representation for Itemsets in AssociationRule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Carynthia Kharkongor and Bhabesh Nath

Efficient Clustering Using Nonnegative Matrix Factorizationfor Gene Expression Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179Pooja Kherwa, Poonam Bansal, Sukhvinder Singh and Tanishaq Gupta

Design of Random Forest Algorithm Based Model for TachycardiaDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Saumendra Kumar Mohapatra, Tripti Swarnkarand Mihir Narayan Mohanty

Detection of Spam in YouTube Comments Using DifferentClassifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201Rama Krushna Das, Sweta Shree Dash, Kaberi Das and Manisha Panda

Deep Learning Architectures for Named Entity Recognition:A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215Anu Thomas and S. Sangeetha

Effect of Familiarity on Recognition of Pleasant and UnpleasantEmotional States Induced by Hindi Music Videos . . . . . . . . . . . . . . . . . 227Syed Naser Daimi, Soumil Jain and Goutam Saha

The Case Study of Brain Tumor Data Analysis Using Stata and R . . . . 239Prisilla Jayanthi, Murali Krishna Iyyanki, Prakruthi Vadakattuand Padmaja Megham

x Contents

Page 11: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Predictive Data Analytics for Breast Cancer Prognosis . . . . . . . . . . . . . 253Ritu Chauhan and Neeraj Kumar

A Semantic Approach of Building Dynamic Learner Profile ModelUsing WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263T. Sheeba and Reshmy Krishnan

Prioritizing Public Grievance Redressal Using Text Miningand Sentimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273Rama Krushna Das, Manisha Panda and Sweta Shree Dash

An Automatic Summarizer for a Low-Resourced Language . . . . . . . . . 285Sagarika Pattnaik and Ajit Kumar Nayak

Printed Odia Symbols for Character Recognition:A Database Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297Sanjibani Sudha Pattanayak, Sateesh Kumar Pradhanand Ramesh Chandra Mallik

Importance of Data Standardization Methods on Stock IndicesPrediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309Binita Kumari and Tripti Swarnkar

Advanced Image and Video Processing

Feature Relevance Analysis and Feature Reduction of UNSW NB-15Using Neural Networks on MAMLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 321Smitha Rajagopal, Katiganere Siddaramappa Hareeshaand Poornima Panduranga Kundapur

Video Summarization Based on Optical Flow . . . . . . . . . . . . . . . . . . . . 333Dipti Jadhav and Udhav Bhosle

Decision Support System for Black Classification of Dental ImagesUsing GIST Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343Prerna Singh and Priti Sehgal

Pattern Analysis of Brain Functional Connectivity Parameters AfterRemoval of Artifactual Motifs from EEG During Meditation . . . . . . . . 353Laxmi Shaw and Aurobinda Routray

Hyper-heuristic Image Enhancement (HHIE): A ReinforcementLearning Method for Image Contrast Enhancement . . . . . . . . . . . . . . . 363Mitra Montazeri

Comparative Analysis of Salt and Pepper Removal Techniquesfor Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377Usha Rani, Amandeep Kaur and Gurpreet Josan

Contents xi

Page 12: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

An Intelligent Approach for Noise Elimination from Brain Image . . . . . 391Pritisman Kar and Mihir Narayan Mohanty

Hyper-spectral Image Denoising Using Sparse Representation . . . . . . . . 401Vijay Chilkewar and Vibha Vyas

Automated Retinal Vessel Segmentation Based on MorphologicalPreprocessing and 2D-Gabor Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 411Kundan Kumar, Debashisa Samal and Suraj

Spectral Smoothening Based Waveform Concatenation Techniquefor Speech Quality Enhancement in Text-to-Speech Systems . . . . . . . . . 425Soumya Priyadarsini Panda and Ajit Kumar Nayak

Breast Abnormality Detection Using LBP Variants and DiscreteWavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433M. K. Kaushik, Spandana Paramkusham and V. Akhilandeshwari

Modified Histogram Segmentation Bi-Histogram Equalization . . . . . . . . 443Mitra Montazeri

A Comparative Analysis on Filtering Techniques Usedin Preprocessing of Mammogram Image . . . . . . . . . . . . . . . . . . . . . . . . 455Sushreeta Tripathy and Tripti Swarnkar

Cryptography and Information Security

Synthetic Image and Strange Attractor: Two Folded EncryptionApproach for Secure Image Communication . . . . . . . . . . . . . . . . . . . . . 467Sridevi Arumugham, Sundararaman Rajagopalan, Sivaraman Rethinam,Siva Janakiraman, C. Lakshmi and Amirtharajan Rengarajan

Role of Soft Outlier Analysis in Database Intrusion Detection . . . . . . . . 479Anitarani Brahma and Suvasini Panigrahi

Grey-Level Text Encryption Using Chaotic Maps and ArnoldTransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491Vasudharini Moranam Ravi, Nishi Prasad, C. Lakshmi,Sivaraman Rethinam, Sridevi Arumugham and Amirtharajan Rengarajan

Application of Deep Learning for Database Intrusion Detection . . . . . . 501Rajesh Kumar Sahu and Suvasini Panigrahi

Design of Intrusion Detection System for Wormhole Attack Detectionin Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513Snehal Deshmukh-Bhosale and S. S. Sonavane

xii Contents

Page 13: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Information Encoding, Gap Detection and Analysis from 2D LiDARData on Android Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525Arpan Phukan, Pronam Phukan, Rohit Sinha, Shyamal Hazarikaand Abhijit Boruah

Attribute-Based Convergent Encryption Key Managementfor Secure Deduplication in Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537E. Silambarasan, S. Nickolas and S. Mary Saira Bhanu

Random Invertible Key Matrix Decomposition for ClassicalCryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553Adyasha Behera, Alakananda Tripathy, Alok Ranjan Tripathyand Smita Rath

Advanced Algorithms, Software Engineering

State Space Reduction Using Bounded Search Method . . . . . . . . . . . . . 567Bidush Kumar Sahoo and Mitrabinda Ray

Exploring Application of Knowledge Space Theoryin Accessibility Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583Neha Gupta

Unsupervised Detection of Dispersion and Merging Activitiesfor Crowded Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595Manasi Pathade and Madhuri Khambete

Mapping Clinical Narrative Texts of Patient Discharge Summariesto UMLS Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605Swarupananda Bissoyi and Manas Ranjan Patra

Enhanced Decision Tree Algorithm for Discovery of Exceptions . . . . . . 617Sunil Kumar, Saroj Ratnoo and Renu Bala

A Fuzzy Approach for Cost and Time Optimization in AgileSoftware Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629Anupama Kaushik, Devendra Kr. Tayal and Kalpana Yadav

Computing Articulation Points Using Maximal Clique-BasedVertex Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641Sadhu Sai Kaushal, Mullapudi Aseesh and Jyothisha J Nair

Contents xiii

Page 14: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

About the Editors

Dr. Bibudhendu Pati is an Associate Professor and the Head of the Department ofComputer Science at Rama Devi Women’s University, Bhubaneswar, India. Hereceived his Ph.D. degree from IIT Kharagpur. Dr. Pati has 21 years of experiencein teaching, research, and industry. His areas of research include wireless sensornetworks, cloud computing, big data, Internet of Things, and networks virtualiza-tion. He is a life member of the Indian Society of Technical Education (ISTE), asenior member of IEEE, member of ACM, and life member of the ComputerSociety of India (CSI). He has published several papers in leading internationaljournals, conference proceedings and books. He is involved in organizing variousinternational conferences.

Dr. Chhabi Rani Panigrahi is an Assistant Professor at the Department ofComputer Science at the Rama Devi Women’s University, Bhubaneswar, India.She holds a doctoral degree from the Indian Institute of Technology Kharagpur.Her research interests include software testing and mobile cloud computing. Shehas more than 18 years of teaching and research experience, and has publishednumerous articles in leading international journals and conferences. Dr. Panigrahi isthe author of a number of books, including a textbook and. She is a life memberof the Indian Society of Technical Education (ISTE) and a member of IEEE and theComputer Society of India (CSI).

Dr. Rajkumar Buyya is a Redmond Barry Distinguished Professor of ComputerScience and Software Engineering and Director of the Cloud Computing andDistributed Systems (CLOUDS) Laboratory at the University of Melbourne,Australia. He is also serving as the founding CEO of Manjrasoft, a spin-off com-pany of the university, commercializing its innovations in cloud computing. Heserved as a future fellow of the Australian Research Council during 2012–2016. Hehas authored over 525 publications and seven textbooks including “MasteringCloud Computing” published by McGraw Hill, China Machine Press, and MorganKaufmann for Indian, Chinese, and international markets, respectively. He has alsoedited several books including “Cloud Computing: Principles and Paradigms”

xv

Page 15: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

(Wiley Press, USA, Feb 2011). He is one of the highly cited authors in computerscience and software engineering worldwide (h-index=112, g-index=245, 63,900+citations). Microsoft Academic Search Index ranked Dr. Buyya as #1 author in theworld (2005–2016) for both field rating and citation evaluations in the area ofDistributed and Parallel Computing. Recently, Dr. Buyya is recognized as “2016Web of Science Highly Cited Researcher” by Thomson Reuters.

Prof. Kuan-Ching Li is a Professor in the Department of Computer Science andInformation Engineering at the Providence University, Taiwan. He has been thevice-dean for the Office of International and Cross-Strait Affairs (OIA) at the sameuniversity since 2014. Prof. Li is the recipient of awards from Nvidia, the Ministryof Education (MOE)/Taiwan and the Ministry of Science and Technology (MOST)/Taiwan, and also guest professorship from various universities in China. Hereceived his Ph.D. from the University of Sao Paulo, Brazil, in 2001. His areas ofresearch include networked and graphics processing unit (GPU) computing, parallelsoftware design, and performance evaluation and benchmarking. He has edited twobooks: Cloud Computing and Digital Media and Big Data published by CRC Press.He is a fellow of the IET, senior member of the IEEE, and a member of TACC.

xvi About the Editors

Page 16: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Machine Learning Applications

Page 17: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Effect of Dimensionality Reductionon Classification Accuracyfor Protein–Protein InteractionPrediction

Satyajit Mahapatra, Anish Kumar, Animesh Sharmaand Sitanshu Sekhar Sahu

Abstract “Large Dimension” of features derived from protein sequences is a majorproblem in protein–protein interaction (PPI) prediction. Thus, reduction of the fea-ture dimension may increase the classification accuracy. In this paper, particle swarmoptimization (PSO) and principal component analysis (PCA) have been used fordimensionality reduction of PPI sequence features. The performance of the algo-rithm has been assessed using the intraspecies E coli protein–protein interactiondatabase, containing an equal number of positive and negative interacting pairs.Standard sequence-based features such as amino acid composition (AAC), dipep-tide composition (Dipep), and conjoint triad composition (CTD) are extracted. Fromthe results, it is seen that the PSO-based dimensionality reduction method providessteady and better performance in terms of accuracy when applied to the features.

Key terms Protein–protein interaction · Feature extraction · Dimensionalityreduction · Machine learning

1 Introduction

Interaction between DNA, RNA, and proteins plays significant roles in the executionof cellular processes and cell functions. Among this, protein–protein interaction pos-sesses a vital role in facilitating essential cellular and molecular processes. Proteins

S. Mahapatra (B) · A. Kumar · A. Sharma · S. S. SahuDepartment of Electronics and Communication Engineering, Birla Institute of Technology Mesra,Ranchi, Indiae-mail: [email protected]

A. Kumare-mail: [email protected]

A. Sharmae-mail: [email protected]

S. S. Sahue-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2020B. Pati et al. (eds.), Advanced Computing and Intelligent Engineering,Advances in Intelligent Systems and Computing 1082,https://doi.org/10.1007/978-981-15-1081-6_1

3

Page 18: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

4 S. Mahapatra et al.

are represented by a polypeptide chain. One protein differs from others due to differ-ence in arrangement of the amino acids in the polypeptide chain. When two or moreproteins unite to carry out their respective biological functions, then the process canbe termed as protein–protein interaction (PPI). Prior information about the PPIs canbe helpful for better understanding of disease mechanism that can become the basisfor newmethods of remedies for diseases. Although a number of experimental meth-ods have been designed to analyze PPI, they remain expensive and time-consuming.Therefore, computational methods are preferred as an alternative to predict PPI.Results obtained from computational methods can be used as an efficient guide forexperimental methods.

Barman et al. have studied intraspecies protein–protein interaction inenteropathogens using features such as domain–domain association (DDA), AAC,Dipep, and degree of amino acid for developing SVM predictor [1]. Zhao et al. havepredicted protein–protein interaction using deep neural networks [2]. Cui et al. havestudied a human virus interaction model using conjoint triad feature and have usedSVM for prediction [3]. Barman et al. have made a comparison study on SVM,random forest, and Naïve Bayes predictor for prediction of host and virus basedon the fivefold cross-validation test [4]. The above features are of huge dimensionsresulting in the curse of dimensionality. Dimensionality reduction algorithms can beapplied on the feature matrix, and can enhance the predictor’s accuracy. Some of thedimensionality reduction techniques used for image data analysis and microarraydata classification have been discussed below.

Zhuo et al. have made a comparative study of six dimensionality reduction meth-ods. These methods are PCA, LLP, LLE, FLDA, LFDA, and ISOMAP and theirstudy shows that LLP and LLE can effectively reduce the dimension which helpsin increasing the accuracy in image retrieval [5]. Sun et al. have proposed an all-dimensional neighborhood (ADN)-based PSO for improving the local search capa-bility around the Gbest in PSO [6]. Sahu and Mishra have proposed a PSO-basedfiltering technique for high-dimensional microarray data classification [7]. Xue et al.have proposed a multi-objective PSO for feature selection for removing redundantand irrelevant features [8]. Hameed et al. have proposed a GBPSO-SVM algorithmfor improving feature selection and classification process for autism spectrum dis-order [9]. Han et al. have proposed a feature selection algorithm using binary PSOencoding gene-to-class sensitivity information [10].

This paper is divided into five sections. Motivation and introduction are given inSect. 1. Section 2 contains information on datasets and algorithm. Section 3 describessupport vector machine classifier and performance evaluation parameters. Resultsand analysis are given in Sect. 4. Finally, conclusion and future scope are reportedin Sect. 5.

Page 19: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Effect of Dimensionality Reduction on Classification Accuracy … 5

2 Materials and Methods

To performprotein–protein interaction (PPI) prediction, first, the character sequencesare converted into its numeric values, otherwise, called features. These features arethen filtered to find out the dominant features. The database is then divided into twoparts (80% for training and 20% for independent testing) and is given to the classifier(SVM). Then the accuracy, MCC, sensitivity, and specificity are calculated. Figure 1shows the implementation of the above PPI prediction method.

2.1 Dataset

In this study for the development of a prediction model, intraspecies E coli pro-tein–protein interaction data have been used. The datasets have been collected fromEnPPIpred: Enteropathogen Protein–Protein Interactions (PPIs) Prediction (http://bicresources.jcbose.ac.in/ssaha4/EnPPIpred/) [1]. It contains 3886 positive and equalnumber of negative interacting pairs.

2.2 Feature Extraction Technique

2.2.1 Amino Acid Composition (AAC)

This technique provides the normalized frequency of occurrences of amino acids in apolypeptide chain. The numbers of occurrences of amino acids are divided by lengthof polypeptide chain, for normalization. Combining the features of a pair of proteinsequence we get 40-dimensional feature vector. It is given by

X(si ) = f1(si )∑20

i=1 f1(si )i = 1, 2, 3 . . . . . . 20 (1)

Fig. 1 Implementation of the prediction algorithm

Page 20: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

6 S. Mahapatra et al.

In the above equation, X(si ) represents sequence features and f1(si ) representsthe frequency count of ith amino acid [1].

2.2.2 Dipeptide Amino Acid Composition

In this technique, 400 fragments will be generated for each protein sequence. Thesefragments are the fractions of dipeptides, i.e., AA, AC, AD, … YV, YW, and YY.Combining the features of a pair of protein sequence we get 800-dimensional featurevector. The dipeptide features are given by

X(si ) = f1(si )∑400

i=1 f1(si )i = 1, 2, 3 . . . 400 (2)

In the above equation, X(si ) represents the 400 dipeptides and f1(si ) is thefrequency count of ith dipeptide [1, 11].

2.2.3 Conjoint Triad

In this technique, the 20 amino acids are grouped into seven different classes, i.e.,{A,G,V}, {I,L,F,P}, {Y,M,T,S}, {H,N,Q,W}, {R,K}, {D,E}, {C} on the basis ofvolumes of the side chains and dipoles. Features obtained are normalized frequencyof 3-mer in the seven class representation of the protein sequence [13, 14]. Combiningthe features of a pair of protein sequence we get 686-dimensional feature vector.

2.3 Dimensionality Reduction

2.3.1 Principal Component Analysis (PCA)

It is an unsupervised linear dimensionality reduction method, which is used forprojection of a data space into smaller dimensional space by using orthogonaltransformation. The process used for reduction is given below:

I. At first, find out the mean.II. Then go for an element by element subtraction of mean from its data, in order

to form the covariance matrix.

COV =N∑

i=1

(xi − μ)(xi − μ)T (3)

III. Find the eigenvalues using this covariance matrix.

Page 21: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Effect of Dimensionality Reduction on Classification Accuracy … 7

IV. Now sort the eigenvectors in descending order and extract the required numberof vectors according to the required number of dimensions.

V. Principal component scores = input × eigenvectors.

2.3.2 Particle Swarm Optimization (PSO)

I. It is an optimization technique based on population, which evaluates theobjective function, here standard deviation, at each particle.

II. The best position is calculated after each iteration and the new velocity of theparticles is then evaluated.

Velocity update:

[Vi (t + 1) = W × Vi (t) + C1 × rand × (pbest(t) − Xi (t) + C2 × rand × (gbest(t) − Xi (t))

]

(4)

Position update:Xi (t + 1) = Xi (t) + Vi (t + 1) (5)

III. For dimensionality reduction to a specific number of columns, first, the requirednumbers of columns are selected randomly. Then the column numbers areupdated using PSO to obtain the best cost on the basis of a minimum of standarddeviation.

IV. Implementation of PSO-based dimensionality reduction is given below:

Step1: Initialization of ConstantsWeightconstant1constant2Step2: Randomly initialize Position of the particlesPositions = zeros(population,nOfSelection);fori = 1:populationtempVar = randperm(Boundary(2));Positions(i,:) = tempVar(1:nOfSelection);endStep3: Randomly initialize velocity of the particlesVelocity = ones(population,of selection);Step4: Initialization of the Global bestStep5: Calculating Fitness ValuesStep6: Updating the particle bestStep7: Updating the Global BestStep8: Velocity UpdateStep9: Position UpdateStep10: Boundary Checking for PositionPositions(Positions > Boundary(2))= round(rand(1)*(Boundary(2)-1)) + 1;

Page 22: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

8 S. Mahapatra et al.

Positions(Positions < Boundary(1))= round(rand(1)*(Boundary(2)-1)) + 1;Step11: To avoid repeating same positionif length(unique(Positions(i,:))) ~ = nOfSelectionStep12: New Positions are selected randomly againEnd while when the No. of maximum iterations is reached

3 Classification and Evaluation

3.1 Support Vector Machines

In this study, support vector machine (SVM) developed by Vapnik has been used forprediction of interacting and noninteracting protein pairs [12]. SVM is supervisedlearning machines that are associated with learning algorithms for analysis of data.For model development, first, the SVM is trained using a set of training data eachtagged with its respective class and then a set of testing data is given whose class isto be predicted by the model.

A hyperplane or margin is necessary that gives the maximum distance ofseparation between the classes.

The hyperplane is represented by equation below:

h(x) = wT + b (6)

where ′w′ and ′b′ are the weight and bias vectors, respectively.Minimizing the cost function, hyperplane is obtained

J (ω) = 1

2wTw = 1

2‖w‖2 (7)

Subject to the constraints di[wT xi + b

] ≥ 1 i = 1, 2, . . . , N (8)

In this paper, Scikit-learn package in Python has been used for implementationof SVM.

3.2 Evaluation

Performance is evaluated using fivefold cross-validation tests. Four different param-eters, i.e., sensitivity, specificity, accuracy, and MCC are calculated [13, 15].

Page 23: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Effect of Dimensionality Reduction on Classification Accuracy … 9

Accuracy(Acc) = Pos++Neg+Pos++Neg++Pos−+Neg−

Sensitivity(Se) = Pos+Pos++Pos−

Specificity(Sp) = Neg+Neg++Neg−

Mcc = (Pos+×Neg+)−(Pos−×Neg−)√(Pos++Pos−)×(Pos++Neg−)×(Neg++Pos−)×(Neg++Neg−)

(Pos+

)is the count positive samples classified as positive.

(Neg−)

is the countpositive samples classified as negative.

(Pos−

)is the count negative samples classified

as positive.(Neg+)

is the count negative samples classified as negative.

4 Result and Discussions

In this paper, for assessment of prediction accuracy, we have used three commonlyused PPI sequence information extraction methods such as AAC, Dipep, and CTD.The above feature extraction techniques are applied on a benchmark intraspecies Ecoli dataset provided by Barman et al. available at EnPPIpred [1]. Then the dimen-sionality reduction algorithm is applied to reduce the feature dimension keeping thenumber of samples constant. AAC contains 40 features and upon application of adimensionality reduction technique, it is reduced to 15. Similarly, Dipep and con-joint triad contain 800 and 686 features which are reduced to 40 after application ofdimensionality reduction algorithm.

It is evident from the Andrews plot (Figs. 2 and 3) that after reduction of features,there is a clear difference between the pattern of positive and negative samples.

t0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

f(t)

-1

-0.5

0

0.5

1

1.5 AAC01

Fig. 2 AAC featureswithout reduction (“0” represents negative samples and “1” represents positivesamples)

Page 24: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

10 S. Mahapatra et al.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1t

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

f(t)

AAC-PSO01

Fig. 3 AAC features reduced using PSO (“0” represents negative samples and “1” representspositive samples)

So, machine learning techniques used on the reduced features will provide betterclassification accuracy. In the plot shown in Figs. 2 and 3, “0” and “1” representnegative and positive samples, respectively.

4.1 Comparison of Dimensionality Reduction Algorithmon an Intraspecies Dataset

The database has been divided into two parts, i.e., 80% for training by fivefoldcross-validation and 20% for independent testing.

The accuracy obtained using AAC features is 79.65, for PCA-AAC is 99.80, forPSO-AAC is 99.42, and for HMS-AAC is 99 as shown in Table 1.

Using dipeptide composition, the accuracy obtained is 83.15, for PCA-DIPEP,the accuracy is 99.80, for PSO-DIPEP, the accuracy is 96.37, and for HMS-DIPEP,the accuracy is 87.43 as shown in Table 2.

Table 1 Performance of SVM using amino acid composition

Features Accuracy Sensitivity Specificity MCC

AAC (c = 1000, γ = 0.5, rbf) 79.65 76.45 82.85 59.42

PCA-AAC (c = 1000, linear) 99.80 99.74 99.87 99.61

PSO-AAC (c = 1000, γ = 0.5, rbf) 99.42 99.48 99.35 98.83

Page 25: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Effect of Dimensionality Reduction on Classification Accuracy … 11

Table 2 Performance of SVM using dipeptide composition

Features Accuracy Sensitivity Specificity MCC

DIPEP (c = 100, γ = 0.5, rbf) 83.15 77.74 88.57 66.70

PCA-DIPEP (c = 10, linear) 99.80 99.74 99.87 99.61

PSO-DIPEP (c = 1000, linear) 96.37 96.24 96.49 92.74

Table 3 Performance of SVM using conjoint triad composition

Features Accuracy Sensitivity Specificity MCC

Conjoint triad (c = 10,000, linear) 80.10 76.32 83.90 60.40

PCA-conjoint triad (c = 10,000, γ = 0.5, rbf) 78.41 72.96 83.96 57.2

PSO-conjoint triad (c = 1000, linear) 98.51 99.60 97.40 97.05

The accuracy obtained using conjoint triad composition is 80.10, for the PCA-conjoint triad, the accuracy is 78.41, for the PSO-conjoint triad, the accuracy is 98.51,for the HMS-conjoint triad, the accuracy is 99.81 as shown in Table 3.

From Tables 1, 2, and 3, it has been observed that PCA provided the best accuracyfor AAC and dipeptide features but failed in case of conjoint triad features. Similarly,HMS provided the best accuracy in the case of conjoint triad and comparable in caseof AAC features. The accuracy of PSO is comparable with the best accuracy providedby PCA (for AAC and dipeptide) and HMS (for a conjoint triad).

Although PCA gives comparable results in many cases, it fails to provide suitableresults in some cases because PCA-based dimensionality reduction provides derivedfeatures which sometimes disturb the pattern of the original features. In PSO-baseddimensionality reduction, the reduced features are among the original set of featuresdue to which the pattern is conserved.

5 Conclusion

In this paper, dimensionality reduction technique such as PSOandPCAhas been usedfor enhancement of performance of predictor. It has been seen that reduced featuresprovide 15–18% more accuracy. Through exhaustive simulation, it has been foundthat PSO-based dimensionality reduction provides steady performance in terms ofaccuracy, sensitivity, and MCC.

In future, this work can be extended for analysis of interspecies protein–proteininteraction.

Acknowledgements This work has been partly supported by SERB (ECR/2017/000345) and iscarried out at Signal Processing lab of ECE department, BIT Mesra.

Page 26: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

12 S. Mahapatra et al.

References

1. Barman, R.K., Jana, T., Das, S., Saha, S.: Prediction of intra-species protein-protein interactionsin enteropathogens facilitating systems biology study. PLoS One 10, 1–9 (2015). https://doi.org/10.1371/journal.pone.0145648

2. Zhao, Z., Yang, Z., Lin, H., Wang, J.: Aprotein-protein interaction extraction approach basedon deep neural network. Int. J. Data Min. Bioinform. 15, 145–164 (2016)

3. Cui,G., Fang,C.,Han,K.: Prediction of protein-protein interactions betweenviruses andhumanby an SVMmodel. BMCBioinform. 13, S5 (2012). https://doi.org/10.1186/1471-2105-13-S7-S510

4. Barman, R.K., Saha, S., Das, S.: Prediction of interactions between viral and host proteinsusing supervised machine learningmethods. PLoSOne 9 https://doi.org/10.1371/journal.pone.0112034 (2014)

5. Zhuo, L., Cheng, B., Zhang, J.: A comparative study of dimensionality reduction methods forlarge-scale image retrieval. Neurocomputing 141, 202–210 (2014). https://doi.org/10.1016/j.neucom.2014.03.014

6. Sun,W., Lin, A., Yu, H., et al.: All-dimension neighborhood-based particle swarm optimizationwith randomly selected neighbors. Inf. Sci. (NY) 405, 141–156 (2017). https://doi.org/10.1016/j.ins.2017.04.007

7. Sahu, B., Mishra, D.: A novel feature selection algorithm using particle swarm optimizationfor cancer microarray data. Procedia Eng 38, 27–31 (2012). https://doi.org/10.1016/j.proeng.2012.06.005

8. Xue, B., Zhang, M.J., Browne, W.N.: Particle swarm optimization for feature selection inclassification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013). https://doi.org/10.1109/TSMCB.2012.2227469

9. Hameed, S.S., Hassan, R., Muhammad, F.F.: Selection and classification of gene expression inautism disorder: use of a combination of statistical filters and a GBPSO-SVM algorithm. PLoSOne 12, 1–25 (2017). https://doi.org/10.1371/journal.pone.0187371

10. Han, F., Yang, C., Wu, Y.Q., et al.: A gene selection method for microarray data based onbinary PSO encoding gene-to-class sensitivity information. IEEE/ACM Trans. Comput. Biol.Bioinform. 14, 85–96 (2017). https://doi.org/10.1109/TCBB.2015.2465906

11. Lin, H., Ding, H.: Predicting ion channels and their types by the dipeptide mode of pseudoamino acid composition. J Theor. Biol. 269, 64–69 (2011). https://doi.org/10.1016/j.jtbi.2010.10.019

12. Vapnik, V. N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (2000)13. You, Z.-H., Lei, Y.-K., Zhu, L., et al.: Prediction of protein-protein interactions from amino

acid sequences with ensemble extreme learning machines and principal component analysis.BMC Bioinform. 14, S10 (2013). https://doi.org/10.1186/1471-2105-14-S8-S10

14. Shen, J., Zhang, J., Luo, X., et al.: Predicting protein-protein interactions based only onsequences information. Proc. Natl. Acad. Sci. USA 104, 4337–4341 (2007). https://doi.org/10.1073/pnas.0607879104

15. Pandey, C., Sandeep, R., Priyam, A., Mahapatra, S., SahuS, S.: Predicting protein–RNAinteraction using sequence derived features and machine learning approach. Int. J. Data MinBioinform. 19(3), 270–282 (2017)

Page 27: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Early Detection of Breast Cancer UsingSupport Vector Machine With SequentialMinimal Optimization

Kumar Avinash, M. B. Bijoy and P. B. Jayaraj

Abstract Breast cancer is largely occurring cancer disease and one of the lead-ing causes of death among the women in the world. Studies have shown that earlydetection can bring down the mortality rate significantly. Mammography is the mostpopular cancer detection technique but it is painful as well as it uses X-ray to captureimages which spreads radiation in the body. Radiation is one of the causes of breastcancer. Thermography is a method of detection which is noninvasive, painless, andradiation-free. By seeing these thermal images, doctors can’t say whether the patientis having cancer or not that’s why they use computer-aided diagnosis (CAD) to seethe status by feeding these thermal images to model. The detection of breast cancerfrom these data is very crucial and needs some sophisticated image processing andmachine learning techniques.Modernmachine learning technique has become a pop-ular tool for the diagnosis of breast cancer. Among various methods, support vectormachine (SVM) classification is one of the most popular supervised learning meth-ods. Performance of SVM classification by using sequential minimal optimization(SMO) algorithm is evaluated and our proposed model is giving better result in termsof accuracy (94.6%), recall (89.5%), and execution time (0.085 s) onWisconsin dataset.

Keywords Breast cancer · Preprocessing · Feature extraction · Classification ·Sequential Minimal Optimization

K. Avinash · M. B. Bijoy · P. B. Jayaraj (B)Department of Computer Science and Engineering, National Instituteof Technology Calicut, Calicut, Indiae-mail: [email protected]

K. Avinashe-mail: [email protected]

M. B. Bijoye-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2020B. Pati et al. (eds.), Advanced Computing and Intelligent Engineering,Advances in Intelligent Systems and Computing 1082,https://doi.org/10.1007/978-981-15-1081-6_2

13

Page 28: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

14 K. Avinash et al.

1 Introduction

Breast cancer is a disease which is caused by multiple factors like inheritance, tissuecomposition, carcinogens, immunity levels, hormones, radiation, etc. When a cellbecomes cancerous, it’s temperature increases and due to this, the surrounding cellsalso start becoming cancerous that’swhy it spreads very fast.Mammography is one ofthemost popularmethods to detect breast cancer. It is themanualmethod of detectingbreast cancer with the help of experts to identify the tumors in breast tissue. Thismethod is painful as well as it involves radiation since the X-ray image is being takenof each breast. There are various methods available like magnetic resonance imaging(MRI) and ultrasonography which assist the CAD system in detecting breast cancer[1]. MRI uses magnetic and electric fields, gradients, and radio waves to generateimages of the body parts. In ultrasonography, images of body parts are produced usingsound waves. It helps in diagnosis and the causes of pain, swelling, and infection inthe body’s organs.

Thermography is a noninvasive, painless, noncontact, and non-radiation methodto detect breast cancer. Since humans are warm-blooded animal, they emit heatradiation. Thermal image of a human body contains temperature of each cell byusing which prediction of the cells as benign or malignant can be done. Differentfeatures like contrast, area, radius, perimeter, mean, median, etc., of these thermalimages play a crucial role in detecting cancer. Thermography can detect breast cancer8–10 years earlier than all the other methods mentioned above [2].

Breast cancer detection from the thermographic image data needs some prepro-cessing like noise removal, image enhancement, etc., before feeding it to the algo-rithm. This is because of the reason that images may contain some noises due tothe local environment and human errors which can cause redundancy later. Then,segmentation will be done for each of these images in order to separate the left breastand right breast so that we can extract the features from each of these images. In thefeature extraction process, each feature will be stored as a vector. After getting allthe desired features from these images, they will be fed to our proposed algorithmfor the classification.

Image segmentation helps in feature extraction. After separating the left breastand right breast from the thermal image redundancy can be easily avoided in theextracted data. With image segmentation, we can also remove the undesired edgesand the useless areas of the image which is not going to help in decision making.Figure 1 shows the thermal image of a patient’s breast [3]. Figures 2 and 3 show theleft and right breast images after noise removal and segmentation have been applied.

Page 29: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

Early Detection of Breast Cancer Using Support Vector … 15

Fig. 1 Thermal image of apatient’s breast [3]

Fig. 2 Left breast imageafter segmentation

Fig. 3 Right breast imageafter segmentation

Page 30: Bibudhendu Pati Rajkumar Buyya Kuan-Ching Li Editors Advanced … · 2020. 2. 19. · computing, recommender systems, intelligent control, robotics and mechatronics including human-machine

16 K. Avinash et al.

2 Review of Literature

There aremany techniques for early detection of breast cancer using variousmethods.Some of the machine learning related methods are described below.

2.1 Machine Learning Based Related Work for BreastCancer Detection

In the early detection of breast cancer, support vectors machines play an importantrole by classifying the datawith better precision and accuracy.G.R. Suresh et al., havesuggested an approach to detect breast cancer using thermographic color analysisand SVM classifier [1]. The hottest region of breast thermograms was detected usingthree image segmentation techniques: k-means, fuzzy c-means (FCM), and level set.Experimental results have shown that the level set method has a better performancethan other methods as it could highlight the hottest region more precisely.

Ryszard S. Choras suggested an approach in which he has used various imageprocessing techniques likeGaborWavelets and texture parameter evaluation to detectbenign and malignant breast through thermographic grayscale images [4]. Gaborwavelet is a powerful tool to extract texture features and in the spatial domain, is acomplex exponential modulated by a Gaussian function.

Rozita Rastghalam et al., have suggested an approach using the spectral probablefeature on thermographic images to detect the abnormal pattern in the breast. Thegray level co-occurrence matrix is made from image spectrum to obtain spectralco-occurrence feature. This method is not sufficient to extract the spectral probablefeature, so they have optimized this matrix and defined it as a feature vector. Inthis method, normal and abnormal patterns have been separated from each otherthrough the new procedure in which for every image, each of the spectrum featuresis mentioned [5]. There are many other works which are reported in this area [6–10].

3 Motivation

After the experiment with various classifiers, it is found that SVM is giving betterresults. The accuracy of SVM classification used in some models is marginal butcouldn’t exploit the maximization. Our attempt here is to improve the performanceof SVM classification by applying SMO for early detection of breast cancer.