bibudhendu pati rajkumar buyya kuan-ching li editors advanced … · 2020. 2. 19. · computing,...
TRANSCRIPT
Advances in Intelligent Systems and Computing 1082
Bibudhendu PatiChhabi Rani PanigrahiRajkumar BuyyaKuan-Ching Li Editors
Advanced Computing and Intelligent EngineeringProceedings of ICACIE 2018, Volume 1
Advances in Intelligent Systems and Computing
Volume 1082
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, IndiaRafael Bello Perez, Faculty of Mathematics, Physics and Computing,Universidad Central de Las Villas, Santa Clara, CubaEmilio S. Corchado, University of Salamanca, Salamanca, SpainHani Hagras, School of Computer Science and Electronic Engineering,University of Essex, Colchester, UKLászló T. Kóczy, Department of Automation, Széchenyi István University,Gyor, HungaryVladik Kreinovich, Department of Computer Science, University of Texasat El Paso, El Paso, TX, USAChin-Teng Lin, Department of Electrical Engineering, National ChiaoTung University, Hsinchu, TaiwanJie Lu, Faculty of Engineering and Information Technology,University of Technology Sydney, Sydney, NSW, AustraliaPatricia Melin, Graduate Program of Computer Science, Tijuana Instituteof Technology, Tijuana, MexicoNadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,Rio de Janeiro, BrazilNgoc Thanh Nguyen , Faculty of Computer Science and Management,Wrocław University of Technology, Wrocław, PolandJun Wang, Department of Mechanical and Automation Engineering,The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publicationson theory, applications, and design methods of Intelligent Systems and IntelligentComputing. Virtually all disciplines such as engineering, natural sciences, computerand information science, ICT, economics, business, e-commerce, environment,healthcare, life science are covered. The list of topics spans all the areas of modernintelligent systems and computing such as: computational intelligence, soft comput-ing including neural networks, fuzzy systems, evolutionary computing and the fusionof these paradigms, social intelligence, ambient intelligence, computational neuro-science, artificial life, virtual worlds and society, cognitive science and systems,Perception and Vision, DNA and immune based systems, self-organizing andadaptive systems, e-Learning and teaching, human-centered and human-centriccomputing, recommender systems, intelligent control, robotics and mechatronicsincluding human-machine teaming, knowledge-based paradigms, learning para-digms, machine ethics, intelligent data analysis, knowledge management, intelligentagents, intelligent decision making and support, intelligent network security, trustmanagement, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” areprimarily proceedings of important conferences, symposia and congresses. Theycover significant recent developments in the field, both of a foundational andapplicable character. An important characteristic feature of the series is the shortpublication time and world-wide distribution. This permits a rapid and broaddissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Bibudhendu Pati • Chhabi Rani Panigrahi •
Rajkumar Buyya • Kuan-Ching LiEditors
Advanced Computingand Intelligent EngineeringProceedings of ICACIE 2018, Volume 1
123
EditorsBibudhendu PatiDepartment of Computer ScienceRama Devi Women’s UniversityBhubaneswar, Odisha, India
Chhabi Rani PanigrahiDepartment of Computer ScienceRama Devi Women’s UniversityBhubaneswar, Odisha, India
Rajkumar BuyyaCloud ComputingThe University of MelbourneMelbourne, VIC, Australia
Kuan-Ching LiDepartment of Computer Science andInformation EngineeringProvidence UniversityTaichung, Taiwan
ISSN 2194-5357 ISSN 2194-5365 (electronic)Advances in Intelligent Systems and ComputingISBN 978-981-15-1080-9 ISBN 978-981-15-1081-6 (eBook)https://doi.org/10.1007/978-981-15-1081-6
© Springer Nature Singapore Pte Ltd. 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmissionor information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exempt fromthe relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor theauthors or the editors give a warranty, expressed or implied, with respect to the material containedherein or for any errors or omissions that may have been made. The publisher remains neutral with regardto jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,Singapore
Preface
This volume contains the papers presented at the 3rd International Conference onAdvanced Computing and Intelligent Engineering (ICACIE 2018). ICACIE 2018(www.icacie.com) was held during 22–24 December 2018, at the Institute ofTechnical Education and Research (ITER), Siksha ‘O’ Anusandhan Deemed to beUniversity, Bhubaneswar, India, in collaboration with Rama Devi Women’sUniversity, Bhubaneswar, India. There were 528 submissions, and each qualifiedsubmission was reviewed by a minimum of two Technical Program Committeemembers using the criteria of relevance, originality, technical quality and presen-tation. The committee accepted 103 full papers for oral presentation at the con-ference, and the overall acceptance rate is 20%.
ICACIE 2018 was an initiative taken by the organizers, which focuses onresearch and applications on topics of advanced computing and intelligent engi-neering. The focus was also to present state-of-the-art scientific results, to dis-seminate modern technologies and to promote collaborative research in the field ofadvanced computing and intelligent engineering.
Researchers presented their work in the conference and had an excellentopportunity to interact with eminent professors, scientists and scholars in their areaof research. All participants were benefitted from discussions that facilitated theemergence of innovative ideas and approaches. Many distinguished professors,well-known scholars, industry leaders and young researchers participated in makingICACIE 2018 an immense success.
We also had panel discussion on the emerging topic entitled IntellectualProperty and Standardisation in Smart City consisting of panellists from softwareindustries like TCS and Infosys, educationalist and entrepreneurs.
We thank all the Technical Program Committee members and allreviewers/sub-reviewers for their timely and thorough participation during thereview process.
We express our sincere gratitude to Prof. Manojranjan Nayak, President, S‘O’ADeemed to be University, and Prof. Damodar Acharya, Chairman, AdvisorCommittee, for their endless support towards organizing the conference. We extendour sincere thanks to Honourable Vice-Chancellor, Dr. Amit Banerjee, for allowing
v
us to organize ICACIE 2018 on the campus and also for his timely support. We alsoexpress our sincere gratitude to Honourable Patron Vice-Chancellor, Prof. PadmajaMishra, Rama Devi Women’s University, for her moral support and ceaselessencouragement towards successful completion of the conference. We thankProf. Pradeep Kumar Sahoo, Dean Faculty of Engineering and Technology, for hisguidance in local organization of ICACIE 2018. We also thank General Chairs,Prof. P. K. Nanda, Prof. Bibudhendu Pati and Prof. A. K. Nayak, for their valuableguidance during review of papers as well as other aspects of the conference. Weappreciate the time and efforts put in by the members of the local organizing team atthe Institute of Technical Education and Research (ITER), Siksha ‘O’ AnusandhanDeemed to be University, Bhubaneswar, India, especially the faculty members ofDepartment of Computer Applications, Computer Science & InformationTechnology, and Computer Science & Engineering, student volunteers, andadministrative and hostel management staff, who dedicated their time and efforts tomake ICACIE 2018 successful. We thank Er. Subhashis Das Mohapatra fordesigning and maintaining ICACIE 2018 Website.
Bhubaneswar, India Bibudhendu PatiBhubaneswar, India Chhabi Rani PanigrahiMelbourne, Australia Rajkumar BuyyaTaichung, Taiwan Kuan-Ching Li
vi Preface
About This Book
The book focuses on theory, practice and application in the broad areas of advancedcomputing techniques and intelligent engineering. This two-volume book includes109 scholarly articles, which have been accepted for presentation from 528 sub-missions in the 3rd International Conference on Advanced Computing andIntelligent Engineering held at the Institute of Technical Education and Research(ITER), Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar, India, incollaboration with Rama Devi Women’s University, Bhubaneswar, India, during22–24 December 2018. The first volume of this book consists of 54 papers, and thesecond volume contains 49 papers with a total of 103 papers. This book bringstogether academic scientists, professors, research scholars and students to share anddisseminate their knowledge and scientific research works related to advancedcomputing and intelligent engineering. It helps to provide a platform for the youngresearchers to find the practical challenges encountered in these areas of researchand the solutions adopted. The book helps to disseminate the knowledge aboutsome innovative and active research directions in the field of advanced computingtechniques and intelligent engineering, along with some current issues and appli-cations of related topics.
vii
Contents
Machine Learning Applications
Effect of Dimensionality Reduction on Classification Accuracyfor Protein–Protein Interaction Prediction . . . . . . . . . . . . . . . . . . . . . . . 3Satyajit Mahapatra, Anish Kumar, Animesh Sharmaand Sitanshu Sekhar Sahu
Early Detection of Breast Cancer Using Support Vector MachineWith Sequential Minimal Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 13Kumar Avinash, M. B. Bijoy and P. B. Jayaraj
Clustering Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25N. Karthika and B. Janet
Heuristic Algorithm for Resolving Pronominal Anaphorain Hindi Dialects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Seema Mahato, Ani Thomas and Neelam Sahu
End-to-End Reinforcement Learning for Self-driving Car . . . . . . . . . . . 53Rohan Chopra and Sanjiban Sekhar Roy
Understanding Antibiotic Resistance Using Different MachineLearning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Tanaya Priyadarshini Pradhan, N. K. Debata and Tripti Swarnkar
Computer Vision Aided Study for Melanoma Detection: A DeepLearning Versus Conventional Supervised Learning Approach . . . . . . . 75S. S. Poorna, M. Ravi Kiran Reddy, Nukala Akhil, Suraj Kamath,Lekshmi Mohan, K. Anuraj and Haripriya S. Pradeep
Clustering of Association Rules on Microarray GeneExpression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85S. Alagukumar, C. Devi Arockia Vanitha and R. Lawrance
ix
Boolean Association Rule Mining on Microarray GeneExpression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99R. Vengateshkumar, S. Alagukumar and R. Lawrance
Early Detection of Alzheimer’s Disease Using Multi-feature Fusionand an Ensemble of Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113G. Janakasudha and P. Jayashree
Data Mining Applications
An SNN-DBSCAN Based Clustering Algorithm for Big Data . . . . . . . . 127Sriniwas Pandey, Mamata Samal and Sraban Kumar Mohanty
Wavelet Transform Domain Methods for Resolution Enhancementof Satellite Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Mansing Rathod, Jayashree Khanapuri and Dilendra Hiran
Improving Query Results in Ontology-Based Case-Based Reasoningby Dynamic Assignment of Feature Weights . . . . . . . . . . . . . . . . . . . . . 153J. Navin Chandar and G. Kavitha
A Survey on Representation for Itemsets in AssociationRule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Carynthia Kharkongor and Bhabesh Nath
Efficient Clustering Using Nonnegative Matrix Factorizationfor Gene Expression Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179Pooja Kherwa, Poonam Bansal, Sukhvinder Singh and Tanishaq Gupta
Design of Random Forest Algorithm Based Model for TachycardiaDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Saumendra Kumar Mohapatra, Tripti Swarnkarand Mihir Narayan Mohanty
Detection of Spam in YouTube Comments Using DifferentClassifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201Rama Krushna Das, Sweta Shree Dash, Kaberi Das and Manisha Panda
Deep Learning Architectures for Named Entity Recognition:A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215Anu Thomas and S. Sangeetha
Effect of Familiarity on Recognition of Pleasant and UnpleasantEmotional States Induced by Hindi Music Videos . . . . . . . . . . . . . . . . . 227Syed Naser Daimi, Soumil Jain and Goutam Saha
The Case Study of Brain Tumor Data Analysis Using Stata and R . . . . 239Prisilla Jayanthi, Murali Krishna Iyyanki, Prakruthi Vadakattuand Padmaja Megham
x Contents
Predictive Data Analytics for Breast Cancer Prognosis . . . . . . . . . . . . . 253Ritu Chauhan and Neeraj Kumar
A Semantic Approach of Building Dynamic Learner Profile ModelUsing WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263T. Sheeba and Reshmy Krishnan
Prioritizing Public Grievance Redressal Using Text Miningand Sentimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273Rama Krushna Das, Manisha Panda and Sweta Shree Dash
An Automatic Summarizer for a Low-Resourced Language . . . . . . . . . 285Sagarika Pattnaik and Ajit Kumar Nayak
Printed Odia Symbols for Character Recognition:A Database Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297Sanjibani Sudha Pattanayak, Sateesh Kumar Pradhanand Ramesh Chandra Mallik
Importance of Data Standardization Methods on Stock IndicesPrediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309Binita Kumari and Tripti Swarnkar
Advanced Image and Video Processing
Feature Relevance Analysis and Feature Reduction of UNSW NB-15Using Neural Networks on MAMLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 321Smitha Rajagopal, Katiganere Siddaramappa Hareeshaand Poornima Panduranga Kundapur
Video Summarization Based on Optical Flow . . . . . . . . . . . . . . . . . . . . 333Dipti Jadhav and Udhav Bhosle
Decision Support System for Black Classification of Dental ImagesUsing GIST Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343Prerna Singh and Priti Sehgal
Pattern Analysis of Brain Functional Connectivity Parameters AfterRemoval of Artifactual Motifs from EEG During Meditation . . . . . . . . 353Laxmi Shaw and Aurobinda Routray
Hyper-heuristic Image Enhancement (HHIE): A ReinforcementLearning Method for Image Contrast Enhancement . . . . . . . . . . . . . . . 363Mitra Montazeri
Comparative Analysis of Salt and Pepper Removal Techniquesfor Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377Usha Rani, Amandeep Kaur and Gurpreet Josan
Contents xi
An Intelligent Approach for Noise Elimination from Brain Image . . . . . 391Pritisman Kar and Mihir Narayan Mohanty
Hyper-spectral Image Denoising Using Sparse Representation . . . . . . . . 401Vijay Chilkewar and Vibha Vyas
Automated Retinal Vessel Segmentation Based on MorphologicalPreprocessing and 2D-Gabor Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 411Kundan Kumar, Debashisa Samal and Suraj
Spectral Smoothening Based Waveform Concatenation Techniquefor Speech Quality Enhancement in Text-to-Speech Systems . . . . . . . . . 425Soumya Priyadarsini Panda and Ajit Kumar Nayak
Breast Abnormality Detection Using LBP Variants and DiscreteWavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433M. K. Kaushik, Spandana Paramkusham and V. Akhilandeshwari
Modified Histogram Segmentation Bi-Histogram Equalization . . . . . . . . 443Mitra Montazeri
A Comparative Analysis on Filtering Techniques Usedin Preprocessing of Mammogram Image . . . . . . . . . . . . . . . . . . . . . . . . 455Sushreeta Tripathy and Tripti Swarnkar
Cryptography and Information Security
Synthetic Image and Strange Attractor: Two Folded EncryptionApproach for Secure Image Communication . . . . . . . . . . . . . . . . . . . . . 467Sridevi Arumugham, Sundararaman Rajagopalan, Sivaraman Rethinam,Siva Janakiraman, C. Lakshmi and Amirtharajan Rengarajan
Role of Soft Outlier Analysis in Database Intrusion Detection . . . . . . . . 479Anitarani Brahma and Suvasini Panigrahi
Grey-Level Text Encryption Using Chaotic Maps and ArnoldTransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491Vasudharini Moranam Ravi, Nishi Prasad, C. Lakshmi,Sivaraman Rethinam, Sridevi Arumugham and Amirtharajan Rengarajan
Application of Deep Learning for Database Intrusion Detection . . . . . . 501Rajesh Kumar Sahu and Suvasini Panigrahi
Design of Intrusion Detection System for Wormhole Attack Detectionin Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513Snehal Deshmukh-Bhosale and S. S. Sonavane
xii Contents
Information Encoding, Gap Detection and Analysis from 2D LiDARData on Android Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525Arpan Phukan, Pronam Phukan, Rohit Sinha, Shyamal Hazarikaand Abhijit Boruah
Attribute-Based Convergent Encryption Key Managementfor Secure Deduplication in Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537E. Silambarasan, S. Nickolas and S. Mary Saira Bhanu
Random Invertible Key Matrix Decomposition for ClassicalCryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553Adyasha Behera, Alakananda Tripathy, Alok Ranjan Tripathyand Smita Rath
Advanced Algorithms, Software Engineering
State Space Reduction Using Bounded Search Method . . . . . . . . . . . . . 567Bidush Kumar Sahoo and Mitrabinda Ray
Exploring Application of Knowledge Space Theoryin Accessibility Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583Neha Gupta
Unsupervised Detection of Dispersion and Merging Activitiesfor Crowded Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595Manasi Pathade and Madhuri Khambete
Mapping Clinical Narrative Texts of Patient Discharge Summariesto UMLS Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605Swarupananda Bissoyi and Manas Ranjan Patra
Enhanced Decision Tree Algorithm for Discovery of Exceptions . . . . . . 617Sunil Kumar, Saroj Ratnoo and Renu Bala
A Fuzzy Approach for Cost and Time Optimization in AgileSoftware Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629Anupama Kaushik, Devendra Kr. Tayal and Kalpana Yadav
Computing Articulation Points Using Maximal Clique-BasedVertex Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641Sadhu Sai Kaushal, Mullapudi Aseesh and Jyothisha J Nair
Contents xiii
About the Editors
Dr. Bibudhendu Pati is an Associate Professor and the Head of the Department ofComputer Science at Rama Devi Women’s University, Bhubaneswar, India. Hereceived his Ph.D. degree from IIT Kharagpur. Dr. Pati has 21 years of experiencein teaching, research, and industry. His areas of research include wireless sensornetworks, cloud computing, big data, Internet of Things, and networks virtualiza-tion. He is a life member of the Indian Society of Technical Education (ISTE), asenior member of IEEE, member of ACM, and life member of the ComputerSociety of India (CSI). He has published several papers in leading internationaljournals, conference proceedings and books. He is involved in organizing variousinternational conferences.
Dr. Chhabi Rani Panigrahi is an Assistant Professor at the Department ofComputer Science at the Rama Devi Women’s University, Bhubaneswar, India.She holds a doctoral degree from the Indian Institute of Technology Kharagpur.Her research interests include software testing and mobile cloud computing. Shehas more than 18 years of teaching and research experience, and has publishednumerous articles in leading international journals and conferences. Dr. Panigrahi isthe author of a number of books, including a textbook and. She is a life memberof the Indian Society of Technical Education (ISTE) and a member of IEEE and theComputer Society of India (CSI).
Dr. Rajkumar Buyya is a Redmond Barry Distinguished Professor of ComputerScience and Software Engineering and Director of the Cloud Computing andDistributed Systems (CLOUDS) Laboratory at the University of Melbourne,Australia. He is also serving as the founding CEO of Manjrasoft, a spin-off com-pany of the university, commercializing its innovations in cloud computing. Heserved as a future fellow of the Australian Research Council during 2012–2016. Hehas authored over 525 publications and seven textbooks including “MasteringCloud Computing” published by McGraw Hill, China Machine Press, and MorganKaufmann for Indian, Chinese, and international markets, respectively. He has alsoedited several books including “Cloud Computing: Principles and Paradigms”
xv
(Wiley Press, USA, Feb 2011). He is one of the highly cited authors in computerscience and software engineering worldwide (h-index=112, g-index=245, 63,900+citations). Microsoft Academic Search Index ranked Dr. Buyya as #1 author in theworld (2005–2016) for both field rating and citation evaluations in the area ofDistributed and Parallel Computing. Recently, Dr. Buyya is recognized as “2016Web of Science Highly Cited Researcher” by Thomson Reuters.
Prof. Kuan-Ching Li is a Professor in the Department of Computer Science andInformation Engineering at the Providence University, Taiwan. He has been thevice-dean for the Office of International and Cross-Strait Affairs (OIA) at the sameuniversity since 2014. Prof. Li is the recipient of awards from Nvidia, the Ministryof Education (MOE)/Taiwan and the Ministry of Science and Technology (MOST)/Taiwan, and also guest professorship from various universities in China. Hereceived his Ph.D. from the University of Sao Paulo, Brazil, in 2001. His areas ofresearch include networked and graphics processing unit (GPU) computing, parallelsoftware design, and performance evaluation and benchmarking. He has edited twobooks: Cloud Computing and Digital Media and Big Data published by CRC Press.He is a fellow of the IET, senior member of the IEEE, and a member of TACC.
xvi About the Editors
Machine Learning Applications
Effect of Dimensionality Reductionon Classification Accuracyfor Protein–Protein InteractionPrediction
Satyajit Mahapatra, Anish Kumar, Animesh Sharmaand Sitanshu Sekhar Sahu
Abstract “Large Dimension” of features derived from protein sequences is a majorproblem in protein–protein interaction (PPI) prediction. Thus, reduction of the fea-ture dimension may increase the classification accuracy. In this paper, particle swarmoptimization (PSO) and principal component analysis (PCA) have been used fordimensionality reduction of PPI sequence features. The performance of the algo-rithm has been assessed using the intraspecies E coli protein–protein interactiondatabase, containing an equal number of positive and negative interacting pairs.Standard sequence-based features such as amino acid composition (AAC), dipep-tide composition (Dipep), and conjoint triad composition (CTD) are extracted. Fromthe results, it is seen that the PSO-based dimensionality reduction method providessteady and better performance in terms of accuracy when applied to the features.
Key terms Protein–protein interaction · Feature extraction · Dimensionalityreduction · Machine learning
1 Introduction
Interaction between DNA, RNA, and proteins plays significant roles in the executionof cellular processes and cell functions. Among this, protein–protein interaction pos-sesses a vital role in facilitating essential cellular and molecular processes. Proteins
S. Mahapatra (B) · A. Kumar · A. Sharma · S. S. SahuDepartment of Electronics and Communication Engineering, Birla Institute of Technology Mesra,Ranchi, Indiae-mail: [email protected]
A. Kumare-mail: [email protected]
A. Sharmae-mail: [email protected]
S. S. Sahue-mail: [email protected]
© Springer Nature Singapore Pte Ltd. 2020B. Pati et al. (eds.), Advanced Computing and Intelligent Engineering,Advances in Intelligent Systems and Computing 1082,https://doi.org/10.1007/978-981-15-1081-6_1
3
4 S. Mahapatra et al.
are represented by a polypeptide chain. One protein differs from others due to differ-ence in arrangement of the amino acids in the polypeptide chain. When two or moreproteins unite to carry out their respective biological functions, then the process canbe termed as protein–protein interaction (PPI). Prior information about the PPIs canbe helpful for better understanding of disease mechanism that can become the basisfor newmethods of remedies for diseases. Although a number of experimental meth-ods have been designed to analyze PPI, they remain expensive and time-consuming.Therefore, computational methods are preferred as an alternative to predict PPI.Results obtained from computational methods can be used as an efficient guide forexperimental methods.
Barman et al. have studied intraspecies protein–protein interaction inenteropathogens using features such as domain–domain association (DDA), AAC,Dipep, and degree of amino acid for developing SVM predictor [1]. Zhao et al. havepredicted protein–protein interaction using deep neural networks [2]. Cui et al. havestudied a human virus interaction model using conjoint triad feature and have usedSVM for prediction [3]. Barman et al. have made a comparison study on SVM,random forest, and Naïve Bayes predictor for prediction of host and virus basedon the fivefold cross-validation test [4]. The above features are of huge dimensionsresulting in the curse of dimensionality. Dimensionality reduction algorithms can beapplied on the feature matrix, and can enhance the predictor’s accuracy. Some of thedimensionality reduction techniques used for image data analysis and microarraydata classification have been discussed below.
Zhuo et al. have made a comparative study of six dimensionality reduction meth-ods. These methods are PCA, LLP, LLE, FLDA, LFDA, and ISOMAP and theirstudy shows that LLP and LLE can effectively reduce the dimension which helpsin increasing the accuracy in image retrieval [5]. Sun et al. have proposed an all-dimensional neighborhood (ADN)-based PSO for improving the local search capa-bility around the Gbest in PSO [6]. Sahu and Mishra have proposed a PSO-basedfiltering technique for high-dimensional microarray data classification [7]. Xue et al.have proposed a multi-objective PSO for feature selection for removing redundantand irrelevant features [8]. Hameed et al. have proposed a GBPSO-SVM algorithmfor improving feature selection and classification process for autism spectrum dis-order [9]. Han et al. have proposed a feature selection algorithm using binary PSOencoding gene-to-class sensitivity information [10].
This paper is divided into five sections. Motivation and introduction are given inSect. 1. Section 2 contains information on datasets and algorithm. Section 3 describessupport vector machine classifier and performance evaluation parameters. Resultsand analysis are given in Sect. 4. Finally, conclusion and future scope are reportedin Sect. 5.
Effect of Dimensionality Reduction on Classification Accuracy … 5
2 Materials and Methods
To performprotein–protein interaction (PPI) prediction, first, the character sequencesare converted into its numeric values, otherwise, called features. These features arethen filtered to find out the dominant features. The database is then divided into twoparts (80% for training and 20% for independent testing) and is given to the classifier(SVM). Then the accuracy, MCC, sensitivity, and specificity are calculated. Figure 1shows the implementation of the above PPI prediction method.
2.1 Dataset
In this study for the development of a prediction model, intraspecies E coli pro-tein–protein interaction data have been used. The datasets have been collected fromEnPPIpred: Enteropathogen Protein–Protein Interactions (PPIs) Prediction (http://bicresources.jcbose.ac.in/ssaha4/EnPPIpred/) [1]. It contains 3886 positive and equalnumber of negative interacting pairs.
2.2 Feature Extraction Technique
2.2.1 Amino Acid Composition (AAC)
This technique provides the normalized frequency of occurrences of amino acids in apolypeptide chain. The numbers of occurrences of amino acids are divided by lengthof polypeptide chain, for normalization. Combining the features of a pair of proteinsequence we get 40-dimensional feature vector. It is given by
X(si ) = f1(si )∑20
i=1 f1(si )i = 1, 2, 3 . . . . . . 20 (1)
Fig. 1 Implementation of the prediction algorithm
6 S. Mahapatra et al.
In the above equation, X(si ) represents sequence features and f1(si ) representsthe frequency count of ith amino acid [1].
2.2.2 Dipeptide Amino Acid Composition
In this technique, 400 fragments will be generated for each protein sequence. Thesefragments are the fractions of dipeptides, i.e., AA, AC, AD, … YV, YW, and YY.Combining the features of a pair of protein sequence we get 800-dimensional featurevector. The dipeptide features are given by
X(si ) = f1(si )∑400
i=1 f1(si )i = 1, 2, 3 . . . 400 (2)
In the above equation, X(si ) represents the 400 dipeptides and f1(si ) is thefrequency count of ith dipeptide [1, 11].
2.2.3 Conjoint Triad
In this technique, the 20 amino acids are grouped into seven different classes, i.e.,{A,G,V}, {I,L,F,P}, {Y,M,T,S}, {H,N,Q,W}, {R,K}, {D,E}, {C} on the basis ofvolumes of the side chains and dipoles. Features obtained are normalized frequencyof 3-mer in the seven class representation of the protein sequence [13, 14]. Combiningthe features of a pair of protein sequence we get 686-dimensional feature vector.
2.3 Dimensionality Reduction
2.3.1 Principal Component Analysis (PCA)
It is an unsupervised linear dimensionality reduction method, which is used forprojection of a data space into smaller dimensional space by using orthogonaltransformation. The process used for reduction is given below:
I. At first, find out the mean.II. Then go for an element by element subtraction of mean from its data, in order
to form the covariance matrix.
COV =N∑
i=1
(xi − μ)(xi − μ)T (3)
III. Find the eigenvalues using this covariance matrix.
Effect of Dimensionality Reduction on Classification Accuracy … 7
IV. Now sort the eigenvectors in descending order and extract the required numberof vectors according to the required number of dimensions.
V. Principal component scores = input × eigenvectors.
2.3.2 Particle Swarm Optimization (PSO)
I. It is an optimization technique based on population, which evaluates theobjective function, here standard deviation, at each particle.
II. The best position is calculated after each iteration and the new velocity of theparticles is then evaluated.
Velocity update:
[Vi (t + 1) = W × Vi (t) + C1 × rand × (pbest(t) − Xi (t) + C2 × rand × (gbest(t) − Xi (t))
]
(4)
Position update:Xi (t + 1) = Xi (t) + Vi (t + 1) (5)
III. For dimensionality reduction to a specific number of columns, first, the requirednumbers of columns are selected randomly. Then the column numbers areupdated using PSO to obtain the best cost on the basis of a minimum of standarddeviation.
IV. Implementation of PSO-based dimensionality reduction is given below:
Step1: Initialization of ConstantsWeightconstant1constant2Step2: Randomly initialize Position of the particlesPositions = zeros(population,nOfSelection);fori = 1:populationtempVar = randperm(Boundary(2));Positions(i,:) = tempVar(1:nOfSelection);endStep3: Randomly initialize velocity of the particlesVelocity = ones(population,of selection);Step4: Initialization of the Global bestStep5: Calculating Fitness ValuesStep6: Updating the particle bestStep7: Updating the Global BestStep8: Velocity UpdateStep9: Position UpdateStep10: Boundary Checking for PositionPositions(Positions > Boundary(2))= round(rand(1)*(Boundary(2)-1)) + 1;
8 S. Mahapatra et al.
Positions(Positions < Boundary(1))= round(rand(1)*(Boundary(2)-1)) + 1;Step11: To avoid repeating same positionif length(unique(Positions(i,:))) ~ = nOfSelectionStep12: New Positions are selected randomly againEnd while when the No. of maximum iterations is reached
3 Classification and Evaluation
3.1 Support Vector Machines
In this study, support vector machine (SVM) developed by Vapnik has been used forprediction of interacting and noninteracting protein pairs [12]. SVM is supervisedlearning machines that are associated with learning algorithms for analysis of data.For model development, first, the SVM is trained using a set of training data eachtagged with its respective class and then a set of testing data is given whose class isto be predicted by the model.
A hyperplane or margin is necessary that gives the maximum distance ofseparation between the classes.
The hyperplane is represented by equation below:
h(x) = wT + b (6)
where ′w′ and ′b′ are the weight and bias vectors, respectively.Minimizing the cost function, hyperplane is obtained
J (ω) = 1
2wTw = 1
2‖w‖2 (7)
Subject to the constraints di[wT xi + b
] ≥ 1 i = 1, 2, . . . , N (8)
In this paper, Scikit-learn package in Python has been used for implementationof SVM.
3.2 Evaluation
Performance is evaluated using fivefold cross-validation tests. Four different param-eters, i.e., sensitivity, specificity, accuracy, and MCC are calculated [13, 15].
Effect of Dimensionality Reduction on Classification Accuracy … 9
Accuracy(Acc) = Pos++Neg+Pos++Neg++Pos−+Neg−
Sensitivity(Se) = Pos+Pos++Pos−
Specificity(Sp) = Neg+Neg++Neg−
Mcc = (Pos+×Neg+)−(Pos−×Neg−)√(Pos++Pos−)×(Pos++Neg−)×(Neg++Pos−)×(Neg++Neg−)
(Pos+
)is the count positive samples classified as positive.
(Neg−)
is the countpositive samples classified as negative.
(Pos−
)is the count negative samples classified
as positive.(Neg+)
is the count negative samples classified as negative.
4 Result and Discussions
In this paper, for assessment of prediction accuracy, we have used three commonlyused PPI sequence information extraction methods such as AAC, Dipep, and CTD.The above feature extraction techniques are applied on a benchmark intraspecies Ecoli dataset provided by Barman et al. available at EnPPIpred [1]. Then the dimen-sionality reduction algorithm is applied to reduce the feature dimension keeping thenumber of samples constant. AAC contains 40 features and upon application of adimensionality reduction technique, it is reduced to 15. Similarly, Dipep and con-joint triad contain 800 and 686 features which are reduced to 40 after application ofdimensionality reduction algorithm.
It is evident from the Andrews plot (Figs. 2 and 3) that after reduction of features,there is a clear difference between the pattern of positive and negative samples.
t0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
f(t)
-1
-0.5
0
0.5
1
1.5 AAC01
Fig. 2 AAC featureswithout reduction (“0” represents negative samples and “1” represents positivesamples)
10 S. Mahapatra et al.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1t
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
f(t)
AAC-PSO01
Fig. 3 AAC features reduced using PSO (“0” represents negative samples and “1” representspositive samples)
So, machine learning techniques used on the reduced features will provide betterclassification accuracy. In the plot shown in Figs. 2 and 3, “0” and “1” representnegative and positive samples, respectively.
4.1 Comparison of Dimensionality Reduction Algorithmon an Intraspecies Dataset
The database has been divided into two parts, i.e., 80% for training by fivefoldcross-validation and 20% for independent testing.
The accuracy obtained using AAC features is 79.65, for PCA-AAC is 99.80, forPSO-AAC is 99.42, and for HMS-AAC is 99 as shown in Table 1.
Using dipeptide composition, the accuracy obtained is 83.15, for PCA-DIPEP,the accuracy is 99.80, for PSO-DIPEP, the accuracy is 96.37, and for HMS-DIPEP,the accuracy is 87.43 as shown in Table 2.
Table 1 Performance of SVM using amino acid composition
Features Accuracy Sensitivity Specificity MCC
AAC (c = 1000, γ = 0.5, rbf) 79.65 76.45 82.85 59.42
PCA-AAC (c = 1000, linear) 99.80 99.74 99.87 99.61
PSO-AAC (c = 1000, γ = 0.5, rbf) 99.42 99.48 99.35 98.83
Effect of Dimensionality Reduction on Classification Accuracy … 11
Table 2 Performance of SVM using dipeptide composition
Features Accuracy Sensitivity Specificity MCC
DIPEP (c = 100, γ = 0.5, rbf) 83.15 77.74 88.57 66.70
PCA-DIPEP (c = 10, linear) 99.80 99.74 99.87 99.61
PSO-DIPEP (c = 1000, linear) 96.37 96.24 96.49 92.74
Table 3 Performance of SVM using conjoint triad composition
Features Accuracy Sensitivity Specificity MCC
Conjoint triad (c = 10,000, linear) 80.10 76.32 83.90 60.40
PCA-conjoint triad (c = 10,000, γ = 0.5, rbf) 78.41 72.96 83.96 57.2
PSO-conjoint triad (c = 1000, linear) 98.51 99.60 97.40 97.05
The accuracy obtained using conjoint triad composition is 80.10, for the PCA-conjoint triad, the accuracy is 78.41, for the PSO-conjoint triad, the accuracy is 98.51,for the HMS-conjoint triad, the accuracy is 99.81 as shown in Table 3.
From Tables 1, 2, and 3, it has been observed that PCA provided the best accuracyfor AAC and dipeptide features but failed in case of conjoint triad features. Similarly,HMS provided the best accuracy in the case of conjoint triad and comparable in caseof AAC features. The accuracy of PSO is comparable with the best accuracy providedby PCA (for AAC and dipeptide) and HMS (for a conjoint triad).
Although PCA gives comparable results in many cases, it fails to provide suitableresults in some cases because PCA-based dimensionality reduction provides derivedfeatures which sometimes disturb the pattern of the original features. In PSO-baseddimensionality reduction, the reduced features are among the original set of featuresdue to which the pattern is conserved.
5 Conclusion
In this paper, dimensionality reduction technique such as PSOandPCAhas been usedfor enhancement of performance of predictor. It has been seen that reduced featuresprovide 15–18% more accuracy. Through exhaustive simulation, it has been foundthat PSO-based dimensionality reduction provides steady performance in terms ofaccuracy, sensitivity, and MCC.
In future, this work can be extended for analysis of interspecies protein–proteininteraction.
Acknowledgements This work has been partly supported by SERB (ECR/2017/000345) and iscarried out at Signal Processing lab of ECE department, BIT Mesra.
12 S. Mahapatra et al.
References
1. Barman, R.K., Jana, T., Das, S., Saha, S.: Prediction of intra-species protein-protein interactionsin enteropathogens facilitating systems biology study. PLoS One 10, 1–9 (2015). https://doi.org/10.1371/journal.pone.0145648
2. Zhao, Z., Yang, Z., Lin, H., Wang, J.: Aprotein-protein interaction extraction approach basedon deep neural network. Int. J. Data Min. Bioinform. 15, 145–164 (2016)
3. Cui,G., Fang,C.,Han,K.: Prediction of protein-protein interactions betweenviruses andhumanby an SVMmodel. BMCBioinform. 13, S5 (2012). https://doi.org/10.1186/1471-2105-13-S7-S510
4. Barman, R.K., Saha, S., Das, S.: Prediction of interactions between viral and host proteinsusing supervised machine learningmethods. PLoSOne 9 https://doi.org/10.1371/journal.pone.0112034 (2014)
5. Zhuo, L., Cheng, B., Zhang, J.: A comparative study of dimensionality reduction methods forlarge-scale image retrieval. Neurocomputing 141, 202–210 (2014). https://doi.org/10.1016/j.neucom.2014.03.014
6. Sun,W., Lin, A., Yu, H., et al.: All-dimension neighborhood-based particle swarm optimizationwith randomly selected neighbors. Inf. Sci. (NY) 405, 141–156 (2017). https://doi.org/10.1016/j.ins.2017.04.007
7. Sahu, B., Mishra, D.: A novel feature selection algorithm using particle swarm optimizationfor cancer microarray data. Procedia Eng 38, 27–31 (2012). https://doi.org/10.1016/j.proeng.2012.06.005
8. Xue, B., Zhang, M.J., Browne, W.N.: Particle swarm optimization for feature selection inclassification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013). https://doi.org/10.1109/TSMCB.2012.2227469
9. Hameed, S.S., Hassan, R., Muhammad, F.F.: Selection and classification of gene expression inautism disorder: use of a combination of statistical filters and a GBPSO-SVM algorithm. PLoSOne 12, 1–25 (2017). https://doi.org/10.1371/journal.pone.0187371
10. Han, F., Yang, C., Wu, Y.Q., et al.: A gene selection method for microarray data based onbinary PSO encoding gene-to-class sensitivity information. IEEE/ACM Trans. Comput. Biol.Bioinform. 14, 85–96 (2017). https://doi.org/10.1109/TCBB.2015.2465906
11. Lin, H., Ding, H.: Predicting ion channels and their types by the dipeptide mode of pseudoamino acid composition. J Theor. Biol. 269, 64–69 (2011). https://doi.org/10.1016/j.jtbi.2010.10.019
12. Vapnik, V. N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (2000)13. You, Z.-H., Lei, Y.-K., Zhu, L., et al.: Prediction of protein-protein interactions from amino
acid sequences with ensemble extreme learning machines and principal component analysis.BMC Bioinform. 14, S10 (2013). https://doi.org/10.1186/1471-2105-14-S8-S10
14. Shen, J., Zhang, J., Luo, X., et al.: Predicting protein-protein interactions based only onsequences information. Proc. Natl. Acad. Sci. USA 104, 4337–4341 (2007). https://doi.org/10.1073/pnas.0607879104
15. Pandey, C., Sandeep, R., Priyam, A., Mahapatra, S., SahuS, S.: Predicting protein–RNAinteraction using sequence derived features and machine learning approach. Int. J. Data MinBioinform. 19(3), 270–282 (2017)
Early Detection of Breast Cancer UsingSupport Vector Machine With SequentialMinimal Optimization
Kumar Avinash, M. B. Bijoy and P. B. Jayaraj
Abstract Breast cancer is largely occurring cancer disease and one of the lead-ing causes of death among the women in the world. Studies have shown that earlydetection can bring down the mortality rate significantly. Mammography is the mostpopular cancer detection technique but it is painful as well as it uses X-ray to captureimages which spreads radiation in the body. Radiation is one of the causes of breastcancer. Thermography is a method of detection which is noninvasive, painless, andradiation-free. By seeing these thermal images, doctors can’t say whether the patientis having cancer or not that’s why they use computer-aided diagnosis (CAD) to seethe status by feeding these thermal images to model. The detection of breast cancerfrom these data is very crucial and needs some sophisticated image processing andmachine learning techniques.Modernmachine learning technique has become a pop-ular tool for the diagnosis of breast cancer. Among various methods, support vectormachine (SVM) classification is one of the most popular supervised learning meth-ods. Performance of SVM classification by using sequential minimal optimization(SMO) algorithm is evaluated and our proposed model is giving better result in termsof accuracy (94.6%), recall (89.5%), and execution time (0.085 s) onWisconsin dataset.
Keywords Breast cancer · Preprocessing · Feature extraction · Classification ·Sequential Minimal Optimization
K. Avinash · M. B. Bijoy · P. B. Jayaraj (B)Department of Computer Science and Engineering, National Instituteof Technology Calicut, Calicut, Indiae-mail: [email protected]
K. Avinashe-mail: [email protected]
M. B. Bijoye-mail: [email protected]
© Springer Nature Singapore Pte Ltd. 2020B. Pati et al. (eds.), Advanced Computing and Intelligent Engineering,Advances in Intelligent Systems and Computing 1082,https://doi.org/10.1007/978-981-15-1081-6_2
13
14 K. Avinash et al.
1 Introduction
Breast cancer is a disease which is caused by multiple factors like inheritance, tissuecomposition, carcinogens, immunity levels, hormones, radiation, etc. When a cellbecomes cancerous, it’s temperature increases and due to this, the surrounding cellsalso start becoming cancerous that’swhy it spreads very fast.Mammography is one ofthemost popularmethods to detect breast cancer. It is themanualmethod of detectingbreast cancer with the help of experts to identify the tumors in breast tissue. Thismethod is painful as well as it involves radiation since the X-ray image is being takenof each breast. There are various methods available like magnetic resonance imaging(MRI) and ultrasonography which assist the CAD system in detecting breast cancer[1]. MRI uses magnetic and electric fields, gradients, and radio waves to generateimages of the body parts. In ultrasonography, images of body parts are produced usingsound waves. It helps in diagnosis and the causes of pain, swelling, and infection inthe body’s organs.
Thermography is a noninvasive, painless, noncontact, and non-radiation methodto detect breast cancer. Since humans are warm-blooded animal, they emit heatradiation. Thermal image of a human body contains temperature of each cell byusing which prediction of the cells as benign or malignant can be done. Differentfeatures like contrast, area, radius, perimeter, mean, median, etc., of these thermalimages play a crucial role in detecting cancer. Thermography can detect breast cancer8–10 years earlier than all the other methods mentioned above [2].
Breast cancer detection from the thermographic image data needs some prepro-cessing like noise removal, image enhancement, etc., before feeding it to the algo-rithm. This is because of the reason that images may contain some noises due tothe local environment and human errors which can cause redundancy later. Then,segmentation will be done for each of these images in order to separate the left breastand right breast so that we can extract the features from each of these images. In thefeature extraction process, each feature will be stored as a vector. After getting allthe desired features from these images, they will be fed to our proposed algorithmfor the classification.
Image segmentation helps in feature extraction. After separating the left breastand right breast from the thermal image redundancy can be easily avoided in theextracted data. With image segmentation, we can also remove the undesired edgesand the useless areas of the image which is not going to help in decision making.Figure 1 shows the thermal image of a patient’s breast [3]. Figures 2 and 3 show theleft and right breast images after noise removal and segmentation have been applied.
Early Detection of Breast Cancer Using Support Vector … 15
Fig. 1 Thermal image of apatient’s breast [3]
Fig. 2 Left breast imageafter segmentation
Fig. 3 Right breast imageafter segmentation
16 K. Avinash et al.
2 Review of Literature
There aremany techniques for early detection of breast cancer using variousmethods.Some of the machine learning related methods are described below.
2.1 Machine Learning Based Related Work for BreastCancer Detection
In the early detection of breast cancer, support vectors machines play an importantrole by classifying the datawith better precision and accuracy.G.R. Suresh et al., havesuggested an approach to detect breast cancer using thermographic color analysisand SVM classifier [1]. The hottest region of breast thermograms was detected usingthree image segmentation techniques: k-means, fuzzy c-means (FCM), and level set.Experimental results have shown that the level set method has a better performancethan other methods as it could highlight the hottest region more precisely.
Ryszard S. Choras suggested an approach in which he has used various imageprocessing techniques likeGaborWavelets and texture parameter evaluation to detectbenign and malignant breast through thermographic grayscale images [4]. Gaborwavelet is a powerful tool to extract texture features and in the spatial domain, is acomplex exponential modulated by a Gaussian function.
Rozita Rastghalam et al., have suggested an approach using the spectral probablefeature on thermographic images to detect the abnormal pattern in the breast. Thegray level co-occurrence matrix is made from image spectrum to obtain spectralco-occurrence feature. This method is not sufficient to extract the spectral probablefeature, so they have optimized this matrix and defined it as a feature vector. Inthis method, normal and abnormal patterns have been separated from each otherthrough the new procedure in which for every image, each of the spectrum featuresis mentioned [5]. There are many other works which are reported in this area [6–10].
3 Motivation
After the experiment with various classifiers, it is found that SVM is giving betterresults. The accuracy of SVM classification used in some models is marginal butcouldn’t exploit the maximization. Our attempt here is to improve the performanceof SVM classification by applying SMO for early detection of breast cancer.