ijitce dec 2011

UK: Managing Editor International Journal of Innovative Technology and Creative Engineering 1a park lane, Cranford London TW59WA UK E-Mail: [email protected] Phone: +44-773-043-0249

USA: Editor International Journal of Innovative Technology and Creative Engineering Dr. Arumugam Department of Chemistry University of Georgia GA-30602, USA. Phone: 001-706-206-0812 Fax:001-706-542-2626

India: Editor International Journal of Innovative Technology & Creative Engineering Dr. Arthanariee. A. M Finance Tracking Center India 17/14 Ganapathy Nagar 2nd Street Ekkattuthangal Chennai -600032Mobile: 91-7598208700

www.ijitce.co.uk

INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY & CREATIVE ENGINEERING (ISSN:2045-8711) VOL.1 NO.12 DECEMBER 2011

IJITCE PUBLICATION

INTERNATIONAL JOURNAL OF INNOVATIVE

TECHNOLOGY & CREATIVE ENGINEERING

Vol.1 No.12

December 2011

www.ijitce.co.uk


From Editor's Desk

Dear Researcher, Greetings! This monthly journal contains topics about Data mining, routing protocol, encryption algorithm and detection of monotonic graph. Let us review world research focus for this month. Apple was granted its first fuel cell related patent about six years ago. It involved the use of Liquidmetal as a catalyst for isolating hydrogen to produce electricity. Looks like the infamous patent pusher hasn't given up on fuel cell technology yet, as it has filed a fresh patent focusing on hydrogen fuel cells. The patent applications reveal little of exactly how or where the fuel cell will be implemented. The description and diagrams submitted by Apple detail implementations with the capability to recharge a smaller battery, as well as get refuelled by it. Unlike vehicular hydrogen cells, sealed implementations for consumer electronics cannot be refuelled easily. Therefore, such cells have to be regenerated by using the batteries to reverse the chemical process. When perfected, this technology may be used in Apple's future devices to deliver unprecedented battery life. Threats like Stuxnet, which is credited with setting back Iran's nuclear program by several years, or its successor, Duqu, have shocked the security industry with their level of sophistication. Experts believe that they are only the beginning and that more highly advanced malware will be launched in 2012. Countries like the U.S., U.K., Germany, China and India have established specialized teams and centers to defend government assets against cyberattacks and to even retaliate, if necessary. However, determining who is behind Internet-based hostile operations with certainty is impossible most of the time and that's just one of the problems. The Internet ran out of IPv4 address space in early February when the Internet Assigned Numbers Authority assigned two of the remaining blocks of IPv4 addresses - each containing 16.7 million addresses - to the Asia Pacific Network Information Centre. This action sparked an immediate distribution of the remaining five blocks of IPv4 address space, with one block going to each of the five Regional Internet Registries. Dell is longer interested in selling Netbooks--that category of 10-inch class laptops that saw mild success for a couple of years but is now facing a serious existential crisis. It has been an absolute pleasure to present you articles that you wish to read. We look forward to many more new technology-related research articles from you and your friends. We are anxiously awaiting the rich and thorough research papers that have been prepared by our authors for the next issue. Thanks, Editorial Team IJITCE


Editorial Members Dr. Chee Kyun Ng Ph.D Department of Computer and Communication Systems, Faculty of Engineering, Universiti Putra Malaysia,UPM Serdang, 43400 Selangor,Malaysia. Dr. Simon SEE Ph.D Chief Technologist and Technical Director at Oracle Corporation, Associate Professor (Adjunct) at Nanyang Technological University Professor (Adjunct) at Shangai Jiaotong University, 27 West Coast Rise #08-12,Singapore 127470 Dr. sc.agr. Horst Juergen SCHWARTZ Ph.D, Humboldt-University of Berlin, Faculty of Agriculture and Horticulture, Asternplatz 2a, D-12203 Berlin, Germany Dr. Marco L. Bianchini Ph.D Italian National Research Council; IBAF-CNR, Via Salaria km 29.300, 00015 Monterotondo Scalo (RM), Italy Dr. Nijad Kabbara Ph.D Marine Research Centre / Remote Sensing Centre/ National Council for Scientific Research, P. O. Box: 189 Jounieh, Lebanon Dr. Aaron Solomon Ph.D Department of Computer Science, National Chi Nan University, No. 303, University Road, Puli Town, Nantou County 54561, Taiwan Dr. Arthanariee. A. M M.Sc.,M.Phil.,M.S.,Ph.D Director - Bharathidasan School of Computer Applications, Ellispettai, Erode, Tamil Nadu,India Dr. Takaharu KAMEOKA, Ph.D Professor, Laboratory of Food, Environmental & Cultural Informatics Division of Sustainable Resource Sciences, Graduate School of Bioresources, Mie University, 1577 Kurimamachiya-cho, Tsu, Mie, 514-8507, Japan Mr. M. Sivakumar M.C.A.,ITIL.,PRINCE2.,ISTQB.,OCP.,ICP Project Manager - Software, Applied Materials, 1a park lane, cranford, UK Dr. Bulent Acma Ph.D Anadolu University, Department of Economics, Unit of Southeastern Anatolia Project(GAP), 26470 Eskisehir, TURKEY Dr. Selvanathan Arumugam Ph.D Research Scientist, Department of Chemistry, University of Georgia, GA-30602, USA.

Review Board Members

Dr. T. Christopher, Ph.D., Assistant Professor & Head,Department of Computer Science,Government Arts College(Autonomous),Udumalpet, India. Dr. T. DEVI Ph.D. Engg. (Warwick, UK), Head,Department of Computer Applications,Bharathiar University,Coimbatore-641 046, India. Dr. Giuseppe Baldacchini ENEA - Frascati Research Center, Via Enrico Fermi 45 - P.O. Box 65,00044 Frascati, Roma, ITALY. Dr. Renato J. orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business School,Rua Itapeva, 474 (8° andar) ,01332-000, São Paulo (SP), Brazil Visiting Scholar at INSEAD,INSEAD Social Innovation Centre,Boulevard de Constance,77305 Fontainebleau - France Y. Benal Yurtlu Assist. Prof. Ondokuz Mayis University Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic & Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168 Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India


Chutima Boonthum-Denecke, Ph.D Department of Computer Science,Science & Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688 Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business SchoolRua Itapeva, 474 (8° andar), 01332-000, São Paulo (SP), Brazil Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 Javad Robati Crop Production Departement,University of Maragheh,Golshahr,Maragheh,Iran Vinesh Sukumar (PhD, MBA) Product Engineering Segment Manager, Imaging Products, Aptina Imaging Inc. doc. Ing. Rostislav Choteborský, Ph.D. Katedra materiálu a strojírenské technologie Technická fakulta,Ceská zemedelská univerzita v Praze,Kamýcká 129, Praha 6, 165 21 Dr. Binod Kumar M.sc,M.C.A.,M.Phil.,ph.d, HOD & Associate Professor, Lakshmi Narayan College of Tech.(LNCT), Kolua, Bhopal (MP) , India. Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic & Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168 DR.Chutima Boonthum-Denecke, Ph.D Department of Computer Science,Science & Technology Bldg.,Hampton University,Hampton, VA 23688 Mr. Abhishek Taneja B.sc(Electronics),M.B.E,M.C.A., M.Phil., Assistant Professor in the Department of Computer Science & Applications, at Dronacharya Institute of Management and Technology, Kurukshetra. (India). doc. Ing. Rostislav Chot ěborský,ph.d, Katedra materiálu a strojírenské technologie, Technická fakulta,Česká zemědělská univerzita v Praze,Kamýcká 129, Praha 6, 165 21 Dr. Amala VijayaSelvi Rajan, B.sc,Ph.d, Faculty – Information Technology Dubai Women’s College – Higher Colleges of Technology,P.O. Box – 16062, Dubai, UAE Naik Nitin Ashokrao B.sc,M.Sc Lecturer in Yeshwant Mahavidyalaya Nanded University Dr.A.Kathirvell, B.E, M.E, Ph.D,MISTE, MIACSIT, MEN GG Professor - Department of Computer Science and Engineering,Tagore Engineering College, Chennai Dr. H. S. Fadewar B.sc,M.sc,M.Phil.,ph.d,PGDBM,B.Ed . Associate Professor - Sinhgad Institute of Management & Computer Application, Mumbai-Banglore Westernly Express Way Narhe, Pune - 41 Dr. David Batten Leader, Algal Pre-Feasibility Study,Transport Technologies and Sustainable Fuels,CSIRO Energy Transformed Flagship Private Bag 1,Aspendale, Vic. 3195,AUSTRALIA Dr R C Panda (MTech & PhD(IITM);Ex-Faculty (Curtin Univ Tech, Perth, Australia))Scientist CLRI (CSIR), Adyar, Chennai - 600 020,India Miss Jing He PH.D. Candidate of Georgia State University,1450 Willow Lake Dr. NE,Atlanta, GA, 30329 Dr. Wael M. G. Ibrahim Department Head-Electronics Engineering Technology Dept.School of Engineering Technology ECPI College of Technology 5501 Greenwich Road - Suite 100,Virginia Beach, VA 23462 Dr. Messaoud Jake Bahoura Associate Professor-Engineering Department and Center for Materials Research Norfolk State University,700 Park avenue,Norfolk, VA 23504 Dr. V. P. Eswaramurthy M.C.A., M.Phil., Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. P. Kamakkannan,M.C.A., Ph.D ., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India.


Dr. V. Karthikeyani Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 008, India. Dr. K. Thangadurai Ph.D., Assistant Professor, Department of Computer Science, Government Arts College ( Autonomous ), Karur - 639 005,India. Dr. N. Maheswari Ph.D., Assistant Professor, Department of MCA, Faculty of Engineering and Technology, SRM University, Kattangulathur, Kanchipiram Dt - 603 203, India. Mr. Md. Musfique Anwar B.Sc(Engg.) Lecturer, Computer Science & Engineering Department, Jahangirnagar University, Savar, Dhaka, Bangladesh. Mrs. Smitha Ramachandran M.Sc(CS)., SAP Analyst, Akzonobel, Slough, United Kingdom. Dr. V. Vallimayil Ph.D., Director, Department of MCA, Vivekanandha Business School For Women, Elayampalayam, Tiruchengode - 637 205, India. Mr. M. Rajasenathipathi M.C.A., M.Phil Assistant professor, Department of Computer Science, Nallamuthu Gounder Mahalingam College, India. Mr. M. Moorthi M.C.A., M.Phil., Assistant Professor, Department of computer Applications, Kongu Arts and Science College, India Prema Selvaraj Bsc,M.C.A,M.Phil Assistant Professor,Department of Computer Science,KSR College of Arts and Science, Tiruchengode Mr. V. Prabakaran M.C.A., M.Phil Head of the Department, Department of Computer Science, Adharsh Vidhyalaya Arts And Science College For Women, India. Mrs. S. Niraimathi. M.C.A., M.Phil Lecturer, Department of Computer Science, Nallamuthu Gounder Mahalingam College, Pollachi, India. Mr. G. Rajendran M.C.A., M.Phil., N.E.T., PGDBM., P GDBF., Assistant Professor, Department of Computer Science, Government Arts College, Salem, India. Mr. R. Vijayamadheswaran, M.C.A.,M.Phil Lecturer, K.S.R College of Ars & Science, India. Ms.S.Sasikala,M.Sc.,M.Phil.,M.C.A.,PGDPM & IR., Assistant Professor,Department of Computer Science,KSR College of Arts & Science,Tiruchengode - 637215 Mr. V. Pradeep B.E., M.Tech Asst. Professor, Department of Computer Science and Engineering, Tejaa Shakthi Institute of Technology for Women, Coimbatore, India. Dr. Pradeep H Pendse B.E.,M.M.S.,Ph.d Dean - IT,Welingkar Institute of Management Development and Research, Mumbai, India Mr. K. Saravanakumar M.C.A.,M.Phil., M.B.A, M.Tech, PGDBA, PGDPM & IR Asst. Professor, PG Department of Computer Applications, Alliance Business Academy, Bangalore, India. Muhammad Javed Centre for Next Generation Localisation, School of Computing, Dublin City University, Dublin 9, Ireland Dr. G. GOBI Assistant Professor-Department of Physics,Government Arts College,Salem - 636 007 Dr.S.Senthilkumar Research Fellow,Department of Mathematics,National Institute of Technology (REC),Tiruchirappli-620 015, Tamilnadu, India.


Contents [1]. Detecting Monotonic Graph over Edge Connectivity Constraint by Mr.Abhay Bhamaikar,

Dr. P R Rao.......[1]

[2]. Metrics of a new symmetrical encryption algorithm by G. RAMESH,Dr. R. UMARANI.…..…[6]

[3]. Data Mining Applications: A comparative study for Predicting Student’s Performance by Surjeet Kumar Yadav, Brijesh Bharadwaj and Saurabh Pal …….[13]

[4]. An Analysis of Fixed Probabilistic Route Discovery Mechanism using on-demand routing protocols by V.Mathivanan, E.Ramaraj…[20]

INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY & CREATIVE ENGINEERING (ISSN:2045-8711) VOL.1 NO.12 DECEMBER

1

Detecting Monotonic Graph over Edge Connectivity Constraint

Mr.Abhay Bhamaikar #1, Dr. P R Rao *2 # Department of Information Technology,Shree Rayeshwar Institute of Engineering and IT Shiroda – Goa, India.

1 [email protected] * Department of Computer science and technology,Goa University, Goa, India.

2 [email protected]

Abstract - Edge Connectivity parameter of a given graph is widely used in graph mining and finds usag e in wide range of applications like transportation problem , Microarray Data , Bioinformatics etc. Bas ed upon the property of a given graph on edge connectivity constraint the decomposition of graph can be avoided, which in turn reduces computational tim e. The paper focuses at determining whether the given Graph (G) satisfies Monotone property over Edge Connectivity Constraint. It takes into consideratio n the Maximum degree, Minimum degree and Edge Connectivity values of given Graph (G), based on wh ich the algorithm determines if the given graph satisfi es Monotone Property. Key Words: Maximum Degree, Minimum Degree, Edge Connectivity.

I. INTRODUCTION

Graph Mining is gaining importance due to the numerous applications that on graph based data. A graph is composed of set of vertices VG and a set of edges EG. Each vertex has a label (or a identifier) and each edge ei,j � EG connects the vertices vi and vj .In some application the labels are unique and the graph is termed as relational graph whereas if the labels are not unique then the graph is said to be non relational graph. In this we have restricted ourselves to relational graph. The monotone property of graph plays a major role in skyline approach where in the records returned to the user are the are the ones that are not dominated by any other record, where domination is based on the values of the record. Let p and q be two records, each composed of d attributes. We denote by pi (qi) the value of the i-th attribute of p (q) .Record p dominates record q if p is “as good as” q in all

attributes and is “better” than q in atleast one attribute. Skyline processing is scale invariant, it does not require a ranking function, it does not require any threshold and can be used as long as long as the data dimensionality is low. In Skyline approach considering the Edge Connectivity as constraint, Determining of monotone graph will reduce the computational time as the graph need not be decomposed further as the obtained graph will be dominating all its subgraphs over edge connectivity constraints. The edge connectivity of a given graph signifies the measure of coherence of the given graph. The larger value of edge connectivity signifies that the given graph is more coherent.

II. RELATED WORK AND CONTRIBUTION

There is on-going interest in the research community regarding knowledge discovery from graph data (Cook and Holder 2007).In this section , we briefly present fundamental contribution to our work. There has been significant contribution with regard to skyline approach. The recent contribution is with regard to SkyGraph algorithm (Apostolos N. Papadopoulos et al. 2008) where in the edge connectivity and number of vertices has been taken as constraints. Another Major contribution with regard to constraint based mining is FREQT Algorithm (Jeroen De Knij et al.) Edge Connectivity has been applied as a clustering tool, where clusters are formed by the vertices of a graph G that show a high degree of connectivity (Hartuv and Shamir 2000; Wu and Leahy 1993).Our Work is inspired by the work carried out by (Apostolos N. Papadopoulos et al. 2008). The problem we study in this work is formally stated as follows: Given a Relational Graph G, we


2

determine whether the given graph G satisfies Monotonic Constraint over Edge Connectivity Constraint.

III. PRELIMINARIES Definition 1 Graphs: A graph G = (V,E) is a pair in which V is a (non-empty) set of vertices or nodes and E is either a set of edges E C {{v,w} | v, w � v, v � w} or a set of arcs E C {(v,w) | v, w � v, v � w}. In the latter case we call the graph directed. Definition 2 The edge connectivity, λ (G), of a connected Graph G is the minimum number of edges whose removal results in two connected subgraphs. Definition 3 Maximum Degree (∆) of a given Graph (G) is defined as the largest degree over all the vertices. Definition 4 Minimum degree (δ) of a given Graph (G) is defined as the smallest degree over all the vertices. Definition 5 Monotonic Constraint: A monotonic Constraint is a constraint Cm such that for all Subgraphs H derived from a Graph G satisfies Cm if H satisfies it. Monotonic Graph over Edge Connectivity Constraint implies that the set of subgraphs (H) obtained by decomposing the graph (G) will always contain the value of the edge connectivity less than that of Graph (G). IV.MONOTONIC AND ANTI-MONOTONIC GRAPHS

Lemma One

If the graph (G) having edge connectivity one satisfies monotonic property if and only if it is a tree. Proof: Consider a Graph G, Let the edge connectivity of Graph (G) be 1 i.e λ (G) = 1. Since the given graph is tree its edge connectivity is always 1 and the edge connectivity of its induced subgraph is either 1 or 0. Hence a Graph (G) satisfies monotonic property over edge connectivity constraint. Lemma Two If ∆ is the Maximum Degree and δ is the minimum Degree of a given graph (G) then the edge connectivity of the subgraph (H) obtained from the given graph (G) is always less than or equal to Maximum Degree. Proof: i) The Subgraph (H) obtained from given Graph (G) can have Minimum Degree at most ∆.

ii) The edge connectivity of (H) cannot be greater than Minimum Degree (δ). Hence Edge Connectivity of H is at most ∆.

Lemma Three

Consider a Graph G Having maximum degree = ∆, Minimum Degree = δ and Edge Connectivity = λG .Let H be set of all induced subgraphs obtained from Given Graph G having Edge Connectivity λH.

If maximum degree is equal to Minimum Degree (∆ = δ) then the edge connectivity of all the induced subgraph (λH.) is always less than the edge connectivity λG of Given Graph. The Given Graph does not contain any Bridges. Proof: Since the Maximum Degree (∆) = Minimum Degree (δ). Therefore Edge Connectivity of given graph λG is less than equal to Maximum Degree (∆) or Minimum Degree (δ). The Edge Connectivity λH of Subgraph (H) is always less than equal to Maximum Degree (∆) from Lemma One. Hence the given graph will always satisfy monotone constraint over edge connectivity.

Lemma Four

Consider a Graph G Having maximum degree = ∆, Minimum Degree = δ and Edge Connectivity = λG .Let H be set of all induced subgraphs obtained from Given Graph G having Edge Connectivity λH. If the Edge Connectivity λG of Graph (G) is less than Minimum Degree (δ) and there exists clique with Maximum degree equal to δ then there exists Subgraph whose edge connectivity is greater then edge connectivity of Graph (G), Hence Graph Does not Satisfy Monotonic Property. Proof: If λG is edge connectivity of Graph G and δ is minimum degree of Graph G and Edge Connectivity λG less than Minimum Degree δ of graph G and there exist a clique whose minimum degree is equal to δ, then edge connectivity (λH) of clique is equal to δ. This Implies λH is greater than λG.

Hence a given Graph (G) satisfies anti-monotonic property over edge connectivity constraint.

Lemma Five Consider a Graph G Having maximum degree = ∆, Minimum Degree = δ and Edge Connectivity = λG .Let


3

H be set of all induced subgraphs obtained from Given Graph G having Edge Connectivity λH. If the Edge Connectivity λG of Graph (G) is equal to Minimum Degree (δ) and minimum degree is less than equal to n where n is less than Maximum Degree (∆) and there exists clique with Maximum degree equal to n+1 then there exists Subgraph whose edge connectivity is greater then edge connectivity of Graph (G), Hence Graph Does not Satisfy Monotonic Property. Proof: Let λG be edge connectivity of Graph G and δ be minimum degree of Graph G. If λG = δ and δ is less than equal to n and ∆ is greater than n and there exist a clique with minimum degree equal to n + 1, This implies Edge Connectivity (λH) of clique is greater than δ. Hence a given Graph (G) satisfies anti-monotonic property over edge connectivity constraint.

V. ALGORITHM

Algorithm MonotoneGraph (G) Input: G, Initial Input Graph Output: Monotone Graph (G) 1. Initialize Graph (G) 2. If Graph (G) contains a bridge && G is a Tree 3. Then G is Monotone 4. Else Calculate Maximum Degree (∆) and Minimum Degree (δ). 5. Endif 6. If Maximum Degree = = Minimum Degree 7. Then G is Monotone 8. Else Calculate Edge Connectivity (λ) 9. Endif 10. If Maximum Degree � Minimum Degree 11.If Minimum Degree = = Edge Connectivity && Minimum Degree = = n, n < Maximum Degree

12. If There Does Not Exist Clique with degree (n+1) 13. Then G is Monotone 14. Endif 15. Else If Minimum Degree > Edge Connectivity 16. If there Exist no Clique with degree (δ). 17. Then G is Monotonic 18. Endif 19. Endif

V.I EXPLANATION Line 2 to Line 5: Here the given Graph (G) is checked for bridge. If it contains bridge then it is checked if the given graph is tree. If it satisfies above condition then the given graph satisfies monotonic property. Else calculate maximum degree and Minimum Degree of the Graph (G). Line 6 to Line 9: Here if the Maximum Degree of the Graph (G) equals Minimum Degree of Graph (G) then the given Graph satisfies the monotonic property. Else calculate the Edge Connectivity of the given graph. Line 10 to Line 14: Here if the Maximum degree of given graph is not equal to Minimum Degree then the algorithm checks for the following condition: If minimum degree equal to Edge Connectivity and assigns variable n for the value of minimum degree. Determine if there exist a clique with degree n+1, If There exist no clique with degree n+1 then the given graph satisfies monotonic property. Line 15 top Line 18: If Minimum degree of a given graph is greater than Edge Connectivity then check for the following condition: Determine if there exist a clique with degree equal to minimum degree, If There exist no clique with degree equal to minimum degree then the given graph satisfies monotonic property.

VI. PERFORMANCE EVALUATION

The software project aims at providing Graphical User Interface for various graph mining algorithms. The software has been implemented in JAVA (jdk1.5.0 or jdk1.6.0_12) and the graphs and their corresponding attributes are displayed to the user using the JAVA applet-viewer. All the graph algorithms have been performed on an Intel Core i3 processor at 3.20GHz, with 3GB RAM running Windows XP. The software basically checks for graph parameters such as order, size, etc. that are stored or saved in a foreign source (i.e. in a text file) and using this information about the graph, various graph attributes such as vertex/edge connectivity, clique detection, time complexity, etc. are calculated and displayed on the user interface. The software aims at investigating performance by varying the graph parameters and also keeps a track on the time taken to evaluate a graph (i.e. the time complexity of the software).


4

The various phases of the software for evaluating and displaying a particular graph scanned are as follows: Scan the text file, which the user chooses on the interface (by clicking the corresponding file link) and store the graph data to the corresponding variables. Using these variables, perform the following task and calculate the following graph attributes: Define and store a particular path in variables, so as to draw and display the graph, using the information from these variables, on the user interface. Calculate the number of bridges present in the graph 1. Calculate the edge connectivity. 2. Calculate the cliques detected. 3. Check if the given graph is a tree or not. 4. Calculate the minimum and maximum degree of the graph. 5. Check if the graph is monotonic, using information of edge connectivity, cliques, tree information. These graph attributes are then displayed to the user.

VII. Figures and Results

Figure 1

The above window shows the software detecting a clique. The above graph is monotonic since, Minimum degree of the graph = Maximum degree of the graph=5.

Figure 2

The above window shows the software detecting a Tree. The above graph is monotonic since, its edge connectivity is one and it is a tree.

Figure 3

In the above window, minimum degree (δ) > edge connectivity (λ) and clique degree is at-most = minimum degree i.e. 4 > 3 and 3 <= 4 (Condition Satisfied)

Figure 4

In the above window, minimum degree (δ) = edge connectivity (λ) and Minimum degree == edge connectivity and (min-degree = n && n < max-degree) and Clique-degree is at most = n. i.e. 2 = 2 and n = 2 && n < 4 and 0 <= 2. (Condition Satisfied)

VII CONCLUSION

In this paper we proposed a novel way to detect if the given graph satisfies monotonic property over edge connectivity constraint. We developed an efficient algorithm MonotoneGraph (G) which determines whether the given graph is Monotone over Edge


5

Connectivity Constraint. Software using Java Programming Language has been designed, which assists in determining the above property of graph .It helps in determining the value of Maximum Degree, Minimum Degree, Edge Connectivity of given graph (G) and Determine if there exist a clique in given graph (G).

REFERENCES [1] A N.Papadopoulos, A. Lyritsis, and Y Manalopoulos, “SkyGraph: an algorithm for important subgraph discovery in relational graphs”, DMKD (2008), Springer, 23 Jun 2008, pp. 57-76. [2]Cook DJ, Holder L B (eds) (2007) Mining graph data. Wiley, London. [3]Hartuv E, Shamir R(2000), A Clustering algorithm based on Graph Connectivity. Inform Process Lett 76: 175-181. [4]Wu Z, Leahy R (1993) An optimal graph theoretic approach to data clustering: theory and its application to image segmentation, IEEE Trans Pattern Anal Machine Intell 15(11): 1101-1113. [5]J D Knij, A Feelders, Monotone Constraints in Frequent Tree Mining.


6

METRICS OF A NEW SYMMETRICAL ENCRYPTION ALGORITHM

1G. RAMESH 2 Dr. R. UMARANI 1Research Scholar, Research and Development Centre, Bharathiyar University, Coimbatore, Tamilnadu

[email protected] 2 Associate Professor in Computer Science, Sri Sarada college for women, Salem -16

[email protected]

Abstract- The hacking is the greatest problem in th e wireless local area network (WLAN). Many algorithms like DES, 3DES, AES,CAST, UMARAM and RC6 have been used to prevent the outside attacks to eavesdr op or prevent the data to be transferred to the end-us er correctly. The authentication protocols have been used for authentication and key-exchange processes. A new symmetrical encryption algorithm is proposed in this paper to prevent the outside attacks to obt ain any information from any data-exchange in Wireless Local Area Network(WLAN). The new symmetrical algorithm avoids the key exchange between users and reduces the time taken for the encryption, decrypti on, and authentication processes. It operates at a data rate higher than DES, 3DES, AES, UMARAM and RC6 algorithms. It is applied on a text file and an ima ge as an application. The encryption becomes more secure and high data rate than DES,3DES,AES,CAST,UMARAM and RC6. A comparison has been conducted for the encryptio n algorithms like DES, 3DES,AES,CAST,UMARAM and RC6 at different settings for each algorithm such a s different sizes of data blocks, different data type s, battery power consumption, different key size and finally encryption/decryption speed. Experimental results are given to demonstrate the effectiveness of each algorithm. Keywords: Plaintext; Encryption; Decryption; S-Box; Key updating; Outside attack; key generation for Proposed Algorithm;

I.INTRODUCTION Wireless Local Area Network (WLAN) is one of the fastest growing technologies. Wireless Local Area Network(WLAN) is found in the office buildings, colleges, universities, and in many other public areas [1]. The security in WLAN is based on cryptography, the science and art of transforming messages to make them secure and immune to attacks by authenticating the sender to receiver within the WLAN. There are a lot of symmetric-encryption algorithms used in WLAN, such as DES [2], TDES [3], AES [4], CAST-256,RC6 [5] and UMARAM[6]. In all these

algorithms, both sender and receiver have used the same key for encryption and decryption processes respectively. The attacks on the security of WLAN depend on viewing the function of the computer system in WLAN as providing information (such as company title, the data type can be transferred in WLAN, and the algorithms and authentication protocol used in WLAN). Each company sends its title with each message. The outside attacks can use this fixed plaintext, company-title, and encrypted text of that title to obtain the key used in WLAN. The outside attack can also appear as a fox because he can lie to use a computer on the WLAN to send an important message to someone because there are some troubles in his device while his device is still open to take a copy from the encrypted message. The plaintext and encrypted text are known. He can obtain the key used for encryption and decryption processes easily. The authentication protocols have been used for authentication and key-exchange processes, such as EAP-TLS [9], EAP-TTLS [9], and PEAP [10]. The attacker can be authorized-user and he will be accepted to access the network after the success of authentication and key exchange processes. He will act as an evil to analysis the data-exchange to eavesdrop or act as man-in-the middle. The proposed algorithm will avoid key-exchange, the time taken for authentication process, and it will avoid the foxes. The hacking is the greatest problem in the wireless local area network (WLAN). Many algorithms like DES, 3DES, AES,CAST, UMARAM and RC6 have been used to prevent the outside attacks to eavesdrop or prevent the data to be transferred to the end-user correctly. The authentication protocols have been used for authentication and key-exchange processes. A new symmetrical encryption algorithm is proposed in this


7

paper to prevent the outside attacks to obtain any information from any data-exchange in Wireless Local Area Network(WLAN). The new symmetrical algorithm avoids the key exchange between users and reduces the time taken for the encryption, decryption, and authentication processes. It operates at a data rate higher than DES, 3DES, AES, UMARAM and RC6 algorithms[20]. It is applied on a text file and an image as an application. The encryption becomes more secure and high data rate than DES,3DES,AES,CAST,UMARAM and RC6. This paper examines a method for evaluating performance of selected symmetric encryption of various algorithms. Encryption algorithms consume a significant amount of computing resources such as CPU time, memory, and battery power. Battery power is subjected to the problem of energy consumption due to encryption algorithms. Battery technology is increasing at a slower rate than other technologies. This causes a “battery gap" [17, 18]. We need a way to make decisions about energy consumption and security to reduce the consumption of battery powered devices. This study evaluates seven different encryption algorithms namely; AES, DES, 3DES, RC6, Blowfish, UMARAM and RC2. The performance measure of encryption schemes will be conducted in terms of energy, changing data types - such as text or document, Audio data and video data power consumption, changing packet size and changing key size for the above and proposed cryptographic algorithms. This paper is organized as follows. Section 2 gives experimental design for metric of proposed system. Section 3 presents the experimental result. Conclusions are presented in section 4. We have to add some metrics like 1. CPU Workload 2. Power Consumption 3. Throughput 4. Encryption/Decryption Time 5. Different Data Types and 6. Different size of Data Block

II. EXPERIMENTAL DESIGN FOR METRIC OF PROPOSED SYSTEM:

For our experiment, we use a laptop IV 2.4 GHz CPU, in which performance data is collected. In the experiments, the laptop encrypts a different file size ranges from 321 K byte to 7.129Mega Byte138MegaBytes for text data, from 34 Kbytes to 8252 Kbytes for audio data, and from 4006 Kbytes to 5078 Kbytes for video files. Several performance metrics are collected: 1) Encryption time; 2) CPU process time; and 3) CPU clock cycles and battery power,4)Throughput,5)Different data types,6)Different size of data block. The encryption time is considered the time that an encryption algorithm takes to produce a cipher text from a plaintext. Encryption time is used to calculate the throughput of an encryption scheme. It indicates the speed of encryption. The throughput of the encryption scheme is calculated as the total plaintext in bytes encrypted divided by the encryption time [19]. Throughput=Total plaintext encrypted in bytes / Encryption time The CPU process time is the time that a CPU is committed only to the particular process of calculations. It reflects the load of the CPU. The more CPU time is used in the encryption process, the higher is the load of the CPU. The CPU clock cycles are a metric, reflecting the energy consumption of the CPU while operating on encryption operations. Each cycle of CPU will consume a small amount of energy. The following tasks that will be performed are shown as follows:

� A comparison is conducted between the results of the selected different encryption and decryption schemes in terms of the encryption time at two different encoding bases namely; hexadecimal base encoding and in base 64 encoding.

� A study is performed on the effect of changing packet size at power consumption during throughput for each selected cryptography algorithm.

� A study is performed on the effect of changing data types - such as text or document, audio file, and video file for each cryptography selected algorithm on power consumption.


8

A study is performed on the effect of changing key size for cryptography selected algorithm on power consumption. III. EXPERIMENTAL RESULTS 3.1 Differentiate Output Results of Encryption (Bas e 64, Hexadecimal) Experimental results are given in Figures 2 and 3 for the selected seven encryption algorithms at different encoding method. Figure 1 shows the results at base 64 encoding while Figure 2 gives the results of hexadecimal base encoding. We can notice that there is no significant difference at both encoding method. The same files are encrypted by two methods; we can recognize that the two curves almost give the same results. Time consumption of encryption algorithm (base 64 encoding) 3.2 Effect of Changing Packet Size for Cryptographic Algorithms on Power Consumption 3.2.1 Encryption of Different Packet Size Encryption time is used to calculate the throughput of an encryption scheme. The throughput of the encryption scheme is calculated by dividing the total plaintext in Megabytes encrypted on the total encryption time for each algorithm in.

Time consumption of encryption algorithm (base 64 e ncoding)

0

500

1000

1500

2000

2500

F1(46

K b

yte)

F2(59

K b

yte)

F3(10

0 K b

yte)

F4(24

7K b

yte)

F5(32

1K b

yte)

F6(69

4 K b

yte)

F7(89

9 K b

yte)

F8(96

3 K b

yte)

F9(53

45.2

8K b

yte)

F10(7

310.

336K

byt

e)

Packet

Dur

atio

n T

ime

mill

isec

Blow fish

RC6

Rajendial

DES

3DES

RC2

PA

Figure 1: Time consumption of encryption algorithm (base 64 encoding) As the throughput value is increased, the power consumption of this encryption technique is decreased. Experimental results for this compassion point are shown Figure 3 at encryption stage. The results show the superiority of Proposed algorithm over other algorithms in terms of the processing time. Another point can be noticed here; that RC6 requires less time than all algorithms except Proposed Algorithm. A third point can be noticed here; that AES has an advantage

over other 3DES, DES and RC2 in terms of time consumption and throughput. A fourth point can be noticed here; that 3DES has low performance in terms of power consumption and throughput when compared with DES. It always requires more time than DES because of its triple phase encryption characteristics. Finally, it is found that RC2 has low performance and low throughput when compared with other six algorithms in spite of the small key size used. 3.2.2 Decryption of Different Packet Size Experimental results for this compassion point are shown Figure 4 decryption stage. We can find in decryption that Proposed Algorithm is the better than other algorithms in throughput and power consumption. The second point should be noticed here that RC6 requires less time than all algorithms except Proposed Algorithm. A third point that can be noticed that AES has an advantage over other 3DES, DES, RC2.The fourth point that can be considered is that RC2 still has low performance of these algorithm. Finally, Triple DES (3DES) still requires more time than DES.

Time consumption of encryption algorithm (Hexadecim al encoding)

0

500

1000

1500

2000

2500

F1(46

K b

yte)

F2(59

K b

yte)

F3(10

0 K by

te)

F4(24

7K b

yte)

F5(32

1K b

yte)

F6(69

4 K by

te)

F7(89

9 K by

te)

F8(96

3 K by

te)

F9(53

45.28

K byte

)

F10(7

310.3

36K b

yte)

Packet

Dur

atio

n T

ime

in m

illis

ec

PA

Blowfish

RC6

Rajendial

DES

3DES

RC2

Figure 2: Time consumption of encryption algorithm (Hexadecimal encoding)

0

5

10

15

20

25

30

Th

rou

ghpu

t(Meg

aByt

es/s

ec)

RC2 DES 3DES Rijndael BlowFlsh RC6 PA

Cryptographic Algorithm

Throughput of each Encryption algorithm(MegaByte /s ec)

Figure 3: Throughput of each encryption algorithm (Megabyte/Sec)


9

3.3 The Effect of Changing File Type (Audio Files) for Cryptography Algorithm on Power Consumption

3.3.1 Encryption of Different Audio Files (Differen t Sizes) Encryption Throughput

In the previous section, the comparison between encryption algorithms has been conducted at text and document data files. Now we will make a comparison between other types of data (Audio file) to check which one can perform better in this case. Experimental results for audio data type are shown Figure 5 at encryption. CPU Work Load In Figure 8, we show the performance of cryptographic algorithms in terms of sharing the CPU load. With a different audio block size Results show the superiority of Proposed algorithm over other algorithms in terms of the processing time (CPU work load) and throughput. Another point can

02468

101214161820

Thr

ough

put(M

egaB

ytes

/sec

)

RC2 DES 3DES Rijndael BlowFlsh RC6 PA

Cryptographic Algorithm

Throughput of each decryption Algorithm(MegaByte/Se c)

Figure 4: Throughput of each decryption algorithm (Megabyte/Sec)

0

1000

2000

3000

4000

5000

6000

7000

Thr

ough

put(K

iloB

ytes

/Sec

)

Blowfish RC6 AES DES 3DES RC2 PA

Cryptograhic Algorithm

Throughput of each encryption Algorithm(KiloByte/Se c)

Figure 5: Throughput of each encryption algorithm (Kilo-bytes/Second)

be noticed here; that RC6 requires less time than all algorithms except Proposed Algorithm. A third point can be noticed here; that AES has an advantage over other 3DES, DES and RC2 in terms of time consumption and throughput especially in small size file. A fourth point can be noticed here; that 3DES has low performance in terms of power consumption and throughput when compared with DES. It always requires more time than DES. Finally, it is found that RC2 has low performance and low throughput when compared with other six algorithms in spite of the small key size used. Decryption of Different Audio files (Different Size s) Decryption Throughput Experimental results for this compassion point are shown Figure 7.

0

200

400

600

800

1000

1200

Dur

atio

n T

ime

in

Mill

isec

ond

F10(8282

k byte)

F8(7844

k byte)

F8(8884

k byte)

F7(8880

k byte)

F8(6867

k byte)

F6(6488

k byte)

F4(4677

k byte)

F8(2826

k byte)

F2(387 k

byte)

F1(33 k

byte)

Packet

Time Consumption for encrypt different audio files BlowFish

RC6

Rajendial

DES

3DES

RC2

PA

Figure 6: Time consumption for encrypt different audio files

0100020003000400050006000700080009000

Thr

ough

put(K

iloby

tes/

sec)

Blowfish RC6 AES DES 3DES RC2 PA

Cryptographic Algorithms

Throughput of each decryption Algorithm(kilo Byte/s ec)

Figure 7: Throughput of each Decryption algorithm (Kilobytes / Second)


10

CPU Work Load Experimental results for this compassion point are shown Figure 8.From the results we found the result as the same as in encryption process for audio files. 3.4 The Effect of Changing File Type (Video Files) for Cryptography Algorithm on Power Consumption 3.4.1 Encryption of different video files (differen t sizes) Encryption Throughput Now we will make a comparison between other types of data (video files) to check which one can perform better in this case. Experimental results for video data type are shown Figure 9 at encryption. CPU Work Load In Figure 10, we show the performance of cryptography algorithms in terms of sharing the CPU load. With a different audio block size. The results show the superiority of Proposed algorithm over other algorithms in terms of the processing time and throughput as the same as in Audio files. Another point can be noticed here; that RC6 still requires less time has throughput greater than all algorithms except Proposed Algorithm. A third point can be noticed here; that 3DES has low performance in terms of power consumption and throughput when compared with DES. It always requires more time than DES. Finally, it is found that RC2 has low performance and low throughput when compared with other six algorithms. 3.4.2 Decryption of Different Video Files (Differen t Sizes) Decryption Throughput Experimental results for this compassion point are shown Figure 11.

0

100

200

300

400

500

600

Cry

ptog

raph

ic

Alg

orith

ms

F1(33 kbyte)

F3(2935kbyte)

F5(5466k byte)

F7(6800k byte)

F9(7944k byte)

Packet

Time Consumption for Decrypt different audio files

Blow fish

RC6

AES

DES

3DES

PA

Figure 8: Time consumption for decrypt different audio files

0

1000

2000

3000

4000

5000

6000

Thr

ough

put(K

iloby

tes/

sec)

RC2 DES 3DES Rijndael Blowfish RC6 PA

CryptographicAlgorithoms

Throughput of each encryption Algorithm(kiloByte/se c)

Figure 9: Throughput of each encryption algorithm (Kilobytes/sec)

0

100

200

300

400

500

600

Dur

atio

ntim

e(M

illis

econ

d)

F1(4006 k byte) F2(4415 K byte) F3(5078 k byte)

Packet

Time consumption for encrypt different video files

Blow f ish

RC6

Rajendial

DES

3DES

RC2

PA

Figure 10: Time consumption for encrypt different video files CPU Work Load Experimental results for this compassion point are shown Figure 12.From the results we found the result as the same as in encryption process for video and audio files.

0

1000

2000

3000

4000

5000

6000

7000

Thr

ough

put(K

iloB

ytes

/sec

ond)

RC2 DES 3DES Rijndael Blowfish RC6 PA

Cryptographic Algorithms

Throughput of each decryption algorithm(kiloByte/Se c)

Figure 11: Throughput of each decryption algorithm (Kilobytes/Second)


11

0

50

100

150

200

250

300

350

Dur

atio

nTim

e(M

illis

econ

d)

F1(4,006 k byte) F2(4,415 k byte) F3(5,013 k byte)

Packet

Time consumption for decrypt different video files

Blow fish

RC6Rajendial

DES

3DES

RC2PA

Figure 12: Time consumption for decrypt different video files 3.5 The Effect of Changing Key Size of AES, And RC6 on Power Consumption The last performance comparison point is changing different key sizes for AES and RC6 algorithm. In case of AES, we consider the three different key sizes possible i.e., 128-bit, 192-bit and 256-bit keys. The Experimental result are shown in Figures 13 and 14.

050

100150200250300350400

AES128 AES192 AES256 PA 64

Tim

e in

Mill

isec

onds

Key size

Time consumption for different key size for AES and Proposed

Algorithm

Figure 13: Time consumption for different key size for AES and PA

-203080

130180230280330380430480

RC6 256 RC6 192 RC6 128 PA64

Dur

atio

nTim

e in

Mill

isec

ond

Keysize

Time Consumption for different key size for RC6 and proposed algorithm

Figure 14: Time consumption for different key size for RC6 and PA In case of AES it can be seen that higher key size leads to clear change in the battery and time consumption. It can be seen that going from 128-bit key to 192-bit causes increase in power and time consumption about 8% and to 256-bit key causes an increase of 16% [12]. Also in case of RC6, we consider the three different key sizes possible i.e., 128-bit, 192-bit and 256-bit keys. The result is close to the one shown in the following figure: In case of RC6 it can be seen that higher key size leads to clear change in the battery and time consumption.

IV. CONCLUSION

The selected algorithms are AES, DES, 3DES, RC6, Blowfish, RC2 and Proposed Algorithm were tested .Several points can be concluded from the Experimental results. Firstly; there is no significant difference when the results are displayed either in hexadecimal base encoding or in base 64 encoding. Secondly; in the case of changing packet size, it was concluded that proposed Algorithm has better performance than other common encryption algorithms used, followed by RC6. Thirdly; we find that 3DES still has low performance compared to algorithm DES. Fourthly; wend RC2, has disadvantage over all other algorithms in terms of time consumption. Fifthly; we find AES has better performance than RC2, DES, and 3DES. In the case of audio and video files we found the result as the same as in text and document. Finally in the case of changing key size - it can be seen that higher key size leads to clear change in the battery and time consumption.


12

REFERENCES [1] William Stallings “ Network Security Essentials (Applications and Standards)”, Pearson Education, 2004. [2] National Bureau of Standards, “ Data Encryption Standard,” FIPS Publication 46, 1977. [3] Jose J. Amador, Robert W.Green, " Symmetric-Key Block Ciphers for Image and Text Cryptography", International Journal of Imaging System Technology,2005. [4] Daemen, J., and Rijmen, V. "Rijndael: The Advanced Encryption Standard." Dr. Dobb's Journal, March 2001. [5] Adams,C. “ Constructing Symmetric Ciphers Using the CAST Design.” Design, Codes, and Cryptography, 1997. [6] Ramesh G, Umarani. R, ” Data Security In Local Area Network Based On Fast Encryption Algorithm”,International Journal of Computing Communication and Information System(JCCIS) Journal Page 85-90. 2010. [7] S. Contini, R.L. Rivest, M.J.B. Robshaw and Y.L. Yin. “The Security of the RC6 Block Cipher. Version 1.0 “. August 20, 1998. [8] Simon, D., Aboba, B., and R. Hurst, "The EAP-TLS Authentication Protocol",RFC 5216, March 2008. [9] P. Funk and S. Blake-Wilson, "EAP Tunneled TLS Authentication Protocol Version 1 (EAP-TTLSv1)", The Internet Society, Mar. 2006. [10] Palekar, A., Simon, D., Zorn, G., Salowey, J., Zhou, H., and S. Josefsson, "Protected EAP Protocol (PEAP) Version 2", work in progress, October 2004. [11] ANSI3.106, “American National Standard for Information Systems—Data Encryption Algorithm—Modes of Operation,” American National Standards Institute, 1983. [12] Bruce Schneider, John Wiley & Sons, Inc., “Applied Cryptography, Second Edition,” New York, NY, 1996. [13] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. Levkowetz, "Extensible Authentication Protocol (EAP)", RFC 3748, June 2004. [14] Simpson, W., "The Point-to-Point Protocol (PPP)", STD 51, RFC 1661, July 1994. [15] Institute of Electrical and Electronics Engineers, "Local and Metropolitan Area Networks: Overview and Architecture", IEEE Standard 802,1990. [16] Aamer Nadeem, Dr M. Younus Javed, " A Performance Comparison of Data Encryption Algorithms ", IEEE International Conference on Networking, 2009. [17] R. Chandramouli, \Battery power-aware encryption," ACM Transactions on Information and System Security (TISSEC), vol. 9, no. 2, pp. 162-180,May 2006. [18] K. McKay, Trade-o®s between Energy and Security in Wireless Networks Thesis, Worcester Polytechnic Institute, Apr. 2005. [19]A. A. Tamimi, Performance Analysis of Data Encryption Algorithms, Retrieved Oct. 1, 2008. (http://www.cs.wustl.edu/»jain/cse567-06/ftp/encryption perf/index.html).

[20] G. Ramesh, Dr. R. Umarani “A Novel Symmetrical Encryption Algorithm with High Security Based on Key Updating” gopalax Journals , International Journal of Computer Network and Security (IJCNS) Vol. 3 No. 1 pp 57-69, http://www.ijcns.com/pdf/207.pdf


13

Data Mining Applications: A comparative Study for Predicting Student’s performance

Surjeet Kumar Yadav 1, Brijesh Bharadwaj2, Saurabh Pal3 1Research Scholar, Shri Venkateshwara University, Moradabad

Email: [email protected] 2Assistant Professor, Dr. R. M. L. Awadh University, Faizabad India

Email: [email protected] 3Head, Dept. Of MCA, VBS Purvanchal University,Jaunpur, India

Email: [email protected]

Abstract— Knowledge Discovery and Data Mining (KDD)

is a multidisciplinary area focusing upon methodolo gies for extracting useful knowledge from data and there are several useful KDD tools to extracting the knowledg e. This knowledge can be used to increase the quality of education. But educational institution does not use any knowledge discovery process approach on these data. Data mining can be used for decision making in educational system. A decision tree classifier is o ne of the most widely used supervised learning methods used f or data exploration based on divide & conquer techniqu e. This paper discusses use of decision trees in educa tional data mining. Decision tree algorithms are applied o n students’ past performance data to generate the mod el and this model can be used to predict the students’ performance. It helps earlier in identifying the dr opouts and students who need special attention and allow t he teacher to provide appropriate advising/counseling. Keywords—Educational Data Mining, Classification, Knowledge Discovery in Database (KDD)

I. INTRODUCTION Students are main assets of universities/ Institutions.

The students’ performance plays an important role in producing the best quality graduates and post-graduates who will become great leader and manpower for the country thus responsible for the country’s economic and social development. The performance of students in universities should be a concern not only to the administrators and educators, but also to corporations in the labour market. Academic achievement is one of the main factors considered by the employer in recruiting workers especially the fresh graduates. Thus, students have to place the greatest effort in their study to obtain a good grade in order to fulfil the employer’s demand. Students’ academic achievement is measured by the Cumulative Grade Point Average (CGPA). CGPA shows the overall students’ academic performance where it considers the average of all examinations’ grade for all semesters during the tenure in university. Many factors

could act as barrier and catalyst to students achieving a high CGPA that reflects their overall academic performance.

The advent of information technology in various fields has lead the large volumes of data storage in various formats like students’ data, teachers’ data, alumni data, resource data etc. The data collected from different applications require proper method of extracting knowledge from large repositories for better decision making. Knowledge discovery in databases (KDD), often called data mining, aims at the discovery of useful information from large collections of data [1]. The main functions of data mining are applying various methods and algorithms in order to discover and extract patterns of stored data [2]. Data mining tools predict patterns, future trends and behaviors, allowing businesses to effect proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analysis of past events provided by retrospective tools typical of decision support systems..

There are increasing research interests in using data mining in education. This new emerging field, called Educational Data Mining, concerns with developing methods that discover knowledge from data originating from educational environments [3]. Educational Data Mining uses many techniques such as Decision Trees, Neural Networks, Naïve Bayes, K- Nearest neighbour, and many others.

The main objective of this paper is to use data mining methodologies to study students’ performance in the courses. Data mining provides many tasks that could be used to study the students performance. In this research, the classification task is used to evaluate student’s performance and as there are many approaches that are used for data classification, the decision tree method is used here. Student’s information like Attendance, Class test, Seminar and Assignment


14

marks were collected from the student’s management system, to predict the performance at the end of the semester examination. This paper investigates the accuracy of different Decision tree.

II. BACKGROUND AND RELATED WORKS Data mining techniques can be used in educational

field to enhance our understanding of learning process to focus on identifying, extracting and evaluating variables related to the learning process of students as described by Alaa el-Halees [4]. Mining in educational environment is called Educational Data Mining.

Han and Kamber [3] describes data mining software that allow the users to analyze data from different dimensions, categorize it and summarize the relationships which are identified during the mining process.

Bhardwaj and Pal [13] conducted study on the student performance based by selecting 300 students from 5 different degree college conducting BCA (Bachelor of Computer Application) course of Dr. R. M. L. Awadh University, Faizabad, India. By means of Bayesian classification method on 17 attributes, it was found that the factors like students’ grade in senior secondary exam, living location, medium of teaching, mother’s qualification, students other habit, family annual income and student’s family status were highly correlated with the student academic performance.

Pandey and Pal [5] conducted study on the student performance based by selecting 600 students from different colleges of Dr. R. M. L. Awadh University, Faizabad, India. By means of Bayes Classification on category, language and background qualification, it was found that whether new comer students will performer or not.

Hijazi and Naqvi [6] conducted as study on the student performance by selecting a sample of 300 students (225 males, 75 females) from a group of colleges affiliated to Punjab university of Pakistan. The hypothesis that was stated as "Student's attitude towards attendance in class, hours spent in study on daily basis after college, students' family income, students' mother's age and mother's education are significantly related with student performance" was framed. By means of simple linear regression analysis, it was found that the factors like mother’s education and student’s family income were highly correlated with the student academic performance.

Khan [7] conducted a performance study on 400 students comprising 200 boys and 200 girls selected from the senior secondary school of Aligarh Muslim University, Aligarh, India with a main objective to establish the prognostic value of different measures of cognition, personality and demographic variables for success at higher secondary level in science stream. The selection was based on cluster sampling technique in which the entire population of interest was divided into groups, or clusters, and a random sample of these clusters was selected for further analyses. It was found that girls with high socio-economic status had relatively higher academic achievement in science stream and boys with low socioeconomic status had relatively higher academic achievement in general.

Z. J. Kovacic [15] presented a case study on educational data mining to identify up to what extent the enrolment data can be used to predict student’s success. The algorithms CHAID and CART were applied on student enrolment data of information system students of open polytechnic of New Zealand to get two decision trees classifying successful and unsuccessful students. The accuracy obtained with CHAID and CART was 59.4 and 60.5 respectively.

Galit [8] gave a case study that use students data to analyze their learning behavior to predict the results and to warn students at risk before their final exams.

Al-Radaideh, et al [9] applied a decision tree model to predict the final grade of students who studied the C++ course in Yarmouk University, Jordan in the year 2005. Three different classification methods namely ID3, C4.5, and the NaïveBayes were used. The outcome of their results indicated that Decision Tree model had better prediction than other models.

Baradwaj and Pal [16] obtained the university students data like attendance, class test, seminar and assignment marks from the students’ previous database, to predict the performance at the end of the semester.

Ayesha, Mustafa, Sattar and Khan [11] describe the use of k-means clustering algorithm to predict student’s learning activities. The information generated after the implementation of data mining technique may be helpful for instructor as well as for students.

Pandey and Pal [11] conducted study on the student performance based by selecting 60 students from a degree college of Dr. R. M. L. Awadh University, Faizabad, India. By means of association rule they find the interestingness of student in opting class teaching language.


15

Bray [12], in his study on private tutoring and its implications, observed that the percentage of students receiving private tutoring in India was relatively higher than in Malaysia, Singapore, Japan, China and Sri Lanka. It was also observed that there was an enhancement of academic performance with the intensity of private tutoring and this variation of intensity of private tutoring depends on the collective factor namely socioeconomic conditions.

III. DECISION TREE INTRODUCTION

A decision tree is a flow-chart-like tree structure, where each internal node is denoted by rectangles, and leaf nodes are denoted by ovals. All internal nodes have two or more child nodes. All internal nodes contain splits, which test the value of an expression of the attributes. Arcs from an internal node to its children are labelled with distinct outcomes of the test. Each leaf node has a class label associated with it.

The decision tree classifier has two phases [3]: i) Growth phase or Build phase. ii) Pruning phase. The tree is built in the first phase by recursively

splitting the training set based on local optimal criteria until all or most of the records belonging to each of the partitions bearing the same class label. The tree may overfit the data.

The pruning phase handles the problem of over fitting the data in the decision tree. The prune phase generalizes the tree by removing the noise and outliers. The accuracy of the classification increases in the pruning phase.

TABLE I： FREQUENCY USAGE OF DECISION TREE ALGORITHMS

Algorithm Usage frequency (%)

CLS 9 ID3 68

IDE3+ 4.5 C4.5 54.55 C5.0 9

CART 40.9 Random Tree 4.5

Random Forest 9

SLIQ 27.27

Public 13.6 OCI 4.5

Clouds 4.5

Pruning phase accesses only the fully grown tree.

The growth phase requires multiple passes over the training data. The time needed for pruning the decision

tree is very less compared to build the decision tree. The table I specified represents the usage frequency of various decision tree algorithms [17]. Observing the above table the most frequently used decision tree algorithms are ID3, C4.5 and CART. Hence, the experiments are conducted on the above three algorithms.

A. ID3 (Iterative Dichotomiser 3)

This is a decision tree algorithm introduced in 1986 by Quinlan Ross [14]. It is based on Hunts algorithm. The tree is constructed in two phases. The two phases are tree building and pruning.

ID3 uses information gain measure to choose the splitting attribute. It only accepts categorical attributes in building a tree model. It does not give accurate result when there is noise. To remove the noise pre-processing technique has to be used.

To build decision tree, information gain is calculated for each and every attribute and select the attribute with the highest information gain to designate as a root node. Label the attribute as a root node and the possible values of the attribute are represented as arcs. Then all possible outcome instances are tested to check whether they are falling under the same class or not. If all the instances are falling under the same class, the node is represented with single class name, otherwise choose the splitting attribute to classify the instances.

Continuous attributes can be handled using the ID3 algorithm by discretizing or directly, by considering the values to find the best split point by taking a threshold on the attribute values. ID3 does not support pruning.

B. C4.5

This algorithm is a successor to ID3 developed by Quinlan Ross [14]. It is also based on Hunt’s algorithm.C4.5 handles both categorical and continuous attributes to build a decision tree. In order to handle continuous attributes, C4.5 splits the attribute values into two partitions based on the selected threshold such that all the values above the threshold as one child and the remaining as another child. It also handles missing attribute values. C4.5 uses Gain Ratio as an attribute selection measure to build a decision tree. It removes the biasness of information gain when there are many outcome values of an attribute.

At first, calculate the gain ratio of each attribute. The root node will be the attribute whose gain ratio is maximum. C4.5 uses pessimistic pruning to remove


16

unnecessary branches in the decision tree to improve the accuracy of classification.

C. CART

CART [18] stands for Classification And Regression Trees introduced by Breiman. It is also based on Hunt’s algorithm. CART handles both categorical and continuous attributes to build a decision tree. It handles missing values.

CART uses Gini Index as an attribute selection measure to build a decision tree .Unlike ID3 and C4.5 algorithms, CART produces binary splits. Hence, it produces binary trees. Gini Index measure does not use probabilistic assumptions like ID3, C4.5. CART uses cost complexity pruning to remove the unreliable branches from the decision tree to improve the accuracy.

IV. DATA MINING PROCESS

In present day’s educational system, a student’s performance is determined by the internal assessment and end semester examination. The internal assessment is carried out by the teacher based upon student’s performance in educational activities such as class test, seminar, assignments, general proficiency, attendance and lab work. The end semester examination is one that is scored by the student in semester examination. Each student has to get minimum marks to pass a semester in internal as well as end semester examination.

A. Data Preparations

The data set used in this study was obtained from VBS Purvanchal University, Jaunpur (Uttar Pradesh), India on the sampling method of computer Applications department of course MCA (Master of Computer Applications) from session 2008 to 2011. Initially size of the data is 48. In this step data stored in different tables was joined in a single table after joining process errors were removed.

B. Data Selection and Transformation

In this step only those fields were selected which were required for data mining. A few derived variables were selected. While some of the information for the variables was extracted from the database. All the predictor and response variables which were derived from the database are given in Table II for reference.

The domain values for some of the variables were defined for the present investigation as follows:

PSM – Previous Semester Marks/Grade obtained in MCA course. It is split into five class values: First –

≥60%, Second – ≥45% and < 60%, Third – ≥ 36% and < 45%, Fail < 36%.

TABLE II : STUDENTS RELATED VARIABLES

Variable Description Possible Values

PSM Previous Semester Marks

{First ≥ 60% Second ≥ 45 & <60% Third ≥ 36 & <45%, Fail < 36%}

CTG Class Test Grade {Poor , Average, Good}

SEM Seminar Performance

{Poor , Average, Good}

ASS Assignment {Yes, No}

ATT Attendance {Poor , Average, Good}

LW Lab Work {Yes, No}

ESM End Semester Marks

{First ≥ 60% Second ≥ 45 & <60% Third ≥ 36 & <45% Fail < 36%}

• CTG – Class test grade obtained. Here in each semester two class tests are conducted and average of two class test are used to calculate sessional marks. CTG is split into three classes: Poor – < 40%, Average – ≥ 40% and < 60%, Good –≥60%.

• SEM – Seminar Performance obtained. In each semester seminar are organized to check the performance of students. Seminar performance is evaluated into three classes: Poor – Presentation and communication skill is low, Average – Either presentation is fine or Communication skill is fine, Good – Both presentation and Communication skill is fine.

• ASS – Assignment performance. In each semester two assignments are given to students by each teacher. Assignment performance is divided into two classes: Yes – student submitted assignment, No – Student not submitted assignment.

• ATT – Attendance of Student. Minimum 70% attendance is compulsory to participate in End Semester Examination. But even though in special cases low attendance students also participate in End Semester Examination on genuine reason basis. Attendance is divided into three classes: Poor - <60%, Average - ≥ 60% and < 80%, Good - ≥ 80%.


17

• LW – Lab Work. Lab work is divided into two classes: Yes – student completed lab work, No – student not completed lab work.

• ESM - End semester Marks obtained in MCA semester and it is declared as response variable. It is split into five class values: First – ≥ 60%, Second – ≥ 45% and <60%, Third – ≥ 36% and < 45%, Fail < 36%.

C. Data Set

The data set of 48 students used in this study was obtained from VBS Purvanchal University, Jaunpur (Uttar Pradesh) Computer Applications department of course MCA (Master of Computer Applications) from session 2008 to 2011.

TABLE III : DATA SET

S. No. PSM CTG SEM ASSS ATT LW ESM

1. First Good Good Yes Good Yes First

2. First Good Average Yes Good Yes First

3. First Good Average No Average No First

4. First Average Good No Good Yes First

5. First Average Average No Good Yes First

6. First Poor Average No Average Yes First

7. First Poor Average No Poor Yes Second

8. First Average Poor Yes Average No First

9. First Poor Poor No Poor No Third

10. First Average Average Yes Good No First

11. Second Good Good Yes Good Yes First

12. Second Good Average Yes Good Yes First

13. Second Good Average Yes Good No First

14. Second Average Good Yes Good No First

15. Second Good Average Yes Average Yes First

16. Second Good Average Yes Poor Yes Second

17. Second Average Average Yes Good Yes Second

18. Second Average Average Yes Poor Yes Second

19. Second Poor Average No Good Yes Second

20. Second Average Poor Yes Average Yes Second

21. Second Poor Average No Poor No Third

22. Second Poor Poor Yes Average Yes Third

23. Second Poor Poor No Average Yes Third

24. Second Poor Poor Yes Good Yes Second

25. Second Poor Poor Yes Poor Yes Third

26. Second Poor Poor No Poor Yes Fail

27. Third Good Good Yes Good Yes First

28. Third Average Good Yes Good Yes Second

29. Third Good Average Yes Good Yes Second

30. Third Good Good Yes Average Yes Second

31. Third Good Good No Good Yes Second

32. Third Average Average Yes Good Yes Second

33. Third Average Average No Average Yes Third

34. Third Average Good No Good Yes Third

35. Third Good Average No Average Yes Third

36. Third Average Poor No Average Yes Third

37. Third Poor Average Yes Average Yes Third

38. Third Poor Average No Poor Yes Fail

39. Third Average Average No Poor Yes Third

40. Third Poor Poor No Good No Third

41. Third Poor Poor No Poor Yes Fail

42. Third Poor Poor No Poor No Fail

43. Fail Good Good Yes Good Yes Second

44. Fail Good Good Yes Average Yes Second

45. Fail Average Good Yes Average Yes Third

46. Fail Poor Poor Yes Average No Fail

47. Fail Good Poor No Poor Yes Fail

48. Fail Poor Poor No Poor Yes Fail

D. Model Construction

The Weka Knowledge Explorer is an easy to use graphical user interface that harnesses the power of the Weka software. The major Weka packages are Filters, Classifiers, Clusters, Associations, and Attribute Selection is represented in the Explorer along with a Visualization tool, which allows datasets and the predictions of Classifiers and Clusters to be visualized in two dimensions. The workbench contains a collection of visualization tools and algorithms for data analysis and predictive modelling together with graphical user interfaces for easy access to this functionality. It was primarily designed as a tool for analysing data from agricultural domains. Now it is used in many different application areas, in particular for educational purposes and research. The main strengths is freely available under the GNU General Public License, very portable because it is fully implemented in the Java programming language and runs on any modern computing platform, contains a comprehensive collection of data pre-processing and modelling techniques. Weka supports several standard data mining tasks like data clustering, classification, regression, pre-processing, visualization and feature selection. These techniques are predicated on the assumption that the data is available as a single flat file or relation. Each data point is described by a fixed number of attributes and an important area is currently not covered by the algorithms included in the Weka distribution is sequence modelling.

From the above data, mca.arff file was created. This

file was loaded into WEKA explorer. The classify panel enables the user to apply classification and regression algorithms to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, or the model itself. There are 16 decision tree algorithms like ID3, J48, Simple CART etc. implemented in WEKA. The algorithm used for classification is ID3, C4.5 and CART. Under the "Test options", the 10-fold cross-validation is selected as our evaluation approach. Since there is no separate evaluation data set, this is necessary to get a reasonable idea of accuracy of the generated model. The model is generated in the form of decision tree.


18

E. Results Obtained

The Table IV shows the accuracy of ID3, C4.5 and CART algorithms for classification applied on the above data sets using 10-fold cross validation is observed as follows:

TABLE IV: CLASSIFIERS ACCURACY

Algorithm Correctly Classified Instances

Incorrectly Classified Instances

ID3 52.0833% 35.4167% C4.5 45.8333% 54.1667 %

CART 56.25% 43.75%

Table IV shows that a CART technique has highest accuracy of 56.25% compared to other methods. ID3 algorithm also showed an acceptable level of accuracy.

The Table V shows the time complexity in seconds of various classifiers to build the model for training data.

TABLE V: EXECUTION TIME TO BUILD THE MODEL

Algorithm Execution Time (Sec)

ID3 0 C4.5 0.02

CART 0.05

The classification matrix has been presented in Table VI, VII and VIII, which compared the actual and predicted classifications. In addition, the classification accuracy for the four-class outcome categories was presented.

TABLE VI: CLASSIFICATION MATRIX-ID3 PREDICTION MODEL

ESM Predicted

% of correct predictio

n First Secon

d Thir

d Fail

Actual

First 8 3 0 0 72.7% Second 4 6 2 0 50.0% Third 0 4 7 2 53.8% Fail 0 1 1 4 66.7%

TABLE VII: CLASSIFICATION MATRIX-C4.5 PREDICTION MODEL

ESM Predicted

% of correct prediction

First

Second

Third

Fail

Actua

l

First 8 4 2 0 57.1% Secon

d 3 8 2 1 57.1%

Third 4 4 4 1 30.8% Fail 0 1 5 1 14.3%

TABLE VIII: CLASSIFICATION MATRIX-CART PREDICTION MODEL

The knowledge represented by decision tree can be extracted and represented in the form of IF-THEN rules.

IF PSM = ‘First’ AND ATT = ‘Good’ AND CTG = ‘Good’ or ‘Average’ THEN ESM = First

IF PSM = ‘First’ AND CTG = ‘Good’ AND ATT = “Good’ OR ‘Average’ THEN ESM = ‘First’

IF PSM = ‘Second’ AND ATT = ‘Good’ AND ASS = ‘Yes’ THEN ESM = ‘First’

IF PSM = ‘Second’ AND CTG = ‘Average’ AND LW = ‘Yes’ THEN ESM = ‘Second’

IF PSM = ‘Third’ AND CTG = ‘Good’ OR ‘Average’ AND ATT = “Good’ OR ‘Average’ THEN PSM = ‘Second’

IF PSM = ‘Third’ AND ASS = ‘No’ AND ATT = ‘Average’ THEN PSM = ‘Third’

IF PSM = ‘Fail’ AND CTG = ‘Poor’ AND ATT = ‘Poor’ THEN PSM = ‘Fail’

Fig 1. Rule Set generated by Decision Tree

The classifiers accuracy on various data sets is represented in the form of a graph.

Fig 2. Comparison of Classifiers

ESM Predicted % of correct

prediction Firs

t Secon

d Thir

d Fail

Actual

First 9 3 2 0 64.3% Secon

d 2 10 2 0 71.4%

Third 2 4 5 2 38.5% Fail 0 1 3 3 42.9%


19

V. CONCLUSION

Data Mining is gaining its popularity in almost all applications of real world. One of the data mining techniques i.e., classification is an interesting topic to the researchers as it is accurately and efficiently classifies the data for knowledge discovery. Decision trees are so popular because they produce classification rules that are easy to interpret than other classification methods. Frequently used decision tree classifiers are studied and the experiments are conducted to find the best classifier for Student data to predict the student’s performance in the end semester examination. The experimental results show that CART is the best algorithm for classification of data.

This study will help to the students and the teachers to improve the performance of the students. This study will also work to identify those students which needed special attention and will also work to reduce fail ratio and taking appropriate action for the next semester examination.

REFERENCES [1] Heikki, Mannila, “Data mining: machine learning, statistics, and

databases”, IEEE, 1996. [2] U. Fayadd, Piatesky, G. Shapiro, and P. Smyth, “From data mining

to knowledge discovery in databases”, AAAI Press / The MIT Press, Massachusetts Institute Of Technology. ISBN 0–262 56097–6, 1996.

[3] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann, 2000.

[4] Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A Case Study”, 2009.

[5] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or underperformer using classification”, (IJCSIT) International Journal of Computer Science and Information Technology, Vol. 2(2), pp.686-690, ISSN:0975-9646, 2011.

[6] S. T. Hijazi, and R. S. M. M. Naqvi, “Factors affecting student’s performance: A Case of Private Colleges”, Bangladesh e-Journal of Sociology, Vol. 3, No. 1, 2006.

[7] Z. N. Khan, “Scholastic achievement of higher secondary students in science stream”, Journal of Social Sciences, Vol. 1, No. 2, pp. 84-87, 2005..

[8] Galit.et.al, “Examining online learning processes based on log files analysis: a case study”. Research, Reflection and Innovations in Integrating ICT in Education 2007.

[9] Q. A. AI-Radaideh, E. W. AI-Shawakfa, and M. I. AI-Najjar, “Mining student data using decision trees”, International Arab Conference on Information Technology (ACIT'2006), Yarmouk University, Jordan, 2006.

[10] U. K. Pandey, and S. Pal, “A Data mining view on class room teaching language”, (IJCSI) International Journal of Computer

Science Issue, Vol. 8, Issue 2, pp. 277-282, ISSN: 1694-0814, 2011.

[11] Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar, M. Inayat Khan, “Data mining model for higher education system”, Europen Journal of Scientific Research, Vol.43, No.1, pp.24-29, 2010.

[12] M. Bray, “The shadow education system: private tutoring and its implications for planners”, (2nd ed.), UNESCO, PARIS, France, 2007.

[13] B.K. Bharadwaj and S. Pal. “Data Mining: A prediction for performance improvement using classification”, International Journal of Computer Science and Information Security (IJCSIS), Vol. 9, No. 4, pp. 136-140, 2011.

[14] J. R. Quinlan, “Introduction of decision tree”, Journal of Machine learning”, : pp. 81-106, 1986.

[15] Z. J. Kovacic, “Early prediction of student success: Mining student enrollment data”, Proceedings of Informing Science & IT Education Conference 2010

[16] B.K. Bharadwaj and S. Pal. “Mining Educational Data to Analyze Students’ Performance”, International Journal of Advance Computer Science and Applications (IJACSA), Vol. 2, No. 6, pp. 63-69, 2011.

[17] G Stasis, A.C. Loukis, E.N. Pavlopoulos, S.A. Koutsouris, D. “Using decision tree algorithms as a basis for a heart sound diagnosis decision support system”, Information Technology Applications in Biomedicine, 2003. 4th International IEEE EMBS Special Topic Conference, April 2003.

[18] J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, Inc, 1992.


20

An Analysis of Fixed Probabilistic Route Discovery Mechanism using on-demand

routing protocols V.Mathivanan #1, E.Ramaraj *2

#1 Research Scholar, Alagappa University, Karaikkudi, Tamilnadu, India 1 [email protected]

#2 Director, Department of Computer center, Alagappa University Karaikkudi, Tamilnadu, India

2 [email protected]

Abstract— An ad hoc wireless network

consists of set of mobile nodes connected without any central administration. Path finding processes in on-demand route discovery methods in mobile ad hoc networks (MANETs) use flooding. Source mobile node simply broadcast route request (RREQ) packet to its neighbour node and once again the neighbour node rebroadcast RREQ packet to its neighbour until unless route to a particular destination is found. The excessive RREQ packet can lead collision problem and consume more bandwidth in the network and decrease network performance. This paper examined the fixed probabilistic (FP) based broadcast method using existing on demand routing protocols such as on demand distance vector routing protocols (AODV) and dynamic source routing protocol (DSR). The author used NS-2 simulator for the evaluation of FP-AODV and FP-DSR with traditional AODV[1] and DSR[2] routing protocols using various parameters such as collision rate, routing overhead, network connectivity and throughput. The simulation result shows significant improvement in FP-AODV and FP-DSR.

Keywords: MANET, AODV, DSR, RREQ, broadcast.

I. INTRODUCTION Ad hoc wireless network utilize multi-hop nature

and operating without the support of any fixed infrastructure. Hence this type of network called infrastructure less network. The absence of any central coordinator the routing protocol makes routing is very difficult. The path setup between two nodes is completed by the help of intermediate node. The routing is responsibilities of routing protocol, which include exchanging the route information, finding good path to a destination based on good routing metrics such as hop length, minimum power and life

time of the links; collecting information about the path breaks; restoration of broken path with short processing power and bandwidth; and utilizing minimum bandwidth. The routing protocols faces many challenges such as mobility, bandwidth constraints, error-prone and shared channel, location dependent contention etc,. The major needed of routing protocol in ad hoc wireless networks are minimum route acquisition, quick route reconfiguration, loop free routing, distributed routing approach, minimum control overhead, scalability, quality of service, time sensitive traffic, security and privacy.

The major challenge in MANET is multi-hop

behaviour. For Ad hoc network several routing protocols have been proposed. These protocols classified into three categories such as proactive or table driven routing protocols, reactive or on demand routing protocols and hybrid routing protocols. The table –driven routing protocols, all node keep the network topology information in the form of routing tables by periodically exchanging information. Routing information is flooded in whole network. If node require route to destination, it runs path finding algorithm to find the route. For example destination sequenced distance vector routing protocols (DSDV), Wireless routing protocols (WRP), Cluster Head Gateway Switch routing protocols (CGSR) are working under proactive routing. Reactive routing protocols do not maintain topology information, whenever the source node required route it initiates path finding process. These protocols do not exchange routing information periodically. For example Ad hoc on demand distance vector routing protocol (AODV),Temporally ordered routing algorithm (TORA),Location aided routing (LAR) and dynamic source routing protocols (DSR) are coming under reactive protocols. Hybrid routing protocols has the best features of proactive and reactive routing protocols. For example zone routing protocols (ZRP),


21

Core extraction distributed ad hoc routing protocols (CEDAR) coming under hybrid category.

In on demand distance vector routing protocol, the

source node initiates RREQ packet and broadcast to its neighbors. The broadcasting is referred as flooding. For example the source S may initiate a destination search using RREQ packet. This packet contains location of S, destination ID and some control bits. If destination not reaches the intermediate node receives RREQ packet and rebroadcast to its entire neighbor until the destination found. The blind flooding causes unnecessary collision and bandwidth waste. For this problem some optimization techniques applied. The flooding can be classified into simple or blind flooding, probability based flooding, area based flooding and neighbor knowledge methods. The neighbor knowledge based flooding further classified into clustering based flooding, selecting forwarding neighbors and internal node based flooding.

A straightforward flooding is very costly and will

result serious redundancy, contention and collision. They identified this broadcast storm problem. Recently, probabilistic broadcast schemes for MANETs have been suggested for broadcast storm problem [3] associated with the simple flooding. In the probabilistic scheme, each node rebroadcast received RREQ packet with given fixed probability p. This method reduces the routing overhead.

This paper introduce performance analysis of two

on-demand routing protocols that are based on probabilistic route discovery, namely FP- AODV and FP-DSR, in order to assess their behaviour in various network operating environments. In this paper section 2 shows related work, section 3 shows Analysis of Fixed Probabilistic Route Discovery; section 4 shows performance Analysis of Fixed Probabilistic Route Discovery and section 5 conclusions about this paper and future direction.

VI. RELATED WORK Broadcasting in MANETs is means one node sends

a packet to all other nodes in a network. Simple flooding is the simplest form of broadcasting where the source node broadcasts a packet to its neighbouring nodes. Each neighbouring node receiving the broadcast packet for the first time rebroadcasts to its neighboring nodes. Finally, the broadcast propagates outwards from the source node, eventually terminating when every node has received and transmitted the broadcast packet exactly once.

Simple flooding broadcast mechanism ensures the full coverage of the entire network. The broadcast packet is guaranteed to be delivered to every node in the network, provided the network is static and connected. In large sized dense networks, simple flooding may gain far more transmissions than necessary for the broadcast packet to reach every node. Figure 2.1 shows a sample network with 5 nodes. When node v broadcasts a packet, nodes u, w and x receive the packet. u, w and x then forward the packet and lastly y also broadcasts the packet. The figure shows that there is a great deal of broadcast redundancy as a result of simple flooding in this case. Transmitting the broadcast packet only by nodes v and u is enough for the broadcast operation. If the size of the network (i.e. number of nodes) increases and the network becomes denser, more transmission redundancy will be introduced. This type of simple flooding will be initiated transmission collision and contention; this will affect the network performance. This phenomenon of broadcasting induces what is often referred to in the literature as the broadcast storm problem [3]. w

v u y x

Figure 2.1 Example of a MANET of five nodes with redundant transmissions.

The broadcast storm problem can be avoided by reducing the number of nodes that forward the broadcast packet. Ni et al. [3] have classified several proposed broadcast algorithms in two categories: probabilistic and deterministic. William and Camp [4] have compared the performance of several proposed broadcast approaches including the probabilistic, counter-based, area- based, neighbour-designated and cluster-based. The following sections provide a brief description of each these approaches. A. Counter-Based Methods

In this technique, when a node receives a

broadcast packet, it starts a random assessment delay (RAD) and counts the number of received duplicate packets. When the RAD expires, the node


22

rebroadcasts the packet only if the counter does not exceed a threshold value C. If the counter exceeds the threshold after expiration of RAD, the node assumes all its neighbours have received the same packet, and refrains from forwarding the packet. The predefined counter threshold C is the key parameter in this technique. Ni et al. [3] have demonstrated that broadcast redundancy associated with simple flooding can be reduced while maintaining comparable reachability in a network of 100 nodes, each with 500m transmission range placed on an area between 1500m x 1500m and 5500m x 5500m by using a counter based scheme with the value of C set to 3 or 4. B. Area-based Methods

A node using an Area Based Method can evaluate

additional coverage area based on all received redundant transmissions. We note that area based methods only consider the coverage area of a transmission; they don’t consider whether nodes exist within that area. The additional coverage area is determined by a distance-based scheme or location-based scheme. For example, if the node receiving the packet is located a few meters away from the sender, the additional area covered by forwarding the packet is quite low [3]. At the other extreme, if the node receiving the packet is located at the boundary of the sender’s transmission range, then a rebroadcast would reach a significant additional area, 61%, as suggested in [5].

i) Distance-Based Scheme:

A node compares the distance between itself and

each neighbouring node that has previously forwarded a given packet. Upon reception of a previously unseen packet, a random assessment delay (or RAD for short) is initiated and redundant packets are cached. When the RAD expires, the locations of all the sender nodes are examined to see if any node is closer than a threshold distance value. If true, the node does not rebroadcast. Therefore, a node using the distance-based scheme requires the knowledge of the geographic locations of its neighbours in order to make a rebroadcast decision. A physical layer parameter such as the signal strength at a node can be used to gauge the distance to the source of a received packet. Alternatively, if a GPS receiver is available, nodes could include their location information in each packet transmitted. The distance-based scheme succeeds in reaching a large part of the network but does not economise the number of broadcast packets. This is because a node may have received a broadcast packet many times,

but will still rebroadcast the packet if none of the transmission distances are below a given distance threshold.

ii) Location-Based Scheme:

Using a location based scheme [3], each node is

expected to know its own position relative to the position of the sender using a geolocation technique such as GPS. Whenever a node originates or forwards a broadcast packet it adds its own location to the header of the packet. When a neighbouring node initially receives the packet, it notes the location of the sender and calculates the additional coverage area obtainable if it were to rebroadcast. If the additional area is less than a threshold value, the node will not rebroadcast, and all future receptions of the same packet will be ignored. Otherwise, the node assigns a RAD before delivery. If the node receives a redundant packet during the RAD, it recalculates the additional coverage area and compares that value to the threshold. The comparison of the area calculation and threshold occurs for all redundant broadcasts received until the packet reaches either the scheduled send time or is dropped.

C. Neighbour Knowledge Based Methods

Neighbour knowledge based schemes [6] maintain

state information about their neighbourhood via periodic exchange of “hello” packets, which is used in the decision to rebroadcast. The objective is to predetermine a small subset of nodes for broadcasting a packet such that every node in the network receives it. Often this subset is called the forwarding set. Below are brief descriptions of the various neighbour-knowledge-based schemes.

i) Forwarding Neighbours Schemes:

In forwarding neighbours schemes, the forwarding

status of each node is determined by its neighbours. Specifically, the sender proactively selects a subset of its 1-hop neighbours as forwarding nodes. The forwarding nodes are selected using a connected dominating set (CDS) algorithm and the identifiers (IDs) of the selected forwarding nodes are piggybacked on the broadcast packet as the forwarder list. Each designated forward node in turn designates its own list of forward nodes before forwarding the broadcast packet. The Dominant Pruning algorithm [7] is a typical example of the forwarding neighbours schemes. Ideally, the number of forwarding nodes should be minimised to decrease the number of redundant transmissions. However, the


23

optimal solution is NP-complete and requires that nodes know the entire topology of the network.

ii) Self Pruning Schemes:

For broadcasting based on a self pruning scheme

[7], each node may determine its own status as a forward node or non-forward node, after the first copy of a broadcast packet is received or after several copies of the broadcast packet are received. For example the authors of [8] have suggested that each node must have at least 2-hop neighbourhood information which is collected via a periodic exchange of “hello” packets among neighbouring nodes. A node piggybacks its list of known 1-hop neighbours in the headers of “hello” packets and broadcast packets and each node that receives the packet construct a list of its 2-hop and 1-hop neighbours that will covered by the broadcast. If the receiving node will not reach additional nodes, it refrains from broadcasting; otherwise it rebroadcasts the packet.

iii) Scalable Broadcast Algorithm (SBA):

This algorithm requires that all nodes have

knowledge of their neighbours within a two hop radius [9]. This neighbour information coupled with the identity of the node from which a packet is received allows a receiving node to determine if it would reach additional nodes by forwarding the broadcast packet. 2-hop neighbour information is achievable via a periodic exchange of “hello” packets; each “hello” packet contains the node’s identifier and the list of known neighbours. After a node receives a “hello” packet from all its neighbours, it has 2-hop topology information centred at itself.

iv) Multipoint Relaying Algorithm:

In multipoint relaying [10], each node selects a

small subset of its 1-hop neighbours as Multipoint Relays (MPRs) sufficient to cover its 2-hop neighbourhood (see Figure 2.2). When a broadcast packet is transmitted by a node, only the MPRs of a given node are allowed to forward the packet and only their MPRs forward the packet and so on. Using some heuristics, each node is able to locally compute its own MPRs based on the availability of its neighbourhood topology information. The neighbourhood topology information is obtained via a periodic exchange of “hello” packets among neighbouring nodes. Each “hello” packet contains the sender’s ID and its list of neighbours.

Figure 2.2. Simulator usage from MobiHoc survey for 2000-2005. D. Cluster-Based Methods

In cluster-based broadcast methods, the network is

partitioned into several groups of clusters forming a simple backbone infrastructure. Each cluster has one cluster head that dominates all other members in the cluster, e.g. is responsible for forwarding packets and selecting forwarding nodes on behalf of the cluster. Two or more overlapping clusters are connected by gateway nodes. Although clustering can be desirable in MANETs, the overhead associated with the formation and maintenance of clusters is non-trivial in most cases [11]. Therefore, the total number of transmissions (i.e. number of forwarding nodes) is generally used as the cost criterion for broadcasting. Cluster heads and gateway nodes of a given MANET together form a connected dominating set. The problem of finding the minimum number of forwarding nodes that forms the minimum connected dominating set is well known to be NP-complete.

E. Probabilistic Based Methods

Probabilistic broadcasting is one of the simplest

and most efficient broadcast techniques that have been suggested [3] in the literature. In this approach, each intermediate node rebroadcasts received packets only with a predetermined forwarding probability. To determine an appropriate forwarding probability, Sasson et al. [12] have suggested the use of random graphs and percolation theory in MANETs. The authors have claimed that there exists a probability value Pc < 1, such that by using Pc as a forwarding probability, almost all nodes can receive a broadcast packet, while there is not much improvement on reachability for p > Pc. Since Pc is different in various MANET topologies, and there is no existing mathematical method for estimating Pc, many probabilistic approaches use a predefined value for Pc.


24

The advantage of probabilistic broadcasting over the other proposed broadcast methods [3,4,] is its simplicity. However, studies [3] have shown that although probabilistic broadcast schemes can significantly reduce the degrading effects of the broadcast storm problem [3], they suffer from poor reachability, especially in a sparse network topology. But the authors in [13] have argued that the poor reachability exhibited by the probabilistic broadcast algorithms in is due to assigning the same forwarding probability at every node in the network.

Cartigny and Simplot [14] have described a

probabilistic scheme where the forwarding probability p is computed from the local density n (i.e. the number of neighbours of the node considering retransmission). The authors have also introduced a fixed value parameter k to achieve high reachability. This broadcast scheme has a drawback of being locally uniform. This is because each node in the network determines its forwarding probability based on the fixed efficiency parameter k which is not globally optimal.

Zhang and Agrawal [15] have described a dynamic

probabilistic scheme using a combination of probabilistic and counter-based approaches. In this approach, the forwarding probability at a node is set based on the number of duplicate packets received at the node. But the value of a packet counter at a node does not necessarily correspond to the exact number of neighbours of the node, since some of its neighbours may have suppressed their rebroadcasts according to their local rebroadcast probability.

In [13], the network topology is logically partitioned

into sparse and dense regions using the local neighbourhood information. Each node located in a sparse region is assigned a high forwarding probability whereas the nodes located in the dense regions are assigned low forwarding probability.

VII. ANALYSIS OF FIXED PROBABILISTIC ROUTE

DISCOVERY To minimize the overhead associated with the

dissemination of broadcast packets in “pure” broadcast scenarios while still maintaining an acceptable level of reachability, probabilistic approaches have been proposed in the literature as an alternative to simple flooding [3, 13]. In the probabilistic schemes, upon receiving a broadcast packet for the first time, a node forwards the packet with a pre-determined forwarding probability p and drops the packet with the probability 1-p, as shown in Table 3.1. All forwarding node is assigned the same

forwarding (fixed probability) probability p and when p = 1 the probabilistic scheme reduces to simple flooding.

Table 3.1. An algorithmic framework for probabilistic route discovery

Algorithm: Fixed Probabilistic Route Discovery

Upon receiving a RREQ packet rq a node If RREQ is received for first time Set rebroadcast probabilistic to p=Pc Endif Generate a random number Rnd over the range [0,1] If Rnd <= p Broadcast the RREQ packet Else Drop the packet

The effects of network density and nodal mobility

on probabilistic flooding in a pure broadcast scenario have been analyzed over a wide range of forwarding probabilities [13]. The authors have shown that probabilistic broadcast algorithms can achieve improvements in terms of saved rebroadcast in high mobility and dense networks. However, to the best of my knowledge, there has not been a study that evaluates the performance impact of probabilistic broadcast on practical applications such as route discovery over a wide range of forwarding probabilities and varying network operating conditions, notably, network density, node mobility, traffic load and network size.

Motivated by the above observations, the main

objective of this chapter is to conduct an extensive performance analysis by means of Ns-2 [16] simulations of probabilistic route discovery in two popular on-demand routing protocols, namely AODV [17] and DSR [18]. In the case of probabilistic route discovery, each received RREQ packet is forwarded once with the forwarding probability p (see Table 3.1). The performance analysis is conducted over a range of forwarding probabilities from 0.1 to 1 in steps of 0.1. This simulation study is the first evaluation to be reported in the literature and will help to provide insight into the potential performance discrepancies of the two routing protocols and, more significantly, to outline the relative performance of the various forwarding probabilities under varying network operating conditions. The performance analysis is conducted using the most widely used performance metrics: throughput, delivery ratio, network connectivity, end-to-end delay, routing overhead and collision rate.


25

F. PERFORMANCE EVALUATION The performance of fixed probability based

broadcast has been evaluated in NS-2 Simulator by using traditional AODV and DSR routing protocols. The NS-2 simulation model consists of topology scenario files and traffic generation pattern files. The topology scenario files define the simulation area and the mobility model of randomly distributed mobile nodes over the simulation time period. On the other hand, the traffic pattern files define the characteristics of data communications, notably, data packet size, packet type, packet transmission rate and the number of traffic flows. Other simulation parameters used in this research study have been summarized in table 4.1.

Table 4.1 System parameters, mobility model and protocol standards used in the simulation experiments

Simulation Parameter

Value

Simulator Transmitter range Bandwidth Interface queue length Traffic type Packet size Simulation time Number of trials Topology size Number of nodes Maximum speed

NS-2 250 meters 2 Mbps 50packets CBR 512 bytes 900 sec 30 1000m x 1000m 25, 50, 75, . . . , 225 1m/sec 5m/s, 10m/sec, ... , 25m/s

Figure 4.1, when the forwarding probability is

reduced from p = 1 (i.e. simple flooding) to p = 0.7, the collision rate in FP-AODV for both the 100 and 150 node networks is reduced by approximately 88% and 93% respectively, while in FP-DSR the collision rate is reduced by as much as 119% for a 100 node network and approximately 70% for a 150 node network. As expected, the collision rate for a given network size (i.e. a given number of nodes) decreases almost linearly with decreasing forwarding probabilities.

Figure 4.1 Average Collisions rate vs. forwarding probabilities for 100-node and 150-node networks.

The figure 4.2 reveals that for a given network density, the routing overhead incurred by each of the routing protocols decreases almost linearly as the forwarding probability decreases. When the probability is reduced from p1 to p.7, the routing overhead FP-AODV is reduced by approximately 54% for the 100 nodes network and 60% for the 150 nodes network. For a similar reduction of the forwarding probability in FP-DSR, the routing overhead is slightly reduced by approximately 7% in the 100 nodes network and about 27% in the 150 nodes network.

Figure 4.2. Routing overhead vs. forwarding probabilities for 100-node and 150-node network.

The connectivity success ratio in FP-DSR drops sharply in relatively dense network (e.g. 150 nodes). As can be seen in Figure 4.3, the connectivity success ratio of FP-AODV is relatively low for both high and low forwarding probabilities (e.g. p < 4 and p


26

>7) respectively. For p < 4, fewer than optimal number of nodes is allowed to forward the RREQ packets, thereby preventing some of the RREQ packets from reaching their destinations. On the hand, for p > 7, more than optimal number of nodes in the network are allowed to forward the RREQ packets, as a consequence, the channel contention and packet collisions are increased.

Figure 4.3 Network connectivity vs. forwarding probabilities for 100-node and 150-node networks.

The results in Figure 4.4 shows that for FP-AODV, the normalised aggregate throughput in both topology scenarios (i.e. 100 and 150 nodes networks) increases as the forwarding probability increases from 0.1 to 0.6. On the other hand, the throughput decreases as the forwarding probability increases from 0.7 to 1.0. The normalised throughput in FP-DSR for each of the network densities decreases as the forwarding probability increases from 0.1 to 1. The results in Figure 4.4 also show that at low forwarding probability normalised throughput of FP-AODV is relatively lower compared with that of FP-DSR. However, in a dense network the FP-AODV outperforms the FP-DSR when the forwarding probability is set high, particularly in a dense network.

Figure 4.4 Throughput vs. forwarding probabilities for 100-node and 150-node networks.

In Figure 4.5, the results of FP-AODV and FP-DSR

in terms of the average end-to- end packet delay are plotted against forwarding probabilities; the results also show that the FP-DSR incurs higher delay compared with the FP-AODV. This is due to the fact that the FP-DSR often relies on cached routes for data transmission.

Figure 4.5 End-to-end delay vs. forwarding probabilities for 100-node and 150-node networks.


27

VIII. CONCLUSION This chapter has conducted the first performance

analysis of two on-demand routing protocols that are based on probabilistic route discovery, namely FP- AODV and FP-DSR, in order to assess their behavior in various network operating environments. The analysis has been conducted through studying the effects of different network densities in terms of deploying different numbers of nodes over a fixed size topology area. The forwarding probability has been varied from 0.1 to 1 in steps of 0.1. The result shows the probabilistic based broadcast is better than simple flooding. The same kind of evaluation may be examined in against node mobility, traffic load and other parameters.

REFERENCES

[1]. C. S. L. M. S. Committee, "Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications," IEEE Standard 802.11-1997. Retrieved on January 2, 2008, from IEEE 802.11 Wireless Local Networks Website:http://www.ieee802.org/11, 1997.

[2]. D. Day and H. Zimmerman, "The OSI reference model," Proceedings of the IEEE, vol. 71, pp. 1334-1340, December, 1983.

[3]. S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu, "The broadcast storm problem in a mobile ad hoc networks," Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, pp. 152-162, August 1999.P. Weckesser, and R. Dillmann, “Modeling unknown environment with a mobile robot,”, Robotics Autonomous Systems, vol. 23, pp. 293–300, 1998.

[4]. B. Williams and T. Camp, "Comparison of broadcasting techniques for mobile ad hoc networks," Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computing, MOBIHOC, pp. 194 - 205, June 2002.

[5]. Y.-C. Tseng, S.-Y. Ni, and E.-Y. Shih, "Adaptive approaches to relieving broadcast storms in a wireless multihop mobile ad hoc networks," Proceedings of IEEE Transactions on Computers, vol. 52, pp. 545-557, May 2003.

[6]. W. Peng and X. C. Lu, "On the reduction of broadcast redundancy in mobile ad hoc networks," proceedings of the ACM Symposium on Mobile and Ad Hoc Networking and Computing (MobiHoc'00), pp. 129-130, August, 2000.

[7]. H. Lim and C. Kim, "Flooding in wireless ad hoc networks," Computer and Communications, vol. 24, February 2003.

[8]. Wu and F. Dai, "Broadcasting in ad hoc networks based on self-pruning," International Journal of Foundations of Computer Science, vol. 14, pp. 201-221, April 2003.

[9]. W. Peng and X. C. Lu, "On the reduction of broadcast redundancy in mobile ad hoc networks," proceedings of the ACM Symposium on Mobile and Ad Hoc Networking and Computing (MobiHoc'00), pp. 129-130, August, 2000.

[10]. Qayyum, L. Viennot, and A. Laouiti, "Multipoint relaying for flooding broadcast messages in mobile wireless networks," Proceedings of 35th Annual Hawaii International Conference on System Sciences (HICSS'02), vol. 9, pp. 3866-3875, January, 2002.

[11]. B. McDonald and T. F. Znati, "A mobility-based framework for adaptive clustering in wireless ad hoc networks," IEEE JSAC, vol. 17, pp. 1466-1486, August, 1999.

[12]. Y. Sasson, D. Cavin, and A. Schiper, "Probabilistic broadcast for flooding in wireless mobile ad hoc networks," Proceedings of IEEE Wireless Communications and Networking Conference (WCNC), March 2003.

[13]. Bani-Yassein, M. Ould-Khaoua, L. M. Mackenzei, and S. Papanastasiou, "Performance Analysis of Adjusted Probabilistic Broadcasting in Mobile Ad Hoc Networks," International Journal of Wireless Information Networks, vol. 13, pp.127-140, April, 2006.

[14]. Cartigny and D. Simplot, "Border node retransmission based probabilistic broadcast protocols in ad-hoc networks," Telecommunication Systems, vol. 22, pp. 189-204, 2003.

[15]. G. Lin, G. Noubir, and R. Rajamaran, "Mobility Models for Ad hoc Network Simulation," Proceedings of 23rd Conference of the IEEE Communications Society (INFOCOM 2003), vol. 1, pp. 454-463, March 2004.

[16]. Fall and K. V. T. n. m.-t. V. pro, "The Network Simulator Ns-2, the VINT project," http://www.isi.edu/nsnam/ns/ns-build.html; Retrieved in December 2007.

[17]. Perkins, E. Belding-Royer, and S. Das, "Ad hoc On-

Demand Distance Vector (AODV) Routing," IETF Mobile Ad Hoc Networking Working Group INTERNET DRAFT, RFC 3561,July 2003, http://www.ietf.org/rfc/rfc3561.txt. Experimental RFC, retrieved in October 2007.

[18]. Johnson, Y. Hu, and D. Maltz, "The Dynamic Source

Routing Protocol (DSR)," IETF Mobile Ad Hoc Networking Working Group INTERNET DRAFT, RFC 4728, February 2007, http://www.ietf.org/rfc/rfc4728.txt, retrieved in December 2007.


28

ijitce dec 2011

Documents