Download - Final thesis_Knowledge Discovery from Academic Data using Association Rule Mining

Knowledge Discovery from Academic Data

using Association Rule Mining

SUBMITTED BY

Rajshakhar Paul

Student ID: 0805020

Shibbir Ahmed

Student ID: 0805097

Submitted to the Department of Computer Science and Engineering in partial

fulfillment of the requirements for the degree of Bachelor of Science in

Computer Science and Engineering

June, 2014

SUPERVISED BY

Dr. Abu Sayed Md. Latiful Haque

Professor

Department of Computer Science and Engineering

BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY

Page | i

Certificate

We hereby declare that this work has been done by us and neither this thesis nor any part of it

has been submitted elsewhere for the award of any degree or diploma except for publication.

Rajshakhar Paul Shibbir Ahmed

Page | ii

Acknowledgement

First and foremost, we would like to thank the Almighty Allah that we could complete our thesis

work in time with promising findings.

We show our heartfelt gratitude toward our supervisor Dr. Abu Sayed Md. Latiful Haque, who

was very helpful during the entire span of our research work. Without his inspiration, direction,

support and advice, this work would not have been possible.

We would like to thank Department of Computer Science and Engineering for its support with

resources and materials during the research work. Specially, we remember our teachers who

earnestly provided us with encouragement and inspiration for achieving this goal.

We would also like to thank M. Sohel Rahman and Delwar Hossain for their generous assistance

to get the institutional data of students from IICT, BUET.

Last but not the least, we are thankful to our parents, family and friends for their support and

tolerance.

Page | iii

Abstract

Educational Data Mining is an emerging interdisciplinary research area focusing upon

methodologies for extracting useful knowledge from data originating in educational context. The

main objective of a higher education institution is to provide quality education to its students.

One way to achieve the highest possible level of quality in higher education system is by

discovering the knowledge which is hidden among the educational data set and applying the

knowledge properly. This knowledge is extractable through data mining techniques. Association

Rule Mining technique aims at discovering implicative tendencies that can provide valuable

information for the decision maker which is absent in other used academic data mining

techniques as Decision Trees, Neural Networks, Naive Bayes, K-Nearest neighbor etc.

In this work, we present an applied research on mining Association Rule using academic data of

a university. We have discovered knowledge regarding the academic performance and personal

statistics of students. Here we have developed a technique to transform the existing relational

database for students’ academic performance into a universal database format using academic

and personal data of a student. After that we have transformed the universal format into a

modified format for suitability of using Association Rule mining algorithm. We have used

Apriori algorithm for finding interested association rules from the transformed database which

can be useful to extract knowledge of students’ academic progress, potentiality decay,

abandonment as well as retention of students. The impact of courses and curriculum and teaching

methodologies are also found from the extracted knowledge. We have applied the technique

using institutional data of Bangladesh University of Engineering and Technology but it can be

used for the benefit of any institution of higher education.

Page | iv

LIST OF CONTENTS

ACKNOWLEDGEMENT …….…………………………………………………ii

ABSTRACT………….………………………………………………………….. iii

CHAPTER 1………….……………………………………………………………1

INTRODUCTION…………………………..…………………………………………………….1

1.1 Problem Definition…….…………………………………………………………………1

1.2 Motivation……....………………………………………………………………….……..2

1.3 Scope of the Work……..………………………………………………………………….3

1.4 Objectives………...………………....……………………………………………………3

1.5 Thesis Organization…...….………………………………………………………………4

CHAPTER 2………………………………………………………………………6

LITERATURE STUDIES………..……………..……………………………………………….6

2.1 Knowledge Discovery and Data Mining ………………….………………………………6

2.2 Association Rule Mining ……….………….……………………………………………..8

2.3 The Apriori Algorithm …………….…………………………………………………….12

2.4 Generating Association Rules from Frequent Itemsets………………………………….18

2.5 Related Works…………………………………………………………………………...19

CHAPTER 3………………………………………………………………..……21

ACADEMIC DATA STRUCTURE AND MINING SYSTEM……………………………….21

3.1 Data Analysis…………………………………………………………………………...21

3.1.1 Personal and Academic Data……………………………...…………………….21

3.1.2 Course and Curriculum …...……………………………..……………………..22

3.2 Preprocessing for Mining Academic Database……………………….………………...23

3.2.1 Relational Database………………………………………...…………………..23

3.2.2 Universal Database……………………………………..………………………23

3.2.3 Data Transformation………………………………..…………………………..24

3.3 Summary of Methodologies……………………………………………………………27

CHAPTER 4 ……..………………………………………………………….….28

SYSTEM IMPLEMENTATION, RESULTS AND DISCUSSIONS……………………..…28

4.1 Software Implementation ……………………………………………………………...28

Page | v

4.2 Dataset and Application Environment…………………………………………………29

4.3 Results and Discussions………………………………………………………………..30

4.3.1 Impact of Gender………………...……………………………………………..30

4.3.2 Impact of Residence…………………………………………………...……….31

4.3.3 Correlation between Courses…...………………………………...…….………32

4.3.4 Impact on Retention…………………………………………………………….33

4.3.5 Impact on Abandonment………...…….………………………………………..34

4.3.6 Impact of Continuous Assessment……………………………………………...35

4.3.7 Impact of Non Departmental Courses…………………………………………..36

4.3.8 Impact of Departmental Courses………………………………………………..37

CHAPTER 5……………………….……………………………………………..39

CONCLUSIONS……………………………………………………………………………..39

5.1 Summary of the Findings ……………………………………………………………...39

5.2 Future Works ………………………..…………………………………………………39

REFERENCES………..……………..……………………………………………41

APPENDIX…………...…………………….…………………………………….44

Page | vi

List of Tables

Table 2.1: Transactional data for AllElectronics branch………………………………………..14 Table 3.2: Selected Data from BIIS database……………………….…………………………..21

Table 3.2: All Undergraduate Courses for department of CSE…………………….…………...22

Table 3.3: Partial portion of universal database……………………………………..…………..24

Table 3.4: Transformation rule table for 3.0 credit theory course…………………….………...25

Table 3.5: Transformation rule table for 4.0 credit theory course…………………….………...26

Table 3.6: Transformation rule table for 2.0 credit theory course……………………….……...26

Table 3.7: Transformation rule table for all sessional courses………………………….………26

Table 3.8: Transformed table from universal table………………….………………….……….27

Table 4.1: Impact of Gender………………………………………………………….…………30

Table 4.2: Impact of Hall Status……………………………………………….………………..31

Table 4.3: Impact of Hall Status and Gender…………………………………..………………..32

Table 4.4: Correlation between Courses……………………………………….………………..32

Table 4.5: Impact on Retention………………………………………………..……………...…33

Table 4.6: Impact on Abandonment……………………….........................…………………….34

Table 4.7: Impact of Continuous Assessment……………………………..…………………….35

Table 4.8: Impact of Non Departmental Courses………………………….……………………36

Table 4.9: Impact of Departmental Courses……………………………….……………………37

Page | vii

List of Figures

Figure 2.1: Data Mining as a step in the process of Knowledge Discovery………………….….7

Figure 2.2: Market basket analysis……………………………………………………………….9

Figure 2.3 : Generation of candidate itemsets and frequent itemsets, where the minimum support

count is 2…………………………………………………………………………………………15

Figure 2.4 : Generation and pruning of candidate 3-itemsets, C3, from L2 using the Apriori

property…………………………………………………………...……………………………...16

Figure 2.5: The Apriori algorithm for discovering frequent itemsets for mining Boolean

association rules………………………………………………………………………………….17

Figure 3.1: Factors related to Academic Performance,Abandonment and Retention of student..21

Figure 3.2: Relational database………………..………………………………………………...23

Figure 4.1: Experimental Setup for applying Apriori Algorithm using Weka Explorer to

generate Association Rules…………..…...……………………….………………..……………29

Figure A1: Partial Portion of Initial Dataset (.xls format in Excel) of BIIS…………………….44

Figure A2: Partial Portion of Universal data (.xls format in Excel) before converting to

transformation table………………………………………………….……………......................45

Figure A3: Partial Portion of Transformation table (.xls format in Excel)……………………...46

Figure A4: Tranformation Table (in .csv format) loaded into Weka Explorer…………………47

Figure A5: Selecting Association Algorithm after loading transformation table in Weka

Explorer…………………………………………………………………………………………..48

Figure A6: Choosing support and confidence metrics with number of rules in Weka Explorer..48

Figure A7: After Choosing specific support and confidence metrics with number of rules in

Weka Explorer, the associator needs to be started for generating association rules……………..49

Knowledge Discovery from Academic Data using Association Rule Mining Page | 1

Chapter 1

Introduction

Students are one of the fundamental elements of any academic institution. Indeed, the prime

concern for an educational institution is to ensure qualified technical foundation, scholarly

guidance and high standard education to all of its students. For a large educational institute like

public university which generates large volumes of data, it requires an efficient way to apply data

mining techniques for obtaining knowledge on the development and performance improvement

of academic activities. The knowledge acquired from the institutional database will be sufficient

to look for answers to such questions as: Which factors determine better or worse academic

performance of students? What are the causes behind the students' retention i.e., the extended

continuation of the studies in the university? Why do students drop out before graduation i.e.,

students‟ abandonment from an educational institute. Concepts and techniques of data mining are

essential to discover the hidden knowledge from large datasets [1].

1.1 Problem Definition

Bangladesh University of Engineering and Technology (BUET) is the topmost technological

university of Bangladesh and it enrolls the top most brilliant 1000 students selected by a

competitive examination among one million students competing higher secondary education.

Among these 1000 students, top ranked students can get admission into the different departments

under different faculties. Although, this university possesses most of the brightest students of

Bangladesh, statistics demonstrates that performance of some students degrades noticeably. On

the other hand, some students perform outstandingly at the initial stage of the undergraduate

studies but they can not demonstrate the same level of excellence till the completion of their

graduation. Some students can not perform well initially but at the end of their graduation they

possess pretty good academic career. Again, there are some students in this university who have

to continue their studies year after year and take a very long time for the completion of their

graduation. Unfortunately there are also some meritorious students who drop out before the

graduation. Only statistical analysis is not sufficient for finding the reasons of all the above


problems in any academic institution. The hidden knowledge inside the institutional academic

and personal data of students is necessary to find out the possible causes of all these problems

and take suitable precaution for them. That is why knowledge discovery and data mining form

academic data is essential for educational institution like BUET to improve academic

performance of students as well as refine the standard of teaching methodologies and reshape the

decision makings for the betterment of the institution.

Discovering the hidden knowledge from educational data and applying it properly for decision

making is essential for ensuring high quality education in any academic institution. For this, data

mining techniques are very effective. But all the data mining techniques can not be applied

directly on academic data because of complex structure. This requires rigorous preprocessing.

The choice of support and confidence, selection of important association rules from huge number

of generated rules are other significant problems of knowledge discovery from academic data.

1.2 Motivation

In a developing country like Bangladesh, too many students from rural area come to city for

higher education. They usually come to city leaving their family and have to accommodate with

a completely new environment. They start their new educational life at institution‟s hall. New

living place, new types of foods, new companions, new atmosphere. It is seen that they usually

need some time to cope up both physically and mentally with all of these new things which may

hamper their educational activities at the very beginning. And the scenario is bit more difficult

for girls than boys. So sometimes they lag behind at the beginning of the race of their higher

studies which may create an adverse effect in the long run for them. On the other hand, the city

students are more likely familiar with the environment, living with their family and provided

with more opportunities of educational, technological and psychological aspects which may give

them some advantages in the track of higher education. Though the scenario can be different, the

more opportunities may drive them away from the track and demoralize them in studies.

In higher education system like BUET, the performance of one course depends on different

aspects such as class attendance, class test, quiz, assignments, term final examinations, etc. some

of which start from very beginning of the class. So if any student gets poor marks in any of these,

it may affect the final result. And the later courses are sometimes dependent of previous courses.


So if any student gets poor result in any course it may affect the performance of other related

courses too.

So it is very obvious to discover all possible knowledge from academic data to know all the

relevant rules behind students‟ performances whether they are doing well or bad. And if they

cannot perform well then the reason behind it can also be discovered.

1.3 Scope of the Work

Bangladesh University of Engineering and Technology (BUET) is the renowned university in

Bangladesh for engineering studies. There are five different faculties which are Faculty of

Architecture and Planning, Faculty of Civil Engineering, Faculty of Electrical and Electronic

Engineering, Faculty of Engineering and Faculty of Mechanical Engineering [2]. Under these

five faculties there are eleven different departments which are Dept. of Electrical & Electronic

Engineering (EEE), Dept. of Computer Science & Engineering (CSE), Dept. of Architecture

(Arch), Dept. of Urban & Regional Planning (URP), Dept. of Civil Engineering (CE), Dept. of

Water Resources Engineering (WRE), Dept. of Chemical Engineering (Ch.E), Dept. of Materials

& Metallurgical Engineering (MME), Dept. of Mechanical Engineering (ME), Dept. of Naval

Architecture & Marine Engineering (NAME), Dept. of Industrial & Production Engineering

(IPE). BUET offers both undergraduate and postgraduate degrees in all these departments.

Overall there are more than 5000 students in an academic session. So, the scope of knowledge

discovery form academic data of BUET is immense in context of undergraduate and

postgraduate students of all the departments. In this research, we have only considered the data

of undergraduate students of the department of CSE. We have developed a technique which can

be used to discover knowledge of rest of the departments even the modified technique can be

applicable to postgraduate course and curriculum for the betterment of postgraduate studies as

well.

1.4 Objectives

The department of Computer Science and Engineering (CSE) [3] is one of the prestigious

departments of BUET. Although, this department possesses most of the brightest students of

Bangladesh, statistical data demonstrates that performance of some students degrades noticeably.


Moreover the problem of retention as well as abandonment is also prevalent among the students.

The main objective of this research study is –

To discover knowledge of students‟ academic progress from academic performance

with personal statistics through the impact of different assessment of courses e.g.,

class test, attendance, term final examination etc.

To find out reasons behind the degradation of student‟s merit i.e., decay in their

potentiality

To discover causes behind extended continuation for graduation i.e., retention of

students

To find out why some meritorious students drop out before graduation i.e.,

abandonment of students

1.5 Thesis Organization

We have developed a technique to discover knowledge using Association Rule Mining from

institutional data of students who have completed their undergraduate in the department of CSE,

BUET. All the literature studies e.g., preliminaries of Knowledge Discovery and Data Mining

(KDD), Association Rule Mining, Apriori algorithm and related works have been elaborated in

chapter 2. Description of the academic data mining system i.e., the entire methodologies

including both the analysis and design part have been described in chapter 3. Basically in this

research, we have transformed the existing relational database for students‟ academic

performance into a universal database format using academic and personal data of students. After

that, we have further transformed the universal format into a modified format for suitability of

using Association Rule Mining algorithm which is elaborated in chapter 3.

We have discovered interested rules which interpret several important facts related to students‟

academic performance e.g., impact of personal information of students such that gender,

residence etc. and the impact of course contents and pedagogy. We have also determined the

impact of retention for particular courses. Addressing the abandonment issue, we have

categorized the students based on their personal information who could not complete their

graduation which is explained in chapter 4. In Chapter 4, basically we have briefly illustrated the

software implementation, dataset and application environment along with the results and


discussions. Finally, we have illustrated the summary of the findings along with quantitative

analysis. We have also encompassed the scope of the extension this research work by illustrating

some significant future works in chapter 5. Thus, we have presented a guideline to apply the

extracted knowledge to improve the academic performance and to make an optimization between

abandonment and retention.


Chapter 2

Literature Studies

2.1 Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining (KDD) is an interdisciplinary area focusing upon

methodologies for extracting useful knowledge from data [4]. Data mining is the task of

discovering interesting patterns from large amounts of data, where the data can be stored in

databases, data warehouses, or other information repositories. It is a young interdisciplinary

field, drawing from areas such as database systems, data warehousing, statistics, machine

learning, data visualization, information retrieval, and high-performance computing. Other

contributing areas include neural networks, pattern recognition, spatial data analysis, image

databases, signal processing, and many application fields, such as business, economics, and

bioinformatics. Generally, data mining (sometimes called data or knowledge discovery) is the

process of analyzing data from different perspectives and summarizing it into useful information

which can be used to increase revenue, cuts costs, or both. Data mining software is one of a

number of analytical tools for analyzing data. It allows users to analyze data from many different

dimensions or angles, categorize it, and summarize the relationships identified. Technically, data

mining is the process of finding correlations or patterns among dozens of fields in large

relational databases.

Many people treat data mining as a synonym for another popularly used term, Knowledge

Discovery from Data, or KDD, while others view data mining as simply an essential step in the

process of knowledge discovery [1].

Data mining has attracted a great deal of attention in the information industry and in society as a

whole in recent years, due to the wide availability of huge amounts of data and the imminent

need for turning such data into useful information and knowledge. The information and

knowledge gained can be used for applications ranging from market analysis, fraud detection,

and customer retention, to production control and science exploration.


Data mining can be viewed as a result of the natural evolution of information technology. The

database system industry has witnessed an evolutionary path in the development of the following

functionalities: data collection and database creation, data management (including data storage

and retrieval, and database transaction processing), and advanced data analysis (involving data

warehousing and data mining). For instance, the early development of data collection and

database creation mechanisms served as a prerequisite for later development of effective

mechanisms for data storage and retrieval, and query and transaction processing. With numerous

database systems offering query and transaction processing as common practice, advanced data

analysis has naturally become the next target.

Figure 2.1 : Data Mining as a step in the process of Knowledge Discovery

The Knowledge discovery process is shown in Figure 2.1 as an iterative sequence of the

following steps:

1. Data cleaning (to remove noise and inconsistent data)

2. Data integration (where multiple data sources may be combined)

3. Data selection (where data relevant to the analysis task are retrieved from the database)

4. Data transformation (where data are transformed or consolidated into forms appropriate

Interpretation

Evaluation

Data mining

Transformation

Preprocessing

Selection

Knowledge

Patterns/Models

Transformed

Data

Preprocessed

Data

Target

Data

Data


for mining by performing summary or aggregation operations, for instance)

5. Data mining (an essential process where intelligent methods are applied in order to

extract data patterns)

6. Pattern evaluation (to identify the truly interesting patterns representing knowledge

based on some interestingness measures)

7. Knowledge presentation (where visualization and knowledge representation techniques

are used to present the mined knowledge to the user)

Steps 1 through 4 are different forms of data preprocessing, where the data are prepared for

mining. The data mining step may interact with the user or a knowledge base. The interesting

patterns are presented to the user and may be stored as new knowledge in the knowledge base.

According to this view, data mining is only one step in the entire process, although an essential

one because it uncovers hidden patterns for evaluation. So, data mining is a step in the

knowledge discovery process. The ongoing rapid growth of online data due to the Internet and

the widespread use of databases have created an immense need for KDD methodologies. The

challenge of extracting knowledge from data draws upon research in statistics, databases, pattern

recognition, machine learning, data visualization, optimization, and high-performance

computing, to deliver advanced business intelligence and web discovery solutions. So, the entire

knowledge discovery process includes data cleaning, data integration, data selection, data

transformation, data mining, pattern evaluation, and knowledge presentation.

2.2 Association Rule Mining

Association rule mining, one of the most important and well researched techniques of data

mining, was first introduced in [6]. It aims to extract interesting correlations, frequent patterns,

associations or casual structures among sets of items in the transaction databases or other data

repositories. Association rules are widely used in various areas such as telecommunication

networks, market and risk management, inventory control etc. Various association mining

techniques and algorithms will be briefly introduced and compared later. Association rule mining

is to find out association rules that satisfy the predefined minimum support and confidence from

a given database. The problem is usually decomposed into two sub problems. One is to find

those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets


are called frequent or large itemsets. The second problem is to generate association rules from

those large itemsets with the constraints of minimal confidence.

Frequent itemset mining leads to the discovery of associations and correlations among items in

large transactional or relational data sets. With massive amounts of data continuously being

collected and stored, many industries are becoming interested in mining such patterns from their

databases. The discovery of interesting correlation relationships among huge amounts of

business transaction records can help in many business decision-making processes, such as

catalog design, cross-marketing, and customer shopping behavior analysis. A typical example of

frequent itemset mining is market basket analysis. This process analyzes customer buying

habits by finding associations between the different items that customers place in their “shopping

baskets” (Figure 2.2). The discovery of such associations can help retailers develop marketing

strategies by gaining insight into which items are frequently purchased together by customers.

For instance, if customers are buying milk, how likely are they to also buy bread (and what kind

of bread) on the same trip to the supermarket? Such information can lead to increased sales by

helping retailers do selective marketing and plan their shelf space.

Figure 2.2 : Market basket analysis.

If we think of the universe as the set of items available at the store, then each item has a Boolean

variable representing the presence or absence of that item. Each basket can then be represented

milk bread

cereal

Customer 1

Which items are frequently

purchased together by the

customers?

milk

sugar

bread

eggs

Customer 2

milk bread

butter

Customer 3

sugar

eggs

Customer n

Shopping Baskets


by a Boolean vector of values assigned to these variables. The Boolean vectors can be analyzed

for buying patterns that reflect items that are frequently associated or purchased together. These

patterns can be represented in the form of association rules. For example, the information that

customers who purchase computers also tend to buy antivirus software at the same time is

represented in Association Rule below:

Rule support and confidence are two measures of rule interestingness. They respectively reflect

the usefulness and certainty of discovered rules. A support of 2% for the Association Rule means

that 2% of all the transactions under analysis show that computer and antivirus software are

purchased together. A confidence of 60% means that 60% of the customers who purchased a

computer also bought the software. Typically, association rules are considered interesting if they

satisfy both a minimum support threshold and a minimum confidence threshold. Such thresholds

can be set by users or domain experts. Additional analysis can be performed to uncover

interesting statistical correlations between associated items.

Let I ={I1, I2, : : : , Im } be a set of items. Let D, the task-relevant data, be a set of database

transactions where each transaction T is a set of items such that T is subset of I. Each transaction

is associated with an identifier, called TID. Let A be a set of items. A transaction T is said to

contain A if and only if A is subset of T. An association rule is an implication of the form ,

where A is subset of I , B is subset of I , and A∩B=φ. The rule holds in the transaction

set D with support s, where s is the percentage of transactions in D that contain AUB (i.e., the

union of sets A and B, or say, both A and B). This is taken to be the probability, P(AUB). The rule

has confidence c in the transaction set D, where c is the percentage of transactions in D

containing A that also contain B. This is taken to be the conditional probability, P(B/A). That is,

( ) ( )

( ) ( )

Rules that satisfy both a minimum support threshold (min_sup) and a minimum confidence

threshold (min_conf) are called strong. By convention, we write support and confidence values

so as to occur between 0% and 100%, rather than 0 to 1.0.


A set of items is referred to as an itemset. An itemset that contains k items is a k-itemset. The set

{computer, antivirus_software} is a 2-itemset. The occurrence frequency of an itemset is the

number of transactions that contain the itemset. This is also known, simply, as the frequency,

support count, or count of the itemset. Note that the itemset support defined in the equation is

sometimes referred to as relative support, whereas the occurrence frequency is called the

absolute support. If the relative support of an itemset I satisfies a prespecified minimum support

threshold (i.e., the absolute support of I satisfies the corresponding minimum support count

threshold), then I is a frequent itemset. The set of frequent k-itemsets is commonly denoted by

Lk. From equation of measuring confidence, we have-

( ) (

)

( )

( )

( )

( )

This equation shows that the confidence of rule can be easily derived from the support

counts of A and AUB. That is, once the support counts of A, B, and AUB are found, it is

straightforward to derive the corresponding association rules and and check

whether they are strong. Thus the problem of mining association rules can be reduced to that of

mining frequent itemsets.

In general, association rule mining can be viewed as a two-step process:

i. Find all frequent itemsets: By definition, each of these itemsets will occur at least as

frequently as a predetermined minimum support count, min_sup.

ii. Generate strong association rules from the frequent itemsets: By definition, these rules

must satisfy minimum support and minimum confidence.

Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets

are generated in the following way: the first rule is {I1, I2, … , Ik-1} {Ik}, by checking the

confidence this rule can be determined as interesting or not. Then other rule are generated by

deleting the last items in the antecedent and inserting it to the consequent, further the confidences

of the new rules are checked to determine the interestingness of them. Those processes iterated

until the antecedent becomes empty. Since the second sub problem is quite straight forward,

most of the researches focus on the first sub problem. The first sub-problem can be further

divided into two sub-problems: candidate large itemsets generation process and frequent itemsets


generation process. We call those itemsets whose support exceed the support threshold as large

or frequent itemsets, those itemsets that are expected or have the hope to be large or frequent are

called candidate itemsets.

In many cases, the algorithms generate an extremely large number of association rules, often in

thousands or even millions. Further, the association rules are sometimes very large. It is nearly

impossible for the end users to comprehend or validate such large number of complex

association rules, thereby limiting the usefulness of the data mining results. Several strategies

have been proposed to reduce the number of association rules, such as generating only

“interesting” rules, generating only “non-redundant” rules, or generating only those rules

satisfying certain other criteria such as coverage, leverage, lift or strength.

Hegland [7] reviews the most well-known algorithm for producing association rules - Apriori

and discuss variants for distributed data, inclusion of constraints and data taxonomies.

2.3 The Apriori Algorithm

Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1993 for mining

frequent itemsets for Boolean association rules [1]. The name of the algorithm is based on the

fact that the algorithm uses prior knowledge of frequent itemset properties. Apriori employs an

iterative approach known as a level-wise search, where k-itemsets are used to explore (k+1)-

itemsets. First, the set of frequent 1-itemsets is found by scanning the database to accumulate the

count for each item, and collecting those items that satisfy minimum support. The resulting set is

denoted L1. Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3,

and so on, until no more frequent k-itemsets can be found. The finding of each Lk requires one

full scan of the database. To improve the efficiency of the level-wise generation of frequent

itemsets, an important property called the Apriori property, presented below, is used to reduce

the search space.

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

The Apriori property is based on the following observation. By definition, if an itemset I does

not satisfy the minimum support threshold, min sup, then I is not frequent; that is, P(I) < min_

sup. If an item A is added to the itemset I, then the resulting itemset (i.e., I UA) cannot occur

more frequently than I. Therefore, I UA is not frequent either; that is, P(I UA) < min_ sup. This


property belongs to a special category of properties called antimonotone in the sense that if a set

cannot pass a test, all of its supersets will fail the same test as well. It is called antimonotone

because the property is monotonic in the context of failing a test. “How is the Apriori property

used in the algorithm?” To understand this, let us look at how Lk-1 is used to find Lk for k ≥2. A

two-step process is followed, consisting of join and prune actions.

1. The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1

with itself. This set of candidates is denoted Ck. Let l1 and l2 be itemsets in Lk-1. The

notation li[j] refers to the jth item in li (e.g., l1[k-2] refers to the second to the last item in

l1). By convention, Apriori assumes that items within a transaction or itemset are sorted

in lexicographic order. For the (k-1)-itemset, li, this means that the items are sorted such

that li[1] < li [2] < …< li [k-1]. The join, Lk-1 on Lk-1, is performed, where members of

Lk-1 are joinable if their first (k-2) items are in common. That is, members l1 and l2 of Lk-1

are joined if (l1[1] = l2[1]) ^ (l1[2] = l2[2]) ˄ … ˄ (l1[k-2] = l2[k-2]) ^(l1[k-1] < l2[k-1]).

The condition l1[k-1] < l2[k-1] simply ensures that no duplicates are generated. The

resulting itemset formed by joining l1 and l2 is l1[1], l1[2], … , l1[k-2], l1[k-1], l2[k-1].

2. The prune step: Ck is a superset of Lk, that is, its members may or may not be

frequent, but all of the frequent k-itemsets are included in Ck. A scan of the database to

determine the count of each candidate in Ck would result in the determination of Lk (i.e.,

all candidates having a count no less than the minimum support count are frequent by

definition, and therefore belong to Lk). Ck, however, can be huge, and so this could

involve heavy computation. To reduce the size of Ck, the Apriori property is used as

follows. Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset.

Hence, if any (k-1)-subset of a candidate k-itemset is not in Lk-1, then the candidate cannot

be frequent either and so can be removed from Ck. This subset testing can be done

quickly by maintaining a hash tree of all frequent itemsets.


Example: Apriori. Let‟s look at a concrete example, based on the AllElectronics transaction

database, D, of the following table 2.1. There are nine transactions in this database, that is, |D| =

9. We use Figure 2.3 to illustrate the Apriori algorithm for finding frequent itemsets in D.

Table 2.1: Transactional data for AllElectronics branch

TID List of item_IDs

T100 I1, I2, I5

T200 I2, I4

T300 I2, I3

T400 I1, I2, I4

T500 I1, I3

T600 I2, I3

T700 I1, I3

T800 I1, I2, I3, I5

T900 I1, I2, I3

1. In the first iteration of the algorithm, each item is a member of the set of candidate 1

itemsets, C1. The algorithm simply scans all of the transactions in order to count the

number of occurrences of each item.

2. Suppose that the minimum support count required is 2, that is, min_sup = 2. (Here, we

are referring to absolute support because we are using a support count. The

corresponding relative support is 2/9 = 22%). The set of frequent 1-itemsets, L1, can then

be determined. It consists of the candidate 1-itemsets satisfying minimum support. In our

example, all of the candidates in C1 satisfy minimum support.

3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 on L1 to

generate a candidate set of 2-itemsets, C2. C2 consists of |L1|C 2 2-itemsets. Note that no

candidates are removed fromC2 during the prune step because each subset of the

candidates is also frequent.


Figure 2.3 : Generation of candidate itemsets and frequent itemsets, where the minimum

support count is 2.

4. Next, the transactions in D are scanned and the support count of each candidate itemset

in C2 is accumulated, as shown in the middle table of the second row in Figure 2.3.

5. The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2

itemsets in C2 having minimum support.

6. The generation of the set of candidate 3-itemsets, C3, is detailed in Figure 2.4. From

the join step, we first getC3 =L2 onL2 = {I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4},

{I2, I3, I5}, {I2, I4, I5}. Based on the Apriori property that all subsets of a frequent

itemset must also be frequent, we can determine that the four latter candidates cannot

Itemset Sup. Count

{I1} 6

{I2} 7

{I3} 6

{I4} 2

{I5} 2

Itemset Sup. Count

{I1} 6

{I2} 7

{I3} 6

{I4} 2

{I5} 2

Itemset Sup. Count

{I1, I2} 4

{I1, I3} 4

{I1, I5} 2

{I2, I3} 4

{I2, I4} 2

{I2, I5} 2

Itemset Sup. Count

{I1, I2} 4

{I1, I3} 4

{I1, I4} 1

{I1, I5} 2

{I2, I3} 4

{I2, I4} 2

{I2, I5} 2

{I3, I4} 0

{I3, I5} 1

{I4, I5} 0

Itemset

{I1, I2}

{I1, I3}

{I1, I4}

{I1, I5}

{I2, I3}

{I2, I4}

{I2, I5}

{I3, I4}

{I3, I5}

{I4, I5}

Itemset Sup. Count

{I1, I2, I3} 2

{I1, I2, I5} 2

Itemset Sup. Count

{I1, I2, I3} 2

{I1, I2, I5} 2

Itemset

{I1, I2, I3}

{I1, I2, I5}

Compare candidate

support count with

minimum support count

Scan D for

count of each

candidate

Generate C2

Candidates

from L1

Scan D for

count of each

candidate

Compare

candidate

support count

with

minimum

support count

Scan D for

count of each

candidate

Compare

candidate

support count

with minimum

support count

Generate C3

Candidates

from L2

C1 L1

C2 C2 L2

C3 C3 L3


possibly be frequent. We therefore remove them fromC3, thereby saving the effort of

unnecessarily obtaining their counts during the subsequent scan of D to determine L3.

Note that when given a candidate k-itemset, we only need to check if its (k-1)-subsets are

frequent since the Apriori algorithm uses a level-wise search strategy. The resulting

pruned version of C3 is shown in the first table of the bottom row of Figure 2.3.

(a) Join: C3 = L2 on L2 = {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}} on

{{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}}

= {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.

(b) Prune using the Apriori property: All nonempty subsets of a frequent itemset must

also be frequent. Do any of the candidates have a subset that is not frequent?

The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}. All 2-item

subsets of {I1, I2, I3} are members of L2. Therefore, keep {I1, I2, I3} in C3.

The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}. All 2-item

subsets of {I1, I2, I5} are members of L2. Therefore, keep {I1, I2, I5} in C3.

The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}. {I3, I5} is not a

member of L2, and so it is not frequent. Therefore, remove {I1, I3, I5} from C3.







(c) Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after pruning.

Figure 2.4 : Generation and pruning of candidate 3-itemsets, C3, from L2 using the

Apriori property.

7. The transactions in D are scanned in order to determine L3, consisting of those

candidate 3-itemsets in C3 having minimum support (Figure 2.3).

8. The algorithm uses L3 on L3 to generate a candidate set of 4-itemsets, C4. Although the

join results in {I1, I2, I3, I5}, this itemset is pruned because its subset {I2, I3, I5} is not

frequent. Thus, C4 = φ, and the algorithm terminates, having found all of the frequent

itemsets.


Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based on candidate

generation.

Input:

D, a database of transactions;

min sup, the minimum support count threshold.

Output: L, frequent itemsets in D.

Method:

(1) L1 = find_frequent_1-itemsets(D) ;

(2) for (k = 2; Lk-1 ≠ φ; k++) {

(3) Ck = apriori_gen(Lk-1) ;

(4) for each transaction t ԑ D { // scan D for counts

(5) Ct = subset(Ck, t) ; // get the subsets of t that are candidates

(6) for each candidate c ԑ Ct

(7) c.count++ ;

(8) }

(9) Lk = {c ԑ Ck|c:count ≥ min_sup}

(10) }

(11) return L = Uk Lk ;

procedure apriori_gen(Lk-1: frequent (k-1)-itemsets)

(1) for each itemset l1 ԑ Lk-1

(2) for each itemset l2 ԑ Lk-1

(3) if (l1[1] = l2[1])˄(l1[2] = l2[2])˄...˄(l1[k-2] = l2[k-2])˄(l1[k-1] < l2[k-1]) then {

(4) c = l1 on l2; // join step: generate candidates

(5) if has_infrequent_subset(c, Lk-1) then

(6) delete c; // prune step: remove unfruitful candidate

(7) else add c to Ck ;

(8) }

(9) return Ck ;

procedure has_infrequent_subset(c: candidate k-itemset;

Lk-1: frequent (k-1)-itemsets); // use prior knowledge

(1) for each (k-1)-subset s of c

(2) if s Ɇ Lk-1 then

(3) return TRUE;

(4) return FALSE;

Figure 2.5: The Apriori algorithm for discovering frequent itemsets for mining Boolean

association rules.

Figure 2.5 shows pseudo-code for the Apriori algorithm and its related procedures. Step 1 of

Apriori finds the frequent 1-itemsets, L1. In steps 2 to 10, Lk-1 is used to generate candidates Ck in

order to find Lk for k ≥ 2. The apriori_gen procedure generates the candidates and then uses the

Apriori property to eliminate those having a subset that is not frequent (step 3). This procedure is


described below. Once all of the candidates have been generated, the database is scanned (step

4). For each transaction, a subset function is used to find all subsets of the transaction that are

candidates (step 5), and the count for each of these candidates is accumulated (steps 6 and 7).

Finally, all of those candidates satisfying minimum support (step 9) form the set of frequent

itemsets, L (step 11). A procedure can then be called generate association rules from the frequent

itemsets. The Apriori_gen procedure performs two kinds of actions, namely, join and prune, as

described above. In the join component, Lk-1 is joined with Lk-1 to generate potential candidates

(steps 1 to 4). The prune component (steps 5 to 7) employs the Apriori property to remove

candidates that have a subset that is not frequent. The test for infrequent subsets is shown in

procedure has infrequent subset.

2.4 Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from transactions in a database D have been found, it is

straightforward to generate strong association rules from them (where strong association rules

satisfy both minimum support and minimum confidence). This can be done using the below

Equation for confidence, which we show again here for completeness:

( ) (

)

( )

( )

( )

( )

The conditional probability is expressed in terms of itemset support count, where support_

count(AUB) is the number of transactions containing the itemsets AUB, and support_count(A) is

the number of transactions containing the itemset A. Based on this equation, association rules can

be generated as follows:

For each frequent itemset l, generate all nonempty subsets of l.

For every nonempty subset s of l, output the rule ( ) if

( )

( ) min_conf, where min_conf is the minimum confidence threshold.

Because the rules are generated from frequent itemsets, each one automatically satisfies

minimum support. Frequent itemsets can be stored ahead of time in hash tables along with

their counts so that they can be accessed quickly.


2.5 Related Works

Techniques of Educational Data Mining (EDM) have been used to resolve educational research

issues since 1993 [8]. Mining educational data through classification is an effective way to

analyze students‟ performance from the extracted knowledge [9]. Automatic Clustering and

decision rule data mining techniques are also applied for knowledge discovery based on

academic data analysis [10]. It has been shown already how using data mining algorithms can

help discovering all possible relevant knowledge contained in databases obtained from Web-

based educational systems [11]. Besides these, there is also an idea presented in the research [12]

about the factors depended on students‟ performance and how Naïve Bayes classifier can be

applied to calculate probabilities so that the final examination results can be predicted based on

the findings.

Student retention i.e., the perception of students‟ extended continuation in the institution and

student abandonment i.e., the perception of students‟ dropping out from the institution are two

important indicators of academic performance and teaching methodology for an educational

institute. There is an impressive study [13] which illustrates how data mining techniques can

help to detect retentive students, evaluate course suitability and finally implement intervention

programs to decrease the drop out of students. As, abandonment problem can be solved by

increasing student retention. Regarding this, a research [14] has been done in order to increase

college student retention by performing early detection of academic risk using data mining

methods. The preliminary results have been shown on initial model development using

classification in this regard.

Recently important works have incorporated data mining in academic research. For instance,

Association Rule Mining discoveries technique has been used to compare students‟ performance

in the courses common at graduation and post-graduation level which is useful to predict factors

related to success or failure [15]. There is an important study [16] showing the comparison of

several frequent and rare Association Rule Mining algorithms with a view to measuring both

their performance and their usefulness in educational environments. A different approach of

Association Rule Mining is used to find out support, confidence and interestingness level for

appropriate language and attendance in the classroom [17]. An interesting research has been

done recently in which Rule Schema formalism for obtaining Association Rules from knowledge


base has been proposed by integrating user knowledge in the post processing task [18]. Although

there is a study showing the drawbacks and solutions of applying Association Rule Mining in

learning management systems [19], recent work e.g., mining the impact of unsupervised course

work like assignments on overall performance of students [20] and developing useful software

for knowledge discovery from students‟ result repository by Association Rule Mining approach

[21] encourage us to proceed farther in discovering knowledge from academic data using

Association Rule Mining.

Before applying mining association rules, academic data is needed to be preprocessed properly.

For this, a technique of preprocessing academic data before Mining Association Rule [22] with

synthetic dataset has been proposed recently for checking the suitability of the system with real

institutional dataset.


Chapter 3

Academic Data Mining

3.1 Data Analysis

3.1.1 Personal and Academic Data

In this research, we have considered academic data structure of BUET. The student data of the

BIIS (BUET Institutional Information System) contains several personal and academic

information of a particular student. We have collected them anonymously for the data

preprocessing and data analysis. We have considered these personal and academic data stated in

the Table 3.1 for knowledge discovery regarding academic performance, abandonment and

retention of students illustrated in Figure 3.1.

Table 3.2: Selected Data from BIIS database

Academic Information

Department

Admission Year / Batch

Overall CGPA

Marks of Class test, Attendance, Two Answer

Scripts, Total Marks and Grades of all Theory

Courses

Total Marks and Grades of all Sessional Courses

Total Completed Credit Hour

Personal Information Gender

Hall Resident/Non-resident

Figure 3.1: Factors related to Academic Performance, Abandonment and Retention of students

Academic Performance

Student Retention

Student Abandonment

Residence Gender

Records of all Continuous

Assessments

Records of

Departmental Courses Records of Non

Departmental Courses


3.1.2 Course and Curriculum

As we have experimented with the students‟ data of the department of Computer Science and

Engineering (CSE) in BUET, we have analyzed all the courses in the curriculum which has to be

taken to complete the BSc degree. A student has to take total 68 departmental and non-

departmental courses in total. All the courses along with their credit hour are shown in Table 3.2.

Table 3.2: All Undergraduate Courses for department of CSE

Among them there are 40 theory courses (25 departmental and 15 non-departmental) and 28

sessional courses (20 departmental and 7 non-departmental) including thesis. We determine

academic performance and impact of other factors on basis of these courses‟ final grade and

marks of attendance, class tests, term final answer scripts, total marks etc.

Course Type Credit

Hour Course Number

Departmental

Theory Courses

4.0 CSE307, CSE321

3.0

CSE103, CSE105, CSE201, CSE203, CSE205, CSE207,



CSE409, CSE461, CSE463

2.0 CSE100, CSE 211

Departmental

Sessional Courses

1.5


CSE308, CSE314, CSE316, CSE404

0.75

CSE204, CSE208, CSE300, CSE310, CSE322,

CSE324, CSE402, CSE410, CSE462, CSE464

Non-Departmental

Theory Courses

4.0 PHY109, MATH143, EEE263, MATH 243,

3.0

EEE163, MATH141, ME165, CHEM101, HUM175,

MATH241, EEE269, IPE493

2.0 HUM211, HUM275, HUM371

Non-Departmental

Sessional Courses

1.5

PHY102, EEE164, ME160, HUM272, CHEM114,

EEE264, EEE270

Thesis 6.0 CSE400


3.2 Preprocessing for Mining Academic Database

3.2.1 Relational Database

Students take courses through BIIS account via registration. In the relational database illustrated

in Figure 2, all the personal information as well as the results of taken courses of a student are

stored. Through which we can obtain the relational table containing a student‟s gender, hall

status, performance of all courses, CGPA etc.

Figure 3.2: Relational database

3.2.2 Universal Database

A universal database is created for the purpose in which records of all taken courses along with

personal information like gender, hall status of corresponding student id are stored in a single

row of the table. For a specific course, the grade, attendance, marks of class tests, marks of each

section (section A and section B) of term final answer scripts and total marks. Like this the

similar records of all other taken courses are stored in the database with the corresponding

student id. And by this process the records of other students are stored in the database one after

another after the corresponding Gender and Hall Status of a particular student. Another attribute

is stored as Student Type by which we have determined the student type- regular, retentive or

abandoned. As, for applying Apriori algorithm of Association Rule Mining, we have to set the

value of attribute in discrete form. So, record such as student id has been omitted in the universal

table.

Student

Grade

Sheet

Course

achieves represents


Table 3.3: Partial portion of universal database

3.2.3 Data Transformation

The universal database of Table 3.3 has been transformed into an equivalent transformation table

by transforming the continuous valued attribute as discrete valued attribute representing some

knowledge for the suitability of implementing Apriori algorithm of Association Rule Mining. As

for example, CGPA is a continuous attribute and it has been transformed into five classifications

as excellent, very good, good, average and poor. We have used one algorithm for transforming

all continuous numbers for attendance, class tests, and both sections of answer scripts of term

final and total marks of a course. We have used another algorithm for transforming all grade or

grade points of courses or overall CGPA into those five classifications.

For transforming the numbers of universal table i.e., attendance, class tests, section A, section B,

total marks of each course, Algorithm1 has been developed to populate the transformed table in

such a way that there is no continuous value in an entry.

Gender

Hall_

Status

Student_

Type

CSE

103_Grade

CSE103_

Attendan

ce

CSE103

_CT

CSE103_

Section A

CSE 103_

SectionB

CSE103

_Total

… Male Resident Regular A+ 30 55 90 75 250

Female Non-

Resident

Regular A

25

45

85

70

225 … … … … … … … … …

Algoithm1: Marks_Transformation ( )

Input: marks of Attendance, CT, Section A, Section B, Total Marks of each course from Universal

Table of Studentlist

Output: discrete level of marks for the Transformation Table

for i=1 to | Studentlist |

if (marks>=80%)

level = “Excellent”

else if (marks<80% && marks>=75%)

level = “Very Good”

else if (marks<75% && marks>=60%)

level = “Good”

else if(marks<60% && marks>=50%)

level = “Average”

else if(marks<50%)

level = “Poor”

end for


Similarly the grades of universal table are also transformed by an algorithm named as

Algorithm2. As the real data set contains CGPA in grade points we similarly consider another

variable grade point and transformed the continuous value of CGPA to these five classified

definitions.

As there are theory courses of credit 4.0, 3.0 and 2.0 and sessional with credit hour 1.5 and 0.75,

we need different transformation rule tables for all these different courses. Below,

Transformation rules for 3.0 credit hour (in Table 3.4), for 4.0 credit hour (in Table 3.5), for 2.0

credit hour (in Table 3.6) theory courses and for all sessional courses (in Table 3.7) are

illustrated.

Table 3.4: Transformation rule table for 3.0 credit theory course

Algoithm2: Grade_Transformation ( )

Input: all acquired Grade of each courses in the Courselist of the universal table

Output: transformed_ grade for the Transformation Table

for i=1 to | Courselist |

if grade = A+

transformed_grade = „Excellent‟

else if grade = A

transformed_grade = „Very Good‟

else if grade = A- or B+

transformed_grade = „Good‟

else if grade = B

transformed_grade = „Average‟

else if grade = B- or C+ or C or D

transformed_grade = „Poor‟

end for

Classified

Name

Range of Marks (M)

Attendance Class Test SecA/SecB Total

Excellent 27≤ M ≤30 48≤M≤60 84≤M≤105 240≤M≤300

Very Good 24≤ M ≤26 45≤M≤47 78≤M≤83 225≤M≤239

Good 21≤ M ≤23 36≤M≤44 63≤M≤77 180≤M≤224

Average 18≤ M ≤20 30≤M≤35 52≤M≤62 150≤M≤179

Poor 0≤ M ≤17 0≤M≤29 0≤M≤51 0≤M≤149




Table 3.7: Transformation rule table for all sessional courses

Classified

Name

Range of Marks (M)


Excellent 36≤ M ≤40 64≤M≤80 112≤M≤140 320≤M≤400

VeryGood 32≤ M ≤35 60≤M≤63 105≤M≤111 300≤M≤319

Good 28≤ M ≤31 48≤M≤49 84≤M≤104 240≤M≤299

Average 24≤ M ≤27 40≤M≤47 70≤M≤83 200≤M≤239

Poor 0≤ M ≤23 0≤M≤39 0≤M≤69 0≤M≤199

Classified

Name

Range of Marks (M)


Excellent 18≤ M ≤20 32≤M≤40 56≤M≤70 160≤M≤200

Very Good 16≤ M ≤17 30≤M≤31 52≤M≤55 150≤M≤159

Good 14≤ M ≤15 24≤M≤29 42≤M≤51 120≤M≤149

Average 12≤ M ≤13 20≤M≤23 35≤M≤41 100≤M≤119

Poor 0≤ M ≤11 0≤M≤19 0≤M≤34 0≤M≤99

Classified

Name

Range of Marks (M)

Sessional Credit Hour=1.5 Sessional Credit Hour=0.75

Excellent 120≤ M ≤150 60≤ M ≤75

Very Good 112≤ M ≤119 56≤ M ≤59

Good 90≤ M ≤111 45≤ M ≤55

Average 75≤ M ≤89 37≤ M ≤44

Poor 0≤ M ≤74 0≤ M ≤36


To construct the entire transformed table as given in Table 3.8, we have used the universal table

and above transformation rules.

Table 3.8: Transformed table from universal table

3.2 Summary of Methodologies

Methodologies of knowledge discovery from academic data using association rule mining can be

summarized as below:

1. Before applying Association Rule Data Mining technique on institutional data of

BUET, academic data is needed to be analyzed and preprocessed in the following steps:

i. At first we have selected relevant data from BIIS database and categorized into

personal and academic information of a particular student of CSE department who

have already graduated.

ii. We have developed a technique to transform the existing relational database

into a universal database format using both academic and personal data of

students.

iii. We have manipulated universal database and developed transformation rule to

transform the continuous data into discrete value.

iv. We have developed algorithms to transform the universal database into a

discrete valued transformed database using the transformation rules.

2. We have applied the Apriori algorithm on the transformed database to find association

rules.

Gender Hall_Status Student_Type CSE103_

Grade

CSE103_

Attendance

CSE103_CT CSE103_

SectionA

CSE103_

SectionB

CSE103_

Total

…… Male Resident Regular Excellent Excellent Excellent Excellent Good Excellent

Female Non-resident Regular Very Good Very Good Very Good Excellent Good Very Good

…. …. …. …. …. …. …. …. ….


Chapter 4

System Implementation, Results and

Discussions 4.1 Software Implementation

In order to implement the proposed academic data mining system, we have used mainly two

softwares which are Microsoft Excel and Weka.

Microsoft Excel:

The existing relational BIIS data has been given in excel file format (.xls). That is why

for the preprocessing steps we have used Microsoft Excel 2010 which is easier and

more convenient. For transforming the existing relational database into a universal

format using both academic and personal data of students of CSE department who have

already graduated, we have used all the necessary tools and scripting Macros provided

in Excel. After that we have used another add-in of Microsoft Excel named Kutools for

manipulating universal database and transformed continuous data into discrete value. By

using this tool we have easily implemented the developed algorithms and transformation

rule tables for converting all continuous data e.g., records of theory and sessional

courses, overall CGPA etc.

Weka:

After preprocessing step, we have obtained a transformed table which is suitable for

applying Association Rule Mining algorithm. In this regard, we have used Weka [23], a

popular suite of machine learning software written in Java, developed at the University

of Waikato, New Zealand. We have applied Apriori algorithm to the transformation

table with predefined minimum support and confidence using Weka to generate

interesting Association Rules. We have used Weka Explorer which provides

convenient and easy to use interface for generating specific number of rules with certain

metric of support and confidence to the full or partial transformation database. This is

very useful for choosing support and confidence and selecting important association

rules from huge number of generated rules.


4.2 Dataset and Application Environment

In this experiment, we have considered the data up to the last five graduated batch in the

department of CSE, BUET. The institutional dataset of BUET consist academic and personal

data of 9210 students in last 10 years. We have categorized relevant academic and personal

information of those students which are gender, hall status, admission year, completed credit

hour, all records of theory and sessional courses, overall CGPA etc. from the relational BIIS

database and transformed into universal table structure. Finally we transformed it into a

transformed table structure for applying association rule mining. The entire experimental setup is

illustrated in Figure 4.1.

Figure 4.1: Experimental Setup for applying Apriori Algorithm using Weka Explorer to

generate Association Rules

BUET Institutional Dataset of 9210 Students

of All Departments in Last 10 years

Gender Hall Status Admission Year Completed CreditHour

All Records of Theory & Sessional Courses Overall CGPA

Universal Table Structure

Regular 552

Student Type

Retentive 26

Abandoned 4

Male 473

Gender

Female 109

Resident 348

Hall Status

Non Resident 234

Theory Course 40

Attendance Classtest Section A Section B Total Grade

Sessional

Course 28

Total Marks Grade

Transformation Table Structure

Regular 552

Student Type

Retentive 26

Abandoned 4

Male 473

Gender

Female 109

Resident 348

Hall Status

Non Resident 234

Poor Average Good Very Good Excellent

All Marks & Grade of 68 Theory & Sessional Courses

Including Overall CGPA of 582 Students


After preprocessing step, we have obtained a transformed table of 582 students of department of

CSE who have already graduated. Universal table also contain one additional attribute which is

student type – retentive, regular or abandoned. Student type is obtained by analyzing completed

credit hour and admission year. We have manipulated the transformation table containing all

continuous data transformed into five discrete value- Excellent, Very Good, Good, Average and

Poor. Finally we have used Weka Explorer to the transformation table (in .csv file format) to

generate interesting Association Rules.

4.3 Results and Discussions

4.3.1 Impact of Gender

We have found the impact of gender in the overall academic performance. This indication is very

important in terms of socio economic condition of the country. In BUET majority of the students

are male and lives in the university dormitories. There are multiple factors that affect the

academic environment and students‟ academic performance. The result of Table 4.1 points out

that the male students have a very high confidence level with the poor CGPA. The reason is that

male students are generally affected by various societal problems of a third world country like

Bangladesh. All other rules support that the academic performance of female students is better

than the male students.

Table 4.1: Impact of Gender

No. Generated Interesting Rules Minimum Support Confidence

01 CGPA=Poor Gender=male 10% 87%

02 CGPA=Average Gender=male 10% 79%

03 CGPA=Very Good Gender=male 10% 83%

04 Gender=male CGPA=Good 10% 26%

05 Gender=male CGPA=Average 10% 21%

06 CGPA=Good Gender=female 5% 22%

07 CGPA=Average Gender=female 5% 21%

08 CGPA=Excellent Gender=female 5% 20%


4.3.2 Impact of Residence

In BUET, most of the students live in institution hall. But the number of students live in home is

also significant fact. Analyzing the rules we have found that both the students of hall and the

students residing at home get good CGPA with a descent minimum support and confidence (in

table 4.2). So if any student wants to do well in academic prospect he can do from anywhere.

Table 4.2: Impact of Hall Status

No Generated Interesting Rules Minimum Support Confidence

01 CGPA=Average

Hall_Status=Resident

10% 65%

02 CGPA=Very Good


10% 63%

03 CGPA=Good Hall_Status=Non-

Resident

10% 43%

04 CGPA=Good Hall_Status=Resident

Gender=male

10% 82%

But it is found that the percentage of getting poor CGPA is high in hall. Because in hall, there is

very little restriction and sometimes there is no one to take care of a student as family members

do. So a student can be demoralize and get a very poor grade due to lack of studies. And as

shown in rule number 1 in table 4.3, the percentage of male resident students is higher in this

regard. In most of the cases, it is inevitable that the poor CGPA holders are resident of hall (rule

number 1 and 5 of table 4.3).


Table 4.3: Impact of Hall Status and Gender

No Generated Interesting Rules Minimum

Support

Confidence

01 CGPA=Poor Gender=male


5% 51%

02 CGPA=Very Good Gender=male

Hall_Status=Non-Resident

5% 40%

03 Hall_Status=Non-Resident Gender=

female CGPA=Average

5% 24%

04 Hall_Status=Resident Gender=female

CGPA=Good

5% 21%

05 CGPA=Poor Hall_Status=Resident 5% 52%

4.3.3 Correlation between Courses

The analyzed Association Rules show that the grade of one course may depend on prerequisite

courses. In rule number 1 we find that if anyone gets excellent grade in CSE105, he/she gets

excellent grade in the course CSE205 too with a confidence of 0.48 where CSE105 is Structured

Programming Language course and CSE201 is Object Oriented Programming Language course.

We also discover that the interrelation of course CSE311 (Data Communication-I) and CSE321

(Networking) in rule number 6, 7 and 8. We also find the impact of course CSE205 (Digital

Logic Design) and CSE209 (Digital Electronics and Pulse Technique) on course CSE403

(Digital System Design) in rule number 10 in Table 4.4.

Table 4.4: Correlation between Courses


Support

Confidence

01 CSE105_Grade=Excellent

CSE201_Grade=Excellent

10% 48%

02 CSE201_Grade=Very Good

CSE105_Grade=Very Good

5% 30%


4.3.4 Impact on Retention

If any student fails to pass any course then he becomes retentive because he needs to take that

course again later to complete his graduation. We find that retentive students usually struggle

with the grades in rule number 2, 3, 4, 5 and 6. If a student has not passed in CSE100 which is

the first fundamental course of CSE, he or she is retentive i.e., he or she has not passed in the

later departmental courses also. This is illustrated by the generated rule no. 1 in the Table 4.5.

Moreover, we have discovered that maximum retentive student are hall resident and male which

are illustrated in rule number 7 and 8 respectively with a high confidence in the Table 4.5.

Table 4.5: Impact on Retention

03 EEE163_Grade=Excellent

EEE263_Grade=Very Good

5% 27%



10% 50%

05 CSE403_Grade=Poor

CSE205_Grade=Average

5% 28%

06 CSE321_Grade=Average


5% 36%

07 CSE321_Grade=F

CSE311_Grade=Poor

3% 13%

08 CSE321_Grade=Poor

CSE311_Grade=Poor

3% 16%




5% 53%


Support

Confidence

01 CSE100_Grade=F

Student Type=Retentive

5% 42%

02 Student Type=Retentive

MATH243_Grade=Poor

5% 35%


4.3.5 Impact on Abandonment

The students who have given up their academic studies without completing all the required

courses are typed as „abandoned‟. By analyzing the rules illustrated in Table 4.6, it is discovered

that with a high confidence, the abandoned students are male and resident of hall. But the

minimum value of support is very low. Thus it is found that the rate of abandonment is very low

in the CSE department of this university.

Table 4.6: Impact on Abandonment

No Generated Interesting Rules Minimum Support Confidence

01 Student Type=Abandoned Gender=male

0.5% 100%

02 Student Type=Abandoned


0.5% 75%

03 Student Type=Abandoned Gender=male Hall_Status=Resident

0.5% 75%



5% 35%



5% 27%


EEE263_Grade=Poor

5% 33%



5% 43%



5% 65%

08 Student Type=Retentive Gender=male 5% 81%


4.3.6 Impact of Continuous Assessment

The grading of a course depends on various aspects such as marks of attendance, class test, both

sections of term final examination. From rule number 7 which has a maximum confidence value

1.00, we have discovered that the excellent grade of a course depends on the excellent

performance of all other aspects of continuous assessment. Again, the performance of class test

depends on attendance which is illustrated by rule number 5 in Table 4.7 with a confidence of

0.95 which is very high.

Table 4.7: Impact of Continuous Assessment


Support

Confidence

01 CSE103_Attendance=Excellent

CSE103_SectionB=Poor


10% 63%


CSE103_CT=Good CSE103_Attendance=

Excellent

10% 97%

03 EEE163_Grade=Average

EEE163_SectionB=Poor

10% 57%

04 EEE163_Grade=Very Good

EEE163_Attendance= Excellent

EEE163_CT=Excellent

10% 67%

05 HUM275_CT=Excellent

HUM275_Attendance= Excellent

10% 95%

06 HUM275_CT=Excellent

HUM275_SectionA=Good

HUM275_Grade=Very Good

HUM275_Attendance= Excellent

10% 75%



CSE401_CT=Excellent CSE401_SectionA=

Excellent CSE401_Attendance= Excellent

10% 100%

08 CSE401_SectionB=Excellent

CSE401_Grade=Good

10% 75%

4.3.7 Impact of Non Departmental Courses

After analyzing the generated Association Rules (in Table 4.8) we observed various impacts of

non-departmental courses on academic performances. According to curriculum we need to take

some non-departmental courses‟ performance which is added to the final result. So it may

happen that some students get poor grades in those non departmental courses. But according to

generated rules though the good performance of the non-departmental courses brings good grade

but the impact of getting poor grade in non-departmental courses causes less harm to the final

CGPA because those courses are less in quantity and maximum of those are studied at the

beginning of undergraduate level. So students get enough opportunities to improve their CGPA

later.

Table 4.8: Impact of Non Departmental Courses


Support

Confidence

01 CGPA=Very Good

HUM272_Grade=Very Good

10% 73%

02 CGPA=Very Good

MATH143_Grade=Average

5% 37%

03 CGPA=Good EEE163_Grade=Average 5% 36%

04 CGPA=Very Good

CHEM101_Grade=Average

10% 52%


05 CGPA=Average IPE493_Grade=Very

Good

5% 29%

06 CGPA=Good ME165_Grade=Average 10% 43%

07 CGPA=Average

MATH243_Grade=Poor

5% 27%

4.3.8 Impact of Departmental Courses

As there are too many departmental courses are studied and there some inter connection between

some courses because of prerequisite courses, the result of departmental courses affect the final

CGPA very much. From the analyzed rules, it is found that the good grade of departmental

courses brings good CGPA. On the other hand poor grade in departmental courses results in poor

overall CGPA. This significant knowledge is discovered from the rules illustrated by the impact

of departmental courses in Table 4.9.

Table 4.9: Impact of Departmental Courses


Support

Confidence

01 CGPA=Very Good


5% 42%

02 CGPA=Very Good


5% 31%

03 CGPA=Very Good


10% 44%

04 CGPA=Good


5% 31%

05 CGPA=Poor ==> CSE321_Grade=Poor 5% 29%


06 CGPA=Excellent


5% 50%

07 CGPA=Average


5% 29%

08 CGPA=Average


5% 42%


Chapter 5

Conclusions

5.1 Summary of the Findings

Knowledge discovery from academic data is very important to improve the academic

performance of any higher educational institution. In this research, we study the academic

system, the existing problems and the performance data of the most renowned Engineering

University of Bangladesh. We have found problems like abandonment, retention and potentiality

decay of the most brilliant students. We have applied Association Rule Data Mining technique to

explore the root of the cause of the above problems.

Before applying the data mining algorithm, the existing academic data has been preprocessed to

make it suitable for data mining. We have developed a data transformation technique that

transforms the relational database into an equivalent universal relational format. In this format,

we have also transformed the continuous data into discrete valued qualitative data. We have

found interesting Association Rules applying Apriori Association Rule generator on the

transformed data using WEKA tool. From the large number of association rules, we have

extracted the interesting rules regarding the impacts of gender, residence, continuous assessment

on the academic performance. We have also found the association among the courses, retention

and abandonment. The obtained result is found to be very much significant for the decision

maker to improve the overall academic condition of the institution.

According to the results found, 10% of 582 students of CSE department who have already

graduated are male and have CGPA below 3.00 and the probability of being male students

among poor CGPA holders is 0.87. Again, we have discovered that, 5% of total students have

poor CGPA and they are hall resident and the probability of hall resident among poor CGPA

holders is 0.52. We have also discovered the significant correlation between courses. For

example, more than 58 students have excellent grades in both CSE105 (Structured Programming

Language) and CSE201 (Object Oriented Programming Language). The probability of having

excellent grade in CSE201 among students having excellent grade in CSE105 is 0.48. We have


found that there are about 30 students who has to retake MATH243 courses. We found that 5%

of total male students are both retentive and hall resident and 65% of total retentive students are

hall resident. Abandonment rate is very low in CSE department of BUET as we found that only 3

male students dropped out before completing graduation and 75% of abandoned students were

hall resident. We have also determined the impact of several Non departmental courses. For

example, more than 60 students possess very good grade in HUM272 as well as have CGPA

over 3.50. We have also determined the impact of several departmental courses. For example,

5% of 582 students have CGPA over 3.75 and have got A+ in CSE 401. 50% of students having

CGPA over 3.75 have obtained A+ in CSE 401.

We hope all these quantitative findings will be helpful to the decision maker for improving the

quality of education provided in this department. We have applied the technique to only the CSE

department of BUET but it is applicable to any department of any higher educational institute.

5.2 Future Works

In this research, we have considered the data of only the department of CSE, but there are ten

other different departments also in this university. In the future work, we can apply the same

technique to extract knowledge from the data of all other departments of BUET. Again, we have

considered only the undergraduate records of students. We can modify the technique so that it

can be applicable to the postgraduate course and curriculum for the betterment of postgraduate

studies. We can also develop a recommendation system by designing a classifier using present

dataset as training data and classify the students based on their performance.


REFERENCES

[1] Han, J. , Kamber, M., and Pei, J. Data Mining: Concepts and Techniques. Morgan

Kaufmann Publishers, San Francisco, 2011.

[2] Bangladesh University of Engineering and Technology: General Information. Available

from http://www.buet.ac.bd/?page_id=5; accessed 24 February, 2014.

[3] The Department of Computer Science and Engineering, Bangladesh University of

Engineering and Technology: General Information. Available from

http://www.buet.ac.bd/cse/geninfo/index.php ; accessed 24 February, 2014.

[4] http://researcher.watson.ibm.com/researcher/view_group.php?id=144

[5] http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.ht

m

[6] Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets

of items in large databases. In Proceedings of the 1993 ACM SIGMOD International

Conference on Management of Data, 207-216.

[7] Hegland, M., Algorithms for Association Rules, Lecture Notes in Computer Science,

Volume 2600, Jan 2003, Pages 226 - 234

[8] Romero, C., Educational Data Mining: A Review of the State-of-the-Art. In Proceedings of

IEEE Transaction on System, Man and Cybernetics, Part C:Applications and Reviews, 40(6).

601-618.

[9] Baradwaj, B.K., and Pal, S., Mining Educational Data to Analyze Students‟ Performance.

International Journal of Advanced Computer Science and Applications (IJACSA), (2011),

2(6), 63-69.

[10] Salazar, A., Gosalbez, J., Bosch, I., Miralles, R., and Vegara, L., A case study of

knowledge discovery on academic achievement, student desertion and student retention,

Information Technology: Research and Education, ITRE 2004. 2nd International Conference,

(June 2004).


[11] Merceron, A., Yacef, K., Educational Data Mining a Case Study. In Proceedings of the

12th international Conference on Artificial Intelligence in Education AIED, (2005).

[12] Kumar, V., and Sharma, V., Students-Examination-Result-Mining-A-Predictive-

Approach. International Journal of Advanced Computer Science and Applications (IJSER),

3(11). (Nov. 2012).

[13] Zhang, Y., Oussena, S., Clark, T., Kim, H., Using data mining to improve student

retention in HE: a case study. In: 12th International Conerence on Enterprise Information

Systems, ICEIS, (Portugal, 8-12 June, 2010).

[14] Lauria J.M., E., Baron, J. D., Devireddy, M., Sundararaju, V., Jayprakash, S.M., Mining

academic data to improve college student retention: An open source perspective. In

Proceedings of the 2nd International Conference on Learning Analytics and Knowledg.

LAK‟12. ACM, NewYork, NY, 139-142.

[15] Kumar V., Chandha, A., Mining Association Rules in Student‟s Assessment Data.

International Journal of Computer Science Issues (IJCSI), 9(5). (Sep. 2012).

[16] Romero, C., Romero, J.R., Luna, M.J., and Ventura, S., Mining Rare Association Rules

from e-Learning Data. Educational Data Mining (EDM), 2010.

[17] Pandey, U.K., Pal, S., A Data Mining view on Class Room Teaching Language.

International Journal of Computer Science Issues (IJCSI), 8(2). (Mar. 2011).

[18] Ajith P., Tejaswi, B., U.K., Sai, M.S.S., Rule Mining Framework for Students

Performance Evaluation. International Journal of Soft Computing and Engineering (IJSCE),

2(6). (Jan. 2013).

[19] Garcia, E., Romero, C., Ventura, S., Calders, T., Drawbacks and solutions of applying

association rule mining in learning management systems. In Proceedings of the International

Workshop on Applying Data in e-Learning, (2007).

[20] Chaturvedi , R., Ezeife , C.I., Mining the Impact of Course Assignments on Student

Performance , EDM 2013, 308-309.

[21] Oladipupo, O.O., Oyelade, O.J., Knowledge Discovery from Students‟ Result

Repository: Association Rule Mining Approach, International Journal of Computer Science

& Security (IJCSS), 4(2) , 2011.


[22] Hoque, A.S.Md. L., Paul, R., Ahmed, S., Preprocessing of Academic Data for Mining

Association Rule, In Proceedings of the Workshop on Advances in Data Management:

Applications and Algorithms, WADM, 2013.

[23] Hall, M., Frank, E., Holmes, G., Pfahringer, B. , Reutemann, P., Ian H. Witten (2009);

The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.


APPENDIX

Initial Dataset of BIIS

Figure A1: Partial Portion of Initial Dataset (.xls format in Excel) of BIIS


Partial Portion of Universal Table

Figure A2: Partial Portion of Universal data (.xls format in Excel) before converting to

transformation table


Partial Portion of Transformation Table

Figure A3: Partial Portion of Transformation table (.xls format in Excel)


Using Weka Explorer

Figure A4: Tranformation Table (in .csv format) loaded into Weka Explorer


FigureA5: Selecting Association Algorithm after loading transformation table in Weka Explorer

Figure A6: Choosing support and confidence metrics with number of rules in Weka Explorer


Figure A7: After Choosing specific support and confidence metrics with number of rules in

Weka Explorer, the associator needs to be started for generating association rules.


Run Information for first 200 Rules Using Weka Explorer

Scheme: weka.associations.Apriori -N 200 -T 0 -C 0.5 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1

Relation: Full Transformation Table

Instances: 578

Attributes: 308

[list of attributes omitted]

=== Associator model (full training set) ===

Apriori

=======

Minimum support: 0.85 (491 instances)

Minimum metric <confidence>: 0.5

Number of cycles performed: 3

Generated sets of large itemsets:

Size of set of large itemsets L(1): 24




Best rules found:

1. CSE425_Attendance=Excellent PHY109_Attendance=Excellent 505 ==>

EEE269_Attendance=Excellent 492 conf:(0.97)

2. EEE263_Attendance=Excellent EEE269_Attendance=Excellent 522 ==>

CSE405_Attendance=Excellent 508 conf:(0.97)

3. CSE425_Attendance=Excellent ME165_Attendance=Excellent 522 ==>





5. CSE411_Attendance=Excellent EEE269_Attendance=Excellent 515 ==>




7. Student Type=Regular CSE405_Attendance=Excellent ME165_Attendance=Excellent 512

==> EEE163_Attendance=Excellent 498 conf:(0.97)

8. Student Type=Regular EEE269_Attendance=Excellent ME165_Attendance=Excellent 512






11. CSE425_Attendance=Excellent CSE431_Attendance=Excellent 507 ==>




13. CSE405_Attendance=Excellent CSE425_Attendance=Excellent

ME165_Attendance=Excellent 507 ==> EEE163_Attendance=Excellent 493 conf:(0.97)



15. CSE425_Attendance=Excellent EEE269_Attendance=Excellent


16. CSE205_Attendance=Excellent 505 ==> EEE269_Attendance=Excellent 491 conf:(0.97)



18. Student Type=Regular CSE405_Attendance=Excellent CSE425_Attendance=Excellent 505





20. Student Type=Regular ME165_Attendance=Excellent 529 ==>




22. EEE269_Attendance=Excellent 558 ==> CSE405_Attendance=Excellent 542 conf:(0.97)



24. Student Type=Regular CSE425_Attendance=Excellent 521 ==>






27. EEE269_Attendance=Excellent PHY109_Attendance=Excellent 520 ==>




29. EEE263_Attendance=Excellent ME165_Attendance=Excellent 517 ==>








33. ME165_Attendance=Excellent PHY109_Attendance=Excellent 514 ==>




35. Student Type=Regular PHY109_Attendance=Excellent 513 ==>



36. CSE425_Attendance=Excellent 547 ==> CSE405_Attendance=Excellent 531 conf:(0.97)

37. MATH143_Attendance=Excellent 512 ==> CSE405_Attendance=Excellent 497

conf:(0.97)

38. MATH143_Attendance=Excellent 512 ==> EEE269_Attendance=Excellent 497

conf:(0.97)






EEE269_Attendance=Excellent 511 ==> CSE405_Attendance=Excellent 496 conf:(0.97)













ME165_Attendance=Excellent 508 ==> CSE405_Attendance=Excellent 493 conf:(0.97)



50. Student Type=Regular CSE425_Attendance=Excellent EEE269_Attendance=Excellent 506

==> CSE405_Attendance=Excellent 491 conf:(0.97)









55. PHY109_Attendance=Excellent 536 ==> EEE269_Attendance=Excellent 520 conf:(0.97)






59. Student Type=Regular EEE269_Attendance=Excellent 534 ==>







































79. ME165_Attendance=Excellent 552 ==> CSE405_Attendance=Excellent 535 conf:(0.97)

80. ME165_Attendance=Excellent 552 ==> EEE163_Attendance=Excellent 535 conf:(0.97)

81. MATH141_Attendance=Excellent 517 ==> CSE405_Attendance=Excellent 501

conf:(0.97)

82. MATH141_Attendance=Excellent 517 ==> EEE269_Attendance=Excellent 501

conf:(0.97)










87. EEE163_Attendance=Excellent EEE269_Attendance=Excellent






90. Student Type=Regular EEE163_Attendance=Excellent EEE269_Attendance=Excellent 516























102. Student Type=Regular CSE405_Attendance=Excellent ME165_Attendance=Excellent 512



EEE163_Attendance=Excellent 512 ==> EEE269_Attendance=Excellent 496 conf:(0.97)





































122. PHY109_Attendance=Excellent 536 ==> CSE405_Attendance=Excellent 519 conf:(0.97)























134. Student Type=Regular 552 ==> CSE405_Attendance=Excellent 534 conf:(0.97)

135. Student Type=Regular 552 ==> EEE269_Attendance=Excellent 534 conf:(0.97)























ME165_Attendance=Excellent 499 conf:(0.97)































180. EEE163_Attendance=Excellent 556 ==> EEE269_Attendance=Excellent 537 conf:(0.97)














187. Student Type=Regular 552 ==> EEE163_Attendance=Excellent 533 conf:(0.97)

188. ME165_Attendance=Excellent 552 ==> EEE269_Attendance=Excellent 533 conf:(0.97)
















Download - Final thesis_Knowledge Discovery from Academic Data using Association Rule Mining

Top Related