Knowledge Discovery from Academic Data
using Association Rule Mining
SUBMITTED BY
Rajshakhar Paul
Student ID: 0805020
Shibbir Ahmed
Student ID: 0805097
Submitted to the Department of Computer Science and Engineering in partial
fulfillment of the requirements for the degree of Bachelor of Science in
Computer Science and Engineering
June, 2014
SUPERVISED BY
Dr. Abu Sayed Md. Latiful Haque
Professor
Department of Computer Science and Engineering
BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY
Page | i
Certificate
We hereby declare that this work has been done by us and neither this thesis nor any part of it
has been submitted elsewhere for the award of any degree or diploma except for publication.
Rajshakhar Paul Shibbir Ahmed
Page | ii
Acknowledgement
First and foremost, we would like to thank the Almighty Allah that we could complete our thesis
work in time with promising findings.
We show our heartfelt gratitude toward our supervisor Dr. Abu Sayed Md. Latiful Haque, who
was very helpful during the entire span of our research work. Without his inspiration, direction,
support and advice, this work would not have been possible.
We would like to thank Department of Computer Science and Engineering for its support with
resources and materials during the research work. Specially, we remember our teachers who
earnestly provided us with encouragement and inspiration for achieving this goal.
We would also like to thank M. Sohel Rahman and Delwar Hossain for their generous assistance
to get the institutional data of students from IICT, BUET.
Last but not the least, we are thankful to our parents, family and friends for their support and
tolerance.
Page | iii
Abstract
Educational Data Mining is an emerging interdisciplinary research area focusing upon
methodologies for extracting useful knowledge from data originating in educational context. The
main objective of a higher education institution is to provide quality education to its students.
One way to achieve the highest possible level of quality in higher education system is by
discovering the knowledge which is hidden among the educational data set and applying the
knowledge properly. This knowledge is extractable through data mining techniques. Association
Rule Mining technique aims at discovering implicative tendencies that can provide valuable
information for the decision maker which is absent in other used academic data mining
techniques as Decision Trees, Neural Networks, Naive Bayes, K-Nearest neighbor etc.
In this work, we present an applied research on mining Association Rule using academic data of
a university. We have discovered knowledge regarding the academic performance and personal
statistics of students. Here we have developed a technique to transform the existing relational
database for students’ academic performance into a universal database format using academic
and personal data of a student. After that we have transformed the universal format into a
modified format for suitability of using Association Rule mining algorithm. We have used
Apriori algorithm for finding interested association rules from the transformed database which
can be useful to extract knowledge of students’ academic progress, potentiality decay,
abandonment as well as retention of students. The impact of courses and curriculum and teaching
methodologies are also found from the extracted knowledge. We have applied the technique
using institutional data of Bangladesh University of Engineering and Technology but it can be
used for the benefit of any institution of higher education.
Page | iv
LIST OF CONTENTS
ACKNOWLEDGEMENT …….…………………………………………………ii
ABSTRACT………….………………………………………………………….. iii
CHAPTER 1………….……………………………………………………………1
INTRODUCTION…………………………..…………………………………………………….1
1.1 Problem Definition…….…………………………………………………………………1
1.2 Motivation……....………………………………………………………………….……..2
1.3 Scope of the Work……..………………………………………………………………….3
1.4 Objectives………...………………....……………………………………………………3
1.5 Thesis Organization…...….………………………………………………………………4
CHAPTER 2………………………………………………………………………6
LITERATURE STUDIES………..……………..……………………………………………….6
2.1 Knowledge Discovery and Data Mining ………………….………………………………6
2.2 Association Rule Mining ……….………….……………………………………………..8
2.3 The Apriori Algorithm …………….…………………………………………………….12
2.4 Generating Association Rules from Frequent Itemsets………………………………….18
2.5 Related Works…………………………………………………………………………...19
CHAPTER 3………………………………………………………………..……21
ACADEMIC DATA STRUCTURE AND MINING SYSTEM……………………………….21
3.1 Data Analysis…………………………………………………………………………...21
3.1.1 Personal and Academic Data……………………………...…………………….21
3.1.2 Course and Curriculum …...……………………………..……………………..22
3.2 Preprocessing for Mining Academic Database……………………….………………...23
3.2.1 Relational Database………………………………………...…………………..23
3.2.2 Universal Database……………………………………..………………………23
3.2.3 Data Transformation………………………………..…………………………..24
3.3 Summary of Methodologies……………………………………………………………27
CHAPTER 4 ……..………………………………………………………….….28
SYSTEM IMPLEMENTATION, RESULTS AND DISCUSSIONS……………………..…28
4.1 Software Implementation ……………………………………………………………...28
Page | v
4.2 Dataset and Application Environment…………………………………………………29
4.3 Results and Discussions………………………………………………………………..30
4.3.1 Impact of Gender………………...……………………………………………..30
4.3.2 Impact of Residence…………………………………………………...……….31
4.3.3 Correlation between Courses…...………………………………...…….………32
4.3.4 Impact on Retention…………………………………………………………….33
4.3.5 Impact on Abandonment………...…….………………………………………..34
4.3.6 Impact of Continuous Assessment……………………………………………...35
4.3.7 Impact of Non Departmental Courses…………………………………………..36
4.3.8 Impact of Departmental Courses………………………………………………..37
CHAPTER 5……………………….……………………………………………..39
CONCLUSIONS……………………………………………………………………………..39
5.1 Summary of the Findings ……………………………………………………………...39
5.2 Future Works ………………………..…………………………………………………39
REFERENCES………..……………..……………………………………………41
APPENDIX…………...…………………….…………………………………….44
Page | vi
List of Tables
Table 2.1: Transactional data for AllElectronics branch………………………………………..14 Table 3.2: Selected Data from BIIS database……………………….…………………………..21
Table 3.2: All Undergraduate Courses for department of CSE…………………….…………...22
Table 3.3: Partial portion of universal database……………………………………..…………..24
Table 3.4: Transformation rule table for 3.0 credit theory course…………………….………...25
Table 3.5: Transformation rule table for 4.0 credit theory course…………………….………...26
Table 3.6: Transformation rule table for 2.0 credit theory course……………………….……...26
Table 3.7: Transformation rule table for all sessional courses………………………….………26
Table 3.8: Transformed table from universal table………………….………………….……….27
Table 4.1: Impact of Gender………………………………………………………….…………30
Table 4.2: Impact of Hall Status……………………………………………….………………..31
Table 4.3: Impact of Hall Status and Gender…………………………………..………………..32
Table 4.4: Correlation between Courses……………………………………….………………..32
Table 4.5: Impact on Retention………………………………………………..……………...…33
Table 4.6: Impact on Abandonment……………………….........................…………………….34
Table 4.7: Impact of Continuous Assessment……………………………..…………………….35
Table 4.8: Impact of Non Departmental Courses………………………….……………………36
Table 4.9: Impact of Departmental Courses……………………………….……………………37
Page | vii
List of Figures
Figure 2.1: Data Mining as a step in the process of Knowledge Discovery………………….….7
Figure 2.2: Market basket analysis……………………………………………………………….9
Figure 2.3 : Generation of candidate itemsets and frequent itemsets, where the minimum support
count is 2…………………………………………………………………………………………15
Figure 2.4 : Generation and pruning of candidate 3-itemsets, C3, from L2 using the Apriori
property…………………………………………………………...……………………………...16
Figure 2.5: The Apriori algorithm for discovering frequent itemsets for mining Boolean
association rules………………………………………………………………………………….17
Figure 3.1: Factors related to Academic Performance,Abandonment and Retention of student..21
Figure 3.2: Relational database………………..………………………………………………...23
Figure 4.1: Experimental Setup for applying Apriori Algorithm using Weka Explorer to
generate Association Rules…………..…...……………………….………………..……………29
Figure A1: Partial Portion of Initial Dataset (.xls format in Excel) of BIIS…………………….44
Figure A2: Partial Portion of Universal data (.xls format in Excel) before converting to
transformation table………………………………………………….……………......................45
Figure A3: Partial Portion of Transformation table (.xls format in Excel)……………………...46
Figure A4: Tranformation Table (in .csv format) loaded into Weka Explorer…………………47
Figure A5: Selecting Association Algorithm after loading transformation table in Weka
Explorer…………………………………………………………………………………………..48
Figure A6: Choosing support and confidence metrics with number of rules in Weka Explorer..48
Figure A7: After Choosing specific support and confidence metrics with number of rules in
Weka Explorer, the associator needs to be started for generating association rules……………..49
Knowledge Discovery from Academic Data using Association Rule Mining Page | 1
Chapter 1
Introduction
Students are one of the fundamental elements of any academic institution. Indeed, the prime
concern for an educational institution is to ensure qualified technical foundation, scholarly
guidance and high standard education to all of its students. For a large educational institute like
public university which generates large volumes of data, it requires an efficient way to apply data
mining techniques for obtaining knowledge on the development and performance improvement
of academic activities. The knowledge acquired from the institutional database will be sufficient
to look for answers to such questions as: Which factors determine better or worse academic
performance of students? What are the causes behind the students' retention i.e., the extended
continuation of the studies in the university? Why do students drop out before graduation i.e.,
students‟ abandonment from an educational institute. Concepts and techniques of data mining are
essential to discover the hidden knowledge from large datasets [1].
1.1 Problem Definition
Bangladesh University of Engineering and Technology (BUET) is the topmost technological
university of Bangladesh and it enrolls the top most brilliant 1000 students selected by a
competitive examination among one million students competing higher secondary education.
Among these 1000 students, top ranked students can get admission into the different departments
under different faculties. Although, this university possesses most of the brightest students of
Bangladesh, statistics demonstrates that performance of some students degrades noticeably. On
the other hand, some students perform outstandingly at the initial stage of the undergraduate
studies but they can not demonstrate the same level of excellence till the completion of their
graduation. Some students can not perform well initially but at the end of their graduation they
possess pretty good academic career. Again, there are some students in this university who have
to continue their studies year after year and take a very long time for the completion of their
graduation. Unfortunately there are also some meritorious students who drop out before the
graduation. Only statistical analysis is not sufficient for finding the reasons of all the above
Knowledge Discovery from Academic Data using Association Rule Mining Page | 2
problems in any academic institution. The hidden knowledge inside the institutional academic
and personal data of students is necessary to find out the possible causes of all these problems
and take suitable precaution for them. That is why knowledge discovery and data mining form
academic data is essential for educational institution like BUET to improve academic
performance of students as well as refine the standard of teaching methodologies and reshape the
decision makings for the betterment of the institution.
Discovering the hidden knowledge from educational data and applying it properly for decision
making is essential for ensuring high quality education in any academic institution. For this, data
mining techniques are very effective. But all the data mining techniques can not be applied
directly on academic data because of complex structure. This requires rigorous preprocessing.
The choice of support and confidence, selection of important association rules from huge number
of generated rules are other significant problems of knowledge discovery from academic data.
1.2 Motivation
In a developing country like Bangladesh, too many students from rural area come to city for
higher education. They usually come to city leaving their family and have to accommodate with
a completely new environment. They start their new educational life at institution‟s hall. New
living place, new types of foods, new companions, new atmosphere. It is seen that they usually
need some time to cope up both physically and mentally with all of these new things which may
hamper their educational activities at the very beginning. And the scenario is bit more difficult
for girls than boys. So sometimes they lag behind at the beginning of the race of their higher
studies which may create an adverse effect in the long run for them. On the other hand, the city
students are more likely familiar with the environment, living with their family and provided
with more opportunities of educational, technological and psychological aspects which may give
them some advantages in the track of higher education. Though the scenario can be different, the
more opportunities may drive them away from the track and demoralize them in studies.
In higher education system like BUET, the performance of one course depends on different
aspects such as class attendance, class test, quiz, assignments, term final examinations, etc. some
of which start from very beginning of the class. So if any student gets poor marks in any of these,
it may affect the final result. And the later courses are sometimes dependent of previous courses.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 3
So if any student gets poor result in any course it may affect the performance of other related
courses too.
So it is very obvious to discover all possible knowledge from academic data to know all the
relevant rules behind students‟ performances whether they are doing well or bad. And if they
cannot perform well then the reason behind it can also be discovered.
1.3 Scope of the Work
Bangladesh University of Engineering and Technology (BUET) is the renowned university in
Bangladesh for engineering studies. There are five different faculties which are Faculty of
Architecture and Planning, Faculty of Civil Engineering, Faculty of Electrical and Electronic
Engineering, Faculty of Engineering and Faculty of Mechanical Engineering [2]. Under these
five faculties there are eleven different departments which are Dept. of Electrical & Electronic
Engineering (EEE), Dept. of Computer Science & Engineering (CSE), Dept. of Architecture
(Arch), Dept. of Urban & Regional Planning (URP), Dept. of Civil Engineering (CE), Dept. of
Water Resources Engineering (WRE), Dept. of Chemical Engineering (Ch.E), Dept. of Materials
& Metallurgical Engineering (MME), Dept. of Mechanical Engineering (ME), Dept. of Naval
Architecture & Marine Engineering (NAME), Dept. of Industrial & Production Engineering
(IPE). BUET offers both undergraduate and postgraduate degrees in all these departments.
Overall there are more than 5000 students in an academic session. So, the scope of knowledge
discovery form academic data of BUET is immense in context of undergraduate and
postgraduate students of all the departments. In this research, we have only considered the data
of undergraduate students of the department of CSE. We have developed a technique which can
be used to discover knowledge of rest of the departments even the modified technique can be
applicable to postgraduate course and curriculum for the betterment of postgraduate studies as
well.
1.4 Objectives
The department of Computer Science and Engineering (CSE) [3] is one of the prestigious
departments of BUET. Although, this department possesses most of the brightest students of
Bangladesh, statistical data demonstrates that performance of some students degrades noticeably.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 4
Moreover the problem of retention as well as abandonment is also prevalent among the students.
The main objective of this research study is –
To discover knowledge of students‟ academic progress from academic performance
with personal statistics through the impact of different assessment of courses e.g.,
class test, attendance, term final examination etc.
To find out reasons behind the degradation of student‟s merit i.e., decay in their
potentiality
To discover causes behind extended continuation for graduation i.e., retention of
students
To find out why some meritorious students drop out before graduation i.e.,
abandonment of students
1.5 Thesis Organization
We have developed a technique to discover knowledge using Association Rule Mining from
institutional data of students who have completed their undergraduate in the department of CSE,
BUET. All the literature studies e.g., preliminaries of Knowledge Discovery and Data Mining
(KDD), Association Rule Mining, Apriori algorithm and related works have been elaborated in
chapter 2. Description of the academic data mining system i.e., the entire methodologies
including both the analysis and design part have been described in chapter 3. Basically in this
research, we have transformed the existing relational database for students‟ academic
performance into a universal database format using academic and personal data of students. After
that, we have further transformed the universal format into a modified format for suitability of
using Association Rule Mining algorithm which is elaborated in chapter 3.
We have discovered interested rules which interpret several important facts related to students‟
academic performance e.g., impact of personal information of students such that gender,
residence etc. and the impact of course contents and pedagogy. We have also determined the
impact of retention for particular courses. Addressing the abandonment issue, we have
categorized the students based on their personal information who could not complete their
graduation which is explained in chapter 4. In Chapter 4, basically we have briefly illustrated the
software implementation, dataset and application environment along with the results and
Knowledge Discovery from Academic Data using Association Rule Mining Page | 5
discussions. Finally, we have illustrated the summary of the findings along with quantitative
analysis. We have also encompassed the scope of the extension this research work by illustrating
some significant future works in chapter 5. Thus, we have presented a guideline to apply the
extracted knowledge to improve the academic performance and to make an optimization between
abandonment and retention.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 6
Chapter 2
Literature Studies
2.1 Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining (KDD) is an interdisciplinary area focusing upon
methodologies for extracting useful knowledge from data [4]. Data mining is the task of
discovering interesting patterns from large amounts of data, where the data can be stored in
databases, data warehouses, or other information repositories. It is a young interdisciplinary
field, drawing from areas such as database systems, data warehousing, statistics, machine
learning, data visualization, information retrieval, and high-performance computing. Other
contributing areas include neural networks, pattern recognition, spatial data analysis, image
databases, signal processing, and many application fields, such as business, economics, and
bioinformatics. Generally, data mining (sometimes called data or knowledge discovery) is the
process of analyzing data from different perspectives and summarizing it into useful information
which can be used to increase revenue, cuts costs, or both. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the relationships identified. Technically, data
mining is the process of finding correlations or patterns among dozens of fields in large
relational databases.
Many people treat data mining as a synonym for another popularly used term, Knowledge
Discovery from Data, or KDD, while others view data mining as simply an essential step in the
process of knowledge discovery [1].
Data mining has attracted a great deal of attention in the information industry and in society as a
whole in recent years, due to the wide availability of huge amounts of data and the imminent
need for turning such data into useful information and knowledge. The information and
knowledge gained can be used for applications ranging from market analysis, fraud detection,
and customer retention, to production control and science exploration.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 7
Data mining can be viewed as a result of the natural evolution of information technology. The
database system industry has witnessed an evolutionary path in the development of the following
functionalities: data collection and database creation, data management (including data storage
and retrieval, and database transaction processing), and advanced data analysis (involving data
warehousing and data mining). For instance, the early development of data collection and
database creation mechanisms served as a prerequisite for later development of effective
mechanisms for data storage and retrieval, and query and transaction processing. With numerous
database systems offering query and transaction processing as common practice, advanced data
analysis has naturally become the next target.
Figure 2.1 : Data Mining as a step in the process of Knowledge Discovery
The Knowledge discovery process is shown in Figure 2.1 as an iterative sequence of the
following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved from the database)
4. Data transformation (where data are transformed or consolidated into forms appropriate
Interpretation
Evaluation
Data mining
Transformation
Preprocessing
Selection
Knowledge
Patterns/Models
Transformed
Data
Preprocessed
Data
Target
Data
Data
Knowledge Discovery from Academic Data using Association Rule Mining Page | 8
for mining by performing summary or aggregation operations, for instance)
5. Data mining (an essential process where intelligent methods are applied in order to
extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns representing knowledge
based on some interestingness measures)
7. Knowledge presentation (where visualization and knowledge representation techniques
are used to present the mined knowledge to the user)
Steps 1 through 4 are different forms of data preprocessing, where the data are prepared for
mining. The data mining step may interact with the user or a knowledge base. The interesting
patterns are presented to the user and may be stored as new knowledge in the knowledge base.
According to this view, data mining is only one step in the entire process, although an essential
one because it uncovers hidden patterns for evaluation. So, data mining is a step in the
knowledge discovery process. The ongoing rapid growth of online data due to the Internet and
the widespread use of databases have created an immense need for KDD methodologies. The
challenge of extracting knowledge from data draws upon research in statistics, databases, pattern
recognition, machine learning, data visualization, optimization, and high-performance
computing, to deliver advanced business intelligence and web discovery solutions. So, the entire
knowledge discovery process includes data cleaning, data integration, data selection, data
transformation, data mining, pattern evaluation, and knowledge presentation.
2.2 Association Rule Mining
Association rule mining, one of the most important and well researched techniques of data
mining, was first introduced in [6]. It aims to extract interesting correlations, frequent patterns,
associations or casual structures among sets of items in the transaction databases or other data
repositories. Association rules are widely used in various areas such as telecommunication
networks, market and risk management, inventory control etc. Various association mining
techniques and algorithms will be briefly introduced and compared later. Association rule mining
is to find out association rules that satisfy the predefined minimum support and confidence from
a given database. The problem is usually decomposed into two sub problems. One is to find
those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets
Knowledge Discovery from Academic Data using Association Rule Mining Page | 9
are called frequent or large itemsets. The second problem is to generate association rules from
those large itemsets with the constraints of minimal confidence.
Frequent itemset mining leads to the discovery of associations and correlations among items in
large transactional or relational data sets. With massive amounts of data continuously being
collected and stored, many industries are becoming interested in mining such patterns from their
databases. The discovery of interesting correlation relationships among huge amounts of
business transaction records can help in many business decision-making processes, such as
catalog design, cross-marketing, and customer shopping behavior analysis. A typical example of
frequent itemset mining is market basket analysis. This process analyzes customer buying
habits by finding associations between the different items that customers place in their “shopping
baskets” (Figure 2.2). The discovery of such associations can help retailers develop marketing
strategies by gaining insight into which items are frequently purchased together by customers.
For instance, if customers are buying milk, how likely are they to also buy bread (and what kind
of bread) on the same trip to the supermarket? Such information can lead to increased sales by
helping retailers do selective marketing and plan their shelf space.
Figure 2.2 : Market basket analysis.
If we think of the universe as the set of items available at the store, then each item has a Boolean
variable representing the presence or absence of that item. Each basket can then be represented
milk bread
cereal
Customer 1
Which items are frequently
purchased together by the
customers?
milk
sugar
bread
eggs
Customer 2
milk bread
butter
Customer 3
sugar
eggs
Customer n
Shopping Baskets
Knowledge Discovery from Academic Data using Association Rule Mining Page | 10
by a Boolean vector of values assigned to these variables. The Boolean vectors can be analyzed
for buying patterns that reflect items that are frequently associated or purchased together. These
patterns can be represented in the form of association rules. For example, the information that
customers who purchase computers also tend to buy antivirus software at the same time is
represented in Association Rule below:
Rule support and confidence are two measures of rule interestingness. They respectively reflect
the usefulness and certainty of discovered rules. A support of 2% for the Association Rule means
that 2% of all the transactions under analysis show that computer and antivirus software are
purchased together. A confidence of 60% means that 60% of the customers who purchased a
computer also bought the software. Typically, association rules are considered interesting if they
satisfy both a minimum support threshold and a minimum confidence threshold. Such thresholds
can be set by users or domain experts. Additional analysis can be performed to uncover
interesting statistical correlations between associated items.
Let I ={I1, I2, : : : , Im } be a set of items. Let D, the task-relevant data, be a set of database
transactions where each transaction T is a set of items such that T is subset of I. Each transaction
is associated with an identifier, called TID. Let A be a set of items. A transaction T is said to
contain A if and only if A is subset of T. An association rule is an implication of the form ,
where A is subset of I , B is subset of I , and A∩B=φ. The rule holds in the transaction
set D with support s, where s is the percentage of transactions in D that contain AUB (i.e., the
union of sets A and B, or say, both A and B). This is taken to be the probability, P(AUB). The rule
has confidence c in the transaction set D, where c is the percentage of transactions in D
containing A that also contain B. This is taken to be the conditional probability, P(B/A). That is,
( ) ( )
( ) ( )
Rules that satisfy both a minimum support threshold (min_sup) and a minimum confidence
threshold (min_conf) are called strong. By convention, we write support and confidence values
so as to occur between 0% and 100%, rather than 0 to 1.0.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 11
A set of items is referred to as an itemset. An itemset that contains k items is a k-itemset. The set
{computer, antivirus_software} is a 2-itemset. The occurrence frequency of an itemset is the
number of transactions that contain the itemset. This is also known, simply, as the frequency,
support count, or count of the itemset. Note that the itemset support defined in the equation is
sometimes referred to as relative support, whereas the occurrence frequency is called the
absolute support. If the relative support of an itemset I satisfies a prespecified minimum support
threshold (i.e., the absolute support of I satisfies the corresponding minimum support count
threshold), then I is a frequent itemset. The set of frequent k-itemsets is commonly denoted by
Lk. From equation of measuring confidence, we have-
( ) (
)
( )
( )
( )
( )
This equation shows that the confidence of rule can be easily derived from the support
counts of A and AUB. That is, once the support counts of A, B, and AUB are found, it is
straightforward to derive the corresponding association rules and and check
whether they are strong. Thus the problem of mining association rules can be reduced to that of
mining frequent itemsets.
In general, association rule mining can be viewed as a two-step process:
i. Find all frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count, min_sup.
ii. Generate strong association rules from the frequent itemsets: By definition, these rules
must satisfy minimum support and minimum confidence.
Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets
are generated in the following way: the first rule is {I1, I2, … , Ik-1} {Ik}, by checking the
confidence this rule can be determined as interesting or not. Then other rule are generated by
deleting the last items in the antecedent and inserting it to the consequent, further the confidences
of the new rules are checked to determine the interestingness of them. Those processes iterated
until the antecedent becomes empty. Since the second sub problem is quite straight forward,
most of the researches focus on the first sub problem. The first sub-problem can be further
divided into two sub-problems: candidate large itemsets generation process and frequent itemsets
Knowledge Discovery from Academic Data using Association Rule Mining Page | 12
generation process. We call those itemsets whose support exceed the support threshold as large
or frequent itemsets, those itemsets that are expected or have the hope to be large or frequent are
called candidate itemsets.
In many cases, the algorithms generate an extremely large number of association rules, often in
thousands or even millions. Further, the association rules are sometimes very large. It is nearly
impossible for the end users to comprehend or validate such large number of complex
association rules, thereby limiting the usefulness of the data mining results. Several strategies
have been proposed to reduce the number of association rules, such as generating only
“interesting” rules, generating only “non-redundant” rules, or generating only those rules
satisfying certain other criteria such as coverage, leverage, lift or strength.
Hegland [7] reviews the most well-known algorithm for producing association rules - Apriori
and discuss variants for distributed data, inclusion of constraints and data taxonomies.
2.3 The Apriori Algorithm
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1993 for mining
frequent itemsets for Boolean association rules [1]. The name of the algorithm is based on the
fact that the algorithm uses prior knowledge of frequent itemset properties. Apriori employs an
iterative approach known as a level-wise search, where k-itemsets are used to explore (k+1)-
itemsets. First, the set of frequent 1-itemsets is found by scanning the database to accumulate the
count for each item, and collecting those items that satisfy minimum support. The resulting set is
denoted L1. Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3,
and so on, until no more frequent k-itemsets can be found. The finding of each Lk requires one
full scan of the database. To improve the efficiency of the level-wise generation of frequent
itemsets, an important property called the Apriori property, presented below, is used to reduce
the search space.
Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
The Apriori property is based on the following observation. By definition, if an itemset I does
not satisfy the minimum support threshold, min sup, then I is not frequent; that is, P(I) < min_
sup. If an item A is added to the itemset I, then the resulting itemset (i.e., I UA) cannot occur
more frequently than I. Therefore, I UA is not frequent either; that is, P(I UA) < min_ sup. This
Knowledge Discovery from Academic Data using Association Rule Mining Page | 13
property belongs to a special category of properties called antimonotone in the sense that if a set
cannot pass a test, all of its supersets will fail the same test as well. It is called antimonotone
because the property is monotonic in the context of failing a test. “How is the Apriori property
used in the algorithm?” To understand this, let us look at how Lk-1 is used to find Lk for k ≥2. A
two-step process is followed, consisting of join and prune actions.
1. The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1
with itself. This set of candidates is denoted Ck. Let l1 and l2 be itemsets in Lk-1. The
notation li[j] refers to the jth item in li (e.g., l1[k-2] refers to the second to the last item in
l1). By convention, Apriori assumes that items within a transaction or itemset are sorted
in lexicographic order. For the (k-1)-itemset, li, this means that the items are sorted such
that li[1] < li [2] < …< li [k-1]. The join, Lk-1 on Lk-1, is performed, where members of
Lk-1 are joinable if their first (k-2) items are in common. That is, members l1 and l2 of Lk-1
are joined if (l1[1] = l2[1]) ^ (l1[2] = l2[2]) ˄ … ˄ (l1[k-2] = l2[k-2]) ^(l1[k-1] < l2[k-1]).
The condition l1[k-1] < l2[k-1] simply ensures that no duplicates are generated. The
resulting itemset formed by joining l1 and l2 is l1[1], l1[2], … , l1[k-2], l1[k-1], l2[k-1].
2. The prune step: Ck is a superset of Lk, that is, its members may or may not be
frequent, but all of the frequent k-itemsets are included in Ck. A scan of the database to
determine the count of each candidate in Ck would result in the determination of Lk (i.e.,
all candidates having a count no less than the minimum support count are frequent by
definition, and therefore belong to Lk). Ck, however, can be huge, and so this could
involve heavy computation. To reduce the size of Ck, the Apriori property is used as
follows. Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset.
Hence, if any (k-1)-subset of a candidate k-itemset is not in Lk-1, then the candidate cannot
be frequent either and so can be removed from Ck. This subset testing can be done
quickly by maintaining a hash tree of all frequent itemsets.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 14
Example: Apriori. Let‟s look at a concrete example, based on the AllElectronics transaction
database, D, of the following table 2.1. There are nine transactions in this database, that is, |D| =
9. We use Figure 2.3 to illustrate the Apriori algorithm for finding frequent itemsets in D.
Table 2.1: Transactional data for AllElectronics branch
TID List of item_IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
1. In the first iteration of the algorithm, each item is a member of the set of candidate 1
itemsets, C1. The algorithm simply scans all of the transactions in order to count the
number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min_sup = 2. (Here, we
are referring to absolute support because we are using a support count. The
corresponding relative support is 2/9 = 22%). The set of frequent 1-itemsets, L1, can then
be determined. It consists of the candidate 1-itemsets satisfying minimum support. In our
example, all of the candidates in C1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 on L1 to
generate a candidate set of 2-itemsets, C2. C2 consists of |L1|C 2 2-itemsets. Note that no
candidates are removed fromC2 during the prune step because each subset of the
candidates is also frequent.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 15
Figure 2.3 : Generation of candidate itemsets and frequent itemsets, where the minimum
support count is 2.
4. Next, the transactions in D are scanned and the support count of each candidate itemset
in C2 is accumulated, as shown in the middle table of the second row in Figure 2.3.
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2
itemsets in C2 having minimum support.
6. The generation of the set of candidate 3-itemsets, C3, is detailed in Figure 2.4. From
the join step, we first getC3 =L2 onL2 = {I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4},
{I2, I3, I5}, {I2, I4, I5}. Based on the Apriori property that all subsets of a frequent
itemset must also be frequent, we can determine that the four latter candidates cannot
Itemset Sup. Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2
Itemset Sup. Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2
Itemset Sup. Count
{I1, I2} 4
{I1, I3} 4
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
Itemset Sup. Count
{I1, I2} 4
{I1, I3} 4
{I1, I4} 1
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
{I3, I4} 0
{I3, I5} 1
{I4, I5} 0
Itemset
{I1, I2}
{I1, I3}
{I1, I4}
{I1, I5}
{I2, I3}
{I2, I4}
{I2, I5}
{I3, I4}
{I3, I5}
{I4, I5}
Itemset Sup. Count
{I1, I2, I3} 2
{I1, I2, I5} 2
Itemset Sup. Count
{I1, I2, I3} 2
{I1, I2, I5} 2
Itemset
{I1, I2, I3}
{I1, I2, I5}
Compare candidate
support count with
minimum support count
Scan D for
count of each
candidate
Generate C2
Candidates
from L1
Scan D for
count of each
candidate
Compare
candidate
support count
with
minimum
support count
Scan D for
count of each
candidate
Compare
candidate
support count
with minimum
support count
Generate C3
Candidates
from L2
C1 L1
C2 C2 L2
C3 C3 L3
Knowledge Discovery from Academic Data using Association Rule Mining Page | 16
possibly be frequent. We therefore remove them fromC3, thereby saving the effort of
unnecessarily obtaining their counts during the subsequent scan of D to determine L3.
Note that when given a candidate k-itemset, we only need to check if its (k-1)-subsets are
frequent since the Apriori algorithm uses a level-wise search strategy. The resulting
pruned version of C3 is shown in the first table of the bottom row of Figure 2.3.
(a) Join: C3 = L2 on L2 = {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}} on
{{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}}
= {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
(b) Prune using the Apriori property: All nonempty subsets of a frequent itemset must
also be frequent. Do any of the candidates have a subset that is not frequent?
The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}. All 2-item
subsets of {I1, I2, I3} are members of L2. Therefore, keep {I1, I2, I3} in C3.
The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}. All 2-item
subsets of {I1, I2, I5} are members of L2. Therefore, keep {I1, I2, I5} in C3.
The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}. {I3, I5} is not a
member of L2, and so it is not frequent. Therefore, remove {I1, I3, I5} from C3.
The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}. {I3, I4} is not a
member of L2, and so it is not frequent. Therefore, remove {I2, I3, I4} from C3.
The 2-item subsets of {I2, I3, I5} are {I2, I3}, {I2, I5}, and {I3, I5}. {I3, I5} is not a
member of L2, and so it is not frequent. Therefore, remove {I2, I3, I5} from C3.
The 2-item subsets of {I2, I4, I5} are {I2, I4}, {I2, I5}, and {I4, I5}. {I4, I5} is not a
member of L2, and so it is not frequent. Therefore, remove {I2, I4, I5} from C3.
(c) Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after pruning.
Figure 2.4 : Generation and pruning of candidate 3-itemsets, C3, from L2 using the
Apriori property.
7. The transactions in D are scanned in order to determine L3, consisting of those
candidate 3-itemsets in C3 having minimum support (Figure 2.3).
8. The algorithm uses L3 on L3 to generate a candidate set of 4-itemsets, C4. Although the
join results in {I1, I2, I3, I5}, this itemset is pruned because its subset {I2, I3, I5} is not
frequent. Thus, C4 = φ, and the algorithm terminates, having found all of the frequent
itemsets.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 17
Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based on candidate
generation.
Input:
D, a database of transactions;
min sup, the minimum support count threshold.
Output: L, frequent itemsets in D.
Method:
(1) L1 = find_frequent_1-itemsets(D) ;
(2) for (k = 2; Lk-1 ≠ φ; k++) {
(3) Ck = apriori_gen(Lk-1) ;
(4) for each transaction t ԑ D { // scan D for counts
(5) Ct = subset(Ck, t) ; // get the subsets of t that are candidates
(6) for each candidate c ԑ Ct
(7) c.count++ ;
(8) }
(9) Lk = {c ԑ Ck|c:count ≥ min_sup}
(10) }
(11) return L = Uk Lk ;
procedure apriori_gen(Lk-1: frequent (k-1)-itemsets)
(1) for each itemset l1 ԑ Lk-1
(2) for each itemset l2 ԑ Lk-1
(3) if (l1[1] = l2[1])˄(l1[2] = l2[2])˄...˄(l1[k-2] = l2[k-2])˄(l1[k-1] < l2[k-1]) then {
(4) c = l1 on l2; // join step: generate candidates
(5) if has_infrequent_subset(c, Lk-1) then
(6) delete c; // prune step: remove unfruitful candidate
(7) else add c to Ck ;
(8) }
(9) return Ck ;
procedure has_infrequent_subset(c: candidate k-itemset;
Lk-1: frequent (k-1)-itemsets); // use prior knowledge
(1) for each (k-1)-subset s of c
(2) if s Ɇ Lk-1 then
(3) return TRUE;
(4) return FALSE;
Figure 2.5: The Apriori algorithm for discovering frequent itemsets for mining Boolean
association rules.
Figure 2.5 shows pseudo-code for the Apriori algorithm and its related procedures. Step 1 of
Apriori finds the frequent 1-itemsets, L1. In steps 2 to 10, Lk-1 is used to generate candidates Ck in
order to find Lk for k ≥ 2. The apriori_gen procedure generates the candidates and then uses the
Apriori property to eliminate those having a subset that is not frequent (step 3). This procedure is
Knowledge Discovery from Academic Data using Association Rule Mining Page | 18
described below. Once all of the candidates have been generated, the database is scanned (step
4). For each transaction, a subset function is used to find all subsets of the transaction that are
candidates (step 5), and the count for each of these candidates is accumulated (steps 6 and 7).
Finally, all of those candidates satisfying minimum support (step 9) form the set of frequent
itemsets, L (step 11). A procedure can then be called generate association rules from the frequent
itemsets. The Apriori_gen procedure performs two kinds of actions, namely, join and prune, as
described above. In the join component, Lk-1 is joined with Lk-1 to generate potential candidates
(steps 1 to 4). The prune component (steps 5 to 7) employs the Apriori property to remove
candidates that have a subset that is not frequent. The test for infrequent subsets is shown in
procedure has infrequent subset.
2.4 Generating Association Rules from Frequent Itemsets
Once the frequent itemsets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong association rules
satisfy both minimum support and minimum confidence). This can be done using the below
Equation for confidence, which we show again here for completeness:
( ) (
)
( )
( )
( )
( )
The conditional probability is expressed in terms of itemset support count, where support_
count(AUB) is the number of transactions containing the itemsets AUB, and support_count(A) is
the number of transactions containing the itemset A. Based on this equation, association rules can
be generated as follows:
For each frequent itemset l, generate all nonempty subsets of l.
For every nonempty subset s of l, output the rule ( ) if
( )
( ) min_conf, where min_conf is the minimum confidence threshold.
Because the rules are generated from frequent itemsets, each one automatically satisfies
minimum support. Frequent itemsets can be stored ahead of time in hash tables along with
their counts so that they can be accessed quickly.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 19
2.5 Related Works
Techniques of Educational Data Mining (EDM) have been used to resolve educational research
issues since 1993 [8]. Mining educational data through classification is an effective way to
analyze students‟ performance from the extracted knowledge [9]. Automatic Clustering and
decision rule data mining techniques are also applied for knowledge discovery based on
academic data analysis [10]. It has been shown already how using data mining algorithms can
help discovering all possible relevant knowledge contained in databases obtained from Web-
based educational systems [11]. Besides these, there is also an idea presented in the research [12]
about the factors depended on students‟ performance and how Naïve Bayes classifier can be
applied to calculate probabilities so that the final examination results can be predicted based on
the findings.
Student retention i.e., the perception of students‟ extended continuation in the institution and
student abandonment i.e., the perception of students‟ dropping out from the institution are two
important indicators of academic performance and teaching methodology for an educational
institute. There is an impressive study [13] which illustrates how data mining techniques can
help to detect retentive students, evaluate course suitability and finally implement intervention
programs to decrease the drop out of students. As, abandonment problem can be solved by
increasing student retention. Regarding this, a research [14] has been done in order to increase
college student retention by performing early detection of academic risk using data mining
methods. The preliminary results have been shown on initial model development using
classification in this regard.
Recently important works have incorporated data mining in academic research. For instance,
Association Rule Mining discoveries technique has been used to compare students‟ performance
in the courses common at graduation and post-graduation level which is useful to predict factors
related to success or failure [15]. There is an important study [16] showing the comparison of
several frequent and rare Association Rule Mining algorithms with a view to measuring both
their performance and their usefulness in educational environments. A different approach of
Association Rule Mining is used to find out support, confidence and interestingness level for
appropriate language and attendance in the classroom [17]. An interesting research has been
done recently in which Rule Schema formalism for obtaining Association Rules from knowledge
Knowledge Discovery from Academic Data using Association Rule Mining Page | 20
base has been proposed by integrating user knowledge in the post processing task [18]. Although
there is a study showing the drawbacks and solutions of applying Association Rule Mining in
learning management systems [19], recent work e.g., mining the impact of unsupervised course
work like assignments on overall performance of students [20] and developing useful software
for knowledge discovery from students‟ result repository by Association Rule Mining approach
[21] encourage us to proceed farther in discovering knowledge from academic data using
Association Rule Mining.
Before applying mining association rules, academic data is needed to be preprocessed properly.
For this, a technique of preprocessing academic data before Mining Association Rule [22] with
synthetic dataset has been proposed recently for checking the suitability of the system with real
institutional dataset.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 21
Chapter 3
Academic Data Mining
3.1 Data Analysis
3.1.1 Personal and Academic Data
In this research, we have considered academic data structure of BUET. The student data of the
BIIS (BUET Institutional Information System) contains several personal and academic
information of a particular student. We have collected them anonymously for the data
preprocessing and data analysis. We have considered these personal and academic data stated in
the Table 3.1 for knowledge discovery regarding academic performance, abandonment and
retention of students illustrated in Figure 3.1.
Table 3.2: Selected Data from BIIS database
Academic Information
Department
Admission Year / Batch
Overall CGPA
Marks of Class test, Attendance, Two Answer
Scripts, Total Marks and Grades of all Theory
Courses
Total Marks and Grades of all Sessional Courses
Total Completed Credit Hour
Personal Information Gender
Hall Resident/Non-resident
Figure 3.1: Factors related to Academic Performance, Abandonment and Retention of students
Academic Performance
Student Retention
Student Abandonment
Residence Gender
Records of all Continuous
Assessments
Records of
Departmental Courses Records of Non
Departmental Courses
Knowledge Discovery from Academic Data using Association Rule Mining Page | 22
3.1.2 Course and Curriculum
As we have experimented with the students‟ data of the department of Computer Science and
Engineering (CSE) in BUET, we have analyzed all the courses in the curriculum which has to be
taken to complete the BSc degree. A student has to take total 68 departmental and non-
departmental courses in total. All the courses along with their credit hour are shown in Table 3.2.
Table 3.2: All Undergraduate Courses for department of CSE
Among them there are 40 theory courses (25 departmental and 15 non-departmental) and 28
sessional courses (20 departmental and 7 non-departmental) including thesis. We determine
academic performance and impact of other factors on basis of these courses‟ final grade and
marks of attendance, class tests, term final answer scripts, total marks etc.
Course Type Credit
Hour Course Number
Departmental
Theory Courses
4.0 CSE307, CSE321
3.0
CSE103, CSE105, CSE201, CSE203, CSE205, CSE207,
CSE209, CSE303, CSE305, CSE309, CSE311, CSE301,
CSE313, CSE315, CSE317, CSE401, CSE403, CSE423,
CSE409, CSE461, CSE463
2.0 CSE100, CSE 211
Departmental
Sessional Courses
1.5
CSE106, CSE202, CSE206, CSE210, CSE214, CSE304,
CSE308, CSE314, CSE316, CSE404
0.75
CSE204, CSE208, CSE300, CSE310, CSE322,
CSE324, CSE402, CSE410, CSE462, CSE464
Non-Departmental
Theory Courses
4.0 PHY109, MATH143, EEE263, MATH 243,
3.0
EEE163, MATH141, ME165, CHEM101, HUM175,
MATH241, EEE269, IPE493
2.0 HUM211, HUM275, HUM371
Non-Departmental
Sessional Courses
1.5
PHY102, EEE164, ME160, HUM272, CHEM114,
EEE264, EEE270
Thesis 6.0 CSE400
Knowledge Discovery from Academic Data using Association Rule Mining Page | 23
3.2 Preprocessing for Mining Academic Database
3.2.1 Relational Database
Students take courses through BIIS account via registration. In the relational database illustrated
in Figure 2, all the personal information as well as the results of taken courses of a student are
stored. Through which we can obtain the relational table containing a student‟s gender, hall
status, performance of all courses, CGPA etc.
Figure 3.2: Relational database
3.2.2 Universal Database
A universal database is created for the purpose in which records of all taken courses along with
personal information like gender, hall status of corresponding student id are stored in a single
row of the table. For a specific course, the grade, attendance, marks of class tests, marks of each
section (section A and section B) of term final answer scripts and total marks. Like this the
similar records of all other taken courses are stored in the database with the corresponding
student id. And by this process the records of other students are stored in the database one after
another after the corresponding Gender and Hall Status of a particular student. Another attribute
is stored as Student Type by which we have determined the student type- regular, retentive or
abandoned. As, for applying Apriori algorithm of Association Rule Mining, we have to set the
value of attribute in discrete form. So, record such as student id has been omitted in the universal
table.
Student
Grade
Sheet
Course
achieves represents
Knowledge Discovery from Academic Data using Association Rule Mining Page | 24
Table 3.3: Partial portion of universal database
3.2.3 Data Transformation
The universal database of Table 3.3 has been transformed into an equivalent transformation table
by transforming the continuous valued attribute as discrete valued attribute representing some
knowledge for the suitability of implementing Apriori algorithm of Association Rule Mining. As
for example, CGPA is a continuous attribute and it has been transformed into five classifications
as excellent, very good, good, average and poor. We have used one algorithm for transforming
all continuous numbers for attendance, class tests, and both sections of answer scripts of term
final and total marks of a course. We have used another algorithm for transforming all grade or
grade points of courses or overall CGPA into those five classifications.
For transforming the numbers of universal table i.e., attendance, class tests, section A, section B,
total marks of each course, Algorithm1 has been developed to populate the transformed table in
such a way that there is no continuous value in an entry.
Gender
Hall_
Status
Student_
Type
CSE
103_Grade
CSE103_
Attendan
ce
CSE103
_CT
CSE103_
Section A
CSE 103_
SectionB
CSE103
_Total
… Male Resident Regular A+ 30 55 90 75 250
Female Non-
Resident
Regular A
25
45
85
70
225 … … … … … … … … …
Algoithm1: Marks_Transformation ( )
Input: marks of Attendance, CT, Section A, Section B, Total Marks of each course from Universal
Table of Studentlist
Output: discrete level of marks for the Transformation Table
for i=1 to | Studentlist |
if (marks>=80%)
level = “Excellent”
else if (marks<80% && marks>=75%)
level = “Very Good”
else if (marks<75% && marks>=60%)
level = “Good”
else if(marks<60% && marks>=50%)
level = “Average”
else if(marks<50%)
level = “Poor”
end for
Knowledge Discovery from Academic Data using Association Rule Mining Page | 25
Similarly the grades of universal table are also transformed by an algorithm named as
Algorithm2. As the real data set contains CGPA in grade points we similarly consider another
variable grade point and transformed the continuous value of CGPA to these five classified
definitions.
As there are theory courses of credit 4.0, 3.0 and 2.0 and sessional with credit hour 1.5 and 0.75,
we need different transformation rule tables for all these different courses. Below,
Transformation rules for 3.0 credit hour (in Table 3.4), for 4.0 credit hour (in Table 3.5), for 2.0
credit hour (in Table 3.6) theory courses and for all sessional courses (in Table 3.7) are
illustrated.
Table 3.4: Transformation rule table for 3.0 credit theory course
Algoithm2: Grade_Transformation ( )
Input: all acquired Grade of each courses in the Courselist of the universal table
Output: transformed_ grade for the Transformation Table
for i=1 to | Courselist |
if grade = A+
transformed_grade = „Excellent‟
else if grade = A
transformed_grade = „Very Good‟
else if grade = A- or B+
transformed_grade = „Good‟
else if grade = B
transformed_grade = „Average‟
else if grade = B- or C+ or C or D
transformed_grade = „Poor‟
end for
Classified
Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 27≤ M ≤30 48≤M≤60 84≤M≤105 240≤M≤300
Very Good 24≤ M ≤26 45≤M≤47 78≤M≤83 225≤M≤239
Good 21≤ M ≤23 36≤M≤44 63≤M≤77 180≤M≤224
Average 18≤ M ≤20 30≤M≤35 52≤M≤62 150≤M≤179
Poor 0≤ M ≤17 0≤M≤29 0≤M≤51 0≤M≤149
Knowledge Discovery from Academic Data using Association Rule Mining Page | 26
Table 3.5: Transformation rule table for 4.0 credit theory course
Table 3.6: Transformation rule table for 2.0 credit theory course
Table 3.7: Transformation rule table for all sessional courses
Classified
Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 36≤ M ≤40 64≤M≤80 112≤M≤140 320≤M≤400
VeryGood 32≤ M ≤35 60≤M≤63 105≤M≤111 300≤M≤319
Good 28≤ M ≤31 48≤M≤49 84≤M≤104 240≤M≤299
Average 24≤ M ≤27 40≤M≤47 70≤M≤83 200≤M≤239
Poor 0≤ M ≤23 0≤M≤39 0≤M≤69 0≤M≤199
Classified
Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 18≤ M ≤20 32≤M≤40 56≤M≤70 160≤M≤200
Very Good 16≤ M ≤17 30≤M≤31 52≤M≤55 150≤M≤159
Good 14≤ M ≤15 24≤M≤29 42≤M≤51 120≤M≤149
Average 12≤ M ≤13 20≤M≤23 35≤M≤41 100≤M≤119
Poor 0≤ M ≤11 0≤M≤19 0≤M≤34 0≤M≤99
Classified
Name
Range of Marks (M)
Sessional Credit Hour=1.5 Sessional Credit Hour=0.75
Excellent 120≤ M ≤150 60≤ M ≤75
Very Good 112≤ M ≤119 56≤ M ≤59
Good 90≤ M ≤111 45≤ M ≤55
Average 75≤ M ≤89 37≤ M ≤44
Poor 0≤ M ≤74 0≤ M ≤36
Knowledge Discovery from Academic Data using Association Rule Mining Page | 27
To construct the entire transformed table as given in Table 3.8, we have used the universal table
and above transformation rules.
Table 3.8: Transformed table from universal table
3.2 Summary of Methodologies
Methodologies of knowledge discovery from academic data using association rule mining can be
summarized as below:
1. Before applying Association Rule Data Mining technique on institutional data of
BUET, academic data is needed to be analyzed and preprocessed in the following steps:
i. At first we have selected relevant data from BIIS database and categorized into
personal and academic information of a particular student of CSE department who
have already graduated.
ii. We have developed a technique to transform the existing relational database
into a universal database format using both academic and personal data of
students.
iii. We have manipulated universal database and developed transformation rule to
transform the continuous data into discrete value.
iv. We have developed algorithms to transform the universal database into a
discrete valued transformed database using the transformation rules.
2. We have applied the Apriori algorithm on the transformed database to find association
rules.
Gender Hall_Status Student_Type CSE103_
Grade
CSE103_
Attendance
CSE103_CT CSE103_
SectionA
CSE103_
SectionB
CSE103_
Total
…… Male Resident Regular Excellent Excellent Excellent Excellent Good Excellent
Female Non-resident Regular Very Good Very Good Very Good Excellent Good Very Good
…. …. …. …. …. …. …. …. ….
Knowledge Discovery from Academic Data using Association Rule Mining Page | 28
Chapter 4
System Implementation, Results and
Discussions 4.1 Software Implementation
In order to implement the proposed academic data mining system, we have used mainly two
softwares which are Microsoft Excel and Weka.
Microsoft Excel:
The existing relational BIIS data has been given in excel file format (.xls). That is why
for the preprocessing steps we have used Microsoft Excel 2010 which is easier and
more convenient. For transforming the existing relational database into a universal
format using both academic and personal data of students of CSE department who have
already graduated, we have used all the necessary tools and scripting Macros provided
in Excel. After that we have used another add-in of Microsoft Excel named Kutools for
manipulating universal database and transformed continuous data into discrete value. By
using this tool we have easily implemented the developed algorithms and transformation
rule tables for converting all continuous data e.g., records of theory and sessional
courses, overall CGPA etc.
Weka:
After preprocessing step, we have obtained a transformed table which is suitable for
applying Association Rule Mining algorithm. In this regard, we have used Weka [23], a
popular suite of machine learning software written in Java, developed at the University
of Waikato, New Zealand. We have applied Apriori algorithm to the transformation
table with predefined minimum support and confidence using Weka to generate
interesting Association Rules. We have used Weka Explorer which provides
convenient and easy to use interface for generating specific number of rules with certain
metric of support and confidence to the full or partial transformation database. This is
very useful for choosing support and confidence and selecting important association
rules from huge number of generated rules.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 29
4.2 Dataset and Application Environment
In this experiment, we have considered the data up to the last five graduated batch in the
department of CSE, BUET. The institutional dataset of BUET consist academic and personal
data of 9210 students in last 10 years. We have categorized relevant academic and personal
information of those students which are gender, hall status, admission year, completed credit
hour, all records of theory and sessional courses, overall CGPA etc. from the relational BIIS
database and transformed into universal table structure. Finally we transformed it into a
transformed table structure for applying association rule mining. The entire experimental setup is
illustrated in Figure 4.1.
Figure 4.1: Experimental Setup for applying Apriori Algorithm using Weka Explorer to
generate Association Rules
BUET Institutional Dataset of 9210 Students
of All Departments in Last 10 years
Gender Hall Status Admission Year Completed CreditHour
All Records of Theory & Sessional Courses Overall CGPA
Universal Table Structure
Regular 552
Student Type
Retentive 26
Abandoned 4
Male 473
Gender
Female 109
Resident 348
Hall Status
Non Resident 234
Theory Course 40
Attendance Classtest Section A Section B Total Grade
Sessional
Course 28
Total Marks Grade
Transformation Table Structure
Regular 552
Student Type
Retentive 26
Abandoned 4
Male 473
Gender
Female 109
Resident 348
Hall Status
Non Resident 234
Poor Average Good Very Good Excellent
All Marks & Grade of 68 Theory & Sessional Courses
Including Overall CGPA of 582 Students
Knowledge Discovery from Academic Data using Association Rule Mining Page | 30
After preprocessing step, we have obtained a transformed table of 582 students of department of
CSE who have already graduated. Universal table also contain one additional attribute which is
student type – retentive, regular or abandoned. Student type is obtained by analyzing completed
credit hour and admission year. We have manipulated the transformation table containing all
continuous data transformed into five discrete value- Excellent, Very Good, Good, Average and
Poor. Finally we have used Weka Explorer to the transformation table (in .csv file format) to
generate interesting Association Rules.
4.3 Results and Discussions
4.3.1 Impact of Gender
We have found the impact of gender in the overall academic performance. This indication is very
important in terms of socio economic condition of the country. In BUET majority of the students
are male and lives in the university dormitories. There are multiple factors that affect the
academic environment and students‟ academic performance. The result of Table 4.1 points out
that the male students have a very high confidence level with the poor CGPA. The reason is that
male students are generally affected by various societal problems of a third world country like
Bangladesh. All other rules support that the academic performance of female students is better
than the male students.
Table 4.1: Impact of Gender
No. Generated Interesting Rules Minimum Support Confidence
01 CGPA=Poor Gender=male 10% 87%
02 CGPA=Average Gender=male 10% 79%
03 CGPA=Very Good Gender=male 10% 83%
04 Gender=male CGPA=Good 10% 26%
05 Gender=male CGPA=Average 10% 21%
06 CGPA=Good Gender=female 5% 22%
07 CGPA=Average Gender=female 5% 21%
08 CGPA=Excellent Gender=female 5% 20%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 31
4.3.2 Impact of Residence
In BUET, most of the students live in institution hall. But the number of students live in home is
also significant fact. Analyzing the rules we have found that both the students of hall and the
students residing at home get good CGPA with a descent minimum support and confidence (in
table 4.2). So if any student wants to do well in academic prospect he can do from anywhere.
Table 4.2: Impact of Hall Status
No Generated Interesting Rules Minimum Support Confidence
01 CGPA=Average
Hall_Status=Resident
10% 65%
02 CGPA=Very Good
Hall_Status=Resident
10% 63%
03 CGPA=Good Hall_Status=Non-
Resident
10% 43%
04 CGPA=Good Hall_Status=Resident
Gender=male
10% 82%
But it is found that the percentage of getting poor CGPA is high in hall. Because in hall, there is
very little restriction and sometimes there is no one to take care of a student as family members
do. So a student can be demoralize and get a very poor grade due to lack of studies. And as
shown in rule number 1 in table 4.3, the percentage of male resident students is higher in this
regard. In most of the cases, it is inevitable that the poor CGPA holders are resident of hall (rule
number 1 and 5 of table 4.3).
Knowledge Discovery from Academic Data using Association Rule Mining Page | 32
Table 4.3: Impact of Hall Status and Gender
No Generated Interesting Rules Minimum
Support
Confidence
01 CGPA=Poor Gender=male
Hall_Status=Resident
5% 51%
02 CGPA=Very Good Gender=male
Hall_Status=Non-Resident
5% 40%
03 Hall_Status=Non-Resident Gender=
female CGPA=Average
5% 24%
04 Hall_Status=Resident Gender=female
CGPA=Good
5% 21%
05 CGPA=Poor Hall_Status=Resident 5% 52%
4.3.3 Correlation between Courses
The analyzed Association Rules show that the grade of one course may depend on prerequisite
courses. In rule number 1 we find that if anyone gets excellent grade in CSE105, he/she gets
excellent grade in the course CSE205 too with a confidence of 0.48 where CSE105 is Structured
Programming Language course and CSE201 is Object Oriented Programming Language course.
We also discover that the interrelation of course CSE311 (Data Communication-I) and CSE321
(Networking) in rule number 6, 7 and 8. We also find the impact of course CSE205 (Digital
Logic Design) and CSE209 (Digital Electronics and Pulse Technique) on course CSE403
(Digital System Design) in rule number 10 in Table 4.4.
Table 4.4: Correlation between Courses
No Generated Interesting Rules Minimum
Support
Confidence
01 CSE105_Grade=Excellent
CSE201_Grade=Excellent
10% 48%
02 CSE201_Grade=Very Good
CSE105_Grade=Very Good
5% 30%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 33
4.3.4 Impact on Retention
If any student fails to pass any course then he becomes retentive because he needs to take that
course again later to complete his graduation. We find that retentive students usually struggle
with the grades in rule number 2, 3, 4, 5 and 6. If a student has not passed in CSE100 which is
the first fundamental course of CSE, he or she is retentive i.e., he or she has not passed in the
later departmental courses also. This is illustrated by the generated rule no. 1 in the Table 4.5.
Moreover, we have discovered that maximum retentive student are hall resident and male which
are illustrated in rule number 7 and 8 respectively with a high confidence in the Table 4.5.
Table 4.5: Impact on Retention
03 EEE163_Grade=Excellent
EEE263_Grade=Very Good
5% 27%
04 CSE205_Grade=Excellent
CSE403_Grade=Excellent
10% 50%
05 CSE403_Grade=Poor
CSE205_Grade=Average
5% 28%
06 CSE321_Grade=Average
CSE311_Grade=Average
5% 36%
07 CSE321_Grade=F
CSE311_Grade=Poor
3% 13%
08 CSE321_Grade=Poor
CSE311_Grade=Poor
3% 16%
09 CSE205_Grade=Very Good
CSE209_Grade=Excellent
CSE403_Grade=Excellent
5% 53%
No Generated Interesting Rules Minimum
Support
Confidence
01 CSE100_Grade=F
Student Type=Retentive
5% 42%
02 Student Type=Retentive
MATH243_Grade=Poor
5% 35%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 34
4.3.5 Impact on Abandonment
The students who have given up their academic studies without completing all the required
courses are typed as „abandoned‟. By analyzing the rules illustrated in Table 4.6, it is discovered
that with a high confidence, the abandoned students are male and resident of hall. But the
minimum value of support is very low. Thus it is found that the rate of abandonment is very low
in the CSE department of this university.
Table 4.6: Impact on Abandonment
No Generated Interesting Rules Minimum Support Confidence
01 Student Type=Abandoned Gender=male
0.5% 100%
02 Student Type=Abandoned
Hall_Status=Resident
0.5% 75%
03 Student Type=Abandoned Gender=male Hall_Status=Resident
0.5% 75%
03 Student Type=Retentive
CSE205_Grade=Average
5% 35%
04 Student Type=Retentive
CSE311_Grade=Average
5% 27%
05 Student Type=Retentive
EEE263_Grade=Poor
5% 33%
06 Student Type=Retentive
CSE409_Grade=Average
5% 43%
07 Student Type=Retentive
Hall_Status=Resident
5% 65%
08 Student Type=Retentive Gender=male 5% 81%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 35
4.3.6 Impact of Continuous Assessment
The grading of a course depends on various aspects such as marks of attendance, class test, both
sections of term final examination. From rule number 7 which has a maximum confidence value
1.00, we have discovered that the excellent grade of a course depends on the excellent
performance of all other aspects of continuous assessment. Again, the performance of class test
depends on attendance which is illustrated by rule number 5 in Table 4.7 with a confidence of
0.95 which is very high.
Table 4.7: Impact of Continuous Assessment
No Generated Interesting Rules Minimum
Support
Confidence
01 CSE103_Attendance=Excellent
CSE103_SectionB=Poor
CSE103_Grade=Average
10% 63%
02 CSE103_Grade=Very Good
CSE103_CT=Good CSE103_Attendance=
Excellent
10% 97%
03 EEE163_Grade=Average
EEE163_SectionB=Poor
10% 57%
04 EEE163_Grade=Very Good
EEE163_Attendance= Excellent
EEE163_CT=Excellent
10% 67%
05 HUM275_CT=Excellent
HUM275_Attendance= Excellent
10% 95%
06 HUM275_CT=Excellent
HUM275_SectionA=Good
HUM275_Grade=Very Good
HUM275_Attendance= Excellent
10% 75%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 36
07 CSE401_Grade=Excellent
CSE401_CT=Excellent CSE401_SectionA=
Excellent CSE401_Attendance= Excellent
10% 100%
08 CSE401_SectionB=Excellent
CSE401_Grade=Good
10% 75%
4.3.7 Impact of Non Departmental Courses
After analyzing the generated Association Rules (in Table 4.8) we observed various impacts of
non-departmental courses on academic performances. According to curriculum we need to take
some non-departmental courses‟ performance which is added to the final result. So it may
happen that some students get poor grades in those non departmental courses. But according to
generated rules though the good performance of the non-departmental courses brings good grade
but the impact of getting poor grade in non-departmental courses causes less harm to the final
CGPA because those courses are less in quantity and maximum of those are studied at the
beginning of undergraduate level. So students get enough opportunities to improve their CGPA
later.
Table 4.8: Impact of Non Departmental Courses
No Generated Interesting Rules Minimum
Support
Confidence
01 CGPA=Very Good
HUM272_Grade=Very Good
10% 73%
02 CGPA=Very Good
MATH143_Grade=Average
5% 37%
03 CGPA=Good EEE163_Grade=Average 5% 36%
04 CGPA=Very Good
CHEM101_Grade=Average
10% 52%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 37
05 CGPA=Average IPE493_Grade=Very
Good
5% 29%
06 CGPA=Good ME165_Grade=Average 10% 43%
07 CGPA=Average
MATH243_Grade=Poor
5% 27%
4.3.8 Impact of Departmental Courses
As there are too many departmental courses are studied and there some inter connection between
some courses because of prerequisite courses, the result of departmental courses affect the final
CGPA very much. From the analyzed rules, it is found that the good grade of departmental
courses brings good CGPA. On the other hand poor grade in departmental courses results in poor
overall CGPA. This significant knowledge is discovered from the rules illustrated by the impact
of departmental courses in Table 4.9.
Table 4.9: Impact of Departmental Courses
No Generated Interesting Rules Minimum
Support
Confidence
01 CGPA=Very Good
CSE100_Grade=Very Good
5% 42%
02 CGPA=Very Good
CSE105_Grade=Average
5% 31%
03 CGPA=Very Good
CSE206_Grade=Very Good
10% 44%
04 CGPA=Good
CSE303_Grade=Average
5% 31%
05 CGPA=Poor ==> CSE321_Grade=Poor 5% 29%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 38
06 CGPA=Excellent
CSE401_Grade=Excellent
5% 50%
07 CGPA=Average
CSE401_Grade=Average
5% 29%
08 CGPA=Average
CSE409_Grade=Average
5% 42%
Knowledge Discovery from Academic Data using Association Rule Mining Page | 39
Chapter 5
Conclusions
5.1 Summary of the Findings
Knowledge discovery from academic data is very important to improve the academic
performance of any higher educational institution. In this research, we study the academic
system, the existing problems and the performance data of the most renowned Engineering
University of Bangladesh. We have found problems like abandonment, retention and potentiality
decay of the most brilliant students. We have applied Association Rule Data Mining technique to
explore the root of the cause of the above problems.
Before applying the data mining algorithm, the existing academic data has been preprocessed to
make it suitable for data mining. We have developed a data transformation technique that
transforms the relational database into an equivalent universal relational format. In this format,
we have also transformed the continuous data into discrete valued qualitative data. We have
found interesting Association Rules applying Apriori Association Rule generator on the
transformed data using WEKA tool. From the large number of association rules, we have
extracted the interesting rules regarding the impacts of gender, residence, continuous assessment
on the academic performance. We have also found the association among the courses, retention
and abandonment. The obtained result is found to be very much significant for the decision
maker to improve the overall academic condition of the institution.
According to the results found, 10% of 582 students of CSE department who have already
graduated are male and have CGPA below 3.00 and the probability of being male students
among poor CGPA holders is 0.87. Again, we have discovered that, 5% of total students have
poor CGPA and they are hall resident and the probability of hall resident among poor CGPA
holders is 0.52. We have also discovered the significant correlation between courses. For
example, more than 58 students have excellent grades in both CSE105 (Structured Programming
Language) and CSE201 (Object Oriented Programming Language). The probability of having
excellent grade in CSE201 among students having excellent grade in CSE105 is 0.48. We have
Knowledge Discovery from Academic Data using Association Rule Mining Page | 40
found that there are about 30 students who has to retake MATH243 courses. We found that 5%
of total male students are both retentive and hall resident and 65% of total retentive students are
hall resident. Abandonment rate is very low in CSE department of BUET as we found that only 3
male students dropped out before completing graduation and 75% of abandoned students were
hall resident. We have also determined the impact of several Non departmental courses. For
example, more than 60 students possess very good grade in HUM272 as well as have CGPA
over 3.50. We have also determined the impact of several departmental courses. For example,
5% of 582 students have CGPA over 3.75 and have got A+ in CSE 401. 50% of students having
CGPA over 3.75 have obtained A+ in CSE 401.
We hope all these quantitative findings will be helpful to the decision maker for improving the
quality of education provided in this department. We have applied the technique to only the CSE
department of BUET but it is applicable to any department of any higher educational institute.
5.2 Future Works
In this research, we have considered the data of only the department of CSE, but there are ten
other different departments also in this university. In the future work, we can apply the same
technique to extract knowledge from the data of all other departments of BUET. Again, we have
considered only the undergraduate records of students. We can modify the technique so that it
can be applicable to the postgraduate course and curriculum for the betterment of postgraduate
studies. We can also develop a recommendation system by designing a classifier using present
dataset as training data and classify the students based on their performance.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 41
REFERENCES
[1] Han, J. , Kamber, M., and Pei, J. Data Mining: Concepts and Techniques. Morgan
Kaufmann Publishers, San Francisco, 2011.
[2] Bangladesh University of Engineering and Technology: General Information. Available
from http://www.buet.ac.bd/?page_id=5; accessed 24 February, 2014.
[3] The Department of Computer Science and Engineering, Bangladesh University of
Engineering and Technology: General Information. Available from
http://www.buet.ac.bd/cse/geninfo/index.php ; accessed 24 February, 2014.
[4] http://researcher.watson.ibm.com/researcher/view_group.php?id=144
[5] http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.ht
m
[6] Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets
of items in large databases. In Proceedings of the 1993 ACM SIGMOD International
Conference on Management of Data, 207-216.
[7] Hegland, M., Algorithms for Association Rules, Lecture Notes in Computer Science,
Volume 2600, Jan 2003, Pages 226 - 234
[8] Romero, C., Educational Data Mining: A Review of the State-of-the-Art. In Proceedings of
IEEE Transaction on System, Man and Cybernetics, Part C:Applications and Reviews, 40(6).
601-618.
[9] Baradwaj, B.K., and Pal, S., Mining Educational Data to Analyze Students‟ Performance.
International Journal of Advanced Computer Science and Applications (IJACSA), (2011),
2(6), 63-69.
[10] Salazar, A., Gosalbez, J., Bosch, I., Miralles, R., and Vegara, L., A case study of
knowledge discovery on academic achievement, student desertion and student retention,
Information Technology: Research and Education, ITRE 2004. 2nd International Conference,
(June 2004).
Knowledge Discovery from Academic Data using Association Rule Mining Page | 42
[11] Merceron, A., Yacef, K., Educational Data Mining a Case Study. In Proceedings of the
12th international Conference on Artificial Intelligence in Education AIED, (2005).
[12] Kumar, V., and Sharma, V., Students-Examination-Result-Mining-A-Predictive-
Approach. International Journal of Advanced Computer Science and Applications (IJSER),
3(11). (Nov. 2012).
[13] Zhang, Y., Oussena, S., Clark, T., Kim, H., Using data mining to improve student
retention in HE: a case study. In: 12th International Conerence on Enterprise Information
Systems, ICEIS, (Portugal, 8-12 June, 2010).
[14] Lauria J.M., E., Baron, J. D., Devireddy, M., Sundararaju, V., Jayprakash, S.M., Mining
academic data to improve college student retention: An open source perspective. In
Proceedings of the 2nd International Conference on Learning Analytics and Knowledg.
LAK‟12. ACM, NewYork, NY, 139-142.
[15] Kumar V., Chandha, A., Mining Association Rules in Student‟s Assessment Data.
International Journal of Computer Science Issues (IJCSI), 9(5). (Sep. 2012).
[16] Romero, C., Romero, J.R., Luna, M.J., and Ventura, S., Mining Rare Association Rules
from e-Learning Data. Educational Data Mining (EDM), 2010.
[17] Pandey, U.K., Pal, S., A Data Mining view on Class Room Teaching Language.
International Journal of Computer Science Issues (IJCSI), 8(2). (Mar. 2011).
[18] Ajith P., Tejaswi, B., U.K., Sai, M.S.S., Rule Mining Framework for Students
Performance Evaluation. International Journal of Soft Computing and Engineering (IJSCE),
2(6). (Jan. 2013).
[19] Garcia, E., Romero, C., Ventura, S., Calders, T., Drawbacks and solutions of applying
association rule mining in learning management systems. In Proceedings of the International
Workshop on Applying Data in e-Learning, (2007).
[20] Chaturvedi , R., Ezeife , C.I., Mining the Impact of Course Assignments on Student
Performance , EDM 2013, 308-309.
[21] Oladipupo, O.O., Oyelade, O.J., Knowledge Discovery from Students‟ Result
Repository: Association Rule Mining Approach, International Journal of Computer Science
& Security (IJCSS), 4(2) , 2011.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 43
[22] Hoque, A.S.Md. L., Paul, R., Ahmed, S., Preprocessing of Academic Data for Mining
Association Rule, In Proceedings of the Workshop on Advances in Data Management:
Applications and Algorithms, WADM, 2013.
[23] Hall, M., Frank, E., Holmes, G., Pfahringer, B. , Reutemann, P., Ian H. Witten (2009);
The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 44
APPENDIX
Initial Dataset of BIIS
Figure A1: Partial Portion of Initial Dataset (.xls format in Excel) of BIIS
Knowledge Discovery from Academic Data using Association Rule Mining Page | 45
Partial Portion of Universal Table
Figure A2: Partial Portion of Universal data (.xls format in Excel) before converting to
transformation table
Knowledge Discovery from Academic Data using Association Rule Mining Page | 46
Partial Portion of Transformation Table
Figure A3: Partial Portion of Transformation table (.xls format in Excel)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 47
Using Weka Explorer
Figure A4: Tranformation Table (in .csv format) loaded into Weka Explorer
Knowledge Discovery from Academic Data using Association Rule Mining Page | 48
FigureA5: Selecting Association Algorithm after loading transformation table in Weka Explorer
Figure A6: Choosing support and confidence metrics with number of rules in Weka Explorer
Knowledge Discovery from Academic Data using Association Rule Mining Page | 49
Figure A7: After Choosing specific support and confidence metrics with number of rules in
Weka Explorer, the associator needs to be started for generating association rules.
Knowledge Discovery from Academic Data using Association Rule Mining Page | 50
Run Information for first 200 Rules Using Weka Explorer
Scheme: weka.associations.Apriori -N 200 -T 0 -C 0.5 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation: Full Transformation Table
Instances: 578
Attributes: 308
[list of attributes omitted]
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.85 (491 instances)
Minimum metric <confidence>: 0.5
Number of cycles performed: 3
Generated sets of large itemsets:
Size of set of large itemsets L(1): 24
Size of set of large itemsets L(2): 120
Size of set of large itemsets L(3): 150
Size of set of large itemsets L(4): 18
Best rules found:
1. CSE425_Attendance=Excellent PHY109_Attendance=Excellent 505 ==>
EEE269_Attendance=Excellent 492 conf:(0.97)
2. EEE263_Attendance=Excellent EEE269_Attendance=Excellent 522 ==>
CSE405_Attendance=Excellent 508 conf:(0.97)
3. CSE425_Attendance=Excellent ME165_Attendance=Excellent 522 ==>
EEE163_Attendance=Excellent 508 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 51
4. CSE405_Attendance=Excellent PHY109_Attendance=Excellent 519 ==>
EEE269_Attendance=Excellent 505 conf:(0.97)
5. CSE411_Attendance=Excellent EEE269_Attendance=Excellent 515 ==>
CSE405_Attendance=Excellent 501 conf:(0.97)
6. CSE401_Attendance=Excellent EEE269_Attendance=Excellent 513 ==>
CSE405_Attendance=Excellent 499 conf:(0.97)
7. Student Type=Regular CSE405_Attendance=Excellent ME165_Attendance=Excellent 512
==> EEE163_Attendance=Excellent 498 conf:(0.97)
8. Student Type=Regular EEE269_Attendance=Excellent ME165_Attendance=Excellent 512
==> EEE163_Attendance=Excellent 498 conf:(0.97)
9. CSE425_Attendance=Excellent EEE263_Attendance=Excellent 511 ==>
CSE405_Attendance=Excellent 497 conf:(0.97)
10. CSE401_Attendance=Excellent ME165_Attendance=Excellent 508 ==>
EEE163_Attendance=Excellent 494 conf:(0.97)
11. CSE425_Attendance=Excellent CSE431_Attendance=Excellent 507 ==>
CSE405_Attendance=Excellent 493 conf:(0.97)
12. CSE425_Attendance=Excellent CSE431_Attendance=Excellent 507 ==>
EEE269_Attendance=Excellent 493 conf:(0.97)
13. CSE405_Attendance=Excellent CSE425_Attendance=Excellent
ME165_Attendance=Excellent 507 ==> EEE163_Attendance=Excellent 493 conf:(0.97)
14. CSE431_Attendance=Excellent EEE263_Attendance=Excellent 506 ==>
CSE405_Attendance=Excellent 492 conf:(0.97)
15. CSE425_Attendance=Excellent EEE269_Attendance=Excellent
ME165_Attendance=Excellent 506 ==> EEE163_Attendance=Excellent 492 conf:(0.97)
16. CSE205_Attendance=Excellent 505 ==> EEE269_Attendance=Excellent 491 conf:(0.97)
17. CSE411_Attendance=Excellent CSE425_Attendance=Excellent 505 ==>
CSE405_Attendance=Excellent 491 conf:(0.97)
18. Student Type=Regular CSE405_Attendance=Excellent CSE425_Attendance=Excellent 505
==> EEE269_Attendance=Excellent 491 conf:(0.97)
19. CSE425_Attendance=Excellent EEE269_Attendance=Excellent 530 ==>
CSE405_Attendance=Excellent 515 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 52
20. Student Type=Regular ME165_Attendance=Excellent 529 ==>
EEE163_Attendance=Excellent 514 conf:(0.97)
21. CSE103_Attendance=Excellent EEE269_Attendance=Excellent 525 ==>
CSE405_Attendance=Excellent 510 conf:(0.97)
22. EEE269_Attendance=Excellent 558 ==> CSE405_Attendance=Excellent 542 conf:(0.97)
23. CSE425_Attendance=Excellent ME165_Attendance=Excellent 522 ==>
CSE405_Attendance=Excellent 507 conf:(0.97)
24. Student Type=Regular CSE425_Attendance=Excellent 521 ==>
EEE269_Attendance=Excellent 506 conf:(0.97)
25. CSE431_Attendance=Excellent EEE269_Attendance=Excellent 521 ==>
CSE405_Attendance=Excellent 506 conf:(0.97)
26. EEE163_Attendance=Excellent EEE263_Attendance=Excellent 521 ==>
CSE405_Attendance=Excellent 506 conf:(0.97)
27. EEE269_Attendance=Excellent PHY109_Attendance=Excellent 520 ==>
CSE405_Attendance=Excellent 505 conf:(0.97)
28. CSE431_Attendance=Excellent EEE163_Attendance=Excellent 517 ==>
CSE405_Attendance=Excellent 502 conf:(0.97)
29. EEE263_Attendance=Excellent ME165_Attendance=Excellent 517 ==>
CSE405_Attendance=Excellent 502 conf:(0.97)
30. EEE163_Attendance=Excellent PHY109_Attendance=Excellent 516 ==>
EEE269_Attendance=Excellent 501 conf:(0.97)
31. CSE103_Attendance=Excellent CSE425_Attendance=Excellent 514 ==>
CSE405_Attendance=Excellent 499 conf:(0.97)
32. CSE401_Attendance=Excellent EEE163_Attendance=Excellent 514 ==>
CSE405_Attendance=Excellent 499 conf:(0.97)
33. ME165_Attendance=Excellent PHY109_Attendance=Excellent 514 ==>
EEE163_Attendance=Excellent 499 conf:(0.97)
34. ME165_Attendance=Excellent PHY109_Attendance=Excellent 514 ==>
EEE269_Attendance=Excellent 499 conf:(0.97)
35. Student Type=Regular PHY109_Attendance=Excellent 513 ==>
EEE269_Attendance=Excellent 498 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 53
36. CSE425_Attendance=Excellent 547 ==> CSE405_Attendance=Excellent 531 conf:(0.97)
37. MATH143_Attendance=Excellent 512 ==> CSE405_Attendance=Excellent 497
conf:(0.97)
38. MATH143_Attendance=Excellent 512 ==> EEE269_Attendance=Excellent 497
conf:(0.97)
39. Student Type=Regular CSE431_Attendance=Excellent 512 ==>
EEE269_Attendance=Excellent 497 conf:(0.97)
40. CSE411_Attendance=Excellent EEE163_Attendance=Excellent 512 ==>
CSE405_Attendance=Excellent 497 conf:(0.97)
41. CSE425_Attendance=Excellent EEE163_Attendance=Excellent
EEE269_Attendance=Excellent 511 ==> CSE405_Attendance=Excellent 496 conf:(0.97)
42. CSE103_Attendance=Excellent EEE263_Attendance=Excellent 510 ==>
CSE405_Attendance=Excellent 495 conf:(0.97)
43. EEE263_Attendance=Excellent 542 ==> CSE405_Attendance=Excellent 526 conf:(0.97)
44. Student Type=Regular CSE401_Attendance=Excellent 508 ==>
EEE163_Attendance=Excellent 493 conf:(0.97)
45. CSE401_Attendance=Excellent ME165_Attendance=Excellent 508 ==>
CSE405_Attendance=Excellent 493 conf:(0.97)
46. CSE411_Attendance=Excellent ME165_Attendance=Excellent 508 ==>
CSE405_Attendance=Excellent 493 conf:(0.97)
47. CSE313_Attendance=Excellent CSE405_Attendance=Excellent
ME165_Attendance=Excellent 508 ==> EEE163_Attendance=Excellent 493 conf:(0.97)
48. CSE425_Attendance=Excellent EEE163_Attendance=Excellent
ME165_Attendance=Excellent 508 ==> CSE405_Attendance=Excellent 493 conf:(0.97)
49. CSE313_Attendance=Excellent EEE269_Attendance=Excellent
ME165_Attendance=Excellent 507 ==> EEE163_Attendance=Excellent 492 conf:(0.97)
50. Student Type=Regular CSE425_Attendance=Excellent EEE269_Attendance=Excellent 506
==> CSE405_Attendance=Excellent 491 conf:(0.97)
51. CSE103_Attendance=Excellent EEE163_Attendance=Excellent
EEE269_Attendance=Excellent 506 ==> CSE405_Attendance=Excellent 491 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 54
52. CSE425_Attendance=Excellent EEE269_Attendance=Excellent
ME165_Attendance=Excellent 506 ==> CSE405_Attendance=Excellent 491 conf:(0.97)
53. CSE431_Attendance=Excellent 538 ==> CSE405_Attendance=Excellent 522 conf:(0.97)
54. EEE163_Attendance=Excellent EEE269_Attendance=Excellent 537 ==>
CSE405_Attendance=Excellent 521 conf:(0.97)
55. PHY109_Attendance=Excellent 536 ==> EEE269_Attendance=Excellent 520 conf:(0.97)
56. EEE163_Attendance=Excellent ME165_Attendance=Excellent 535 ==>
CSE405_Attendance=Excellent 519 conf:(0.97)
57. CSE405_Attendance=Excellent ME165_Attendance=Excellent 535 ==>
EEE163_Attendance=Excellent 519 conf:(0.97)
58. CSE411_Attendance=Excellent 534 ==> CSE405_Attendance=Excellent 518 conf:(0.97)
59. Student Type=Regular EEE269_Attendance=Excellent 534 ==>
CSE405_Attendance=Excellent 518 conf:(0.97)
60. Student Type=Regular CSE405_Attendance=Excellent 534 ==>
EEE269_Attendance=Excellent 518 conf:(0.97)
61. EEE269_Attendance=Excellent ME165_Attendance=Excellent 533 ==>
CSE405_Attendance=Excellent 517 conf:(0.97)
62. EEE269_Attendance=Excellent ME165_Attendance=Excellent 533 ==>
EEE163_Attendance=Excellent 517 conf:(0.97)
63. CSE401_Attendance=Excellent 532 ==> CSE405_Attendance=Excellent 516 conf:(0.97)
64. CSE313_Attendance=Excellent EEE269_Attendance=Excellent 531 ==>
CSE405_Attendance=Excellent 515 conf:(0.97)
65. CSE405_Attendance=Excellent CSE425_Attendance=Excellent 531 ==>
EEE269_Attendance=Excellent 515 conf:(0.97)
66. CSE425_Attendance=Excellent EEE163_Attendance=Excellent 528 ==>
CSE405_Attendance=Excellent 512 conf:(0.97)
67. CSE313_Attendance=Excellent ME165_Attendance=Excellent 525 ==>
EEE163_Attendance=Excellent 509 conf:(0.97)
68. CSE103_Attendance=Excellent EEE163_Attendance=Excellent 524 ==>
CSE405_Attendance=Excellent 508 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 55
69. EEE163_Attendance=Excellent 556 ==> CSE405_Attendance=Excellent 539 conf:(0.97)
70. CSE405_Attendance=Excellent CSE431_Attendance=Excellent 522 ==>
EEE269_Attendance=Excellent 506 conf:(0.97)
71. CSE425_Attendance=Excellent ME165_Attendance=Excellent 522 ==>
EEE269_Attendance=Excellent 506 conf:(0.97)
72. Student Type=Regular CSE425_Attendance=Excellent 521 ==>
CSE405_Attendance=Excellent 505 conf:(0.97)
73. Student Type=Regular CSE425_Attendance=Excellent 521 ==>
EEE163_Attendance=Excellent 505 conf:(0.97)
74. CSE103_Attendance=Excellent ME165_Attendance=Excellent 521 ==>
CSE405_Attendance=Excellent 505 conf:(0.97)
75. CSE103_Attendance=Excellent ME165_Attendance=Excellent 521 ==>
EEE163_Attendance=Excellent 505 conf:(0.97)
76. CSE407_Attendance=Excellent EEE269_Attendance=Excellent 521 ==>
CSE405_Attendance=Excellent 505 conf:(0.97)
77. CSE313_Attendance=Excellent CSE425_Attendance=Excellent 520 ==>
CSE405_Attendance=Excellent 504 conf:(0.97)
78. CSE313_Attendance=Excellent CSE425_Attendance=Excellent 520 ==>
EEE269_Attendance=Excellent 504 conf:(0.97)
79. ME165_Attendance=Excellent 552 ==> CSE405_Attendance=Excellent 535 conf:(0.97)
80. ME165_Attendance=Excellent 552 ==> EEE163_Attendance=Excellent 535 conf:(0.97)
81. MATH141_Attendance=Excellent 517 ==> CSE405_Attendance=Excellent 501
conf:(0.97)
82. MATH141_Attendance=Excellent 517 ==> EEE269_Attendance=Excellent 501
conf:(0.97)
83. Student Type=Regular EEE263_Attendance=Excellent 517 ==>
CSE405_Attendance=Excellent 501 conf:(0.97)
84. CSE313_Attendance=Excellent EEE263_Attendance=Excellent 517 ==>
CSE405_Attendance=Excellent 501 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 56
85. CSE431_Attendance=Excellent EEE163_Attendance=Excellent 517 ==>
EEE269_Attendance=Excellent 501 conf:(0.97)
86. EEE263_Attendance=Excellent ME165_Attendance=Excellent 517 ==>
EEE163_Attendance=Excellent 501 conf:(0.97)
87. EEE163_Attendance=Excellent EEE269_Attendance=Excellent
ME165_Attendance=Excellent 517 ==> CSE405_Attendance=Excellent 501 conf:(0.97)
88. CSE405_Attendance=Excellent EEE269_Attendance=Excellent
ME165_Attendance=Excellent 517 ==> EEE163_Attendance=Excellent 501 conf:(0.97)
89. EEE163_Attendance=Excellent PHY109_Attendance=Excellent 516 ==>
CSE405_Attendance=Excellent 500 conf:(0.97)
90. Student Type=Regular EEE163_Attendance=Excellent EEE269_Attendance=Excellent 516
==> CSE405_Attendance=Excellent 500 conf:(0.97)
91. Student Type=Regular CSE405_Attendance=Excellent EEE163_Attendance=Excellent 516
==> EEE269_Attendance=Excellent 500 conf:(0.97)
92. CSE431_Attendance=Excellent ME165_Attendance=Excellent 515 ==>
CSE405_Attendance=Excellent 499 conf:(0.97)
93. CSE431_Attendance=Excellent ME165_Attendance=Excellent 515 ==>
EEE163_Attendance=Excellent 499 conf:(0.97)
94. CSE425_Attendance=Excellent 547 ==> EEE269_Attendance=Excellent 530 conf:(0.97)
95. CSE103_Attendance=Excellent CSE425_Attendance=Excellent 514 ==>
EEE269_Attendance=Excellent 498 conf:(0.97)
96. ME165_Attendance=Excellent PHY109_Attendance=Excellent 514 ==>
CSE405_Attendance=Excellent 498 conf:(0.97)
97. Student Type=Regular EEE163_Attendance=Excellent ME165_Attendance=Excellent 514
==> CSE405_Attendance=Excellent 498 conf:(0.97)
98. Student Type=Regular EEE163_Attendance=Excellent ME165_Attendance=Excellent 514
==> EEE269_Attendance=Excellent 498 conf:(0.97)
99. CSE103_Attendance=Excellent 544 ==> CSE405_Attendance=Excellent 527 conf:(0.97)
100. Student Type=Regular CSE431_Attendance=Excellent 512 ==>
CSE405_Attendance=Excellent 496 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 57
101. Student Type=Regular EEE269_Attendance=Excellent ME165_Attendance=Excellent 512
==> CSE405_Attendance=Excellent 496 conf:(0.97)
102. Student Type=Regular CSE405_Attendance=Excellent ME165_Attendance=Excellent 512
==> EEE269_Attendance=Excellent 496 conf:(0.97)
103. CSE405_Attendance=Excellent CSE425_Attendance=Excellent
EEE163_Attendance=Excellent 512 ==> EEE269_Attendance=Excellent 496 conf:(0.97)
104. CSE313_Attendance=Excellent CSE431_Attendance=Excellent 511 ==>
CSE405_Attendance=Excellent 495 conf:(0.97)
105. CSE313_Attendance=Excellent CSE431_Attendance=Excellent 511 ==>
EEE269_Attendance=Excellent 495 conf:(0.97)
106. CSE313_Attendance=Excellent PHY109_Attendance=Excellent 511 ==>
EEE269_Attendance=Excellent 495 conf:(0.97)
107. CSE313_Attendance=Excellent EEE163_Attendance=Excellent
EEE269_Attendance=Excellent 511 ==> CSE405_Attendance=Excellent 495 conf:(0.97)
108. Student Type=Regular CSE411_Attendance=Excellent 510 ==>
CSE405_Attendance=Excellent 494 conf:(0.97)
109. CSE407_Attendance=Excellent CSE425_Attendance=Excellent 510 ==>
CSE405_Attendance=Excellent 494 conf:(0.97)
110. CSE407_Attendance=Excellent CSE425_Attendance=Excellent 510 ==>
EEE269_Attendance=Excellent 494 conf:(0.97)
111. CSE313_Attendance=Excellent CSE401_Attendance=Excellent 509 ==>
CSE405_Attendance=Excellent 493 conf:(0.97)
112. CSE313_Attendance=Excellent CSE411_Attendance=Excellent 509 ==>
CSE405_Attendance=Excellent 493 conf:(0.97)
113. CSE407_Attendance=Excellent CSE411_Attendance=Excellent 509 ==>
CSE405_Attendance=Excellent 493 conf:(0.97)
114. CSE313_Attendance=Excellent EEE163_Attendance=Excellent
ME165_Attendance=Excellent 509 ==> CSE405_Attendance=Excellent 493 conf:(0.97)
115. Student Type=Regular CSE401_Attendance=Excellent 508 ==>
CSE405_Attendance=Excellent 492 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 58
116. Student Type=Regular CSE313_Attendance=Excellent EEE269_Attendance=Excellent 508
==> CSE405_Attendance=Excellent 492 conf:(0.97)
117. Student Type=Regular CSE313_Attendance=Excellent CSE405_Attendance=Excellent 508
==> EEE269_Attendance=Excellent 492 conf:(0.97)
118. CSE425_Attendance=Excellent EEE163_Attendance=Excellent
ME165_Attendance=Excellent 508 ==> EEE269_Attendance=Excellent 492 conf:(0.97)
119. CSE313_Attendance=Excellent EEE269_Attendance=Excellent
ME165_Attendance=Excellent 507 ==> CSE405_Attendance=Excellent 491 conf:(0.97)
120. CSE405_Attendance=Excellent CSE425_Attendance=Excellent
ME165_Attendance=Excellent 507 ==> EEE269_Attendance=Excellent 491 conf:(0.97)
121. CSE431_Attendance=Excellent 538 ==> EEE269_Attendance=Excellent 521 conf:(0.97)
122. PHY109_Attendance=Excellent 536 ==> CSE405_Attendance=Excellent 519 conf:(0.97)
123. Student Type=Regular EEE163_Attendance=Excellent 533 ==>
CSE405_Attendance=Excellent 516 conf:(0.97)
124. Student Type=Regular EEE163_Attendance=Excellent 533 ==>
EEE269_Attendance=Excellent 516 conf:(0.97)
125. CSE313_Attendance=Excellent CSE405_Attendance=Excellent 532 ==>
EEE269_Attendance=Excellent 515 conf:(0.97)
126. Student Type=Regular ME165_Attendance=Excellent 529 ==>
CSE405_Attendance=Excellent 512 conf:(0.97)
127. Student Type=Regular ME165_Attendance=Excellent 529 ==>
EEE269_Attendance=Excellent 512 conf:(0.97)
128. CSE313_Attendance=Excellent EEE163_Attendance=Excellent 529 ==>
CSE405_Attendance=Excellent 512 conf:(0.97)
129. CSE405_Attendance=Excellent 560 ==> EEE269_Attendance=Excellent 542 conf:(0.97)
130. CSE425_Attendance=Excellent EEE163_Attendance=Excellent 528 ==>
EEE269_Attendance=Excellent 511 conf:(0.97)
131. CSE103_Attendance=Excellent CSE405_Attendance=Excellent 527 ==>
EEE269_Attendance=Excellent 510 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 59
132. CSE313_Attendance=Excellent ME165_Attendance=Excellent 525 ==>
CSE405_Attendance=Excellent 508 conf:(0.97)
133. CSE405_Attendance=Excellent CSE407_Attendance=Excellent 522 ==>
EEE269_Attendance=Excellent 505 conf:(0.97)
134. Student Type=Regular 552 ==> CSE405_Attendance=Excellent 534 conf:(0.97)
135. Student Type=Regular 552 ==> EEE269_Attendance=Excellent 534 conf:(0.97)
136. CSE203_Attendance=Excellent 521 ==> CSE405_Attendance=Excellent 504 conf:(0.97)
137. CSE203_Attendance=Excellent 521 ==> EEE269_Attendance=Excellent 504 conf:(0.97)
138. CSE313_Attendance=Excellent 550 ==> CSE405_Attendance=Excellent 532 conf:(0.97)
139. Student Type=Regular CSE103_Attendance=Excellent 519 ==>
CSE405_Attendance=Excellent 502 conf:(0.97)
140. Student Type=Regular CSE103_Attendance=Excellent 519 ==>
EEE163_Attendance=Excellent 502 conf:(0.97)
141. Student Type=Regular CSE103_Attendance=Excellent 519 ==>
EEE269_Attendance=Excellent 502 conf:(0.97)
142. CSE103_Attendance=Excellent CSE313_Attendance=Excellent 519 ==>
CSE405_Attendance=Excellent 502 conf:(0.97)
143. CSE407_Attendance=Excellent EEE163_Attendance=Excellent 518 ==>
CSE405_Attendance=Excellent 501 conf:(0.97)
144. CSE405_Attendance=Excellent CSE411_Attendance=Excellent 518 ==>
EEE269_Attendance=Excellent 501 conf:(0.97)
145. Student Type=Regular CSE407_Attendance=Excellent 517 ==>
EEE269_Attendance=Excellent 500 conf:(0.97)
146. CSE401_Attendance=Excellent CSE405_Attendance=Excellent 516 ==>
EEE163_Attendance=Excellent 499 conf:(0.97)
147. CSE401_Attendance=Excellent CSE405_Attendance=Excellent 516 ==>
EEE269_Attendance=Excellent 499 conf:(0.97)
148. EEE163_Attendance=Excellent PHY109_Attendance=Excellent 516 ==>
ME165_Attendance=Excellent 499 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 60
149. CSE431_Attendance=Excellent ME165_Attendance=Excellent 515 ==>
EEE269_Attendance=Excellent 498 conf:(0.97)
150. CSE103_Attendance=Excellent CSE407_Attendance=Excellent 514 ==>
CSE405_Attendance=Excellent 497 conf:(0.97)
151. CSE103_Attendance=Excellent CSE425_Attendance=Excellent 514 ==>
EEE163_Attendance=Excellent 497 conf:(0.97)
152. CSE407_Attendance=Excellent ME165_Attendance=Excellent 514 ==>
CSE405_Attendance=Excellent 497 conf:(0.97)
153. CSE407_Attendance=Excellent ME165_Attendance=Excellent 514 ==>
EEE163_Attendance=Excellent 497 conf:(0.97)
154. Student Type=Regular PHY109_Attendance=Excellent 513 ==>
CSE405_Attendance=Excellent 496 conf:(0.97)
155. Student Type=Regular PHY109_Attendance=Excellent 513 ==>
EEE163_Attendance=Excellent 496 conf:(0.97)
156. CSE401_Attendance=Excellent EEE269_Attendance=Excellent 513 ==>
EEE163_Attendance=Excellent 496 conf:(0.97)
157. CSE313_Attendance=Excellent CSE405_Attendance=Excellent
EEE163_Attendance=Excellent 512 ==> EEE269_Attendance=Excellent 495 conf:(0.97)
158. CSE313_Attendance=Excellent PHY109_Attendance=Excellent 511 ==>
CSE405_Attendance=Excellent 494 conf:(0.97)
159. CSE425_Attendance=Excellent EEE263_Attendance=Excellent 511 ==>
EEE269_Attendance=Excellent 494 conf:(0.97)
160. CSE407_Attendance=Excellent 540 ==> CSE405_Attendance=Excellent 522 conf:(0.97)
161. Student Type=Regular CSE411_Attendance=Excellent 510 ==>
EEE269_Attendance=Excellent 493 conf:(0.97)
162. CSE405_Attendance=Excellent EEE163_Attendance=Excellent 539 ==>
EEE269_Attendance=Excellent 521 conf:(0.97)
163. CSE207_Attendance=Excellent 509 ==> CSE405_Attendance=Excellent 492 conf:(0.97)
164. CSE313_Attendance=Excellent CSE401_Attendance=Excellent 509 ==>
EEE163_Attendance=Excellent 492 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 61
165. CSE313_Attendance=Excellent EEE163_Attendance=Excellent
ME165_Attendance=Excellent 509 ==> EEE269_Attendance=Excellent 492 conf:(0.97)
166. Student Type=Regular CSE401_Attendance=Excellent 508 ==>
EEE269_Attendance=Excellent 491 conf:(0.97)
167. CSE411_Attendance=Excellent ME165_Attendance=Excellent 508 ==>
EEE163_Attendance=Excellent 491 conf:(0.97)
168. Student Type=Regular CSE313_Attendance=Excellent EEE163_Attendance=Excellent 508
==> CSE405_Attendance=Excellent 491 conf:(0.97)
169. Student Type=Regular CSE313_Attendance=Excellent CSE405_Attendance=Excellent 508
==> EEE163_Attendance=Excellent 491 conf:(0.97)
170. Student Type=Regular CSE313_Attendance=Excellent EEE269_Attendance=Excellent 508
==> EEE163_Attendance=Excellent 491 conf:(0.97)
171. Student Type=Regular CSE313_Attendance=Excellent EEE163_Attendance=Excellent 508
==> EEE269_Attendance=Excellent 491 conf:(0.97)
172. CSE103_Attendance=Excellent CSE405_Attendance=Excellent
EEE163_Attendance=Excellent 508 ==> EEE269_Attendance=Excellent 491 conf:(0.97)
173. CSE313_Attendance=Excellent CSE405_Attendance=Excellent
ME165_Attendance=Excellent 508 ==> EEE269_Attendance=Excellent 491 conf:(0.97)
174. CSE405_Attendance=Excellent ME165_Attendance=Excellent 535 ==>
EEE269_Attendance=Excellent 517 conf:(0.97)
175. EEE163_Attendance=Excellent ME165_Attendance=Excellent 535 ==>
EEE269_Attendance=Excellent 517 conf:(0.97)
176. Student Type=Regular CSE405_Attendance=Excellent 534 ==>
EEE163_Attendance=Excellent 516 conf:(0.97)
177. Student Type=Regular EEE269_Attendance=Excellent 534 ==>
EEE163_Attendance=Excellent 516 conf:(0.97)
178. CSE401_Attendance=Excellent 532 ==> EEE163_Attendance=Excellent 514 conf:(0.97)
179. CSE313_Attendance=Excellent EEE163_Attendance=Excellent 529 ==>
EEE269_Attendance=Excellent 511 conf:(0.97)
180. EEE163_Attendance=Excellent 556 ==> EEE269_Attendance=Excellent 537 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 62
181. Student Type=Regular CSE313_Attendance=Excellent 526 ==>
CSE405_Attendance=Excellent 508 conf:(0.97)
182. Student Type=Regular CSE313_Attendance=Excellent 526 ==>
EEE163_Attendance=Excellent 508 conf:(0.97)
183. Student Type=Regular CSE313_Attendance=Excellent 526 ==>
EEE269_Attendance=Excellent 508 conf:(0.97)
184. CSE405_Attendance=Excellent EEE263_Attendance=Excellent 526 ==>
EEE269_Attendance=Excellent 508 conf:(0.97)
185. CSE313_Attendance=Excellent ME165_Attendance=Excellent 525 ==>
EEE269_Attendance=Excellent 507 conf:(0.97)
186. CSE103_Attendance=Excellent EEE163_Attendance=Excellent 524 ==>
EEE269_Attendance=Excellent 506 conf:(0.97)
187. Student Type=Regular 552 ==> EEE163_Attendance=Excellent 533 conf:(0.97)
188. ME165_Attendance=Excellent 552 ==> EEE269_Attendance=Excellent 533 conf:(0.97)
189. CSE313_Attendance=Excellent 550 ==> EEE269_Attendance=Excellent 531 conf:(0.97)
190. CSE301_Attendance=Excellent 521 ==> CSE405_Attendance=Excellent 503 conf:(0.97)
191. CSE103_Attendance=Excellent ME165_Attendance=Excellent 521 ==>
EEE269_Attendance=Excellent 503 conf:(0.97)
192. CSE313_Attendance=Excellent CSE425_Attendance=Excellent 520 ==>
EEE163_Attendance=Excellent 502 conf:(0.97)
193. CSE103_Attendance=Excellent CSE313_Attendance=Excellent 519 ==>
EEE269_Attendance=Excellent 501 conf:(0.97)
194. CSE405_Attendance=Excellent EEE163_Attendance=Excellent
ME165_Attendance=Excellent 519 ==> EEE269_Attendance=Excellent 501 conf:(0.97)
195. CSE425_Attendance=Excellent 547 ==> EEE163_Attendance=Excellent 528 conf:(0.97)
196. CSE407_Attendance=Excellent EEE163_Attendance=Excellent 518 ==>
EEE269_Attendance=Excellent 500 conf:(0.97)
197. Student Type=Regular CSE405_Attendance=Excellent EEE269_Attendance=Excellent 518
==> EEE163_Attendance=Excellent 500 conf:(0.97)
Knowledge Discovery from Academic Data using Association Rule Mining Page | 63
198. Student Type=Regular CSE407_Attendance=Excellent 517 ==>
CSE405_Attendance=Excellent 499 conf:(0.97)
199. Student Type=Regular EEE263_Attendance=Excellent 517 ==>
EEE163_Attendance=Excellent 499 conf:(0.97)
200. Student Type=Regular EEE263_Attendance=Excellent 517 ==>
EEE269_Attendance=Excellent 499 conf:(0.97)