internship recommendation system (irs)greenskill.net/suhailan/fyp/report/037221.pdf · internship...
TRANSCRIPT
INTERNSHIP RECOMMENDATION SYSTEM (IRS)
BASED ON STUDENT’S COURSE ACHIEVEMENT
USING K-MEANS CLUSTERING
ARIFAH MUNIRAH BINTI ZULKAFLI
BACHELOR OF COMPUTER SCIENCE
(INTERNET COMPUTING)
UNIVERSITI SULTAN ZAINAL ABIDIN
2017
INTERNSHIP RECOMMENDATION SYSTEM (IRS) BASED ON
STUDENT’S COURSE ACHIEVEMENT USING K-MEANS CLUSTERING
ARIFAH MUNIRAH BINTI ZULKAFLI
Bachelor of Computer Science (Internet Computing)
Faculty of Informatics and Computing
Universiti Sultan Zainal Abidin, Terengganu, Malaysia
MAY 2017
i
DECLARATION
I hereby declare that this report is based on my original work except for quotations
and citations, which have been duly acknowledged. I also declare that it has not been
previously or concurrently submitted for any other degree at Universiti Sultan Zainal
Abidin or other institutions.
________________________________
Name : ..................................................
Date : ..................................................
ii
CONFIRMATION
This is to confirm that this project entitled Internship Recommendation System (IRS)
Based on Student’s Course Achievement Using K-Means Clustering was prepared and
submitted by Arifah Munirah Binti Zulkafli (Matric Number: BTCL14037221) and
has been satisfactory in terms of scope, quality and presentation as partial fulfilment
of the requirement for the Bachelor of Computer Science (Internet Computing) with
honors in Universiti Sultan Zainal Abidin. The research conducted and the writing of
this report was under my supervisor.
________________________________
Name : Dr Suhailan Bin Dato’ Safei
Date : ..................................................
iii
DEDICATION
In the Name of Allah, the Most Gracious and the Most Merciful.
Alhamdulillah, I completely finish writing this research project. This research
project could not have been conducted without the support, encouragement and
cooperation of many people. Here I would like to express my deepest gratitude to my
supervisor, Dr Suhailan B. Dato’ Safei, who has always given valuable advice and
encouragement at each phase throughout in developing this project successfully. I
would like to thank him giving the opportunity to learn and work under his guidance,
which has been the most memorable experience.
I want to take this opportunity to praise my parents who give me full support to
keep study for my life and also special thanks to all lecturer of Faculty of Informatics
and Computing for their guidance and advices to help in development of this project.
Last but not least, my sincere thanks to my friends that always give support and help
to finish up this project.
Thank you.
iv
ABSTRACT
Selection of student placement in Internship based on CGPA is no longer
appropriate. Therefore, the Industry needs to get student based on their skill and
expertise in a particular field to add more expert in Industry. The obvious problem
here is, student skill has been determine based on their CGPA that are not reflected the
actual skill of student. So that, CGPA is not suitable to group student according to a
particular skill. Thus, this project may help student to be grouped based on their skill
strength according to their course grade. The result can be used for better internship
placement that are suitable based on their skills and interest. . In realizing this
solution, K-Means Clustering technique will be used. K-Means Clustering are is an
unsupervised learning algorithm that tries to cluster data based on their similarity.
Score range student that are unsupervised will be grouped based on their similarity.
The similarity result of students’ that has been distributed based on clustering may
help university to distribute them in placement of Internship that is suitable with their
interest and expertise.
v
ABSTRAK
Pemilihan pelajar untuk penempatan latihan industri berdasarkan CGPA tidak
lagi sesuai digunakan. Oleh itu, indusrti perlu memilih pelajar bedasarkan
kemahiran dan kepakaran dalam beberapa bidang untuk menambah lebih banyak
tenaga pakar di dalam sesebuah industri. Masalah yang paling ketara di sini
adalah kebolehan pelajar diukur berdasarkan CGPA mereka tidak
menggambarkan kemahiran sebenar mereka. Oleh itu, penggunaan CGPA adalah
tidak sesuai untuk mengasingkan pelajar dalam kelompok untuk kemahiran
tertentu. Projek ini akan membantu pelajar untuk di kumpulkan berdasarkan
kemahiran dan kekuataan bersesuaian dengan pencapaian subjek. Hasil
keputusan ini boleh digunakan untuk penempatan latihan industri yang sesuai
berdasarkan kemahiran dan minat mereka. Untuk merealisasikan penyelesaian
ini, Teknik Kelompok K-Mean akan digunakan. K-Means Kelompok adalah satu
algorithma pembelajaran yang tidak teratur dan cuba untuk dikumpulkan
berdasarkan persamaan mereka. Julat markah pelajar yang tidak teratur akan di
kumpulkan berdasarkan persamaan.Hasil persamaan dari keputusan pengagihan
pelajar megikut kelompok ini boleh membantu pihak universiti bagi mengagihkan
mereka di tempat latihan industri yang bersesuaian berdasarkan minat dan
kepakaran mereka.
vi
TABLE OF CONTENT
CONFIRMATION..................................................................................................... ii
DEDICATION ......................................................................................................... iii
ABSTRACT ............................................................................................................. iv
ABSTRAK .................................................................................................................v
CHAPTER 1 INTRODUCTION ...............................................................................1
1.1 Background ......................................................................................................1
1.2 Problem Statement ...........................................................................................2
1.3 Objectives ........................................................................................................2
1.4 Scope ...............................................................................................................2
1.4.1. Scope of User ..........................................................................................2
1.4.1.1 Admin .....................................................................................................2
1.4.1.2 Student ....................................................................................................2
1.4.1.3 Lecturer ...................................................................................................3
1.4.2 Scope of System ......................................................................................3
1.4.2.1 Internship Placement (UniSZA) Student .................................................3
1.5 Limitation of Work ...........................................................................................3
1.5.1 Scope of the system..................................................................................3
1.5.1.1 Internship Placement ...............................................................................3
1.6 Thesis Organization ..........................................................................................4
vii
1.7 Expected Outcome ...........................................................................................4
CHAPTER 2 LITERATURE REVIEW .....................................................................5
2.1 Introduction .....................................................................................................5
2.2 Current Problem of This Project .......................................................................5
2.3 Similar System .................................................................................................6
2.4 Analysis Gap ....................................................................................................8
2.5 K-means Clustering Technique ........................................................................9
2.5.1 Introduction to K-means clustering .............................................................9
2.5.2 K-means Clustering Algorithm ...................................................................9
2.6 Summary ....................................................................................................... 10
CHAPTER 3 METHODOLOGY OF SOFTWARE DEVELOPMENT ..................... 11
3.0 Introduction ................................................................................................... 11
3.1 Planning and Requirement Phase ................................................................... 12
3.2 Analysis and Design Phase ............................................................................. 13
3.2.1 Context Diagram ................................................................................... 13
3.2.2 Data Flow Diagram (DFD) .................................................................... 15
3.2.2.1 Data-Flow Diagram Level 0 .................................................................. 15
3.2.2.2 Data-Flow Diagram Level 1 .................................................................. 18
3.2.2.2.1 User Registration Process ................................................................... 18
3.2.2.2.2 Manage Company List ........................................................................ 19
3.2.2.2.3 Manage Subject List ........................................................................... 20
viii
3.2.2.2.2 Cluster Group of Student .................................................................... 20
3.2.3 Entity Relationship Diagram (ERD) ......................................................... 21
3.2.4 Data Dictionary ........................................................................................ 22
3.2.6.1 K-means Clustering Algorithm .............................................................. 25
3.2.6.2 Example of K-means clustering Algorithm ............................................ 27
3.3 Requirement................................................................................................... 29
3.3.1. Software Requirement ............................................................................. 29
3.3.2 Hardware Requirement ............................................................................. 30
3.4 Summary ....................................................................................................... 30
REFERENCES ......................................................................................................... 31
ix
LIST OF TABLE
Table 2-1: Summary based on area cover of the system...........................................7
Table 2-2: Analysis of gap ......................................................................................8
Table 3-1: Data Dictionary for student .................................................................. 22
Table 3-2: Data Dictionary for subject_list ............................................................ 23
Table 3-3: Data Dictionary for subject_mark ........................................................ 23
Table 3-4: Data Dictionary for lecturer ................................................................. 23
Table 3-5: Data Dictionary for admin .................................................................... 24
Table 3-6: Data Dictionary for company_list ........................................................ 24
Table 3-7: Data Dictionary for student_company .................................................. 24
3.0 Introduction .................................................................................................... 11
Figure 3- 1: Iterative Model .................................................................................. 12
3.1 Planning and Requirement Phase .................................................................... 12
3.2 Analysis and Design Phase .............................................................................. 13
3.2.1 Context Diagram ...................................................................................... 13
Figure 3-2: Context Diagram for Internship Recommendation System (IRS)......... 14
3.2.2 Data Flow Diagram (DFD) ....................................................................... 15
3.2.2.1 Data-Flow Diagram Level 0 .................................................................. 15
Figure 3-3: DFD Level 0 for Admin ...................................................................... 15
x
Figure 3-4: DFD Level 0 for Lecturer ................................................................... 16
Figure 3-5: DFD Level 0 for Student..................................................................... 17
3.2.2.2 Data-Flow Diagram Level 1 .................................................................. 18
3.2.2.2.1 User Registration Process ................................................................... 18
Figure 3-6: DFD Level 1 for process registration users ......................................... 18
3.2.2.2.2 Manage Company List ........................................................................ 19
Figure 3-7: DFD Level 1 for process manage company list ................................... 19
3.2.2.2.3 Manage Subject List ........................................................................... 20
Figure 3-8: DFD Level 1 for process manage subject list ...................................... 20
3.2.2.2.2 Cluster Group of Student .................................................................... 20
Figure 3-9: DFD Level 1 for process clustering student subject marks .................. 20
3.2.3 Entity Relationship Diagram (ERD) ......................................................... 20
Figure 3-10: ERD for IRS ..................................................................................... 22
3.0 Introduction .................................................................................................... 11
Figure 3- 1: Iterative Model .................................................................................. 12
3.1 Planning and Requirement Phase .................................................................... 12
3.2 Analysis and Design Phase .............................................................................. 13
3.2.1 Context Diagram ...................................................................................... 13
Figure 3-2: Context Diagram for Internship Recommendation System (IRS)......... 14
3.2.2 Data Flow Diagram (DFD) ....................................................................... 15
xi
3.2.2.1 Data-Flow Diagram Level 0 .................................................................. 15
Figure 3-3: DFD Level 0 for Admin ...................................................................... 15
Figure 3-4: DFD Level 0 for Lecturer ................................................................... 16
Figure 3-5: DFD Level 0 for Student..................................................................... 17
3.2.2.2 Data-Flow Diagram Level 1 .................................................................. 18
3.2.2.2.1 User Registration Process ................................................................... 18
Figure 3-6: DFD Level 1 for process registration users ......................................... 18
3.2.2.2.2 Manage Company List ........................................................................ 19
Figure 3-7: DFD Level 1 for process manage company list ................................... 19
3.2.2.2.3 Manage Subject List ........................................................................... 19
Figure 3-8: DFD Level 1 for process manage subject list ...................................... 19
3.2.2.2.2 Cluster Group of Student .................................................................... 19
Figure 3-9: DFD Level 1 for process clustering student subject marks .................. 19
3.2.3 Entity Relationship Diagram (ERD) ......................................................... 19
Figure 3-10: ERD for IRS ..................................................................................... 22
xii
LIST OF FIGURES
Figure 3- 1: Iterative Model .................................................................................. 12
Figure 3-2: Context Diagram for Internship Recommendation System (IRS)......... 14
Figure 3-3: DFD Level 0 for Admin ...................................................................... 15
Figure 3-4: DFD Level 0 for Lecturer ................................................................... 16
Figure 3-5: DFD Level 0 for Student..................................................................... 17
Figure 3-6: DFD Level 1 for process registration users ......................................... 18
Figure 3-7: DFD Level 1 for process manage company list ................................... 19
Figure 3-8: DFD Level 1 for process manage subject list ...................................... 20
Figure 3-9: DFD Level 1 for process clustering student subject marks .................. 19
Figure 3-10: ERD for IRS ..................................................................................... 22
xiii
LIST OF ABBREVIATIONS / TERMS / SYMBOLS
CD Context Diagram
DFD Data Flow Diagram
ERD Entity Relationship Diagram
IRS Internship Recommendation System
1
CHAPTER I
INTRODUCTION
1.1 Background
Internship component is a vital part of the university training program for students to
gain the required skills for employment in pursuit of degree certification. However,
some students faced problems to choose their internship placement because they did
not know their strength and interest. Cumulative Grade Point Average (CGPA) is
commonly used as indicator for academic achievement. Many higher learning
institution set a minimum CGPA requirement set for student is 1.5. Whereas, for any
graduated program, CGPA of 3.00 and above are considered as good achievement.
While in this case, grouping of students into different categories according to their
achievement are not reliable and has become a complicated task. With traditional
grouping of students based on their average scores, it is hard to acquire a view of the
state of the students’ achievement. Emphasised are this Internship Recommendation
System (IRS) Based on Student’s Course Achievement Using K-Means Clustering
will be implemented to help students to overcome this problem. This system will
analyse student course grade based on clustering analysis with K-Means Algorithm.
Thus, students need to fill up their grade of course subject and they will be distributed
in groups that are similarly based on K-Means Clustering Algorithm. Then, student
will know their level of strength and interest. So that, they can apply for internship
placement that are suitable for them, that has been recommended by the system.
2
1.2 Problem Statement
The main problem is the Higher Institutions used Cumulative Grade Point Average
(CGPA) as indicator for student achievement. Whereas, students’ achievement that
has been determined based on CGPA is not reliable to figure the actual skill of
student. However, CGPA also cannot be used to group student based on their skill.
These scenarios will affect the academic performance of student if the placement of
internship that are selected are not suitable based on their skill and strength.
1.3 Objectives
1. To analyse the problem of internship placement for final year student in
university.
2. To design a proposed system of Internship Recommendation System (IRS)
Based on Student’s Course Achievement Using K-Means Clustering
3. To develop system of Internship Recommendation System (IRS) Based on
Student’s Course Achievement Using K-Means Clustering.
1.4 Scope
1.4.1. Scope of User
1.4.1.1 Admin
Admin can manage user profile, which are lecturer profile and student profile. Admin
can also create, update and delete user profile.
1.4.1.2 Student
Student can access the system anytime and anywhere. Student can manage profile,
manage subject’s marks and review student’s application to a company result. The
Profile Module may consist of add, update and delete student details. In manage
subject mark module, students need to fill up all their course achievement marks in the
system. Other than that, student can view a recommendation for their internship
placement. Moreover, student can view overall report about their activity that they
have made.
3
1.4.1.3 Lecturer
Lecturer can manage their own profile by updating their profile details. Other than
that, lecturer will create company list by add, update and delete company details.
Lecturer also will create subject list by add, update and delete subject details. In Select
company module, lecturer must select subject’s marks and company list to generate
the recommendation for student internship placement. Lastly, lecturer can review
overall report about their activity that they has been made before.
1.4.2 Scope of System
1.4.2.1 Internship Placement Universiti Sultan Zainal Abidin (UniSZA) Student
The internship placement that is recommended by the system is for Bachelors Degree
student in Universiti Sultan Zainal Abidin (UniSZA) from Faculty of Informatics and
Computing which consists of 4 programs.
1.5 Limitation of Work
1.5.1 Scope of the system
1.5.1.1 Internship Placement
It is limited recommended internship placement because the system only cover
internship placement for several established company to keep the reputation of
university by serving all the company with prepared student with strength and skill
that will help the growth of company.
4
1.6 Thesis Organization
The thesis organization is in six (6) chapters. In the Chapter 1, the content consists of
project background, problem statement of project, the objective and system scope.
Then, Chapter 2 is about the study of literature review. This chapter is reviewing the
previous systems. The next is Chapter 3, describes the methodology of research. This
research used iterative model.
Chapter 4 explains the system’s framework and design. Then, Chapter 5 is all about
implementation, testing and result. Lastly, Chapter 6 is the conclusion of the whole
project.
1.7 Expected Outcome
This system is expected to group student based on similar course achievement and
assign them with a suitable internship placement that suit their skill. This project may
help students to know their groups by using one of the modules in this web-based
system to find their internship placement carefully. Finally, students will be given a
list of internship placements that is suitable with their range of group.
5
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
This chapter describes and explains the selected literature review about technique used
in the development of an Internship Recommendation System (IRS) Based on
Student’s Course Achievement Using K-Means Clustering.
2.2 Current Problem of This Project
There are several current problems that made me wanted to manage the problems by
proposing to develop the Student Group Distribution Based on Similar Course
Achievement (SGDSCA). According to The Star Newspaper, entitled: Datuk Seri
Idris Jusoh: CGPA is not everything. He mentioned that a prefect Cumulative Grade
Point Achievement (CGPA) cannot guarantee students a place in the course of their
choice in a public university. Some university such as Universiti Malaya (UM) and
Universiti Kebangsaan Malaysia (UKM) does not select student based on their
academic performance but also look their curricular activities. Through this statement,
classifying student based on their CGPA achievement are not effective enough to
know their skill and strength.
6
2.3 Similar System
Based on literature review, a few similar systems are found. First, Application of k-
Means Clustering algorithm for prediction of students’ Academic Performance
(APSAP). The critical issue is the ability to monitor the progress of students’
academic performance. The goals of this system is to implement K-Means clustering
algorithm for analysing students’ result data. The advantages of this system is, good
benchmark to monitor the progression of academic performance of student.(Oyelade,
Oladipupo, & Obagbuwa, 2010)
Second system is Data mining Application: A comparative Study for Predicting
Student’s Performance. The problem here, educational institution does not use any
knowledge discovery process approach on data. Thus, this system may use data
mining methodologies to study students’ performance in the course. The advantages
of this system are it helps earlier identifying of the dropouts and students who need
special attention and allow the teacher to provide appropriate advising. (Yadav,
Bharadwaj, & Pal, 2012)
Third system is A New Collaborative Recommendation Approach Based on User
Clustering Using Artificial Bee Colony Algorithm. Problem statement here, a
challenge to increase the diversity of methods to fulfil users’ preferences. The goals of
this system is to propose a novel collaborative filtering recommendation approach
based on K-Means clustering algorithm. The advantage of this system is, an excellent
recommendation method meets high accuracy and certain diversity. (Ju & Xu, 2013)
7
Next System is, A Clustering Grouping Model for Enhancing Collaborative Learning.
The problem here is to achieve diversity within a group effectively and automatically.
Thus, this system may implement diversity within a group effectively and
automatically using k-Means Clustering. The advantage of this system is, it can
improve the effectiveness of the collaborative learning. (Pang, Xiao, Wang, & Xue,
2014)
Table 2-1: Summary based on area cover of the system
Area Cover A1 A2 A3 A4
Skill Achievement X X X X
Subject Performance Final exam Course work X X
Learning
Collaborative
X X / /
It has been shown that, Most of the current research covers on subject performance
and learning collaborative. However, there is lacking of research that cluster student’s
performance based on overall subjects marks. Thus, my project will cover on skill
achievement among final year student based on their similar course achievement using
K-Means Clustering.
8
2.4 Analysis Gap
Tab
le 2
-2:
Tab
le A
naly
sis
of
gap
9
2.5 K-means Clustering Technique
2.5.1 Introduction to K-means clustering
K-Means Clustering is an unsupervised pattern classification method that divides a set
of given data into cluster, such that data in the same group that are more similar to
each other. It is one of the most important method in data analysis (C. S. Li, 2011).
Clustering analysis is based on various kinds of objects’ differences and uses distance
functions’ regulations to make model classification (Y. Li & Wu, 2012). K-Means
clustering is proposed by J.B.MacQueen (Zhang Yufang, 2003).
2.5.2 K-means Clustering Algorithm
K-Means Clustering is the simplest unsupervised learning technique that can solve
clustering problem. The step follows a simple and easy way to classify a given set of
data set through a certain number of cluster (assume k clusters) fixed a prior. First, is
to define k centroids, one for each cluster. These centroids should be placed in a wily
way because of different location cause different result. So, is better to place them as
much as possible far away from each other.
The next step is to take each point belonging to a given data set and associated it to a
nearest centroid. When no point is pending the first step is done. At this point, re-
calculated k new centroids as centre of the clusters resulting from the previous step is
needed. Then, after this k new centroids, a new binding has to be done between the
same data points and nearest new centroids. A loop has been generated, until it notice
that the k centroids change their location step by step until no more changes are done.
In the simplest words, centroids do not move any more.
10
Below is the algorithm of K-Means Clustering:
Where S is a K-cluster partition of the entity set represent by vectors 𝑦𝑖 (𝑖 ∈1) in the
M-dimensional features space, consisting of non-empty non-overlapping clusters 𝑆𝑘 ,
each with a centroid 𝑐𝑘(k=1,2,…K).
The algorithm is composed of the following steps:
1. Place k points in the space represented by the objects that are being clustered.
This point is initial group centroids.
2. Assign each object to the group that has the closest centroids.
3. When all object have been assigned, recalculate the position of the k centroids.
4. Repeat step 2 and 3 until the centroids no longer move.
(Kodinariya & Makwana, 2013)
2.6 Summary
This chapter provides an overview regarding the concept of the system. Based on
the study that has been made it shows that literature review is one of the important
part in research or study new idea since by making the literature review we could
know whether the idea has been studied or not. The technique is chosen based on
previous articles and journals.
11
CHAPTER 3
METHODOLOGY OF SOFTWARE DEVELOPMENT
3.0 Introduction
This section describes the methodology used to develop this project. Internship
Recommendation System (IRS) Based on Student’s Course Achievement Using K-
Means Clustering is developed using iterative model. It will explain more about every
phase that involved in this project development and also the system requirements. The
phases are based on the Iterative Model life cycle.
In iterative model, iterative process begin with a simple implementation of a small set
of the software requirements and iteratively enhances until the complete system is
implemented and ready to be deployed. In iterative model, project are built and
improved step by step. Each iteration focuses on a certain set of requirements. In the
first iteration, all high priority risks are taken so that risk at the end of the project is
minimalized. It enables early feedback from the users since every iteration result in
executable release. The iterative model can accommodate changes in requirement
which are very common in most of the projects. In iterative model, less time is spent
on documenting and more time is given for designing.
12
Figure 3- 1: Iterative Model
The phases in iterative model are: Planning & Requirement, Analysis & Design,
Implementation, Testing, Development and Evaluation.
3.1 Planning and Requirement Phase
In this phase, it determines the problem in student group distribution and how to solve
it. Identify the (IRS) features and requirements. Features that are relevant to the (IRS)
are studied through similarities of web-based system. Besides, it refers to the related
journal as guideline.
The system requirement of this system had been collected and analysed. The
problem statement, objective, system scope and literature review had been defined.
This phase can be referred to Chapter One (1) and Chapter Two (2) in this report. Data
related to this project had been collected by referring to books, journals, internets and
research papers. The details of software and hardware requirement will be discussed
in the section 3.4.
13
3.2 Analysis and Design Phase
This phase is to analyse and identify the design of the system and developed the
prototype based in the functionalities that will be build. The data or requirement
obtained during planning and requirement phase was analysed and transformed into
the design that follow the identified requirement. Some diagrams had been built such
as Context Diagram (CD), Data Flow Diagram (DFD) level 0 and 1, Entity Relation
Diagram (ERD), Data Dictionary and Interface Design.
3.2.1 Context Diagram
Figure 3-1 shows the context diagram for Internship Recommendation System (IRS)
which includes 3 entities which are admin, lecturer and student. All entities are
required to login into the system before they can access into their interface. Once they
are successfully authenticated they will be directed to the specific homepage and start
from the home page they can navigate to the other processes on the system. Admin
can manage user profile which are student and lecturer. While lecturer and student can
update their profile. Lecturer can also manage company list and subject list. Other
than that, lecturer can select company for student. Students need to fill up their subject
achievement, this subject mark will be used as a variable in K-Means Clustering
Algorithm to cluster student based on their similarity. Student also can update their
profile and they can view they Internship Company that are suitable for them based on
their achievement.
14
Figure 3-2: Context Diagram for Internship Recommendation System (IRS)
15
3.2.2 Data Flow Diagram (DFD)
Data Flow Diagram (DFD) is a process or stage which will involve the front-end
users.
3.2.2.1 Data-Flow Diagram Level 0
Admin has (4) major processes which are login, register user, create report and logout.
The structure of DFD level 0 (admin) is as shown in Figure 3-2.
Figure 3-3: DFD Level 0 for Admin
16
In this level lecturer has seven (7) major processes which are login, manage profile,
create company list, create subject list, select company, create report and logout. The
structure of DFD level 0 (lecturer) is as shown in Figure 3-3.
Figure 3-4: DFD Level 0 for Lecturer
17
In this level, student has six (6) major processes which are login, manage profile,
manage subject mark, view Student Company, create report and logout. The structure
of DFD level 0 (student) is as shown in Figure 3-4.
Figure 3-5: DFD Level 0 for Student
18
3.2.2.2 Data-Flow Diagram Level 1
The Data-flow Diagram (DFD) Level 1 shows how the system is divided into sub-
systems (processes), each of which deals with one or more of data flows to or from
external entity and which together provide all of the functionalities of the system as a
whole.
3.2.2.2.1 User Registration Process
Figure 3-5 shows the Data Flow Diagram Level 1 for manage registration. Admin can
add new lecturer, delete lecturer, add new student and delete student. Other than that,
lecture can view their profile and update profile. Besides, student can view profile and
update their profile.
Figure 3-6: DFD Level 1 for process registration users
19
3.2.2.2.2 Manage Company List
Figure 3-6 shows the Data Flow Diagram Level 1 for manage company list. Lecturer
can add new company, view company, update company and delete company. All
company list data will be stored in company list.
Figure 3-7: DFD Level 1 for process manage company list
20
3.2.2.2.3 Manage Subject List
Figure 3-7 shows the Data Flow Diagram Level 1 for manage subject list. Lecturer
can add new subject, view subject, update subject and delete subject. All subject list
data will be stored in subject list.
Figure 3-8: DFD Level 1 for process manage subject list
3.2.2.2.2 Cluster Group of Student
Figure 3-8 shows the Data Flow Diagram Level 1 for cluster group of student.
Lecturer can make a selection for student. To generate Cluster group of student in IRS
will collect data from subject list, company list, and subject mark. Then IRS will
update data in student company list. Then student can view their recommended
company based on their strength and skill.
21
Figure 3-9: DFD Level 1 for process clustering student subject marks
3.2.3 Entity Relationship Diagram (ERD)
Entity Relationship Diagram (ERD) for IRS is as shown in Figure 3-9. It consists of
seven (7) entities. The entities are admin, student, lecturer, subject list, company list,
subject mark and Student Company.
22
Figure 3-10: ERD for IRS
3.2.4 Data Dictionary
Data dictionary (DD) for IRS was created. There are seven (7) tables which are
involved in storing data in Internship Recommendation System (IRS) as show in
Table 3-1 – Table 3-7.
Table 3-1: Data Dictionary for student
STUDENT
NO NAME TYPE PK/FK DESCRIPTION
1. St_ID Varchar (6) Primary Key
2. St_Name Varchar (50)
3. St_Course Varchar (20)
4. St_Pass Varchar (12)
23
Table 3-2: Data Dictionary for subject_list
SUBJECT_LIST
NO NAME TYPE PK/FK DESCRIPTION
1. Sub_Code Varchar (7) Primary Key
2. Sub_Name Varchar (30)
Table 3-3: Data Dictionary for subject_mark
SUBJECT_MARK
NO NAME TYPE PK/FK DESCRIPTION
1. Sub_Code Varchar (7) Primary Key
2. Sub_Mark Int (11)
3. St_ID Varchar(6) Foreign Key Table: STUDENT
Table 3-4: Data Dictionary for lecturer
LECTURER
NO NAME TYPE PK/FK DESCRIPTION
1. Lect_ID Varchar (6) Primary Key
2. Lect_Name Varchar (50)
3. Lect_Hp Varchar (12)
4. Lect_Pass Varchar (12)
24
Table 3-5: Data Dictionary for admin
ADMIN
NO NAME TYPE PK/FK DESCRIPTION
1. Ad_ID Varchar (6) Primary Key
2. Ad_Name Varchar (50)
5. Ad_Pass Varchar (12)
Table 3-6: Data Dictionary for company_list
COMPANY_LIST
NO NAME TYPE PK/FK DESCRIPTION
1. Cmp_ID Varchar (6) Primary Key
2. Cmp_Name Varchar
(50)
4. CMP_Phone Varchar
(12)
5. Sub_Code Varchar (7) Foreign Key Table: SUBJECT_LIST
Table 3-7: Data Dictionary for student_company
STUDENT_COMPANY
NO NAME TYPE PK/FK DESCRIPTION
1. Cmp_ID Varchar (6) Foreign Key Table :
COMPANY_LIST
2. St_ID Varchar (6) Foreign Key Table : STUDENT
25
3.2.6.1 K-means Clustering Algorithm
Clustering is the process of partitioning a group of data points into a small number of
cluster. For instance, the mark of student can be clustered in categories (CGPA 3.5
and above as Dean list student). Of course this is a qualitative kind of partitioning. A
quantitative approach would measure certain features of the student mark, say an
average mark of student will be grouped together.
In this project, it will cluster student course achievement based on their similarities of
attributes (i.e. criteria and alternative). This object will consists of two (2) attributes
which are Subject A and Subject B marks. This algorithm will be developed in the
lecturer module. Whereby, after student has been clustered based on their group
achievement, lecturer will assign them with Internship Company that suit their
strength and skill.
There are Three (3) main processes involved in the K-Means Clustering which are
Initial Centroid selection, nearest cluster assignment and centroids update.
a) Initial Centroid Selection
Centroid (n) is assign to a cluster centre that is illustrate using the feature points for a
group if the nearby assigned object. It also used as a reference point in assigning
object into a cluster based on their nearest distance to the centroid. In the beginning of
the assignment process, a number of K set of initial centroids need to be
predetermined so that the object can be assigned accordingly. In basic K-Means, these
initial centroids are randomly selected among objects.
26
b) Nearest cluster assignment
Clustering process begins by measuring each object distance on each centroid.
Where Sik is set of object in cluster-k, k=0 to K and d is a feature. The objects will be
assigned to a cluster where they have the closest distance to the centroid. The distance
measurement is using the Euclidean Means nearest object measurement.
c) Centroids Update
This is the last step where once the objects have been re-assigned, the centroid for
each cluster needs to be re-calculated.
Where M is the total of objects in cluster-k, k = 0 to K and d = 0 to D. This step is to
ensure that all objects that currently assigned to a cluster definitely belong to that
cluster (i.e. nearest to its new assigned centroid) and far away from other cluster. If
there is an object that turns out to be nearer to another centroid, then this object needs
to be reassigned to the nearest cluster. Thus, iteratively, the whole process cycle
starting from step (b) to (c) needs to be repeated until there are no nearest changes to
the centroids in all clusters.
27
3.2.6.2 Example of K-means clustering Algorithm
In this project we will select 2 subject mark of students based on their course
achievement. Below is the example of their mark that has been listed in the record.
Ther
e
are
thre
e (3)
main process that are involved in K-Means Clustering:-
1. Initial Centroid Selection
First, we have to pick the initial centroid for each cluster randomly, for this example
the initial centroid for cluster 1 is (1.0, 1.0) and the initial centroid for cluster 2 is (3.0,
4.0). This initial centroid will be used to calculate the Euclidean Distance for each
object to the nearest distance of the centroid.
SUBJECT A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0 0
5
10
0 2 4 6
Graph for Subject A and B
28
2. Nearest Cluster Assignment
Clustering process begins by measuring each object distance on each centroid.
Calculation for Record 2:-
Cluster 1 = 1 (1.0, 1.0) Cluster 2 = 3 (3.0, 4.0)
Euclidean Distance Cluster 1 = √(1.5 − 1.0)2 + (2.0 − 1.0)2 = 1.12
Euclidean Distance Cluster 2 = √1.5 − 3.0)2 + (2.0 − 4.0)2 =2.5
Therefore distance cluster 1 is less than cluster 2, so that Record 2 has been listed in
cluster 1. So Custer 1 has Record 1 and 2.
3. Centroids Update
This is the last step where once the objects have been re-assigned, the centroid for
each cluster needs to be re-calculated. So that after record 2 has re-assigned in cluster
1. We need to calculate the new means.
Table 3-8: Calculation of New Means
Cluster 1 2
Record 1,2 3 (no change)
Means (1.25 , 1.5) (3.0 , 4.0)
New Means for Cluster 1 = (1+1.5
2,
2+1
2) = (1.25 , 1.5)
Thus, iteratively, the whole process cycle starting from step (2) to (3) needs to be
repeated until there are no nearest changes to the centroids in all cluster.
29
3.3 Requirement
3.3.1. Software Requirement
This is the list of software that has been used in this system as below:
Table 3-9: List of software
SOFTWARE DESCRIPTION
Microsoft Office
2014
As a platform for documentation and presentation
Notepad++ Editor write coding for develop a system
Adobe Photoshop Editing tools for design logo and Header
Mozilla Firefox,
Google Chrome
Browser to run system and search some sources about research
Xamp Server Act as a local server to run and test the system. It contain
apache version 3.2.1 and php version 3.2.1
MySQL Database Open source relational database management system that uses
structured Query Language and store the data of the system.
Version 3.2.1
Dropbox Application for backup the system and data.
30
3.3.2 Hardware Requirement
The list of hardware that used by this system is as shown below:
Table 3-10: List of Hardware
HARDWARE DESCRIPTION
Laptop
(Asus A455L)
Processor: Intel Core [email protected] GHz
RAM: 4 GB
OS: Window 8
GPU: NIVIDIA GeForce FT 820M
Printer HP Deskjet 2520 Series
3.4 Summary
This chapter discusses the methodology used for system development, hardware and
software required to develop this system. Each methodology can be choose according
to the complexity of the system. Choosing right development methodology is very
important because it will affect the development process. The suitable methodology
will help the project to be done according to the specified time that has been listed in
the Gantt chart and also fulfil the requirement of the system.
31
REFERENCES
Ju, C., & Xu, C. (2013). A New Collaborative Recommendation Approach Based on
Users Clustering Using Artificial Bee Colony Algorithm, 2013.
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of
Cluster in K-Means Clustering. International Journal of Advance Research in
Computer Science and Management Studies, 1(6), 2321–7782.
Li, C. S. (2011). Cluster center initialization method for K-means algorithm over data
sets with two clusters. Procedia Engineering, 24, 324–328.
https://doi.org/10.1016/j.proeng.2011.11.2650
Li, Y., & Wu, H. (2012). A Clustering Method Based on K-Means Algorithm. Physics
Procedia, 25, 1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206
Oyelade, O. J., Oladipupo, O. O., & Obagbuwa, I. C. (2010). Application of k Means
Clustering algorithm for prediction of Students Academic Performance.
International Journal of Computer Science and Information Security, 7(1), 292–
295. Retrieved from http://arxiv.org/abs/1002.2425
Pang, Y., Xiao, F., Wang, H., & Xue, X. (2014). A clustering-based grouping model
for enhancing collaborative learning. Proceedings - 2014 13th International
Conference on Machine Learning and Applications, ICMLA 2014, 562–567.
https://doi.org/10.1109/ICMLA.2014.94
Yadav, S., Bharadwaj, B., & Pal, S. (2012). Data mining applications: A comparative
study for predicting student’s performance. International Journal of Innovative
Technology & Creative Engineering, 1(12), 13–19. Retrieved from
http://arxiv.org/abs/1202.4815