internship recommendation system (irs)greenskill.net/suhailan/fyp/report/037221.pdf · internship...

INTERNSHIP RECOMMENDATION SYSTEM (IRS)

BASED ON STUDENT’S COURSE ACHIEVEMENT

USING K-MEANS CLUSTERING

ARIFAH MUNIRAH BINTI ZULKAFLI

BACHELOR OF COMPUTER SCIENCE

(INTERNET COMPUTING)

UNIVERSITI SULTAN ZAINAL ABIDIN

2017

INTERNSHIP RECOMMENDATION SYSTEM (IRS) BASED ON

STUDENT’S COURSE ACHIEVEMENT USING K-MEANS CLUSTERING

ARIFAH MUNIRAH BINTI ZULKAFLI

Bachelor of Computer Science (Internet Computing)

Faculty of Informatics and Computing

Universiti Sultan Zainal Abidin, Terengganu, Malaysia

MAY 2017

i

DECLARATION

I hereby declare that this report is based on my original work except for quotations

and citations, which have been duly acknowledged. I also declare that it has not been

previously or concurrently submitted for any other degree at Universiti Sultan Zainal

Abidin or other institutions.

________________________________

Name : ..................................................

Date : ..................................................

ii

CONFIRMATION

This is to confirm that this project entitled Internship Recommendation System (IRS)

Based on Student’s Course Achievement Using K-Means Clustering was prepared and

submitted by Arifah Munirah Binti Zulkafli (Matric Number: BTCL14037221) and

has been satisfactory in terms of scope, quality and presentation as partial fulfilment

of the requirement for the Bachelor of Computer Science (Internet Computing) with

honors in Universiti Sultan Zainal Abidin. The research conducted and the writing of

this report was under my supervisor.

________________________________

Name : Dr Suhailan Bin Dato’ Safei

Date : ..................................................

iii

DEDICATION

In the Name of Allah, the Most Gracious and the Most Merciful.

Alhamdulillah, I completely finish writing this research project. This research

project could not have been conducted without the support, encouragement and

cooperation of many people. Here I would like to express my deepest gratitude to my

supervisor, Dr Suhailan B. Dato’ Safei, who has always given valuable advice and

encouragement at each phase throughout in developing this project successfully. I

would like to thank him giving the opportunity to learn and work under his guidance,

which has been the most memorable experience.

I want to take this opportunity to praise my parents who give me full support to

keep study for my life and also special thanks to all lecturer of Faculty of Informatics

and Computing for their guidance and advices to help in development of this project.

Last but not least, my sincere thanks to my friends that always give support and help

to finish up this project.

Thank you.

iv

ABSTRACT

Selection of student placement in Internship based on CGPA is no longer

appropriate. Therefore, the Industry needs to get student based on their skill and

expertise in a particular field to add more expert in Industry. The obvious problem

here is, student skill has been determine based on their CGPA that are not reflected the

actual skill of student. So that, CGPA is not suitable to group student according to a

particular skill. Thus, this project may help student to be grouped based on their skill

strength according to their course grade. The result can be used for better internship

placement that are suitable based on their skills and interest. . In realizing this

solution, K-Means Clustering technique will be used. K-Means Clustering are is an

unsupervised learning algorithm that tries to cluster data based on their similarity.

Score range student that are unsupervised will be grouped based on their similarity.

The similarity result of students’ that has been distributed based on clustering may

help university to distribute them in placement of Internship that is suitable with their

interest and expertise.

v

ABSTRAK

Pemilihan pelajar untuk penempatan latihan industri berdasarkan CGPA tidak

lagi sesuai digunakan. Oleh itu, indusrti perlu memilih pelajar bedasarkan

kemahiran dan kepakaran dalam beberapa bidang untuk menambah lebih banyak

tenaga pakar di dalam sesebuah industri. Masalah yang paling ketara di sini

adalah kebolehan pelajar diukur berdasarkan CGPA mereka tidak

menggambarkan kemahiran sebenar mereka. Oleh itu, penggunaan CGPA adalah

tidak sesuai untuk mengasingkan pelajar dalam kelompok untuk kemahiran

tertentu. Projek ini akan membantu pelajar untuk di kumpulkan berdasarkan

kemahiran dan kekuataan bersesuaian dengan pencapaian subjek. Hasil

keputusan ini boleh digunakan untuk penempatan latihan industri yang sesuai

berdasarkan kemahiran dan minat mereka. Untuk merealisasikan penyelesaian

ini, Teknik Kelompok K-Mean akan digunakan. K-Means Kelompok adalah satu

algorithma pembelajaran yang tidak teratur dan cuba untuk dikumpulkan

berdasarkan persamaan mereka. Julat markah pelajar yang tidak teratur akan di

kumpulkan berdasarkan persamaan.Hasil persamaan dari keputusan pengagihan

pelajar megikut kelompok ini boleh membantu pihak universiti bagi mengagihkan

mereka di tempat latihan industri yang bersesuaian berdasarkan minat dan

kepakaran mereka.

vi

TABLE OF CONTENT

CONFIRMATION..................................................................................................... ii

DEDICATION ......................................................................................................... iii

ABSTRACT ............................................................................................................. iv

ABSTRAK .................................................................................................................v

CHAPTER 1 INTRODUCTION ...............................................................................1

1.1 Background ......................................................................................................1

1.2 Problem Statement ...........................................................................................2

1.3 Objectives ........................................................................................................2

1.4 Scope ...............................................................................................................2

1.4.1. Scope of User ..........................................................................................2

1.4.1.1 Admin .....................................................................................................2

1.4.1.2 Student ....................................................................................................2

1.4.1.3 Lecturer ...................................................................................................3

1.4.2 Scope of System ......................................................................................3

1.4.2.1 Internship Placement (UniSZA) Student .................................................3

1.5 Limitation of Work ...........................................................................................3

1.5.1 Scope of the system..................................................................................3

1.5.1.1 Internship Placement ...............................................................................3

1.6 Thesis Organization ..........................................................................................4

vii

1.7 Expected Outcome ...........................................................................................4

CHAPTER 2 LITERATURE REVIEW .....................................................................5

2.1 Introduction .....................................................................................................5

2.2 Current Problem of This Project .......................................................................5

2.3 Similar System .................................................................................................6

2.4 Analysis Gap ....................................................................................................8

2.5 K-means Clustering Technique ........................................................................9

2.5.1 Introduction to K-means clustering .............................................................9

2.5.2 K-means Clustering Algorithm ...................................................................9

2.6 Summary ....................................................................................................... 10

CHAPTER 3 METHODOLOGY OF SOFTWARE DEVELOPMENT ..................... 11

3.0 Introduction ................................................................................................... 11

3.1 Planning and Requirement Phase ................................................................... 12

3.2 Analysis and Design Phase ............................................................................. 13

3.2.1 Context Diagram ................................................................................... 13

3.2.2 Data Flow Diagram (DFD) .................................................................... 15

3.2.2.1 Data-Flow Diagram Level 0 .................................................................. 15


3.2.2.2.1 User Registration Process ................................................................... 18

3.2.2.2.2 Manage Company List ........................................................................ 19

3.2.2.2.3 Manage Subject List ........................................................................... 20

viii

3.2.2.2.2 Cluster Group of Student .................................................................... 20

3.2.3 Entity Relationship Diagram (ERD) ......................................................... 21

3.2.4 Data Dictionary ........................................................................................ 22

3.2.6.1 K-means Clustering Algorithm .............................................................. 25

3.2.6.2 Example of K-means clustering Algorithm ............................................ 27

3.3 Requirement................................................................................................... 29

3.3.1. Software Requirement ............................................................................. 29

3.3.2 Hardware Requirement ............................................................................. 30

3.4 Summary ....................................................................................................... 30

REFERENCES ......................................................................................................... 31

ix

LIST OF TABLE

Table 2-1: Summary based on area cover of the system...........................................7

Table 2-2: Analysis of gap ......................................................................................8

Table 3-1: Data Dictionary for student .................................................................. 22

Table 3-2: Data Dictionary for subject_list ............................................................ 23

Table 3-3: Data Dictionary for subject_mark ........................................................ 23

Table 3-4: Data Dictionary for lecturer ................................................................. 23

Table 3-5: Data Dictionary for admin .................................................................... 24

Table 3-6: Data Dictionary for company_list ........................................................ 24

Table 3-7: Data Dictionary for student_company .................................................. 24

3.0 Introduction .................................................................................................... 11

Figure 3- 1: Iterative Model .................................................................................. 12

3.1 Planning and Requirement Phase .................................................................... 12

3.2 Analysis and Design Phase .............................................................................. 13

3.2.1 Context Diagram ...................................................................................... 13

Figure 3-2: Context Diagram for Internship Recommendation System (IRS)......... 14

3.2.2 Data Flow Diagram (DFD) ....................................................................... 15


Figure 3-3: DFD Level 0 for Admin ...................................................................... 15

x

Figure 3-4: DFD Level 0 for Lecturer ................................................................... 16

Figure 3-5: DFD Level 0 for Student..................................................................... 17



Figure 3-6: DFD Level 1 for process registration users ......................................... 18


Figure 3-7: DFD Level 1 for process manage company list ................................... 19


Figure 3-8: DFD Level 1 for process manage subject list ...................................... 20


Figure 3-9: DFD Level 1 for process clustering student subject marks .................. 20


Figure 3-10: ERD for IRS ..................................................................................... 22

3.0 Introduction .................................................................................................... 11


3.1 Planning and Requirement Phase .................................................................... 12

3.2 Analysis and Design Phase .............................................................................. 13

3.2.1 Context Diagram ...................................................................................... 13


3.2.2 Data Flow Diagram (DFD) ....................................................................... 15

xi
















xii

LIST OF FIGURES











xiii

LIST OF ABBREVIATIONS / TERMS / SYMBOLS

CD Context Diagram

DFD Data Flow Diagram

ERD Entity Relationship Diagram

IRS Internship Recommendation System

1

CHAPTER I

INTRODUCTION

1.1 Background

Internship component is a vital part of the university training program for students to

gain the required skills for employment in pursuit of degree certification. However,

some students faced problems to choose their internship placement because they did

not know their strength and interest. Cumulative Grade Point Average (CGPA) is

commonly used as indicator for academic achievement. Many higher learning

institution set a minimum CGPA requirement set for student is 1.5. Whereas, for any

graduated program, CGPA of 3.00 and above are considered as good achievement.

While in this case, grouping of students into different categories according to their

achievement are not reliable and has become a complicated task. With traditional

grouping of students based on their average scores, it is hard to acquire a view of the

state of the students’ achievement. Emphasised are this Internship Recommendation

System (IRS) Based on Student’s Course Achievement Using K-Means Clustering

will be implemented to help students to overcome this problem. This system will

analyse student course grade based on clustering analysis with K-Means Algorithm.

Thus, students need to fill up their grade of course subject and they will be distributed

in groups that are similarly based on K-Means Clustering Algorithm. Then, student

will know their level of strength and interest. So that, they can apply for internship

placement that are suitable for them, that has been recommended by the system.

2

1.2 Problem Statement

The main problem is the Higher Institutions used Cumulative Grade Point Average

(CGPA) as indicator for student achievement. Whereas, students’ achievement that

has been determined based on CGPA is not reliable to figure the actual skill of

student. However, CGPA also cannot be used to group student based on their skill.

These scenarios will affect the academic performance of student if the placement of

internship that are selected are not suitable based on their skill and strength.

1.3 Objectives

1. To analyse the problem of internship placement for final year student in

university.

2. To design a proposed system of Internship Recommendation System (IRS)

Based on Student’s Course Achievement Using K-Means Clustering

3. To develop system of Internship Recommendation System (IRS) Based on

Student’s Course Achievement Using K-Means Clustering.

1.4 Scope

1.4.1. Scope of User

1.4.1.1 Admin

Admin can manage user profile, which are lecturer profile and student profile. Admin

can also create, update and delete user profile.

1.4.1.2 Student

Student can access the system anytime and anywhere. Student can manage profile,

manage subject’s marks and review student’s application to a company result. The

Profile Module may consist of add, update and delete student details. In manage

subject mark module, students need to fill up all their course achievement marks in the

system. Other than that, student can view a recommendation for their internship

placement. Moreover, student can view overall report about their activity that they

have made.

3

1.4.1.3 Lecturer

Lecturer can manage their own profile by updating their profile details. Other than

that, lecturer will create company list by add, update and delete company details.

Lecturer also will create subject list by add, update and delete subject details. In Select

company module, lecturer must select subject’s marks and company list to generate

the recommendation for student internship placement. Lastly, lecturer can review

overall report about their activity that they has been made before.

1.4.2 Scope of System

1.4.2.1 Internship Placement Universiti Sultan Zainal Abidin (UniSZA) Student

The internship placement that is recommended by the system is for Bachelors Degree

student in Universiti Sultan Zainal Abidin (UniSZA) from Faculty of Informatics and

Computing which consists of 4 programs.

1.5 Limitation of Work

1.5.1 Scope of the system

1.5.1.1 Internship Placement

It is limited recommended internship placement because the system only cover

internship placement for several established company to keep the reputation of

university by serving all the company with prepared student with strength and skill

that will help the growth of company.

4

1.6 Thesis Organization

The thesis organization is in six (6) chapters. In the Chapter 1, the content consists of

project background, problem statement of project, the objective and system scope.

Then, Chapter 2 is about the study of literature review. This chapter is reviewing the

previous systems. The next is Chapter 3, describes the methodology of research. This

research used iterative model.

Chapter 4 explains the system’s framework and design. Then, Chapter 5 is all about

implementation, testing and result. Lastly, Chapter 6 is the conclusion of the whole

project.

1.7 Expected Outcome

This system is expected to group student based on similar course achievement and

assign them with a suitable internship placement that suit their skill. This project may

help students to know their groups by using one of the modules in this web-based

system to find their internship placement carefully. Finally, students will be given a

list of internship placements that is suitable with their range of group.

5

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

This chapter describes and explains the selected literature review about technique used

in the development of an Internship Recommendation System (IRS) Based on

Student’s Course Achievement Using K-Means Clustering.

2.2 Current Problem of This Project

There are several current problems that made me wanted to manage the problems by

proposing to develop the Student Group Distribution Based on Similar Course

Achievement (SGDSCA). According to The Star Newspaper, entitled: Datuk Seri

Idris Jusoh: CGPA is not everything. He mentioned that a prefect Cumulative Grade

Point Achievement (CGPA) cannot guarantee students a place in the course of their

choice in a public university. Some university such as Universiti Malaya (UM) and

Universiti Kebangsaan Malaysia (UKM) does not select student based on their

academic performance but also look their curricular activities. Through this statement,

classifying student based on their CGPA achievement are not effective enough to

know their skill and strength.

6

2.3 Similar System

Based on literature review, a few similar systems are found. First, Application of k-

Means Clustering algorithm for prediction of students’ Academic Performance

(APSAP). The critical issue is the ability to monitor the progress of students’

academic performance. The goals of this system is to implement K-Means clustering

algorithm for analysing students’ result data. The advantages of this system is, good

benchmark to monitor the progression of academic performance of student.(Oyelade,

Oladipupo, & Obagbuwa, 2010)

Second system is Data mining Application: A comparative Study for Predicting

Student’s Performance. The problem here, educational institution does not use any

knowledge discovery process approach on data. Thus, this system may use data

mining methodologies to study students’ performance in the course. The advantages

of this system are it helps earlier identifying of the dropouts and students who need

special attention and allow the teacher to provide appropriate advising. (Yadav,

Bharadwaj, & Pal, 2012)

Third system is A New Collaborative Recommendation Approach Based on User

Clustering Using Artificial Bee Colony Algorithm. Problem statement here, a

challenge to increase the diversity of methods to fulfil users’ preferences. The goals of

this system is to propose a novel collaborative filtering recommendation approach

based on K-Means clustering algorithm. The advantage of this system is, an excellent

recommendation method meets high accuracy and certain diversity. (Ju & Xu, 2013)

7

Next System is, A Clustering Grouping Model for Enhancing Collaborative Learning.

The problem here is to achieve diversity within a group effectively and automatically.

Thus, this system may implement diversity within a group effectively and

automatically using k-Means Clustering. The advantage of this system is, it can

improve the effectiveness of the collaborative learning. (Pang, Xiao, Wang, & Xue,

2014)

Table 2-1: Summary based on area cover of the system

Area Cover A1 A2 A3 A4

Skill Achievement X X X X

Subject Performance Final exam Course work X X

Learning

Collaborative

X X / /

It has been shown that, Most of the current research covers on subject performance

and learning collaborative. However, there is lacking of research that cluster student’s

performance based on overall subjects marks. Thus, my project will cover on skill

achievement among final year student based on their similar course achievement using

K-Means Clustering.

8

2.4 Analysis Gap

Tab

le 2

-2:

Tab

le A

naly

sis

of

gap

9

2.5 K-means Clustering Technique

2.5.1 Introduction to K-means clustering

K-Means Clustering is an unsupervised pattern classification method that divides a set

of given data into cluster, such that data in the same group that are more similar to

each other. It is one of the most important method in data analysis (C. S. Li, 2011).

Clustering analysis is based on various kinds of objects’ differences and uses distance

functions’ regulations to make model classification (Y. Li & Wu, 2012). K-Means

clustering is proposed by J.B.MacQueen (Zhang Yufang, 2003).

2.5.2 K-means Clustering Algorithm

K-Means Clustering is the simplest unsupervised learning technique that can solve

clustering problem. The step follows a simple and easy way to classify a given set of

data set through a certain number of cluster (assume k clusters) fixed a prior. First, is

to define k centroids, one for each cluster. These centroids should be placed in a wily

way because of different location cause different result. So, is better to place them as

much as possible far away from each other.

The next step is to take each point belonging to a given data set and associated it to a

nearest centroid. When no point is pending the first step is done. At this point, re-

calculated k new centroids as centre of the clusters resulting from the previous step is

needed. Then, after this k new centroids, a new binding has to be done between the

same data points and nearest new centroids. A loop has been generated, until it notice

that the k centroids change their location step by step until no more changes are done.

In the simplest words, centroids do not move any more.

10

Below is the algorithm of K-Means Clustering:

Where S is a K-cluster partition of the entity set represent by vectors 𝑦𝑖 (𝑖 ∈1) in the

M-dimensional features space, consisting of non-empty non-overlapping clusters 𝑆𝑘 ,

each with a centroid 𝑐𝑘(k=1,2,…K).

The algorithm is composed of the following steps:

1. Place k points in the space represented by the objects that are being clustered.

This point is initial group centroids.

2. Assign each object to the group that has the closest centroids.

3. When all object have been assigned, recalculate the position of the k centroids.

4. Repeat step 2 and 3 until the centroids no longer move.

(Kodinariya & Makwana, 2013)

2.6 Summary

This chapter provides an overview regarding the concept of the system. Based on

the study that has been made it shows that literature review is one of the important

part in research or study new idea since by making the literature review we could

know whether the idea has been studied or not. The technique is chosen based on

previous articles and journals.

11

CHAPTER 3

METHODOLOGY OF SOFTWARE DEVELOPMENT

3.0 Introduction

This section describes the methodology used to develop this project. Internship

Recommendation System (IRS) Based on Student’s Course Achievement Using K-

Means Clustering is developed using iterative model. It will explain more about every

phase that involved in this project development and also the system requirements. The

phases are based on the Iterative Model life cycle.

In iterative model, iterative process begin with a simple implementation of a small set

of the software requirements and iteratively enhances until the complete system is

implemented and ready to be deployed. In iterative model, project are built and

improved step by step. Each iteration focuses on a certain set of requirements. In the

first iteration, all high priority risks are taken so that risk at the end of the project is

minimalized. It enables early feedback from the users since every iteration result in

executable release. The iterative model can accommodate changes in requirement

which are very common in most of the projects. In iterative model, less time is spent

on documenting and more time is given for designing.

12

Figure 3- 1: Iterative Model

The phases in iterative model are: Planning & Requirement, Analysis & Design,

Implementation, Testing, Development and Evaluation.

3.1 Planning and Requirement Phase

In this phase, it determines the problem in student group distribution and how to solve

it. Identify the (IRS) features and requirements. Features that are relevant to the (IRS)

are studied through similarities of web-based system. Besides, it refers to the related

journal as guideline.

The system requirement of this system had been collected and analysed. The

problem statement, objective, system scope and literature review had been defined.

This phase can be referred to Chapter One (1) and Chapter Two (2) in this report. Data

related to this project had been collected by referring to books, journals, internets and

research papers. The details of software and hardware requirement will be discussed

in the section 3.4.

13

3.2 Analysis and Design Phase

This phase is to analyse and identify the design of the system and developed the

prototype based in the functionalities that will be build. The data or requirement

obtained during planning and requirement phase was analysed and transformed into

the design that follow the identified requirement. Some diagrams had been built such

as Context Diagram (CD), Data Flow Diagram (DFD) level 0 and 1, Entity Relation

Diagram (ERD), Data Dictionary and Interface Design.

3.2.1 Context Diagram

Figure 3-1 shows the context diagram for Internship Recommendation System (IRS)

which includes 3 entities which are admin, lecturer and student. All entities are

required to login into the system before they can access into their interface. Once they

are successfully authenticated they will be directed to the specific homepage and start

from the home page they can navigate to the other processes on the system. Admin

can manage user profile which are student and lecturer. While lecturer and student can

update their profile. Lecturer can also manage company list and subject list. Other

than that, lecturer can select company for student. Students need to fill up their subject

achievement, this subject mark will be used as a variable in K-Means Clustering

Algorithm to cluster student based on their similarity. Student also can update their

profile and they can view they Internship Company that are suitable for them based on

their achievement.

14

Figure 3-2: Context Diagram for Internship Recommendation System (IRS)

15

3.2.2 Data Flow Diagram (DFD)

Data Flow Diagram (DFD) is a process or stage which will involve the front-end

users.

3.2.2.1 Data-Flow Diagram Level 0

Admin has (4) major processes which are login, register user, create report and logout.

The structure of DFD level 0 (admin) is as shown in Figure 3-2.

Figure 3-3: DFD Level 0 for Admin

16

In this level lecturer has seven (7) major processes which are login, manage profile,

create company list, create subject list, select company, create report and logout. The

structure of DFD level 0 (lecturer) is as shown in Figure 3-3.

Figure 3-4: DFD Level 0 for Lecturer

17

In this level, student has six (6) major processes which are login, manage profile,

manage subject mark, view Student Company, create report and logout. The structure

of DFD level 0 (student) is as shown in Figure 3-4.

Figure 3-5: DFD Level 0 for Student

18

3.2.2.2 Data-Flow Diagram Level 1

The Data-flow Diagram (DFD) Level 1 shows how the system is divided into sub-

systems (processes), each of which deals with one or more of data flows to or from

external entity and which together provide all of the functionalities of the system as a

whole.

3.2.2.2.1 User Registration Process

Figure 3-5 shows the Data Flow Diagram Level 1 for manage registration. Admin can

add new lecturer, delete lecturer, add new student and delete student. Other than that,

lecture can view their profile and update profile. Besides, student can view profile and

update their profile.

Figure 3-6: DFD Level 1 for process registration users

19

3.2.2.2.2 Manage Company List

Figure 3-6 shows the Data Flow Diagram Level 1 for manage company list. Lecturer

can add new company, view company, update company and delete company. All

company list data will be stored in company list.

Figure 3-7: DFD Level 1 for process manage company list

20

3.2.2.2.3 Manage Subject List

Figure 3-7 shows the Data Flow Diagram Level 1 for manage subject list. Lecturer

can add new subject, view subject, update subject and delete subject. All subject list

data will be stored in subject list.

Figure 3-8: DFD Level 1 for process manage subject list

3.2.2.2.2 Cluster Group of Student

Figure 3-8 shows the Data Flow Diagram Level 1 for cluster group of student.

Lecturer can make a selection for student. To generate Cluster group of student in IRS

will collect data from subject list, company list, and subject mark. Then IRS will

update data in student company list. Then student can view their recommended

company based on their strength and skill.

21

Figure 3-9: DFD Level 1 for process clustering student subject marks

3.2.3 Entity Relationship Diagram (ERD)

Entity Relationship Diagram (ERD) for IRS is as shown in Figure 3-9. It consists of

seven (7) entities. The entities are admin, student, lecturer, subject list, company list,

subject mark and Student Company.

22

Figure 3-10: ERD for IRS

3.2.4 Data Dictionary

Data dictionary (DD) for IRS was created. There are seven (7) tables which are

involved in storing data in Internship Recommendation System (IRS) as show in

Table 3-1 – Table 3-7.

Table 3-1: Data Dictionary for student

STUDENT

NO NAME TYPE PK/FK DESCRIPTION

1. St_ID Varchar (6) Primary Key

2. St_Name Varchar (50)

3. St_Course Varchar (20)

4. St_Pass Varchar (12)

23

Table 3-2: Data Dictionary for subject_list

SUBJECT_LIST


1. Sub_Code Varchar (7) Primary Key

2. Sub_Name Varchar (30)

Table 3-3: Data Dictionary for subject_mark

SUBJECT_MARK


1. Sub_Code Varchar (7) Primary Key

2. Sub_Mark Int (11)

3. St_ID Varchar(6) Foreign Key Table: STUDENT

Table 3-4: Data Dictionary for lecturer

LECTURER


1. Lect_ID Varchar (6) Primary Key

2. Lect_Name Varchar (50)

3. Lect_Hp Varchar (12)

4. Lect_Pass Varchar (12)

24

Table 3-5: Data Dictionary for admin

ADMIN


1. Ad_ID Varchar (6) Primary Key

2. Ad_Name Varchar (50)

5. Ad_Pass Varchar (12)

Table 3-6: Data Dictionary for company_list

COMPANY_LIST


1. Cmp_ID Varchar (6) Primary Key

2. Cmp_Name Varchar

(50)

4. CMP_Phone Varchar

(12)

5. Sub_Code Varchar (7) Foreign Key Table: SUBJECT_LIST

Table 3-7: Data Dictionary for student_company

STUDENT_COMPANY


1. Cmp_ID Varchar (6) Foreign Key Table :

COMPANY_LIST

2. St_ID Varchar (6) Foreign Key Table : STUDENT

25

3.2.6.1 K-means Clustering Algorithm

Clustering is the process of partitioning a group of data points into a small number of

cluster. For instance, the mark of student can be clustered in categories (CGPA 3.5

and above as Dean list student). Of course this is a qualitative kind of partitioning. A

quantitative approach would measure certain features of the student mark, say an

average mark of student will be grouped together.

In this project, it will cluster student course achievement based on their similarities of

attributes (i.e. criteria and alternative). This object will consists of two (2) attributes

which are Subject A and Subject B marks. This algorithm will be developed in the

lecturer module. Whereby, after student has been clustered based on their group

achievement, lecturer will assign them with Internship Company that suit their

strength and skill.

There are Three (3) main processes involved in the K-Means Clustering which are

Initial Centroid selection, nearest cluster assignment and centroids update.

a) Initial Centroid Selection

Centroid (n) is assign to a cluster centre that is illustrate using the feature points for a

group if the nearby assigned object. It also used as a reference point in assigning

object into a cluster based on their nearest distance to the centroid. In the beginning of

the assignment process, a number of K set of initial centroids need to be

predetermined so that the object can be assigned accordingly. In basic K-Means, these

initial centroids are randomly selected among objects.

26

b) Nearest cluster assignment

Clustering process begins by measuring each object distance on each centroid.

Where Sik is set of object in cluster-k, k=0 to K and d is a feature. The objects will be

assigned to a cluster where they have the closest distance to the centroid. The distance

measurement is using the Euclidean Means nearest object measurement.

c) Centroids Update

This is the last step where once the objects have been re-assigned, the centroid for

each cluster needs to be re-calculated.

Where M is the total of objects in cluster-k, k = 0 to K and d = 0 to D. This step is to

ensure that all objects that currently assigned to a cluster definitely belong to that

cluster (i.e. nearest to its new assigned centroid) and far away from other cluster. If

there is an object that turns out to be nearer to another centroid, then this object needs

to be reassigned to the nearest cluster. Thus, iteratively, the whole process cycle

starting from step (b) to (c) needs to be repeated until there are no nearest changes to

the centroids in all clusters.

27

3.2.6.2 Example of K-means clustering Algorithm

In this project we will select 2 subject mark of students based on their course

achievement. Below is the example of their mark that has been listed in the record.

Ther

e

are

thre

e (3)

main process that are involved in K-Means Clustering:-

1. Initial Centroid Selection

First, we have to pick the initial centroid for each cluster randomly, for this example

the initial centroid for cluster 1 is (1.0, 1.0) and the initial centroid for cluster 2 is (3.0,

4.0). This initial centroid will be used to calculate the Euclidean Distance for each

object to the nearest distance of the centroid.

SUBJECT A B

1 1.0 1.0

2 1.5 2.0

3 3.0 4.0

4 5.0 7.0 0

5

10

0 2 4 6

Graph for Subject A and B

28

2. Nearest Cluster Assignment

Clustering process begins by measuring each object distance on each centroid.

Calculation for Record 2:-

Cluster 1 = 1 (1.0, 1.0) Cluster 2 = 3 (3.0, 4.0)

Euclidean Distance Cluster 1 = √(1.5 − 1.0)2 + (2.0 − 1.0)2 = 1.12

Euclidean Distance Cluster 2 = √1.5 − 3.0)2 + (2.0 − 4.0)2 =2.5

Therefore distance cluster 1 is less than cluster 2, so that Record 2 has been listed in

cluster 1. So Custer 1 has Record 1 and 2.

3. Centroids Update

This is the last step where once the objects have been re-assigned, the centroid for

each cluster needs to be re-calculated. So that after record 2 has re-assigned in cluster

1. We need to calculate the new means.

Table 3-8: Calculation of New Means

Cluster 1 2

Record 1,2 3 (no change)

Means (1.25 , 1.5) (3.0 , 4.0)

New Means for Cluster 1 = (1+1.5

2,

2+1

2) = (1.25 , 1.5)

Thus, iteratively, the whole process cycle starting from step (2) to (3) needs to be

repeated until there are no nearest changes to the centroids in all cluster.

29

3.3 Requirement

3.3.1. Software Requirement

This is the list of software that has been used in this system as below:

Table 3-9: List of software

SOFTWARE DESCRIPTION

Microsoft Office

2014

As a platform for documentation and presentation

Notepad++ Editor write coding for develop a system

Adobe Photoshop Editing tools for design logo and Header

Mozilla Firefox,

Google Chrome

Browser to run system and search some sources about research

Xamp Server Act as a local server to run and test the system. It contain

apache version 3.2.1 and php version 3.2.1

MySQL Database Open source relational database management system that uses

structured Query Language and store the data of the system.

Version 3.2.1

Dropbox Application for backup the system and data.

30

3.3.2 Hardware Requirement

The list of hardware that used by this system is as shown below:

Table 3-10: List of Hardware

HARDWARE DESCRIPTION

Laptop

(Asus A455L)

Processor: Intel Core [email protected] GHz

RAM: 4 GB

OS: Window 8

GPU: NIVIDIA GeForce FT 820M

Printer HP Deskjet 2520 Series

3.4 Summary

This chapter discusses the methodology used for system development, hardware and

software required to develop this system. Each methodology can be choose according

to the complexity of the system. Choosing right development methodology is very

important because it will affect the development process. The suitable methodology

will help the project to be done according to the specified time that has been listed in

the Gantt chart and also fulfil the requirement of the system.

31

REFERENCES

Ju, C., & Xu, C. (2013). A New Collaborative Recommendation Approach Based on

Users Clustering Using Artificial Bee Colony Algorithm, 2013.

Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of

Cluster in K-Means Clustering. International Journal of Advance Research in

Computer Science and Management Studies, 1(6), 2321–7782.

Li, C. S. (2011). Cluster center initialization method for K-means algorithm over data

sets with two clusters. Procedia Engineering, 24, 324–328.

https://doi.org/10.1016/j.proeng.2011.11.2650

Li, Y., & Wu, H. (2012). A Clustering Method Based on K-Means Algorithm. Physics

Procedia, 25, 1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206

Oyelade, O. J., Oladipupo, O. O., & Obagbuwa, I. C. (2010). Application of k Means

Clustering algorithm for prediction of Students Academic Performance.

International Journal of Computer Science and Information Security, 7(1), 292–

295. Retrieved from http://arxiv.org/abs/1002.2425

Pang, Y., Xiao, F., Wang, H., & Xue, X. (2014). A clustering-based grouping model

for enhancing collaborative learning. Proceedings - 2014 13th International

Conference on Machine Learning and Applications, ICMLA 2014, 562–567.

https://doi.org/10.1109/ICMLA.2014.94

Yadav, S., Bharadwaj, B., & Pal, S. (2012). Data mining applications: A comparative

study for predicting student’s performance. International Journal of Innovative

Technology & Creative Engineering, 1(12), 13–19. Retrieved from

http://arxiv.org/abs/1202.4815

internship recommendation system (irs)greenskill.net/suhailan/fyp/report/037221.pdf · internship...

Documents