data mining project for student academic specialization and performance

DATA MINING PROJECT FOR STUDENT ACADEMIC

SPECIALIZATION AND PERFORMANCE

Prepared by

Mohammed Kharma

Impact of university students social status on

their selection of academic specialization and their performance

Top 11 Factors extracted College Specialization Gender Grade of secondary Parent family availability: Student how have one

parent family or two parent family. If the student has job or not Financial aids, if he got a loan for his semester. Educational level of parents. Geographical location. Positive social life Academic overload

Experiments We applied number of algorithms on the

data we got it from AlQadi university, but we also generated part of the data randomly since there are some missing data in some column such as grade mark in the secondary school, so the result of our applying of these algorithm may be not trusted and correct. I used the result as 1 for good prediction for student result, and 0 for bad predicted student result

Data ProcessingThe data used for each attribute are Classified as below:

Every table from the following describe the expected labels for the identified attributes, for example the first attribute “college” may has one of the 2 value listed in “Attribute 1: College” table below

Attribute 1: College12 Pharmacy3 Science and Technology

Gender

1 Male

2 Female

Educational level of parents

2 High1 Low

*High - over Secondary*Low - Secondary or Less

Data Processing

has job

1 Has Job

0 No

Specialization

0305 Math

1201 Pharmacy

0302 Physics

City

1 Ramallah

2 Hebron

3 Jeneen

Parent family availability

2 Both

1 one of them

0 None

Data Processing

Positive social life

0 Positive1 Negative

*Positive - Good and Stable Social Life*Negative - Good and Stable Social Life

Academic overload0 High1 Low

*High - 15 Hour Or Greater*Low - Less Than 15 Hour

Financial aids

1 Has aid0 No

*Has aid - Student got financial Aid*No aid - Student didn't got financial Aid

First Experiment by Random Tree

1. Choose m input variables to be used to determine the decision at a node of the tree.

2. Take a bootstrap sample(training set)

3. For each node of the tree, choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set. The value of m remains constant during forest growing.

4. Each tree is grown to the largest extent possible and not pruned as done in constructing a normal tree classifier.

Result "Grade of secondary" = Grade of secondary : Result (1/0) "Grade of secondary" = 72 | City = City : Result (0/0) | City = 3 : 1 (2/0) | City = 2 | | ID = Id : Result (0/0) | | ID = 20920135 : Result (0/0) | | ID = 20920171 : Result (0/0)

○ :

| | | | ID = 21011651 : Result (0/0) | | ID = 21011733 : Result (0/0) | City = 1 : 0 (1/0) "Grade of secondary" = 76.8 : 0 (1/0) "Grade of secondary" = 87.7 : 1 (1/0) "Grade of secondary" = 94.2 | Colleges = Colleges : Result (0/0) | Colleges = 3 : 0 (1/0) | Colleges = 12 : 1 (1/0) "Grade of secondary" = 70.8 : 1 (1/0) ""Grade of secondary" = 86.9 : 1 (1/0) "Grade of secondary" = 90 : 0 (2/0) "Grade of secondary" = 94.1 : 1 (1/0) "Grade of secondary" = 95.9 : 0 (1/0) "Grade of secondary" = 92.9 : 1 (2/0) "Grade of secondary" = 97.1 : 1 (1/0) "Grade of secondary" = 91 : 1 (1/0) "Grade of secondary" = 93.4 : 0 (1/0) : "Grade of secondary" = 72.9 : 1 (1/0) "Grade of secondary" = 78 : 0 (1/0) "Grade of secondary" = 93.1 : 1 (1/0) "Grade of secondary" = 71.2 : 1 (1/0) "Grade of secondary" = 88.1 : 0 (1/0) "Grade of secondary" = 74.5 : 0 (1/0) "Grade of secondary" = 87 : 1 (1/0) "Grade of secondary" = 73.8 : 0 (1/0) "Grade of secondary" = 86.3 : 1 (1/0) "Grade of secondary" = 96.5 : 0 (1/0) Size of the tree : 150

Second Experiment by W-LADTree

LADTree is multi-class alternating decision tree technique that combines decision trees with the predictive accuracy of LogitBoosting into a set of interpretable classification rules. The original formulation of the tree induction algorithm restricted attention to binary classification problems

Result : 0,0,0 | (1)Gender = 1: -1,1.211,-0.211 | (1)Gender != 1: -0.946,0.339,0.607 | | (3)Parent family availability = 2: -0.521,0.288,0.234 | | | (8)City = 1: -0.467,0.96,-0.493 | | | | (10)Id = 21010890: -0.448,-2.442,2.891 | | | | (10)Id != 21010890: -0.455,0.542,-0.087 | | | (8)City != 1: -0.472,-0.143,0.615 | | (3)Parent family availability != 2: 1.277,-2.193,0.916 | | (5)Academic overload = 1: -0.509,-0.269,0.778 | | (5)Academic overload != 1: -0.027,0.362,-0.335 | (2)Colleges = 3: -0.594,-0.02,0.614 | (2)Colleges != 3: -0.375,0.721,-0.346 | | (4)Positive social life = 1: -0.525,-1.934,2.459 | | (4)Positive social life != 1: -0.103,0.238,-0.136 | | | (6)Id = 21010654: -0.452,-2.441,2.893 | | | (6)Id != 21010654: 0.087,0.139,-0.227 | | | | (7)Id = 21010742: -0.452,-2.441,2.893 | | | | (7)Id != 21010742: 0.022,0.131,-0.153 | | | | | (9)Id = 21010850: -0.446,-2.443,2.89 | | | | | (9)Id != 21010850: 0.11,0.283,-0.394 Legend: Result, 1, 0 #Tree size (total): 31 #Tree size (number of predictor nodes): 21 #Leaves (number of predictor nodes): 13 #Expanded nodes: 100 #Processed examples: 3146 #Ratio e/n: 31.46

Tree View Result of LADTree

Third Experiment by W-J48

J48 Is a standard algorithm in machine learning based on decision tree induction, this algorithm employs two pruning methods. In the first method “sub-tree replacement”, the nodes in a decision tree may be replaced with a leaf to reducing the number of tests along a specific path. The steps of algorithm are:

Algorithm starts from the leaves of the fully formed tree and works backwards the root to reduce number of test along the path. While in the second type “sub-tree raising”, where a node may be moved upwards the root of the tree, replacing other nodes along the path. Sub-tree raising usually has a effect on decision tree models.

Result

Tree View Result

-Thank you

data mining project for student academic specialization and performance

Data & Analytics

result grade of secondary

low secondary

secondary school

grade mark

tree view result of

random tree

decision tree induction

decision tree technique