vertical search for courses of uiuc by jessica bell, alexander loeb, sharon paradesi, michael paul,...
TRANSCRIPT
Vertical Search for Courses of UIUC
by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul,
Jing Xia, Jie Zhang
Demo
http://greedy.cs.uiuc.edu/dssi/course/search.php
Goals of the project- construct a database of UIUC courses across all departments ultimately creating a centralized knowledgebase about each course.
- augment the database by drawing relations between courses both within and between departments and further by finding similarities among courses outside of the University of Illinois.
DA
TA
SO
UR
CE
Course Catalog
Book Store
Webpages
Other Universities
PHP script
JAVA script
AgentIDE
Heritrix
WEKA
DATABASE
Basic Course Info
Book Info
Course homepage
Keywords
Related Courses
Query by
Course Name
Instructor
Description
…
PHP
Architecture
Web Crawling Wget, AgentIDE and Heritrix
Parsers Python and Java
Learning Tools WEKA
Website Design PHP and MySQL
Tools used
Tasks finished
Data Mining – Basic course information Similar course recommendation Prerequisite course list Recommended book information
Learning – Clustering Classification
Keywords
Pull from course descriptions Remove uninformative/common words
Keywords (contd.)
topics 0.1328 fruits 0.6453their 0.1352 horticultural 0.6453problems 0.1370 agricultural 0.6454basic 0.1373 0.6478techniques 0.1439 doctorate 0.6489students 0.1457 speaker 0.6489is 0.1494 meteorological 0.6492are 0.1505 anthropology 0.6493analysis 0.1531 institute 0.6498special 0.1531 reflective 0.6498areas 0.1556 later 0.6508graduate 0.1563 weather 0.6513research 0.1586 protein 0.6514be 0.1586 mobilization 0.6514various 0.1589 authentic 0.6514methods 0.1600 romance 0.6514selected 0.1618 libraries 0.6561current 0.1625 became 0.6563advanced 0.1651 novelists 0.6563that 0.1651 colonization 0.6563concepts 0.1668 initiatives 0.6563both 0.1731 revisit 0.6563development 0.1744 churches 0.6563
russian
Search Search by name, instructor, or content Clean up search string
“cs125” becomes “CS 125” “real-time” becomes “real time realtime”
Split search string into individual words and query database for word matches
Score and rank results by match frequencies and keyword informativeness scores
Look at distribution of scores and display the top results
Classification NBTree Classifier
Training set: 34 instances Test set: 38 instances Attributes: 17
Accuracy - 94.74% Precision - 0.947 Recall - 0.947 F-Measure - .947
Clustering Cobweb Clustering Algorithm
Instances: 20 Attributes: 112
Number of clusters: 17 Incorrectly clustered instances: 7.0 (i.e. 35%)