DATA COLLECTION AND CLASSIFICATION TO
FIND PRODUCTIVITY OF USER
BY USING NAÏVE BAYES ALGORITHM
MUHAMMAD MURSHID BIN RAMLAN
BACHELOR OF COMPUTER SCIENCE
(NETWORK SECURITY)
UNIVERSITI SULTAN ZAINAL ABIDIN
2018
DATA COLLECTION AND CLASSIFICATION TO FIND PRODUCTIVITY OF
USER
BY USING NAÏVE BAYES ALGORITHM
MUHAMMAD MURSHID BIN RAMLAN
Bachelor of Computer Science (Network Security)
Faculty of Informatics and Computing
Universiti Sultan Zainal Abidin, Terengganu, Malaysia
MAY 2018
i
DECLARATION
I hereby declare that this report is based on my original work except for quotations
and citations, which have been duly acknowledged. I also declare that it has not been
previously or concurrently submitted for any other degree at Universiti Sultan Zainal
Abidin or other institutions.
________________________________
Name : ..................................................
Date : ..................................................
ii
CONFIRMATION
This is to confirm that:
The research conducted and the writing of this report was under my supervison.
________________________________
Name : ..................................................
Date : ..................................................
iii
DEDICATION
In the Name of Allah Most Gracious and Most Merciful. I am grateful because He has
given me strength to complete my report for final year project entitled “Data
Collection And Classification To Find Productivity Of User By Using Naïve Bayes
Algorithm “ . I would like to express my sincere thanks and appreciation to my
supervisor, Dr. Mohamad Afendee Bin Mohamed for his guidance and understanding
in imparting his knowledge and constructive comment during the course of this
project. I would like to express my gratitude to my beloved family and friends for
giving me moral support and encouragements. Last but not least, I would like to thank
any person that contributes to my project and guide me throughout the preparation of
the project.
iv
ABSTRACT
Nowadays, the use of internet are increasingly used. This include for the people
who are working in the office environment since many of the company providing
computers for their employee, thus it is hard for us to determine whether what we
are browsing while on the internet are productive or not. The next problem arise is
there are no system that can track what the user are browsing. Furthermore, the
admin cannot see the activities of the user/client since the admin tend to have
limited time to supervise all his client at one time. The admin also don’t know how
long the user really doing their job thus its hard to know the productive and
unproductive underlings. Thus, this project are carried out to design a good and
user friendly system to collect the data and classify the data by using a classifier
algorithm. We will do the data collection based on the history log of the user’s
computer. Then we will classify the data collected and produce an output whether
the user’s browsing history is productive or not. Next objective is to develop the
Naive Bayes algorithm in the system and to test the algorithm to classify the data.
The expected output will be shown in the form of pie chart. In the conclusion, by
developing this system, we can measure the productiveness of an employee and
ourselves. Thus we can take further action or we can reflect to ourselves and
improve to become a better and productive person.
v
ABSTRAK
Pada masa kini, penggunaan internet semakin banyak digunakan.Ini termasuk
bagi orang-orang yang bekerja dalam persekitaran pejabat kerana kebanyakan
syarikat menyediakan komputer untuk kakitangan mereka. Oleh itu ia adalah
sukar bagi kita untuk menentukan sama ada apa yang kita lakukan semasa di
internet adalah produktif atau tidak.Masalah seterusnya yang timbul adalah
masih tiada sistem yang boleh mengesan pengguna melayari internet. Selain itu,
admin tidak dapat melihat aktiviti-aktiviti pengguna/pelanggan kerana admin
cenderung untuk mempunyai masa yang terhad untuk mengawasi semua
pelanggan beliau pada satu masa. Admin juga tidak tahu berapa lama pengguna
benar-benar melakukan tugas mereka maka ia sukar untuk tahu pekerja bawahan
yang produktif dan tidak produktif.Oleh itu, projek ini akan dijalankan untuk
menginventasi yang lebih baik dan sistem mesra pengguna untuk mengumpul data
dan mengkelaskan data dengan menggunakan algoritma Pengelas.Kami akan
melakukan pengumpulan data berdasarkan log lawatan komputer
pengguna.Kemudian kami akan mengelaskan data yang dikumpulkan dan
menghasilkan output yang sama ada lawatan pelayaran pengguna adalah
produktif atau tidak.Objektif seterusnya ialah untuk membangunkan algoritma
Naive Bayes dalam sistem dan untuk menguji algoritma untuk mengelaskan data.
Keputusan yang dijangkakan akan ditunjukkan dalam bentuk Carta pai.
Kesimpulannya, dengan membangunkan sistem ini, kita boleh mengukur tahap
produktiviti pekerja dan diri kita sendiri. Oleh itu kita boleh mengambil tindakan
selanjutnya atau kita boleh muhasabah kepada diri kita dan memperbaiki untuk
menjadi orang yang lebih baik dan produktif.
vi
CONTENTS
PAGE
DECLARATION I
CONFIRMATION Ii
DEDICATION Iii
ABSTRACT Iv
ABSTRAK V
CONTENTS Vi
LIST OF TABLES Vii
LIST OF FIGURES Viii
LIST OF ABBREVIATIONS X
CHAPTER I INTRODUCTION
1.1 Introduction 1
1.2 Problem statement 3
1.3 Objectives 3
1.4
1.5
Scopes
Limitation Of Work
4
4
1.6 Expected Result 5
1.7 Report Organization 6
CHAPTER II LITERATURE REVIEW
2.1 Introduction 8
2.2 Techniques 9
2.2.1 Performance Analysis of Naive Bayes and
J48 Classification Algorithm for Data
Classification
2.2.2 A Survey of Online Activity Recognition
Using Mobile Phones
2.2.3 A Goal-based Classification of Web
Information Task
10
11
12
vii
2.2.4 A Hybrid Learning System for
Recognizing User Tasks from Desktop Activities
and Email Messages
13
2.2.5 Self-Adaptive Attribute Weighting for
Naive Bayes Classification
14
2.2.6 A Review Article On Naive Bayes
Classifier With Various Smoothing Techniques
15
2.2.7 Efficient Manageability and Intelligent
Classification of Web Browsing History Using
Machine Learning
16
2.3 Summary Of Literature review 17
2.4 Summary 20
CHAPTER 3
RESEARCH METHODOLOGY
3.1 Introduction 21
3.2 System Model 22
3.2.1 Requirement and Specification 23
3.2.2 Design 23
3.2.3 Development 23
3.2.4 Integration and Testing 24
3.2.5 Deployment of System 24
3.2.6 Maintenance 24
3.3 System Requirement and Specification 25
3.3.1 Hardware 25
3.3.2 Software 26
3.4 Framework 28
3.5 List Of Website 29
3.6 Summary 33
REFERENCES 34
viii
LIST OF TABLES
TABLE TITLE PAGE
2.3 Summary Of Literature Review 17
3.3.1 Hardware 25
3.3.2 Software 26
ix
LIST OF FIGURES
FIGURE TITLE PAGE
2.2.6 Naïve Bayes Classifier 15
3.2 Waterfall Model 22
3.4 Framework 28
3.4.3 Example of converted JSON format into readable xml
format
29
3.4.4 History Export 30
3.4.5 Example of output after converted into excel format 30
x
LIST OF ABBREVIATIONS / TERMS / SYMBOLS
FYP Final year project
GA Genetic algorithm
HCI Human computer interface
xi
LIST OF APPENDICES
APPENDIX TITLE PAGE
A Appendix 1 80
B Appendix 2 81
C Appendix 3 82
D Appendix 4 83
1
CHAPTER I
INTRODUCTION
1.1 Background
Data collection is the process of gathering and measuring information on targeted
variables in an established systematic way, which then enables one to answer relevant
questions and evaluate the outcomes. Data collection is a component of research in all
fields of study including social sciences, humanities, and business. The goal for all
data collection is to capture quality evidence that allows analysis to lead to the
formulation of convincing and credible answers to the question. Classification of the
data is based on the categorization that we will set on the naïve Bayes algorithm. Thus
in this system, I will apply the data collection technique to classify the data based on
the history taken from the browser log. The data that I will collect is from the browser
history of the user computer. Thus it will later show us the activities while browsing is
productive or unproductive.
The Bayesian Classification represents a supervised learning method as well as a
statistical method for classification. Assumes an underlying probabilistic model and it
allows us to capture uncertainty about the model in a principled way by determining
probabilities of the outcomes. It can solve diagnostic and predictive problems. This
Classification is named after Thomas Bayes (1702-1761), who proposed the Bayes
2
Theorem. Bayesian classification provides practical learning algorithms and prior
knowledge and observed data can be combined. Bayesian Classification provides a
useful perspective for understanding and evaluating many learning algorithms. It
calculates explicit probabilities for hypothesis and it is robust to noise in input data[6].
3
1.2 Problem Statement
The problems that make this program develop are:-
i) The admin cannot see the activities of the user/client since the admin tend to have
limited time to supervise all his client at one time.
ii) The admin also don’t know how long the user really doing their job thus its hard to
know the productive and unproductive underlings.
1.3 OBJECTIVES
The objective of this program are :-
i) To study about the activities carried out by the target user and to categorize them
whether it productive or not.
ii) To develop the Naive Bayes algorithm in the system.
iii) To test the algorithm to classify the data.
4
1.4 SCOPE
This system will involve the user and the admin.
Admin:-
i)Monitor and analyze the information from the client.
ii)Request data from User/client
User:-
i) Target user that will give their data to the admin for further purpose.
ii) Send the requested information to the admin.
1.5 Limitation Of Work
As we know every system has its limitation which are :-
i) This application has very limited functionalities
-It can only classify the gained data from the user/client.
ii) This system will not detect when the user use incognito windows.
iii) Still cannot figure out if the history log is deleted from the browser.
5
1.5 Expected Result
i) To design a system that help user to know the productivity while browsing on the
internet.
ii) The system will be able to help employer to know the his underling performance.
iii) The system that is easy to use
6
1.7 Report Organization
This report consists of 5 chapters that contain information, description and each
section has served a different purpose that has been discussing the project
Chapter 1: Introduction
This chapter shows description and definition about data collection application.
Beside, objective and project scope will showed in this chapter.
Chapter 2: Literature Review
In this chapter, related research paper will state. The difference technology will
showed in table by comparing the advantage and disadvantage. It describes the
research about the existing system. Basically, the difficulties and other problems are
analyzed for improvements. Methods, techniques, equipment, and appropriate
technologies are studied to develop the application.
Chapter 3: Methodology
Type of methodology implemented in this supplication will show. Besides that,
technique or algorithm proposed and implemented will be stated. Framework of data
collection will be shown. Its also discussing about the methodology to be used in the
project. The methodology will be act as a guide for the development process and also
7
helps to make sure the project will runs smoothly as planned. In this chapter also
include system requirement and specification that will be used to assist the
development of the project.
Chapter 4: Implementation and Result
Involves implementation and testing whereby the application being developed
and implemented the method or algorithm and the process testing the
application.
Chapter 5: Conclusion
the result will be discussed, and the conclusion was made. This section also
describes the achievement of the expected results, expectations and suggestion
for improvement and enhancement to the results of the proposed project
8
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
This section discusses and portrays about literature review for Naïve Bayes
classifier that being used for previous research. A literature review is about past
research or recent research or what need to search or seek the truth for the
purpose portraying or illustrate the research problem, solutions and the
importance of seeking a solution. A literature review is not about information
gathering. In a given subject or chosen topic area, the literature review shows
in-depth grasp and summarize prior research that linked to the research subject.
Literature review involves the process of reading journal, articles, books and
research paper and later on analysing, summarize and evaluate the reading
based on connection to the project. It is a guideline for establishes the
credibility for the better project.
9
2.2 Techniques
Nowadays, there are many ways technologies and technique can be used to classify
the data. The efficiency depend on the technology used and what environment to use
so that the effectiveness of an algorithm is perfect and suitable. Below are some of the
research and development that has been carried out by certain developer and also
different approach between them to complete their project.
10
2.2.1 Performance Analysis of Naive Bayes and J48 Classification Algorithm for
Data Classification
In this article, it tell us that classification is an important data mining technique with
broad applications to classify the various kinds of data used in nearly every field of
our life. Next, classification is used to classify the item according to the features of the
item with respect to the predefined set of classes. This paper put a light on
performance evaluation based on the correct and incorrect instances of data
classification using Naïve Bayes and J48 classification algorithm. Naive Bayes
algorithm is based on probability and j48 algorithm is based on decision tree. They
make a research paper sets out to make comparative evaluation of classifiers NAIVE
BAYES AND J48 in the context of bank dataset to maximize true positive rate and
minimize false positive rate of defaulters rather than achieving only higher
classification accuracy using WEKA tool. The experiments results shown in this paper
are about classification accuracy, sensitivity and specificity. The results in the paper
on this dataset also show that the efficiency and accuracy of j48 is better than that of
Naïve Bayes.
11
2.2.2 A Survey of Online Activity Recognition Using Mobile Phones
In this research paper, it tell us that physical activity recognition using embedded
sensors has enabled many context-aware applications in different areas, such as
healthcare. Initially, one or more dedicated wearable sensors were used for such
applications. However, recently, many researchers started using mobile phones for
this purpose, since these ubiquitous devices are equipped with various sensors,
ranging from accelerometers to magnetic field sensors. In most of the current studies,
sensor data collected for activity recognition are analyzed offline using machine
learning tools. However, there is now a trend towards implementing activity
recognition systems on these devices in an online manner, since modern mobile
phones have become more powerful in terms of available resources, such as CPU,
memory and battery. The research on offline activity recognition has been reviewed in
several earlier studies in detail. However, work done on online activity recognition is
still in its infancy and is yet to be reviewed. In this paper, they had review the studies
done so far that implement activity recognition systems on mobile phones and use
only their on-board sensors. They also discuss various aspects of these studies.
Moreover, they discuss their limitations and present various recommendations for
future research.
12
2.2.3 A Goal-based Classification of Web Information Task
In this research paper, I found that they are conducting research using search engines
and online library portals, read the daily news and favourite comic online,
communicate with others increasingly through email and blogs, and have become
accomplished fact checkers thanks to Google. However, they found researchers still
lack a solid understanding of the types of activities and tasks in which users engage on
the Web. There are several reasons for this lack of understanding. First, the Web is a
moving target and is continually changing and evolving. They give an example where
the typical user has changed substantially since the early 1990s when the average web
user was a young, technically inclined male (Hawkey and Inkpen, 2005b). Also, the
Web now supports a much wider range of activities and uses. Examples include the
increase in web-based email; new sophisticated web-based travel and map
applications; and the popularity of online support and blog communities.
13
2.2.4 A Hybrid Learning System for Recognizing User Tasks from Desktop
Activities and Email Messages
What I understand in this research paper is they develop an application named
TaskTracer. It is a system seeks to help multi-tasking users manage the resources that
they create and access while carrying out their work activities. It does this by
associating with each user-defined activity the set of files, folders, email messages,
contacts, and web pages that the user accesses when performing that activity. The
initial TaskTracer system relies on the user to notify the system each time the user
changes activities. However, this is burdensome, and users often forget to tell
TaskTracer what activity they are working on. This paper they introduces
TaskPredictor, a machine learning system that attempts to predict the user’s current
activity. TaskPredictor has two components: one for general desktop activity and
another specifically for email. TaskPredictor achieves high prediction precision by
combining three techniques:
(a) feature selection via mutual information,
(b) classification based on a confidence threshold, and
(c) a hybrid design in which a Naive Bayes classifier estimates the classification
confidence but where the actual classification decision is made by a support vector
machine
14
2.2.5 Self-Adaptive Attribute Weighting for Naive Bayes Classification
In this paper, they propose a new Artificial Immune System (AIS) based self-adaptive
attribute weighting method for Naive Bayes classification. The proposed method that
they build namely AISWNB, uses immunity theory in artificial immune systems to
search optimal attribute weight values, where self-adjusted weight values will
alleviate the conditional independence assumption and help calculate the conditional
probability in an accurate way. One noticeable advantage of AISWNB is that the
unique immune system based evolutionary computation process, including
initialization, clone, section, and mutation, ensures that AISWNB can adjust itself to
the data without explicit specification of functional or distributional forms of the
underlying model. As a result, the AISWNB can obtain good attribute weight values
during the learning process. Experiments and comparisons on 36 machine learning
benchmark data sets and six image classification data sets demonstrate that AISWNB
significantly outperforms its peers in classification accuracy, class probability
estimation, and class ranking performance.
15
2.2.6 A Review Article On Naive Bayes Classifier With Various Smoothing
Techniques
Figure 2.2.6 Naïve Bayes Classifier
In this research paper they tell me more about the Naive Bayes that is very popular in
commercial and open-source anti-spam e-mail filters. However, several forms of
Naive Bayes, something the anti-spam literature does not always acknowledge. They
discuss five different versions of Naive Bayes, and compare them on six new, non-
encoded datasets, that contain ham messages of particular Enron users and fresh spam
messages. The new datasets, which we make publicly available, are more realistic
than previous comparable benchmarks, because they maintain the temporal order of
the messages in the two categories, and they emulate the varying proportion of spam
and ham messages that users receive over time. In this paper they have discovered
various aspects of Naïve Bayes Classifier and smoothing techniques for extraction of
useful data along with their research criteria.
16
2.2.7 Efficient Manageability and Intelligent Classification of Web Browsing
History Using Machine Learning
In this paper, he have a workable solution implemented by using machine learning and
natural language processing techniques for efficient manageability of User’s browsing
history. The significance of adding such a capability to a Web browser is that it
ensures efficient and quick information retrieval from browsing history, which
currently is very challenging. His purpose solution can guarantees that any important
websites visited in the past can be easily accessible because of the intelligent and
automatic classification.
In a nutshell, his solution-based paper provides an implementation as a browser
extension by intelligently classifying the browsing history into most relevant category
automatically without any user’s intervention. This guarantees no information is lost
and increases productivity by saving time spent revisiting websites that were of much
importance.
17
2.3 Summary Of Literature review
No. Title Author Description
1. Performance Analysis of
Naive Bayes and J48
Classification Algorithm for
Data Classification
Tina R. Patil,
Mrs. S. S.
Sherekar
In this article it tell us more
about how to classify the data
by using the naive bayes and
they compare it with the J48
decision tree. So the expected
result is they want to
compare both of the
technique.
2. A Survey of Online Activity
Recognition Using Mobile
Phones
Muhammad
Shoaib ,
Stephan
Bosch ,
Ozlem
Durmaz Incel
, Hans
Scholten 1
and Paul J.M.
Havinga
In this research paper, they
are recognizing using
embedded sensors has
enabled many context-aware
applications in different
areas, such as healthcare.
Then they review the studies
done so far that implement
activity recognition systems
on mobile phones and use
only their on-board sensors
3. A Goal-based Classification
of Web Information Tasks
Melanie
Kellar,
Carolyn
Watters,
Michael
Shepherd
Based on their analysis, the
participants’ record tasks
during the field study, as well
as previous research, they
have developed a goalbased
classification of information
tasks which describes user
activities on the Web
4. A Hybrid Learning System
for Recognizing User Tasks
from Desktop Activities and
Email Messages
Jianqiang
Shen, Lida
Li, Thomas
G. Dietterich,
Jonathan L.
Herlocker
In this research paper, they
build a TaskTracer system
seeks to help multi-tasking
users manage the resources
that they create and access
while carrying out their work
activities. This TaskTracer is
based on two main function
which the behavior of the
user at the desktop is a
mixture of different activities
and each activity is
associated with a set of
18
resources relevant to that
activity
5. Self-Adaptive Attribute
Weighting for Naive Bayes
Classification
Jia Wua,b,
Shirui Panb ,
Xingquan
Zhuc , Zhihua
Caia , Peng
Zhangb ,
Chengqi
Zhangb
In this research paper, they
are going to search optimal
attribute weight values,
where self-adjusted weight
values will alleviate the
conditional independence
assumption and help
calculate the conditional
probability in an accurate
way. They are also propose a
new Artificial Immune
System (AIS) based self-
adaptive attribute weighting
method for Naive Bayes
classification
6
A Review Article On Naive
Bayes Classifier With
Various Smoothing
Techniques
Gurneet
Kaur, Er.
Neelam
Oberai
In this paper they tell us that
there is various classification
methods developed, but the
choice of using these
techniques mainly depend
upon the type of data
collections. Some Classifiers
are discussed. Few methods
perform well on numerical
and text data like Naive
Bayes but neural networks
handle both discrete and
continuous data. KNN is a
time consuming method and
finding the optimal value is
always an issue. Decision
tree reduces the complexity
but fails to handle continuous
data. Naïve Bayes along with
its simplicity is
computationally cheap also.
In the second section of the
paper, Naïve Bayes classifier
is discussed in detail. One of
the major drawback of Naïve
Bayes is of unseen words,
19
which can be eliminated by
applying smoothing
techniques. In the third
section, there are various
smoothing methods when
applied on Naïve Byes are
discussed and their
performances are compared.
7 Efficient Manageability and
Intelligent Classification of
Web Browsing History
Using Machine Learning
Suraj G. ,
Sumantha
Udupa U
This paper deals with ways
of improving browser
capability by efficiently
managing browsing history.
The endeavor is to provide
technology solutions that
enable, extend, and
differentiate the
transformation of a browser
in maintaining websites
visited by users. This
historical information is an
essential part of our everyday
operations, but its huge
quantity and very poor
organizing capability makes
it difficult and time
consuming to retrieve it
according to User’s
preferences. Web Page
Browsing is one of the most
important ways for people to
obtain information. Every
visit has some visiting
motivation, and contains a
certain interest of the Users.
Managing the browsing
history also helps in
developing a web
personalization.
Table 2.3 Summary Of Literature Review
20
2.4 Summary
In this phase, it will deliver the information the study of the past research about type
of classifier used, current system or application and articles from newspaper and
website. The research on the different techniques to be use in the system is important
to ensure the best and most suitable technique is applied in the system. This study is
more focus to do the development and guidance to make a successful project, and also
come out with the new system that will give benefits to the user.
21
Chapter 3
Research Methodology
3.1 Introduction
This chapter will explain the specific details on the methodology being used in order
to develop this project. In order to make sure the project is in the right path,
methodology plays an important role as a guide for the project to complete and
working well as plan. There is different type of methodology that is used for different
type of application. It is important to choose the right and suitable methodology for
the development of an application thus it is necessary to understand the application
functionality itself.
There are many advantage of using waterfall model. One of the advantage are, the
model is simple and easy to understand and use it. Secondly, it is easy to be manage
due the rigidity of the model, this is because each phase has specific deliverables and
a review process. Furthermore, at each of the phases they will processed and
complete one at a time so that the phases will not overlap. Lastly, the waterfall model
works well for smaller project where the requirement are very well to be understand.
22
3.2 System Model
Figure 3.2 Waterfall Model
23
3.2.1 Requirement and Specification
During this phase, all possible requirements of the system to be developed are
captured in this phase by gathered from client and documented in a requirement
specification document
3.2.2 Design
Design is where the technical part of the system development. For this phase, the
requirement specifications from first phase are studied in this phase and the system
design is prepared. This system design helps in specifying which the best hardware
and system that suit to develop this system and helps in defining the overall system
architecture.
3.2.3 Development
With inputs from the system design, the system is first developed in small programs
called units, which are integrated in the next phase. Each unit is developed and tested
for its functionality, which is referred to as Unit Testing. Information has been
gathered and design are created. The system is using JAVA language. This system
will be success if there has no error in coding and follow all specification of the
system.
24
3.2.4 Integration and Testing
All the units developed in the implementation phase are integrated into a system after
testing of each unit. Post integration the entire system is tested for any faults, failures
and error. In this phase, it will be tested with real hardware and test the software to
verify that it is built follow as the specifications given by the client.
3.2.5 Deployment of System
When the testing process of functional and non-functional is done, product are ready
to be use by user. The product is deployed in the customer environment or released
into the market.
3.2.6 Maintenance
Once your system is ready to use, you may later require change the code depends on
customer request if that product has some issues which come up in the client
environment. Also to enhance the product some better versions are released.
Maintenance is done to deliver these changes in the customer environment.
All these phases are require to each other in which progress is seen as flowing steadily
downwards (like a waterfall) through the phases. The next phase is started only after
the defined set of goals are achieved for previous phase and it is signed off, so the
name "Waterfall Model". If previous phase are not settle, process to next phase can
not be executed. By using this model, phases do not overlap.
25
3.3 System Requirement and Specification
System requirement is needed to accomplish this project and assist the
development of the project that involves system requirement in hardware and
software. Each of these requirement is related to each other to make sure that system
can be done smoothly.
3.3.1 HARDWARE
No. Hardware Description
1 Laptop(Hp Pavilion g4) Processor: Intel Core i5 7th Generation
RAM: 6 GB
OS version: Windows 32/64 bit
2 Printer (Hp Ink jet) Used for printing document
Table 3.3.1 Hardware
26
3.3.2 SOFTWARE
No. Software Description
1 Web Historian Read the history/log from the computer.
2 History export Export history from the computer in the json
format.
3 Notepad++ Notepad++ is a source code editor and Notepad
replacement that supports several languages.
Running in the MS Windows environment, its use
is governed by GPL License.
4 Google Chrome Google Chrome used to run localhost server and
web based system.
5 Microsoft Word 2013 Microsoft Word used for word processing, such as
creating and editing report and documentation.
6 Microsoft Powerpoint
2013
To present the result and the findings of this
project.
7 Snipping Tool
Used to captured and screen shot the images
8 Netbeans IDE 8.0.2 Platform to coding writing for develop the system
9 CodeBeutify To convert from json into xml format.
27
10 Microsoft Excel 2013 To represent data from the converted into readable
by the code.
11 Adobe Photoshop CS6
Adobe Photoshop is a software program for
photographers, graphic designers, web designers,
videographers, and 3D artists use to enhance and
manipulate photos to make it more clearly and
beautiful.
12 MATLAB Platform to coding writing for develop the system
Table 3.3.2 Software
28
3.4 FRAMEWORK
Figure 3.4.1 Framework
29
Figure 3.4.2 : Example Web Historian graph for todays activities.
Figure 3.4.3 : Example of converted Json format into readable xml format.
30
Figure 3.4.4 : There are three option available in the history export which is last day,
last week and to view all the history.
Figure 3.4.5 Example of output after converted into excel format.
31
3.5 LIST OF WEBSITES
Gaming:
• Kongregate
• MiniClip
• Addicting Games
• Armor Games
• Newgrounds
• Crazy Monkey Games
• PopCap
• Yahoo Games
• Bgames
Chatting:
• Second Life
• Paltalk
• IMVU
• Badoo.com
• Charmdate.com
• Enterchatroom
Academic:
• Khan academy
• Tutorials point
• Google scholar
• Wikipedia
• Codecademy
• HTML Dog's Beginning HTML Guide
• Ruby on Rails Tutorial
• Mozilla Developer Network
32
Shopping:
• Lazada
• Zalora
• Next
• Uniqlo
• AliExpress
• 11 Street
33
3.6 Summary
Methodology is very important in system and application development. There also a
lots of different software development methodology that available and can be used to
develop any kind of application. The right methodology can help the project to be
done according to the specified time. The activities in each phase in the methodology
are explained so that it can be understood easily.
34
REFERENCES
[1] https://www.finder.com/my/online-shopping
[2] https://www.upwork.com/blog/2014/03/10-best-web-development-tutorials-
beginners/
[3] http://thegeekdesire.com/best-free-chat-rooms-to-make-new-friends.html
[4] https://www.quora.com/What-is-the-best-online-games-site
[5] Suraj G. And Sumantha Udupa U. , Efficient Manageability And Intelligent
Classification Of Web Browsing History Using Machine Learning
[6] Tina R. Patil, Mrs. S. S. Sherekar , Performance Analysis Of Naive Bayes And J48
Classification Algorithm For Data Classification
[7] Muhammad Shoaib Et Al. , A Survey Of Online Activity Recognition Using
Mobile Phones
[8] Melanie Kellar And Carolyn Watters , A Goal-Based Classification Of Web
Information Tasks
[9] Jianqiang Shen Et Al. , A Hybrid Learning System For Recognizing User Tasks
From Desktop Activities And Email Messages
[10] Jia Wua,B Et Al. , Self-Adaptive Attribute Weighting For Naive Bayes
Classification
[11] Gurneet Kaur And Er. Neelam Oberai , A Review Article On Naive Bayes
Classifier With Various Smoothing Techniques