senior year project

72
I EMAIL FILTERING AND ANALYSIS USING CLASSIFICATION ALGORITHMS Submitted in partial fulfillment of the requirements of the degree of Bachelor of Engineering in Information Technology By Akshay Iyer Dipti Pamnani Akanksha Pandey Karmanya Pathak Supervisor: Mrs. Jayshree Hajgude Department of Information Technology Vivekanand Education Society’s Institute of Technology 2013-14

Upload: akshay-iyer

Post on 14-Apr-2017

82 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Senior Year Project

I

EMAIL FILTERING AND ANALYSIS

USING CLASSIFICATION ALGORITHMS

Submitted in partial fulfillment of the requirements

of the degree of

Bachelor of Engineering in Information Technology

By

Akshay Iyer

Dipti Pamnani

Akanksha Pandey

Karmanya Pathak

Supervisor:

Mrs. Jayshree Hajgude

Department of Information Technology

Vivekanand Education Society’s Institute of Technology

2013-14

Page 2: Senior Year Project

II

Project Report Approval for B. E.

This project report entitled EMAIL FILTERING AND ANALYSIS USING

CLASSIFICATION ALGORITHMS by Akshay Iyer, Dipti Pamnani,

Akanksha Pandey, and Karmanya Pathak is approved for the degree of

Bachelor of Engineering in Information Technology.

Examiners

1.---------------------------------------------

2.---------------------------------------------

Supervisors

1.---------------------------------------------

2.---------------------------------------------

Chairman

-----------------------------------------------

Date:

Place:

Page 3: Senior Year Project

III

Declaration

I declare that this written submission represents my ideas in my own words and

where others' ideas or words have been included, I have adequately cited and

referenced the original sources. I also declare that I have adhered to all principles

of academic honesty and integrity and have not misrepresented or fabricated or

falsified any idea/data/fact/source in my submission. I understand that any

violation of the above will be cause for disciplinary action by the Institute and can

also evoke penal action from the sources which have thus not been properly cited

or from whom proper permission has not been taken when needed.

-----------------------------------------

Akshay Iyer

-----------------------------------------

Dipti Pamnani

-----------------------------------------

Akanksha Pandey

-----------------------------------------

Karmanya Pathak

Date:

Page 4: Senior Year Project

IV

ACKNOWLEDGEMENT

This project has been a great learning experience for us. Through the course of this year, we have worked

as a team for the successful completion of this project. Though, on paper it is only us who have made this

project, in reality there are some people without whom this project could not have been finalized and

designed the way it looks now.

First of all, we would like to thank our Principal, Dr.(Mrs.) J.M.Nair, and our Vice-Principal, Dr.

S.Mukhopadhyay for their support and guidance throughout the project implementation period. Without

their help, the project would not have been possible.

First of all, we are truly indebted to our internal project guide Mrs. Jayshree Hajgude, for her immense

guidance and support. She has encouraged us and channelized our enthusiasm effectively.

We would like to thank, Mrs. Vijayalakshmi Muralidharan, HOD of Information Technology Department.

We would also like to thank our lab in charges, Mr. Amar Jaiswar and Mr. Ulhas Pawar, who have been

very kind to us.

Lastly, but not the least, we want to thank our college, Vivekanand Education Society of Institute and

Technology, for providing us with the excellent reference materials and great computing facilities.

Page 5: Senior Year Project

V

ABSTRACT

With the various developments that are taking place in the field of technology especially in the

communication department, there are a wide variety of malpractices that are being taking place which

might prove harmful to the user. Most of this is currently being observed in the Email Account of a user.

The Email user has an Inbox which consists of a wide variety of mails, and these mails are present in an

unorganized manner. Also some mails which are being received by the user may contain harmful content

which may prove to have severe consequences (Normally Termed As Spam). With this idea in mind, the

topic of our BE Project is Email Filtering.

Email Filtering is the process which is used in order to classify the Emails intro various categories on the

basis of their content. The application fetches the emails from a user’s id, and stores it in a server, it then

classifies it into spam and non spam using classification algorithms, and also it classifies it into user

defined categories on the basis of the keyword entered by the user. The user can also send, forward and

reply to a particular mail. There is also a lot of historical spam analysis done by the application on the

basis of the content downloaded by the user. The user can access, read, store and copy the contents of his

Email.

The project report begins with a small introduction about Email Filtering and the reason we have chosen

this topic. This is then followed by the Literature Survey, which tells the various areas where you can find

similar operations being performed, and the various features of Email Filtering. We have also explained

about the Algorithms which we are going to use in order to classify the Emails.

The project then focuses of the Implementation Flow, and various Use Case Designs, which will help in

better understanding of the various features of the project. This chapter is then followed by the actual

implementation code of the project where, you will find information about the various snippets of the

code that are a part of the project. Also, detailed explanation regarding each window of the Email

Filtering application has been written down for the user. The next chapter will display the screenshots of

the Email Filtering, and the various analyses which has been performed by the application, different types

of graphical information is also made visible. This chapter is then followed by the conclusion and the

future scope of the project as to what all features are going to be implemented in the future. The last

chapter consists of a list of references which have played an important role in bringing about the

completion of the project.

Page 6: Senior Year Project

VI

Table of Contents 1. INTRODUCTION

1.1. What is Email Filtering…………………………………………………......2

1.2. Motivation…………………………………………………………………..3

1.3. Problem Definition ………………...………………...…………………..…4

1.4. Objectives…………………………………………………………………...5

2. LITERATURE SURVEY

2.1. Application………………………………………………………………....7

2.2. Issues Faced…………………………………………………………….......8

2.3. Different areas of Applications……………………………………………..9

3. ANALYSIS

3.1. C4.5 Algorithm…………………………………………………………......11

3.2. Naïve Bayes Algorithm………………………………………………….....12

3.3. Formulae…………………………………………………………………....15

4. DESIGN

4.1. Implementation Flow……….…………………………………………..…..17

4.2. Use Case Diagram………….…………………………………………….....19

4.3. Class Diagram…………….………………………………………………...20

4.4. Activity Diagram………….….………………………………………….….21

5. IMPLEMENTATION

5.1. The Connection Dialog Box……..…………………………………………23

5.2. The Email Client Window………..………………………………………...28

5.3. The Message Dialog Box....……….……….…………………………….....38

5.4. The File Chooser…………………….…………………………...................39

5.5. The Downloading Dialog Box……….……………………………………..40

5.6. The Analysis Window………………………………………………........... 41

6. RESULTS………………………………………………...................................49

7. CONCLUSION………………………………………………...........................59

8. FUTURE SCOPE………………………………………………........................61

9. REFERENCES………………………………………………............................64

Page 7: Senior Year Project

VII

LIST OF IMAGES

S. NO IMAGE PG. NO

1 A graph showing the rate of spam and its increase in the past few years 3

2 The Gmail Inbox which has user various folders in which mails get classified 9

3 A logo of the Apache Spam Assassin 9

4 Implementation Flow 17,18

5 Use Case Diagram 19

6 Class Diagram 20

7 Activity Diagram 21

8 A screenshot of the connect dialog window. 49

9 A screenshot of the home screen which opens once the user is logging in 49

10 A Screenshot of the Main Page where all operations can be performed 50

11 A Screenshot of the message viewer tab 50

12 The Save Dialog Box Appears when store in PC has been clicked 51

13 A Screenshot of the Messaging Tab 51

14 A Screenshot of New Message box 52

15 A Screenshot of Reply Message box 52

16 A Screenshot of Forward Message Box 52

17 A screenshot of the credits page 53

18 The Message Dialog 53

19 The File Chooser 54

20 The Downloading Dialog 54

21 A Screenshot of the Statistics tab 55

22 The Annual Spam Rate Report 55

23 The Monthly Spam Rate Report 56

24 The Weekly Spam Rate Report 56

25 Comparative Spam Rate Report 57

26 User Defined Messages Quantity 57

LIST OF TABLES

S. NO TABLE PG. NO

1 The structure of the login details table 26

2 The structure of the main table where all the mails are stored 26

3 The structure of the keyword table where all the keywords are stored 27

Page 8: Senior Year Project

1

CHAPTER 1

INTRODUCTION

Page 9: Senior Year Project

2

1.1 What is Email Filtering?

Email Filtering refers to the classification of an account’s emails based on two types of emails:

Spam and

Non-Spam.

The user first logs in to his account using the valid id and password. Upon logging in, the user’s mails

are fetched in the database and are classified into spam and non-spam. The user can also create

custom labels which are classified using keywords provided by the user. Also, he can browse for the

unread or read emails. This makes the mail service easy and user friendly.

A basic task in email filtering is to mine the data from an email and to classify it into the different

categories using Data Mining classification algorithm. Decision Tree Classification is a method

commonly used in data mining.

Email Filtering involves spam filtering, generalized filtering and segregation and filtering of inbound

emails. Spam mails are filtered since they are not important to most of the users. Generalized filtering

and segregation of emails is segregation of the mails into different categories such as sent and non-

spam.

Companies filter outbound emails so that sensitive data regarding the working of the company do not

leak intentionally or accidentally by emails.

To summarize email filtering

Segregates inbound mails into different categories.

Filters inbound mails so as not to leak sensitive information.

The different categories in which the emails are classified are:

Spam

Non- Spam

Also, the user can define categories as per his choice and can set the values as per the user’s choice.

The user can enter the values, and these values will get associated with all the mails that have been

calculated.

Page 10: Senior Year Project

3

1.2. Motivation for this domain

With the increase in the internet users, communication and transfer of files and data through different

methods over the internet has increased drastically. In such times, it is difficult to know what kinds of

emails are entering your organisation or system.

Most of the present filtering techniques are unable to handle frequent changing scenario of mails

adopted by the senders over the time.

A graph showing the rate of spam and its increase in the past few years

In absolute numbers, the average number of spam mails sent per day increased from 2.4 billion in

2002 to 300 billion in 2010.

Google today announced it has made security improvements to Gmail to further protect users’ emails

from snooping. Gmail now always uses an encrypted HTTPS connection when you check or send

email, and encrypts all messages moving internally on Google’s servers.

With the advent of growth in technology, desktop based email applications are more increasingly

used. Outlook express has changed the way the world read’s and communicates with the help of

Email.

Page 11: Senior Year Project

4

1.3. Problem Definition

As the Internet grows at a phenomenal rate, electronic mail (abbreviated as E-mail) has become a

widely used electronic form of communication on the Internet. Every day, a huge number of people

exchange messages in this fast and inexpensive way. With the excitement on electronic commerce

growing, the usage of E-mail will increase more dramatically. However, the advantages of E-mail also

make it overused by companies, organizations or people to promote products and spread information,

which serves their own purposes. The mailbox of a user may often be crammed with E-mail messages

some or even a large portion of which are not of interest to her/him. Searching for interesting

messages everyday is becoming tedious and annoying. As a consequence, a personal E-mail filter is

indeed needed.

In recent years the highest degree of communication happens through e-mails which are often affected

by passive or active attacks. Effective e-mail filtering measures are the timely requirement to handle

such attacks. The basic idea behind e-mail filtering is to organize the incoming e-mails and also

employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or

other criteria.

The purpose of our application is to classify the incoming mails into different categories as follows:

Spam and

Non Spam

Also there are various other categories which can be created and defined by the user himself which

are stated as shown.

Facebook

Flipkart

Amazon

MakeMyTrip

Page 12: Senior Year Project

5

1.4. Objectives

User Interactive

Whenever the user would like to bring about some modifications to his particular application, he

would be able to achieve it easily and without any glitches.

The user would be able to use the application as per his requirements and reap the benefits of the

same.

Security

Security is also an important issue which needs to be considered before going about the actual

procedure and hence the user should be able use his client application in an extremely safe and

sophisticated manner without any fear of security breaks, and SQL attacks.

Spam Detection

This is the major aim of our project and we aim at bringing about the classification of mails, as per the

presence of malicious content which may be harmful for the user computer and hence has been

regarded as spam.

User Defined Mail Analysis

This is a new feature which would be included in our project

According to this, the user can define his own keyword, and on the basis of that, he can access his

mails easily and without any glitch.

The user himself will define the keywords, and on the basis of the keywords that have been defined,

he can clearly check all the concerned mails under one window.

The user will be able to enter a keyword and on the basis of that keyword the mails will get classified.

Historical Spam Analysis

This is one of the features of our projects.

All the mails that have been received by the user, can be analysed over its time period, and on the

basis of that analysis, historical data, and spam detection can be brought about.

The user can easily track which mails, have had the maximum spam, and in which year did he year

the maximum amount of spam mail.

The user can do the same Monthly and Weekly

Page 13: Senior Year Project

6

CHAPTER 2

LITERATURE SURVEY

Page 14: Senior Year Project

7

2.1. Different areas of Application

Spam Filtering

With the advent of Internet, the number of spam mails has increased too.

A spam filter is a program that is used to detect unsolicited and unwanted email and prevent those

messages from getting to a user’s inbox. Like other types of filtering programs, a spam filter looks for

certain criteria on which it bases judgments.

Generalized Filtering and Segregation of E-mails

Email filtering is the processing of email to organize it according to specified criteria. Most often this

refers to the automatic processing of incoming messages, but the term also applies to the intervention

of human intelligence in addition to anti-spam techniques, and to outgoing emails as well as those

being received.

Filtering mails based on classes like spam, travel, social and look out for a country-based

classification of official mails for ease of access to mails from specific sub-branches would help make

the mail service more efficient in terms of accessibility and user-friendliness.

Inbound and Outbound Filtering of E-mails

Mail filters can operate on inbound and outbound email traffic. Inbound email filtering involves

scanning messages from the Internet addressed to users protected by the filtering system or for lawful

interception.

Outbound email filtering involves the reverse – scanning email messages from local users before any

potentially harmful messages can be delivered to others on the Internet.

One method of outbound email filtering that is commonly used by Internet service

providers is transparent SMTP proxy, in which email traffic is intercepted and filtered via a

transparent proxy within the network.

Outbound filtering can also take place in an email server. Many corporations employ data leak

prevention technology in their outbound mail servers to prevent the leakage of sensitive information

via email.

Page 15: Senior Year Project

8

2.2. Issues Faced

Avoidance of vocabulary treated as Spam by Spammers

The subject and body content are chosen carefully by spammers. Being aware of terms, text

processing rules of a filter, etc. helps the spammers to use alternate words still serving the same

purpose yet not falling prey to the filter. This helps them to pass the filter and the mail is treated as a

non-spam mail which otherwise would have formed part of spam bulk.

The Double Opt-In problem

One of the main problems faced by spammers is to gain access and explicit permission to mail any

particular user. An efficient solution found out by the clan is the Double Opt-In method.

It works in the following manner:

1. The user enters his email address into an online form.

2. They receive a confirmation link.

3. On clicking the conformation link the spammer gets explicit permission to send mails to the user.

These mails, though actually spam, are then treated as normal and non-spam mails.

The Encrypted E-Mail Problem

The Encrypted E-Mail Problem is one of the most important problems which are being faced by

various E-Mail Client Applications. Most of the bank transactions which are being performed by

various banks and corporate companies are sent in an encrypted format to the concerned user. This is

done in order to ensure security.

Many mails which are sent by many Telecom and multinational companies concerning any payment

or any transfer of money are also done in the Encrypted format.

The message which is viewed in the user inbox, is not actually the mail which has been revived by it,

it is encrypted using some encryption key which can be retrieved by some user credentials, such as the

user bank account number, his password.

Thus, it is extremely difficult to bring about classification of mails in this format.

Recently, Gmail had announced that, it has taken a step forward in correct classification of encrypted

mails, which is soon to be implemented by them.

Page 16: Senior Year Project

9

2.3. Recent Applications

Gmail

Email filtering has been and is being continuously developed and used by various email service

providers. Recently Gmail added many more categories apart from spam which includes travel,

promotions; etc. This has helped the users of Gmail to achieve and efficient classification of all

incoming mails. The effectiveness of Gmail filters was recorded to a 99.05%.

The Gmail Inbox which has user various folders in which mails get classified

SpamAssassin

SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse

range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are

applied to email headers and content to classify email using advanced statistical methods. In addition,

SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against

spam and is designed for easy integration into virtually any email system.

A logo of the Apache SpamAssassin

Page 17: Senior Year Project

10

CHAPTER 3

ANALYSIS

Page 18: Senior Year Project

11

3.1. The C4.5 Algorithm

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension

of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for

classification, and for this reason, C4.5 is often referred to as a statistical classifier.

C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept

of information entropy. The training data is a set

of already classified samples. Each sample consists of a p-dimensional vector

,

Where the represent attributes or features of the sample, as well as the class in which falls.

At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of

samples into subsets enriched in one class or the other. The splitting criterion is the

normalized information gain (difference in entropy). The attribute with the highest normalized

information gain is chosen to make the decision. Thus, the C4.5 algorithm then recourses on the

smaller sub lists.

This algorithm has a few base cases.

All the samples in the list belong to the same class. When this happens, it simply creates a leaf

node for the decision tree saying to choose that class.

None of the features provide any information gain. In this case, C4.5 creates a decision node

higher up the tree using the expected value of the class.

Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up

the tree using the expected value.

Pseudo code

In pseudo code, the general algorithm for building decision trees is:

1. Check for base cases

2. For each attribute a

Find the normalized information gain ratio from splitting on a

3. Let a_best be the attribute with the highest normalized information gain

4. Create a decision node that splits on a_best

5. Recurse on the sub lists obtained by splitting on a_best, and add those nodes as children

of node

Page 19: Senior Year Project

12

3.2. The Naïve Bayes Algorithm

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with

strong (naive) independence assumptions. A more descriptive term for the underlying probability

model would be "independent feature model". An overview of statistical classifiers is given in the

article on pattern recognition.

In simple terms, a naive Bayes classifier assumes that the value of a particular feature is unrelated to

the presence or absence of any other feature, given the class variable. For example, a fruit may be

considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier

considers each of these features to contribute independently to the probability that this fruit is an

apple, regardless of the presence or absence of the other features.

For some types of probability models, naive Bayes classifiers can be trained very efficiently in

a supervised learning setting. In many practical applications, parameter estimation for naive Bayes

models uses the method of maximum likelihood; in other words, one can work with the naive Bayes

model without accepting Bayesian probability or using any Bayesian methods.

Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have

worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian

classification problem showed that there are sound theoretical reasons for the apparently

implausible efficacy of naive Bayes classifiers. Still, a comprehensive comparison with other

classification algorithms in 2006 showed that Bayes classification is outperformed by other

approaches, such as boosted trees or random.

Advantages:

An advantage of naive Bayes is that it only requires a small amount of training data to estimate the

parameters (means and variances of the variables) necessary for classification. Because independent

variables are assumed, only the variances of the variables for each class need to be determined and not

the entire covariance matrix.

Probabilistic model:

Abstractly, the probability model for a classifier is a conditional model

over a dependent class variable with a small number of outcomes or classes, conditional on

several feature variables through . The problem is that if the number of features is

Page 20: Senior Year Project

13

large or when a feature can take on a large number of values, then basing such a model on

probability tables is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes' theorem, this can be written

In plain English, using Bayesian Probability terminology, the above equation can be written

as

In practice, there is interest only in the numerator of that fraction, because the denominator does not

depend on and the values of the features are given, so that the denominator is effectively

constant. The numerator is equivalent to the joint probability model

which can be rewritten as follows, using the chain rule for repeated applications of the definition

of conditional probability:

Now the "naive" conditional independence assumptions come into play: assume that each

feature is conditionally independent of every other feature for given the

category . This means that

,

, ,

and so on, for . Thus, the joint model can be expressed as

Page 21: Senior Year Project

14

This means that under the above independence assumptions, the conditional distribution over the class

variable is:

where the evidence is a scaling factor dependent only

on , that is, a constant if the values of the feature variables are known.

Constructing a classifier from the probability model:

The discussion so far has derived the independent feature model, that is, the naive Bayes probability

model. The naive Bayes classifier combines this model with a decision rule. One common rule is to

pick the hypothesis that is most probable; this is known as the maximum a posterior or MAP decision

rule. The corresponding classifier, a Bayes classifier, is the function defined as follows:

Page 22: Senior Year Project

15

3.3. Formulae

F-Measure

F-measure = 2 * precision * recall / (precision + recall)

Where,

Precision = TP / (TP + FP)

Recall = TP / (TP + FN)

True Positive Rate (Sensitivity )

TPR = TP / (TP +FN)

False Positive Rate

FPR = FP / (FP + TN)

True Negative Rate (Specificity)

TNR = TN / (FP + TN)

False Negative Rate

FNR = FN / (TP+FN)

Page 23: Senior Year Project

16

CHAPTER 4

DESIGN

Page 24: Senior Year Project

17

4.1. Implementation Flow

Home Signup Login

Creation of 2

tables in

MySQL

Creation of 3

separate fields in

main table:

Naïve

Bayes

C 4.5

Keyword

Graphical

Display of

the mails

fetched and

the unread

mails. Fill Credentials

Username

Password

Name

Surname

Phone no

Fill Credentials

Username

Password

The

credentials get

stored in a

table called

login details

Authenticate

based on

details in

login details

Classification

Selection between

Naïve Bayes,

C4.5, keyword

based

classification with

a multi-select

option available

to the user

On selection and

submission of

choices by clicking

on CLASSIFY

button, mails are

classified into spam

and non-spam

Page 25: Senior Year Project

18

Message

Viewer

Allows the user to

sell it

Spam or

Non-spam or

Keyword

Gives a view

of mails with

From and

subject as per

choices made

Allows for

keyword

based view,

where a

search is

made by

looking at the

subject as

well as the

content

An option

to store

mail to PC

made

available

An option to

copy e-

mail/content

to clipboard

Statistics Allows for a

graphical comparison

between on e-mails

and on an annual,

monthly or weekly

statistical view of e-

mail based on

historical data.

Messaging Read e-mails Reply to

e-mails

Forward

e-mails

Page 26: Senior Year Project

19

4.2. Use Case Diagram

Page 27: Senior Year Project

20

4.4. Class Diagram

Page 28: Senior Year Project

21

4.5. Activity Diagram

Page 29: Senior Year Project

22

CHAPTER 5

IMPLEMENTATION

Page 30: Senior Year Project

23

5.1 The Connection Dialog Box

The connection window is the major window which takes all the login credentials and the required

information from the user and stores it in the server. The signup credentials take information such as,

the username, the password, and the Name, Surname, Country, and Mobile Number of the user. The

user also needs to provide the Server with which he is going to be interacting, and the server which is

going to be used by the user to perform message sending operations. As specified earlier, the two mail

server which is going to be accessed is the IMAP server, and the SMTP server is going to be used for

message transport and access.

(See Screenshot 1)

From the above image, it can clearly be understood as to what operations are going to be performed

by the connect dialog window, and what are the prerequisites for signing up by the user. Also, as soon

as the user is signing up there are two separate tables that are created for the user, the first one is the

main user table where all the mails are getting fetched and they are getting stored. The second table is

the keyword table that stores all the user defined keywords that have been searched by the user.

ConnectDialog.java

package emailfiltering; import java.awt.*; import java.awt.event.*; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.Statement; import javax.swing.*; public class ConnectDialog extends javax.swing.JDialog { Connection conn = null; Statement stmt = null, stmt1 = null; ResultSet rs = null; String un, ps, n, sn, co, imap, smtp, mobile; public ConnectDialog(Frame parent) { // Call super constructor, specifying that dialog is modal. super(parent, true); initComponents(); try { Class.forName("com.mysql.jdbc.Driver"); conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/email", "root", ""); System.out.println("Connection Established Successfully"); } catch (Exception e) { System.out.println(e); } // Set application title. setTitle("Connect"); // Handle closing events. addWindowListener(new WindowAdapter() {

Page 31: Senior Year Project

24

public void windowClosing(WindowEvent e) { actionCancel(); } }); } private void actionConnect() { if (usernameTextField.getText().trim().length() < 1 || passwordField.getPassword().length < 1) { JOptionPane.showMessageDialog(this, "One or more settings is missing.", "Missing Setting(s)", JOptionPane.ERROR_MESSAGE); return; } // Close dialog. dispose(); } // Cancel connecting and exit program. private void actionCancel() { System.exit(0); } public String getUsername() { return usernameTextField.getText(); } // Get e-mail password. public String getPassword() { return new String(passwordField.getPassword()); } @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code"> private void connectButtonActionPerformed(java.awt.event.ActionEvent evt) { actionConnect(); } private void cancelButtonActionPerformed(java.awt.event.ActionEvent evt) { actionCancel(); } private void signupActionPerformed(java.awt.event.ActionEvent evt) { un = username.getText(); ps = password.getText(); n = name.getText(); sn = surname.getText(); co = country.getText(); imap = servername.getText(); smtp = smtpserver.getText(); mobile = phoneno.getText(); try { String sql = "INSERT INTO `logindetails` (`Username`,`Password`,`Name`,`Surname`,`Country`,`Server`,`SMTPServer`,`Phoneno`) VALUES (?,?,?,?,?,?,?,?);"; PreparedStatement pstmt = conn.prepareStatement(sql); pstmt.setString(1, un); pstmt.setString(2, ps); pstmt.setString(3, n); pstmt.setString(4, sn); pstmt.setString(5, co); pstmt.setString(6, imap); pstmt.setString(7, smtp); pstmt.setString(8, mobile); pstmt.executeUpdate(); } catch (Exception e) {

Page 32: Senior Year Project

25

System.out.println(e); } int index = un.indexOf("@"); String name = un.substring(0, index); String tablename = name.replace(".", ""); try { String sql = "CREATE TABLE IF NOT EXISTS `" + tablename + "` ( `From` text NOT NULL, `Subject` text NOT NULL, `Content` longtext NOT NULL, `Naivebayes` text NOT NULL, `C45` text NOT NULL, `Day` varchar(3) NOT NULL, `Month` varchar(3) NOT NULL, `Date` int(2) NOT NULL, `Year` int(4) NOT NULL, `Time` int(2) NOT NULL, `Keyword` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;"; stmt = (Statement) conn.createStatement(); stmt.executeUpdate(sql); String sql1 = "CREATE TABLE IF NOT EXISTS `" + tablename + "_keyword` ( `Keyword` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;"; stmt1 = (Statement) conn.createStatement(); stmt1.executeUpdate(sql1); } catch (Exception e) { System.out.println(e); }} // Variables declaration - do not modify private javax.swing.JButton cancelButton; private javax.swing.JButton connectButton; private javax.swing.JTextField country; private javax.swing.JLabel jLabel10; private javax.swing.JLabel jLabel11; private javax.swing.JLabel jLabel12; private javax.swing.JLabel jLabel13; private javax.swing.JLabel jLabel14; private javax.swing.JLabel jLabel15; private javax.swing.JLabel jLabel16; private javax.swing.JLabel jLabel2; private javax.swing.JLabel jLabel4; private javax.swing.JLabel jLabel5; private javax.swing.JLabel jLabel6; private javax.swing.JLabel jLabel7; private javax.swing.JLabel jLabel8; private javax.swing.JLabel jLabel9; private javax.swing.JTextField name; private javax.swing.JTextField password; private javax.swing.JPasswordField passwordField; private javax.swing.JTextField phoneno; private javax.swing.JTextField servername; private javax.swing.JButton signup; private javax.swing.JTextField smtpserver; private javax.swing.JTextField surname; private javax.swing.JTextField username; private javax.swing.JTextField usernameTextField; // End of variables declaration }

Page 33: Senior Year Project

26

When the user is signing up for the first time, all his information gets stored in the ‘logindetails’ table

in the server. The structure of the table and the mysql query to execute that code is as shown below.

The structure of the login details table

MySql Query

CREATE TABLE IF NOT EXISTS `logindetails` (

`Username` varchar(30) NOT NULL,

`Password` varchar(30) NOT NULL,

`Name` varchar(30) NOT NULL,

`Surname` varchar(30) NOT NULL,

`Country` varchar(30) NOT NULL,

`Server` varchar(30) NOT NULL,

`SMTPServer` varchar(30) NOT NULL,

`Phoneno` varchar(30) NOT NULL,

`messagecount` int(11) NOT NULL,

`classifiedcount` int(11) NOT NULL,

PRIMARY KEY (`Username`),

UNIQUE KEY `Phoneno` (`Phoneno`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Once the user has signed up, the following are the two table structures that are created for the user.

The structure of the main table where all the mails are stored

Page 34: Senior Year Project

27

This table contains the information regarding the mail getting downloaded. Who was the message

received from, what is the subject of the mail, the content of the mail, the two algorithms which are to

be implemented, the date and time, and a keyword column, where the keyword/s associated with that

mail is/are stored.

MySql Query

CREATE TABLE IF NOT EXISTS `username` (

`From` text NOT NULL,

`Subject` text NOT NULL,

`Content` longtext NOT NULL,

`Naivebayes` text NOT NULL,

`C45` text NOT NULL,

`Day` varchar(3) NOT NULL,

`Month` varchar(3) NOT NULL,

`Date` int(2) NOT NULL,

`Year` int(4) NOT NULL,

`Time` int(2) NOT NULL,

`Keyword` text NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The structure of the keyword table where all the keywords are stored

MySql Query

CREATE TABLE IF NOT EXISTS `username_keyword` (

`Keyword` text NOT NULL,

`Count` int(11) NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Page 35: Senior Year Project

28

5.2. The Email Client Window

The Email Client window is the major window in the application. The major functionalities which are

to be implemented are a part of the Email Client Window. The Email Client is entirely divided into 6

different parts, and each of these 6 parts is represented by 6 tabs which are present on the top of the

Email Client. All the operations which are to be performed can be performed only with the Email

Client.

The Entire Email Client is comprised of the following 6 tabs.

The Welcome Tab

The welcome tab is the basic homepage where the user can view all the basic information, like how

many mails have been downloaded, how many are unread.

The Main Page

It is here that the user performs all the necessary operations, with respect to the client application.

The user executes Naïve Bayes, and C4.5 classification algorithms, as well as can search for specific

user defined keywords.

The Message Viewer

The user can view all his mails on the basis of the conditions that have been specified in this window,

the message viewer helps the user read his mails, as per his preference.

The Statistics Window

The statistics window showcases graphical and historical analysis on the information that is made

available to him from previously fetched data.

The Messaging Window

The user can send a message to another user, from the desktop application to a particular user’s Email

Account.

Credits

Information regarding the developers is present in this window; also a feedback form has been

developed so that the user can send feedbacks regarding his experience with the application.

Page 36: Senior Year Project

29

THE WELCOME TAB

(See Screenshot 2)

The screenshot as shown above clearly shows, a graphical display as to how many mails the user has

received which are read, and the total number of mails the user has received and is unread. The red

portion in the pie chart represents the total amount of unread mail which the user is currently having

in his mailbox. The refresh button allows the user to refresh his mailbox, as to retrieve those mails

which haven’t been retrieved yet. This happens on the execution of the connect method which is

executed by clicking on connect from the connect dialog box.

The Connect Method

final ConnectDialog dialog = new ConnectDialog(this); dialog.show(); username=dialog.getUsername(); password=dialog.getPassword(); final DownloadingDialog downloadingDialog = new DownloadingDialog(this); SwingUtilities.invokeLater(new Runnable() { public void run() { downloadingDialog.show(); } }); //Establish JavaMail session and connect to server. Store store = null; try { //Initialize JavaMail session with SMTP server. Properties props = new Properties(); props.setProperty("mail.store.protocol", "imaps"); props.put("mail.smtp.host","smtp.gmail.com"); props.put("mail.smtp.starttls.enable","true"); props.put("mail.smtp.auth", "true"); session = Session.getInstance(props, new javax.mail.Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication(dialog.getUsername(),dialog.getPassword()); } }); store = session.getStore("imaps"); store.connect("imap.gmail.com",dialog.getUsername(),dialog.getPassword()); } catch (Exception e) { //Close the downloading dialog. downloadingDialog.dispose(); //Show error dialog. showError("Unable to connect.", true); } //Download message headers from server. try { int j=0; //Open main "INBOX" folder. Folder folder = store.getFolder("INBOX"); folder.open(Folder.READ_WRITE);

Page 37: Senior Year Project

30

Message msg[] = folder.getMessages(); FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false); Message msg1[] = folder.search(ft); System.out.println("UNREAD MAILS: "+msg1.length); System.out.println("MAILS: "+msg.length); DefaultPieDataset pieDataset=new DefaultPieDataset(); pieDataset.setValue("Unread Mail",msg1.length); pieDataset.setValue("Read Mail",(msg.length-msg1.length)); JFreeChart chart= ChartFactory.createPieChart("Mail Stats",pieDataset,true,true,true); jPanel12.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie1 =new ChartPanel(chart); jPanel12.removeAll(); jPanel12.add(panelpie1,BorderLayout.CENTER); jPanel12.validate(); for(Message message:msg) { try { String sentdate=message.getSentDate().toString(); String getfrom=message.getFrom()[0].toString(); String getsubject=message.getSubject().toString(); String content; day=sentdate.substring(0,3); month=sentdate.substring(4,7); date=sentdate.substring(8,10); year=sentdate.substring(24,28); time=sentdate.substring(11,13); System.out.println("**********************************"); if (message.getContent() instanceof Multipart) { StringBuffer messageContent = new StringBuffer(); Multipart multipart = (Multipart) message.getContent();; for (int i = 0; i < multipart.getCount(); i++) { Part part = (Part) multipart.getBodyPart(i); if (part.isMimeType("text/plain")) { messageContent.append(part.getContent().toString()); } } content=messageContent.toString(); } else { content=message.getContent().toString(); } try { String sql="INSERT INTO `username` (`From`,`Subject`, `Content`,`Naivebayes`,`C45`,`Day`,`Month`,`Date`,`Year`,`Time`,`Keyword`) VALUES (?,?,?,?,?,?,?,?,?,?,?);"; PreparedStatement pstmt = conn.prepareStatement(sql); pstmt.setString(1, getfrom); pstmt.setString(2, getsubject); pstmt.setString(3, content); pstmt.setString(4,"aa"); pstmt.setString(5,"aa"); pstmt.setString(6,day); pstmt.setString(7,month); pstmt.setString(8,date); pstmt.setString(9,year); pstmt.setString(10,time); pstmt.setString(11,"aa"); pstmt.executeUpdate(); }

Page 38: Senior Year Project

31

catch(Exception e) { System.out.println("there is an exception"); System.out.println(e); } } catch (Exception e) { System.out.println("No Information"); } Message[] messages = folder.getMessages(); //Retrieve message headers for each message in folder. FetchProfile profile = new FetchProfile(); profile.add(FetchProfile.Item.ENVELOPE); folder.fetch(messages, profile); } } catch (Exception e) { // Close the downloading dialog. downloadingDialog.dispose(); // Show error dialog. showError("Unable to download messages.", true); } // Close the downloading dialog. downloadingDialog.dispose(); }

THE MAIN PAGE

The main page is the window where major classification operations are being performed. There are

two algorithms that are being used, Naïve Bayes and C4.5.

(See Screenshot 3)

The classification is being performed using the training dataset which is imported and then various

operations with respect to it are being performed by the user.

Training dataset creation

private void createTrainingSet(String dataset) throws Exception { emailMessage = new Attribute("emailMessage", (FastVector) null); emailClass = new FastVector(3); emailClass.addElement("spam"); emailClass.addElement("no spam"); emailClass.addElement("?"); eClass = new Attribute("emailClass", emailClass); records = new FastVector(2); records.addElement(eClass); records.addElement(emailMessage); trainingSet = new Instances("SpamClsfyTraining", records, 40); trainingSet.setClassIndex(0); this.readTrainingDataset(dataset); ArffSaver saver = new ArffSaver(); saver.setInstances(trainingSet); saver.setFile(new File("C:\\Akshay\\training.arff")); saver.writeBatch();

Page 39: Senior Year Project

32

}

Classification Implementation

private void performClassification(Object model, String modelName) throws Exception { System.out.println("**==" + modelName + "==**"); StringToWordVector stringToVector = new StringToWordVector(1000); stringToVector.setInputFormat(trainingSet); stringToVector.setOutputWordCounts(true); stringToVector.setUseStoplist(false); Instances filteredData = Filter.useFilter(trainingSet, stringToVector); Instances filteredTestData = Filter.useFilter(testingSet,stringToVector); Classifier cModel = (Classifier) model; cModel.buildClassifier(filteredData); Evaluation eTest = new Evaluation(filteredTestData); eTest.evaluateModel(cModel, filteredTestData); double m=eTest.correct(); int x=(int)m; System.out.println(x); if(x==1) { if(nb==1) { System.out.println("Naive Bayes Spam"); } if(c==1) { System.out.println("C4.5 Spam"); } } else { if(nb==1) { System.out.println("Naive Bayes Non Spam"); } if(c==1) { System.out.println("C4.5 Non Spam"); } } }

There is also a keyword based search feature which has been implemented in which the user specified

keyword is being searched by the application.

Keyword Search

private void searchActionPerformed(java.awt.event.ActionEvent evt) { if(keyword.getText().equals("")) { System.out.println("lol"); JOptionPane.showMessageDialog(new JFrame(),"Please Enter The Keyword", "Error", JOptionPane.ERROR_MESSAGE); }

Page 40: Senior Year Project

33

else { try { String sql1="INSERT INTO `username_keyword` (`Keyword`) VALUES (?);"; PreparedStatement pstmt = conn.prepareStatement(sql1); pstmt.setString(1,keyword.getText()); pstmt.executeUpdate(); pst1=conn.prepareStatement("SELECT * FROM `username_keyword`"); rs1=pst1.executeQuery(); keywordviewer.setModel(DbUtils.resultSetToTableModel(rs1)); pst=conn.prepareStatement("SELECT * FROM `username`"); rs=pst.executeQuery(); int i=1; while(rs.next()) { String subject = rs.getString("Subject"); String content = rs.getString("Content"); String pastkeywordlist = rs.getString("Keyword"); String newkeyword; if(pastkeywordlist.equals("")) { newkeyword=keyword.getText(); } else { newkeyword=pastkeywordlist + "," + keyword.getText(); } System.out.println(EmailFiltering.containtsKeyWord(subject, content, keyword.getText())); if(EmailFiltering.containtsKeyWord(subject, content, keyword.getText())) { String sql="UPDATE `username` SET `keyword` = ? WHERE `Subject` = ? AND `Content` = ?"; PreparedStatement pstmt1=conn.prepareStatement(sql); pstmt1.setString(1,newkeyword); pstmt1.setString(2,subject); pstmt1.setString(3,content); pstmt1.executeUpdate(); } } } catch(Exception e) { System.out.println(e); } } FillCombo(); }

THE MESSAGE VIEWER

The message viewer enables the user to view all the information on the basis of segregation which has

been performed by the classification algorithms that are executed by the user. The message viewer

also has a feature where the keyword can be recognised and all the necessary files can be created with

respect to that feature to be implemented.

Page 41: Senior Year Project

34

There are two additional buttons which have been provided; one is to store the particular file in a

specific location which is defined by the user. The other feature is to copy all the message contents to

the clipboard.

(See Screenshot 4)

View Messages on the basis of Classification

private void update_table() { try { String cv,sb; cv=columnvalue.getSelectedItem().toString(); sb=spambox.getSelectedItem().toString(); System.out.println("SELECT `From`,`Subject` FROM `username` WHERE `naivebayes`='spam'"); pst=conn.prepareStatement("SELECT `From`,`Subject` FROM `username` WHERE `"+cv+"`='"+sb+"'"); rs=pst.executeQuery(); messageviewer.setModel(DbUtils.resultSetToTableModel(rs)); } catch(Exception e) { System.out.println(e); }}

View Messages on the basis of Keywords

private void keywordbuttonActionPerformed(java.awt.event.ActionEvent evt) { String keywordvt=keywordcombobox.getSelectedItem().toString(); System.out.println(keywordvt); try { String sql="SELECT * FROM `username`"; pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) { String keywordtb=rs.getString("Keyword"); System.out.println(keywordtb); System.out.println(EmailFiltering.containsKeyWord(keywordtb,keywordvt)); if(EmailFiltering.containsKeyWord(keywordtb,keywordvt)) { pst1=conn.prepareStatement("SELECT `From`,`Subject` FROM `username` WHERE `Keyword`='"+keywordtb+"'"); rs1=pst1.executeQuery(); messageviewer.setModel(DbUtils.resultSetToTableModel(rs1)); //pst.close(); } System.out.println(); }} catch(Exception e) { System.out.println(e); }}

Page 42: Senior Year Project

35

Store the particular text file in a specific location

(See Screenshot 5)

Store in PC

private void savepcActionPerformed(java.awt.event.ActionEvent evt) { System.out.println("Working"); final FileChooser filec=new FileChooser(this,true); int result = FileChooser.jFileChooser2.showSaveDialog(this); if (result == FileChooser.jFileChooser2.APPROVE_OPTION) { String path=FileChooser.jFileChooser2.getSelectedFile().getAbsoluteFile().toString(); try {FileWriter writer=new FileWriter(path); PrintWriter outputStream=new PrintWriter(path); String content=EmailFiltering.jTextArea1.getText(); outputStream.println(content); outputStream.close();} catch(Exception e) {} } else if (result == FileChooser.jFileChooser2.CANCEL_OPTION) { System.out.println("Cancel was selected"); } FileChooser.jFileChooser2.setVisible(false); }

Copy Text

private void copytextActionPerformed(java.awt.event.ActionEvent evt) { String name= jTextArea1.getText(); StringSelection stringSelection = new StringSelection(name); Clipboard clipboard = Toolkit.getDefaultToolkit().getSystemClipboard(); clipboard.setContents(stringSelection,null); }

THE MESSAGING TAB

This tab helps the user to send mails, via the desktop application itself. The user can also select a

particular message and forward that message to any user. The user can also reply to a mail which he

has received. All these features have been implemented with the help of the Message Dialog box.

(See Screenshot 6)

Send Message

private void sendMessage(String to,String Subject,String Content) { MessageDialog dialog=new MessageDialog(this,true); dialog.totextbox.setText(to); dialog.subjecttextbox.setText(Subject); dialog.contenttextbox.setText(Content); dialog.setVisible(true); try {

Page 43: Senior Year Project

36

Message newMessage = new MimeMessage(session); newMessage.setFrom(new InternetAddress(dialog.fromtextbox.getText())); System.out.println("Line 1"); newMessage.setRecipient(Message.RecipientType.TO, new InternetAddress(dialog.totextbox.getText())); System.out.println("Line 2"); newMessage.setSubject(dialog.subjecttextbox.getText()); System.out.println("Line 3"); newMessage.setSentDate(new Date()); System.out.println("Line 4"); newMessage.setText(dialog.contenttextbox.getText()); System.out.println("Line 5"); Transport.send(newMessage); System.out.println("Done"); dialog.setVisible(false); } catch (Exception e) { System.out.println(e); showError("Unable to send message", false); } }

(See Screenshot 7)

Function:

private void actionNew() { int row=messagereader.getSelectedRow(); String messagesubject=""; String messageto=""; String messagecontent=""; sendMessage(messageto,messagesubject,messagecontent); }

(See Screenshot 8)

Function:

private void actionReply() { int row=messagereader.getSelectedRow(); String messagesubject=(messagereader.getModel().getValueAt(row,1).toString()); String messageto=""; String messagecontent=""; String replycontent1= " ---------------- +\n" + " REPLY MESSAGE +\n" + " ----------------- +\n"; String replycontent; String replysubject="RE:"+messagesubject; String sql="select `From`,`Content` from `ourbeproject2014` where subject='"+messagesubject+"' "; try { pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) {

Page 44: Senior Year Project

37

messageto=rs.getString("From"); messagecontent=rs.getString("Content"); replycontent=replycontent1+messagecontent; sendMessage(messageto,replysubject,replycontent); break; } } catch(Exception e) { System.out.println(e); } }

(See Screenshot 9)

Function:

private void actionForward() { int row=messagereader.getSelectedRow(); String messagesubject=(messagereader.getModel().getValueAt(row,1).toString()); String messageto=""; String messagecontent=""; String forwardcontent1=" ----------------- +\n" + " FORWARDED MESSAGE +\n" + " ----------------- +\n"; String forwardcontent; String sql="select `From`,`Content` from `ourbeproject2014` where subject='"+messagesubject+"' "; try { pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) { messagecontent=rs.getString("Content"); forwardcontent=forwardcontent1+messagecontent; sendMessage(messageto,messagesubject,forwardcontent); break; } } catch(Exception e) { System.out.println(e); } }

CREDITS

(See Screenshot 10)

The user can send a feedback as to how the user felt regarding the application.

Page 45: Senior Year Project

38

5.3. The Message Dialog Box

The message dialog box is the dialog box which is being used to send a new mail, reply to an already

existing mail, or to forward a mail. Various code snippets have been combined with this particular

box and hence it plays an important role in the functionality of the project.

(See Screenshot 11)

MessageDialog.java

package emailfiltering; public class MessageDialog extends javax.swing.JDialog { public MessageDialog(java.awt.Frame parent, boolean modal) { super(parent, modal); initComponents(); } private void totextboxActionPerformed(java.awt.event.ActionEvent evt) { } private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) { dispose(); } public static javax.swing.JTextArea contenttextbox; public static javax.swing.JTextField fromtextbox; private javax.swing.JButton jButton1; private javax.swing.JLabel jLabel1; private javax.swing.JLabel jLabel2; private javax.swing.JLabel jLabel3; private javax.swing.JScrollPane jScrollPane1; public static javax.swing.JTextField subjecttextbox; public javax.swing.JTextField totextbox; // End of variables declaration }

Page 46: Senior Year Project

39

5.4. The File Chooser

The file chooser is an inbuilt feature in java which has been included so that the user can trace the

path to a particular location in order to save the file.

(See Screenshot 12)

FileChooser.java

package emailfiltering; public class FileChooser extends javax.swing.JDialog { public FileChooser(java.awt.Frame parent, boolean modal) { super(parent, modal); initComponents(); } private void jFileChooser2ActionPerformed(java.awt.event.ActionEvent evt) { } public static void main(String args[]) { java.awt.EventQueue.invokeLater(new Runnable() { public void run() { FileChooser dialog = new FileChooser(new javax.swing.JFrame(), true); dialog.addWindowListener(new java.awt.event.WindowAdapter() { @Override public void windowClosing(java.awt.event.WindowEvent e) { System.exit(0); } }); dialog.setVisible(true); } });} public static javax.swing.JFileChooser jFileChooser2; }

Page 47: Senior Year Project

40

5.5. The Downloading Dialog

The downloading dialog is a dialogue that appears whenever the mails are being downloaded from the

server. It appears when the Connect button is clicked from the connect dialog box and continues till

the mails are being fetched by the user.

(See Screenshot 13)

DownloadingDialog.java

package emailfiltering; import java.awt.*; import javax.swing.*; public class DownloadingDialog extends JDialog { public DownloadingDialog(Frame parent) { // Call super constructor, specifying that dialog is modal. super(parent, true); // Set dialog title. setTitle("E-mail Client"); // Instruct window not to close when the "X" is clicked. setDefaultCloseOperation(DO_NOTHING_ON_CLOSE); // Put a message with a nice border in this dialog. JPanel contentPane = new JPanel(); contentPane.setBorder( BorderFactory.createEmptyBorder(5, 5, 5, 5)); contentPane.add(new JLabel("Downloading messages...")); setContentPane(contentPane); // Size dialog to components. pack(); // Center dialog over application. setLocationRelativeTo(parent); } @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code"> }

Page 48: Senior Year Project

41

5.6. Analysis Window

THE STATISTICS WINDOW

(See Screenshot 14)

The statistics window is extremely useful in achieving historical analysis of mails, as to how much

amount of spam and non spam has been received over the past few years.

Annual Statistics

The annual statistics generate statistics from 2007 to 2017 and showcase how many mails have been

received each year, how many of them are spam, and how many of them are non spam.

(See Screenshot 15)

Function:

private void annuallyActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetyearly = new DefaultCategoryDataset(); int year=2007; while(year<=2017) { System.out.println(year); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='spam' AND YEAR='"+year+"'"); rs=pst.executeQuery(); int spamyearcount; String yearvalue=Integer.toString(year); while(rs.next()) { spamyearcount=rs.getInt("count"); System.out.println(spamyearcount); datasetyearly.addValue(spamyearcount,"Spam",yearvalue); } } catch(Exception e) { System.out.println(e); } year=year+1; } year=2007; while(year<=2017) { System.out.println(year);

Page 49: Senior Year Project

42

try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='nonspam' AND YEAR='"+year+"'"); rs=pst.executeQuery(); int nonspamyearcount; String yearvalue=Integer.toString(year); while(rs.next()) { nonspamyearcount=rs.getInt("count"); System.out.println(nonspamyearcount); datasetyearly.addValue(nonspamyearcount,"Non Spam",yearvalue); } } catch(Exception e) { System.out.println(e); } year=year+1; } JFreeChart stackedChart = ChartFactory.createStackedBarChart("Annual Spam Rate Report", "Year", "Mail",datasetyearly, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); setResizable(false); barchrt.setRangeGridlinePaint(Color.BLACK); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); }

Monthly Statistics

The yearly statistics which are being developed can be further viewed monthly. The user needs to

specify the year during which he would like to perform Analysis and on the basis of that the user can

understand the amount of spam mails that are being fetched and are being stored by the user.

The Monthly Statistics can be viewed from the month of January and it continues till the month of

December.

All the months have been specified

(See Screenshot 16)

Function:

private void monthlyActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetmonthly = new DefaultCategoryDataset(); String my; String[] month = new String[] {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"}; my=monthyear.getSelectedItem().toString(); System.out.println(my); int i=0;

Page 50: Senior Year Project

43

while(i<month.length) { System.out.println(month[i]); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE MONTH='"+month[i]+"' AND NAIVEBAYES='spam' AND YEAR='"+my+"'"); rs=pst.executeQuery(); int nonspammonthcount; while(rs.next()) { nonspammonthcount=rs.getInt("count"); System.out.println(nonspammonthcount); datasetmonthly.addValue(nonspammonthcount,"Spam",month[i]); } rs.close(); } catch(Exception e) { System.out.println(e); } i++; } i=0; while(i<month.length) { System.out.println(month[i]); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE MONTH='"+month[i]+"' AND NAIVEBAYES='nonspam' AND YEAR='"+my+"'"); rs=pst.executeQuery(); int nonspammonthcount; while(rs.next()) { nonspammonthcount=rs.getInt("count"); System.out.println(nonspammonthcount); datasetmonthly.addValue(nonspammonthcount,"Non Spam",month[i]); } rs.close(); } catch(Exception e) { System.out.println(e); } i++; } JFreeChart stackedChart = ChartFactory.createStackedBarChart("Monthly Spam Rate Report", "Month", "Mails", datasetmonthly, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); setResizable(false); barchrt.setRangeGridlinePaint(Color.BLACK); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); }

Page 51: Senior Year Project

44

Weekly Statistics

The monthly statistics which are being developed can be further viewed weekly. The user needs to

specify the year during which he would like to perform Analysis and on the basis of that the user can

understand the amount of spam mails that are being fetched and are being stored by the user.

The Weekly Statistics can be viewed in spans of 4 weeks

All the weeks have been specified

Week 1: 1-7

Week 2: 8-14

Week 3: 15-21

Week 4: 22-31

(See Screenshot 17)

Function:

private void weeklyActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetweekly = new DefaultCategoryDataset(); int w1=0,w2=0,w3=0,w4=0; String wm,wy; wm=weekmonth.getSelectedItem().toString(); wy=weekyear.getSelectedItem().toString(); System.out.println(wm); System.out.println(wy); int i=1; while(i<=31) { try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='spam' AND DATE='"+i+"' AND MONTH='"+wm+"'AND YEAR='"+wy+"'"); rs=pst.executeQuery(); int spamweekcount; while(rs.next()) { spamweekcount=rs.getInt("count"); if(i>=1 && i<8) { w1=w1+spamweekcount; } if(i>=8 && i<15) { w2=w2+spamweekcount; } if(i>=15 && i<22) { w3=w3+spamweekcount; } if(i>=22 && i<31) { w4=w4+spamweekcount; } } rs.close(); } catch(Exception e) { System.out.println(e); } i++;

Page 52: Senior Year Project

45

} datasetweekly.addValue(w1, "Spam","Week1"); datasetweekly.addValue(w2, "Spam","Week2"); datasetweekly.addValue(w3, "Spam","Week3"); datasetweekly.addValue(w4, "Spam","Week4"); i=0; w1=0;w2=0;w3=0;w4=0; while(i<=31) { try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='nonspam' AND DATE='"+i+"' AND MONTH='"+wm+"'AND YEAR='"+wy+"'"); rs=pst.executeQuery(); int nonspamweekcount; while(rs.next()) { nonspamweekcount=rs.getInt("count"); if(i>=1 && i<8) { w1=w1+nonspamweekcount; } if(i>=8 && i<15) { w2=w2+nonspamweekcount; } if(i>=15 && i<22) { w3=w3+nonspamweekcount; } if(i>=22 && i<31) { w4=w4+nonspamweekcount; } } rs.close(); } catch(Exception e) { System.out.println(e); } i++; } datasetweekly.addValue(w1, "Non Spam","Week1"); datasetweekly.addValue(w2, "Non Spam","Week2"); datasetweekly.addValue(w3, "Non Spam","Week3"); datasetweekly.addValue(w4, "Non Spam","Week4"); JFreeChart stackedChart = ChartFactory.createStackedBarChart("Weekly Spam Rate Report",wm+","+wy, "Messages", datasetweekly, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); barchrt.setRangeGridlinePaint(Color.RED); setResizable(false); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); }

Comparative Analysis:

This method shows a comparison between Naïve Bayes and C4.5 and tells the user, which algorithm

is better in catching Spam.

Page 53: Senior Year Project

46

(See Screenshot 18)

Function:

private void comparativeActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetcomparative = new DefaultCategoryDataset(); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='spam'"); pst1=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE C45='spam'"); rs=pst.executeQuery(); int c45spamcount; int naivebayesspamcount; while(rs.next()) { c45spamcount=rs.getInt("count"); System.out.println(c45spamcount); datasetcomparative.addValue(c45spamcount,"Spam","C45"); } rs1=pst1.executeQuery(); while(rs1.next()) { naivebayesspamcount=rs1.getInt("count"); System.out.println(naivebayesspamcount); datasetcomparative.addValue(naivebayesspamcount,"Spam","Naive Bayes"); } } catch(Exception e) { System.out.println(e); } try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='nonspam'"); pst1=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE C45='nonspam'"); rs=pst.executeQuery(); rs1=pst1.executeQuery(); int c45nonspamcount; int naivebayesnonspamcount; while(rs.next()) { c45nonspamcount=rs.getInt("count"); System.out.println(c45nonspamcount); datasetcomparative.addValue(c45nonspamcount,"Non Spam","C45"); } rs1=pst1.executeQuery(); while(rs1.next()) { naivebayesnonspamcount=rs1.getInt("count"); System.out.println(naivebayesnonspamcount); datasetcomparative.addValue(naivebayesnonspamcount,"Non Spam","Naive Bayes"); } } catch(Exception e)

Page 54: Senior Year Project

47

{ System.out.println(e); } JFreeChart stackedChart = ChartFactory.createStackedBarChart("Comparative Spam Rate Report", "Algorithm", "Spam/NonSpam",datasetcomparative, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); setResizable(false); barchrt.setRangeGridlinePaint(Color.BLACK); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); }

User Defined

This feature shows a comparison amongst the mails, which have been distinguished based on the

keywords which have been specified by the user.

This just helps the user in understanding which mails the user has received number of times.

(See Screenshot 19)

Function:

private void userdefinedActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset barChartData=new DefaultCategoryDataset(); String sql="SELECT * FROM `username_keyword`"; try { pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) barChartData.setValue(rs.getInt("Count"),"Messages",rs.getString("Keyword")); } catch(Exception e) { System.out.println(e); } JFreeChart barChart=ChartFactory.createBarChart("User Preference Messages Quantity","Keyword","Message", barChartData, PlotOrientation.VERTICAL, rootPaneCheckingEnabled, rootPaneCheckingEnabled, rootPaneCheckingEnabled); CategoryPlot barchrt=barChart.getCategoryPlot(); barchrt.setRangeGridlinePaint(Color.ORANGE); jPanel13.setLayout(new java.awt.BorderLayout()); setResizable(false); ChartPanel panelpie =new ChartPanel(barChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate();}

Page 55: Senior Year Project

48

CHAPTER 6

RESULTS

Page 56: Senior Year Project

49

Results

SCREENSHOTS:

Screenshot 1: A screenshot of the connect dialog window.

Screenshot 2: A screenshot of the homescreen which opens once the user is logging in

Page 57: Senior Year Project

50

Screenshot 3: A Screenshot of the Main Page where all operations can be performed

Screenshot 4: A Screenshot of the message viewer tab

Page 58: Senior Year Project

51

Screenshot 5: The Save Dialog Box Appears when store in PC has been clicked

Screenshot 6: A Screenshot of the Messaging Tab

Page 59: Senior Year Project

52

Screenshot 7: A Screenshot of New Message box

Screenshot 8: A Screenshot of Reply Message box

Screenshot 9: A Screenshot of Forward Message Box

Page 60: Senior Year Project

53

Screenshot 10: A screenshot of the credits page

Screenshot 11: The Message Dialog

Page 61: Senior Year Project

54

Screenshot 12:The File Chooser

Screenshot 13: The Downloading Dialog

Page 62: Senior Year Project

55

Analysis

Screenshot 14: A screenshot of the Statistics tab

Screenshot 15: The Annual Spam Rate Report

Page 63: Senior Year Project

56

Screenshot 16: The Monthly Spam Rate Report

Screenshot 17: The Weekly Spam Rate Report

Page 64: Senior Year Project

57

Screenshot 18: Comparative Spam Rate Report

Screenshot 19: User Defined Messages Quantity

Page 65: Senior Year Project

58

Comparison of Parameters

Parameter Naïve Bayes C4.5

True Positive 19 19

False Positive 0 1

True Negative 20 19

False Negative 1 1

True Positive Rate 0.95 0.95

False Positive Rate 0 0.05

True Negative Rate 1 0.95

False Negative Rate 0.05 0.5

Precision 1 0.95

Recall 0.95 0.95

F-Measure 0.974 0.95

Total Number of Mails Considered: 40

Page 66: Senior Year Project

59

CHAPTER 7

CONCLUSION

Page 67: Senior Year Project

60

Conclusion

Considering the necessity of E-Mail in an individual’s life, the need of classifying the messages is of

utmost importance and it is necessary to be achieved. With the employment of various Spam Filtering

techniques, and various classification algorithms, it is extremely easy to classify the information into

various categories. Hence, E-Mail filtering classification and analysis using data mining approach has

been achieved successfully.

Page 68: Senior Year Project

61

CHAPTER 8

FUTURE SCOPE

Page 69: Senior Year Project

62

Future Scope

Cloud Based Email Archiving System

The concept of cloud based email archiving is pretty simple. Broadly put, a service provider typically

processes, manages and stores your business data in a hosted server and at a remote place either as a

substitute or typically as an enhancement to your on premise infrastructure.

Research reveals that cloud-based email archiving service is becoming rather popular over time with

prominent growth in the number of corporate users served by this cloud based archival model.

An email spam filter service on the cloud thus offers an array of significant benefits, which includes:

1. It’s rather predictable cost of ownership.

2. Its ability in letting the specialist providers manage tall those key email and related functions.

3. Its capability of freeing up the IT staff for other initiatives.

4. A paradigm shift from capital expenditure (CAPEX) to the operating expenditure (OPEX)

model.

5. Ease and convenience of managing the IT services.

6. Comprehensive and thorough E DISCOVERY solution.

7. Reduced chance of virus, spam and malware attacks.

8. Inbound and outbound Email filtering

9. Agile E-mail accessibility.

The concept of email storage on the cloud has been in use by the large corporate for many years. The

scope and future of cloud based email archiving system thus looks extremely bright and is popular for

services which ranges from email archiving to retrieval and spam filtration.

Encrypted message based E-Mail Classification

This is an application which will enable the user to fetch messages from the server and perform

classification on the message on the basis of various encryption algorithms.

The E-Mail application will consists of various encryption/decryption algorithms such as:

1. AES.

2. DES.

3. Additive Cipher.

4. Huffman’s Algorithm.

5. RSA Algorithm.

On the basis of the information obtained, the application will decrypt the text obtained from the E-

mail server and execute all the algorithms. On the basis of the result obtained, the best solution will be

selected amongst all the decrypted texts. If however, the algorithm fails to decrypt the text, then the

Page 70: Senior Year Project

63

message will be passed as non-encrypted text and further filtering according to the categories will take

place.

An Android Based Application for accessing Emails

An Android Based Application can be created in order to access and bring about the classification of

emails. This will enable the user to access his E-Mails from any location. We could make use of the

same server to bring about accessing and storage of mails. Also, we can bring about the more user

friendliness with the help of this application.

Location based Analysis of Spam Rate

Location based Analysis of Spam can be a really good feature that can be implemented in the future.

We can take the location information from the user, or retrieve the location information from the

email account of the user, and classify if that particular Email is spam or not. With location based

analysis we can find out which country has maximum spam concentration. This can be graphically

displayed using Google Maps and Java maps in our application.

Page 71: Senior Year Project

64

CHAPTER 9

REFERENCES

Page 72: Senior Year Project

65

References

1. Data Mining: Concepts and Techniques

Jiawei Han (Author), Micheline Kamber (Author), Jian Pei (Author)

2. Videos on Java Swing programming by ‘Programming Knowledge` on www.youtube.com

3. Sun Certified Java Programming

Kathy Sierra and Bert Bates

4. http://en.wikipedia.org/wiki/Naive_Bayes_classifier

5. http://en.wikipedia.org/wiki/C4.5

6. http://arxiv.org/pdf/cs/0006013.pdf

7. http://www.jfree.org/jfreechart/samples.html