cantina antiphishing

Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing

Toc H Institute of Science & Technology

Arakkunnam – 682 313

Page No.1

INTRODUCTION




Page No.2

1. INTRODUCTION

In phishing, an automated form of social engineering, criminals use the

Internet to fraudulently extract sensitive information from businesses and individuals,

often by impersonating legitimate web sites. The potential for high rewards (e.g.,

through access to bank accounts and credit card numbers), the ease of sending

forged email messages impersonating legitimate authorities, and the difficulty law

enforcement has in pursuing the criminals has resulted in a surge of phishing attacks:

estimates suggest that phishing affected 1.2 million U.S. Citizens and cost

businesses billions of dollars in 2004 alone. Phishing also leads to additional

business losses due to consumer fear. Anecdotal evidence suggests that an

increasing number of people shy away from Internet commerce due to the threat of

identity fraud, despite the tendency of US companies to assume the risk for fraud.

Also, many users now default to distrusting any email they receive from financial

institutions Current phishing attacks are still relatively modest in sophistication and

have substantial room for improvement, as we discuss in Section 2.2. Thus, the

research community and corporations need to make a concentrated effort to combat

the increasingly severe economic consequences of phishing. Unfortunately, as we

discuss in Section 8, current anti-phishing techniques do not offer adequate

safeguards for ordinary users. We present three main contributions in this paper.

First, we propose several design principles needed to counter phishing attacks: 1)

sidestep the arms race, 2) provide mutual authentication.

Phishing attacks succeed by exploiting a user’s inability to distinguish

legitimate sites from spoofed sites. Most prior research focuses on assisting the user

in making this distinction; however, users must make the right security decision

every time. Unfortunately, humans are ill-suited for performing the security checks

necessary for secure site identification, and a single mistake may result in a total

compromise of the user’s online account. Fundamentally, users should be

authenticated using information that they cannot readily reveal to malicious parties.

Placing less reliance on the user during the authentication process will enhance

security and eliminate many forms of fraud. We propose using a trusted device to




Page No.3

perform mutual authentication that eliminates reliance on perfect user behavior,

thwarts Man-in-the-Middle attacks after setup, and protects a user’s account even in

the presence of keyloggers and most forms of spyware.

We advocate the following set of design principles for anti-phishing tools.

Many anti-phishing approaches face the same problem as anti-spam solutions:

incremental solutions only provoke an ongoing arms race between researchers and

adversaries. This typically gives the advantage to the attackers, since researchers

are permanently stuck on the defensive. Instead, we need to research fundamental

approaches for preventing phishing. Most anti-phishing techniques strive to prevent

phishing attacks by providing better authentication of the server. However, phishing

actually exploits authentication failures on both the client and the server side. Initially,

a phishing attack exploits the user’s inability to properly authenticate a server before

transmitting sensitive data.

However, a second authentication failure occurs when the server allows the

phisher to use the captured data to login as the victim. A complete anti-phishing

solution must address both of these failures: clients should have strong guarantees

that they are communicating with the intended recipient, and servers should have

similarly strong guarantees that the client requesting service has a legitimate claim to

the accounts it attempts to access. Reduce reliance on users. The majority of current

phishing countermeasures rely on users to assist in the detection of phishing sites

and make decisions as to whether to continue are in many ways unsuited to

authenticating others or themselves to others. As a result, we must move towards

protocols that reduce human involvement or introduce additional information that

cannot readily be revealed. These mechanisms add security without relying on

perfectly correct user behaviour, thus bringing security to a larger audience.

Avoid dependence on the browser’s interface. The majority of current anti-

phishing approaches propose modifications to the browser interface. Unfortunately,

the browser interface is inherently insecure and can be easily circumvented by

embedded JavaScript applications that mimic the “trusted” browser elements.




Page No.4

LITERATURE SURVEY




Page No.5

2. LITERATURE SURVEY

Recently, there has been a dramatic increase in phishing, a kind of attack in

which victims are tricked by spoofed emails and fraudulent web sites into giving up

personal information. Phishing is a rapidly growing problem, with 9,255 unique

phishing sites reported in June of 2006 alone [1]. It is unknown precisely how much

phishing costs each year since impacted industries are reluctant to release figures;

estimates range from $1 billion to 2.8 billion per year. To respond to this threat,

software vendors and companies have released a variety of anti-phishing toolbars.

For example, eBay offers a free toolbar that can positively identify eBay-owned sites,

and Google offers a free toolbar aimed at identifying any fraudulent site [2]. As of

September 2006, the free software download site download.com, listed 84 anti-

phishing toolbars. However, when we conducted an evaluation of ten anti-phishing

tools for a previous study, we found that only one tool could consistently detect more

than 60% of phishing web sites without a high rate of false positives [3]. Thus, we

argue that there is a strong need for better automated detection algorithms.

In this paper, we present the design, implementation, and evaluation of

CANTINA, 1 a novel content-based approach for detecting phishing web sites.

CANTINA examines the content of a web page to determine whether it is legitimate

or not, in contrast to other approaches that look at surface characteristics of a web

page, for example the URL and its domain name. CANTINA makes use of the well-

known TF-IDF (term frequency/inverse document frequency) algorithm used in

information retrieval [4], and more specifically, the Robust Hyperlinks algorithm

previously developed by Phelps and Wilensky for overcoming broken hyperlinks. Our

results show that CANTINA is quite good at detecting phishing sites, detecting 94-

97% of phishing sites. \We also show that we can use CANTINA in conjunction with

heuristics used by other tools to reduce false positives (incorrectly labeling legitimate

web sites as phishing), while lowering phish detection rates only slightly. We present

a summary evaluation, comparing CANTINA to two popular anti-phishing toolbars

that are representative of the most effective tools for detecting phishing sites

currently available. Our experiments show that CANTINA has comparable or better




Page No.6

performance to SpoofGuard (a heuristic-based anti-phishing tool) with far fewer false

positives, and does about as well as NetCraft (a blacklist and heuristic-based anti-

phishing toolbar). Finally, we show that CANTINA combined with heuristics is

effective at detecting phishing URLs in users' actual email, and that its most frequent

mistake is labeling spam-related URLs as phishing.

A number of studies have examined the reasons that people fall for phishing

attacks. For example, Downs et al have described the results of an interview and

role-playing study aimed at understanding why people fall for phishing emails and

what cues they look for to avoid such attacks. In a different study, Dhamija et al.

showed that a large number of people cannot differentiate between legitimate and

phishing web sites, even when they are made aware that their ability to identify

phishing attacks is being tested. Finally, Wu et al. studied three simulated anti-

phishing toolbars to determine how effective they were at preventing users from

visiting web sites the toolbars had determined to be fraudulent. They found that

many study participants ignored the toolbar security indicators and instead used the

site’s content to decide whether or not it was a scam.

Educating People about Phishing Attacks

Anti-phishing education has focused on online training materials, testing, and

situated learning. Online training materials have been by government organizations ,

non-profits and businesses. These materials explain what phishing is and provide

tips to prevent users from falling for phishing attacks. Testing is used to demonstrate

how susceptible people are to phishing attacks and educate them on how to avoid

them. For example, Mail Frontier has a web site containing screenshots of potential

phishing emails. Users are scored based on how well they can identify which emails

are legitimate and which are not. A third approach uses situated learning, where

users are sent phishing emails to test users’ vulnerability of falling for attacks. At the

end of the study, users are given materials that inform them about phishing attacks.

This approach has been used in studies conducted by Indiana University in training

students , West Point in instructing cadets and a New York State Office in educating

employees. The New York study showed an improvement in the participants’

behavior in identifying phishing over those who were given a pamphlet containing the




Page No.7

information on how to combat phishing. In previous work, we developed an email-

based approach to train people how to identify and avoid phishing attacks,

demonstrating that the existing practice of sending security notices is ineffective,

while a story-based approach using a comic strip format was surprisingly effective in

teaching people about phishing.

Anti-Phishing User Interfaces

Other research has focused on the development of better user interfaces for

anti-phishing tools. Some work looks at helping users determine if they are

interacting with a trusted site. For example, Ye et al. and Dhamija and Tygar have

developed prototype user interfaces showing “trusted paths” that help users verify

that their browser has made a secure connection to a trusted site. Herzberg and

Gbara have developed TrustBar, a browser add-on that uses logos and warnings to

help users distinguish trusted and untrusted web sites. Other work has looked at how

to facilitate logins, eliminating the need for end-users to identify whether a site is

legitimate or not. For example, PwdHash transparently converts a user's password

into a domain-specific password by sending only a one way hash of the password

and domain-name. Thus, even if a user falls for a phishing site, the phishers would

not see the correct password. The Lucent Personal Web Assistant and Password

Multiplier used similar approaches to protect people. PassPet is a browser extension

that makes it easier to login to known web sites, simply by pressing a single button.

PassPet requires people to memorize only one password, and like PwdHash,

generates a unique password for each site. Web Wallet is web browser extension

designed to prevent users from sending personal data to the fake page. Web Wallet

prevents people from typing personal information directly into a web site, instead

requiring them to type a special keystroke to log into Web Wallet and then select

their intended web site. Our work in this paper is orthogonal to this previous work, in

that our algorithms could be used in conjunction with better user interfaces to provide

a more effective solution. As Wu and Miller demonstrated, an anti-phishing toolbar

could identify all fraudulent web sites without any false positives, but if it has usability

problems, users might still fall victim to fraud.




Page No.8

Automated Detection of Phishing

Anti-phishing services are now provided by Internet service providers, built

into mail servers and clients, built into web browsers, and available as web browser

toolbar. However, these services and tools do not effectively protect against all

phishing attacks, as attackers and tool developers are engaged in a continuous arms

race. Anti-phishing tools use two major methods for detecting phishing sites. The first

is to use heuristics to judge whether a page has phishing characteristics. For

example, some heuristics used by the SpoofGuard toolbar include checking the host

name, checking the URL for common spoofing techniques, and checking against

previously seen images. The second method is to use a blacklist that lists reported

phishing URLs. For example, Cloudmark [5] relies on user ratings to maintain their

blacklist. Some toolbars, such as Netcraft [6], seem to use a combination of

heuristics plus a blacklist with URLs that are verified by paid employees. Both

methods have pros and cons. For example, heuristics can detect phishing attacks as

soon as they are launched, without the need to wait for blacklists to be updated.

However, attackers may be able to design their attacks to avoid heuristic detection.

In addition, heuristic approaches often produce false positives (incorrectly labeling a

legitimate site as phishing). Blacklists may have a higher level of accuracy, but

generally require human intervention and verification, which may consume a great

deal of resources. At a recent Anti-Phishing Working Group meeting, it was reported

that phishers are starting to use one-time URLs, which direct someone to a phishing

site the first time the URL is used, but direct people to the legitimate site afterwards.

This and other new phishing tactics significantly complicate the process of compiling

a blacklist, and can reduce blacklists’ effectiveness. Our work with CANTINA focuses

on developing and evaluating a new heuristic based on TF-IDF, a popular

information retrieval algorithm. CANTINA not only makes use of surface level

characteristics (as is done by other toolbars), but also analyzes the text-based

content of a page itself. These heuristics were drawn primarily from SpoofGuard and

from PILFER, an algorithm for detecting phishing emails [7].




Page No.9

A Content-based approach for detecting phishing websites

CANTINA makes use of TF-IDF for detecting phishing sites. TFIDF is a well-

known information retrieval algorithm that can be used for comparing and classifying

documents, as well as retrieving documents from a large corpus. In this section, we

first review how TF-IDF works. We then introduce an application of TF-IDF called

Robust Hyperlinks. Finally, we describe how we adapted Robust Hyperlinks for

detecting phishing web sites.

How TDF/IF works

TF-IDF is an algorithm often used in information retrieval and text mining. TF-

IDF yields a weight that measures how important a word is to a document in a

corpus. The importance increases proportionally to the number of times a word

appears in the document, but is offset by the frequency of the word in the corpus.

The term frequency (TF) is simply the number of times a given term appears in a

specific document. This count is usually normalized to prevent a bias towards longer

documents (which may have a higher term frequency regardless of the actual

importance of that term in the document) to give a measure of the importance of the

term within the particular document. The inverse document frequency (IDF) is a

measure of the general importance of the term. Roughly speaking, the IDF measures

how common a term is across an entire collection of documents. Thus, a term has a

high TF-IDF weight by having a high term frequency in a given document (i.e. a word

is common in a document) and a low document frequency in the whole collection of

documents (i.e. is relatively uncommon in other documents)[8].

Robust Hyperlinks

Phelps and Wilensky developed the idea of Robust Hyperlinks to overcome the

problem of broken links. The basic idea is to provide a number of alternative,

independent descriptions of networked resources, that is, URLs. Specifically, Phelps

and Wilensky proposed adding a small number of well-chosen terms, which they

called a lexical signature, to URLs. An example of such a modified signature might

be: When locating a web page, one could first try the basic URL. If the resource




Page No.10

cannot be found, one could then supply the signature terms to a search engine to

locate the document whose signature most closely matches that in the robust

hyperlink. A key issue here is how to create signatures that have appropriate

properties. First, signatures should be effective in picking out few documents.

Second, subsequent changes to a document should have minimal impact on

signature effectiveness. Third, the addition of new documents should have minimal

impact on previous signature effectiveness. Finally, the effectiveness of the signature

should be largely search-engine-independent. To meet these requirements, Phelps

and Wilensky proposed using TF-IDF to generate lexical signatures. Specifically,

they proposed calculating the TF-IDF value for each word in a document, and then

selecting the words with highest value. The rationale here is that term frequency

provides robustness (repeated words are less likely to all be deleted), while inverse

document frequency provides rarity across a set of documents, minimizing the

chance that another document will be added with the same term. Their preliminary

empirical results suggest that lexical signatures of about five terms are sufficient to

determine a web resource virtually uniquely, out of the more than one billion pages

on the web[9]. Their experiments also showed that searching on lexical signatures

often yielded a unique document, namely the desired document. In those few cases

in which more than one document is returned, the desired document is among the

highest ranked.

Adapting TF-IDF for detecting Phishing

The first is that criminals often create phishing sites by copying and then

modifying a legitimate site’s web pages so that personal information is redirected to

the criminals rather than to the legitimate site. The second observation is that

phishing sites often contain brand names and other terms that are common on a

given web page but relatively rare across the web, leading us to hypothesize that,

again, Robust Hyperlinks could be applied to find the owner of those brands[10].

Roughly, CANTINA works as follows:

Given a web page, calculate the TF-IDF scores of each term on that web

page.




Page No.11

Generate a lexical signature by taking the five terms with highest TF-IDF

weights.

Feed this lexical signature to a search engine, which in our case is Google.

If the domain name of the current web page matches the domain name of the

N top search results, we consider it to be a legitimate web site. Otherwise, we

consider it a phishing site.

It is also worth pointing out that, according to the Anti-Phishing Working Group

(APWG), the average time that a phishing site stays online is 4.5 days. Our

experiences show that sometimes it is on the order of hours. Furthermore, we argue

that phishing web pages will have a low Google Page Rank due to a lack of links

pointing to the scam. These two factors combined suggest that a phishing scam will

rarely, if ever, be highly ranked. At the end of this paper, however, we discuss some

ways of possibly subverting CANTINA. In an earlier implementation, we discovered

that TF-IDF alone yields a fair number of false positives, labeling legitimate sites as

phishing. To address this problem, we also add the current domain name to the

lexical signature. For example, if the page is at http://www.ebay.com/xxxxx, then we

add the term “eBay” to the lexical signature (even if it is already there). The rationale

here is that if a page is legitimate, the domain name itself usually can best identify

itself (e.g., ebay.com, paypal.com, bankofamerica.com).On the other hand, if the

suspected page is phishing, no matter what we add onto its content, Google will not

return it. Another design decision was what to do if Google returns zero search

results. This sometimes happens because added domain names are sometimes

meaningless (for example, “u-s-j.be”). To address this problem, if Google fails to

return any result, we now label the suspected site as phishing (initially we labeled it

as unknown). We refer to this as the “Zero results Means Phishing” heuristic (ZMP).

This heuristic has the potential to increase false positives (incorrectly labeling a

legitimate site as phishing), but our early experiments strongly suggest that when

combined with adding the domain name to the lexical signature, this approach can

reduce false positives while not impacting true positives.




Page No.12

We developed our larger set of heuristics based on related work, drawing primarily

from SpoofGuard and PILFER. We implemented each heuristic to return either -1 if it

looks like a phishing page or +1 otherwise. Heuristics include:

Age of Domain

This heuristic checks the age of the domain name. Many phishing sites have

domains that are registered only a few days before phishing emails are sent out. We

use a WHOIS search to implement this heuristic. This heuristic measures the

number of months from when the domain name was first registered. If the page has

been registered longer than 12 months, the heuristic will return +1, deeming it as

legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server

cannot find the domain, the heuristic will simply return -1, deeming it as a phishing

page. The Netcraft and SpoofGuard toolbars use a similar heuristic based on the

time since a domain name was registered. Note that this heuristic does not account

for phishing sites based on existing web sites where criminals have broken into the

web server, nor does it account for phishing sites hosted on otherwise legitimate

domains, for example in space provided by an ISP for personal homepages.

Known Images

This heuristic checks whether a page contains inconsistent well-known logos.

For example, if a page contains eBay logos but is not on an eBay domain, then this

heuristic labels the site as a probable phishing page. Currently we store nine popular

logos locally, including eBay, PayPal, Citibank, Bank of America, Fifth Third Bank,

Barclays Bank, ANZ Bank, Chase Bank, and WellsFargo Bank. Eight of these nine

legitimate sites are included in the PhishTank.com list of Top 10 Identified Targets.

A similar heuristic is used by the SpoofGuard toolbar.

Suspicious URL

This heuristic checks if a page’s URL contains an “at” (@) or a dash (-) in the

domain name. An @ symbol in a URL causes the string to the left to be disregarded,

with the string on the right treated as the actual URL for retrieving the page.




Page No.13

Combined with the limited size of the browser address bar, this makes it possible to

write URLs that appear legitimate within the address bar, but actually cause the

browser to retrieve a different page. This heuristic is used by Mozilla FireFox.

Dashes are also rarely used by legitimate sites, so we use this as another heuristic.

SpoofGuard checks for both at symbols and dashes in URLs.

Suspicious Links

This heuristic applies the URL check above to all the links on the page. If any

link on a page fails this URL check, then the page is labeled as a possible phishing

scam. This heuristic is also used by SpoofGuard.

IP Address

This heuristic checks if a page’s domain name is an IP address. This heuristic

is also used in PILFER [16].

Dots in URL

This heuristic checks the number of dots in a page’s URL. We found that

phishing pages tend to use many dots in their URLs but legitimate sites usually do

not. Currently, this heuristic labels a page as phish if there are 5 or more dots. This

heuristic is also used in PILFER [16].

Forms

This heuristic checks if a page contains any HTML text entry forms asking for

personal data from people, such as password and credit card number. We scan the

HTML for <input> tags that accept text and are accompanied by labels such as

“credit card” and “password”. Most phishing pages contain such forms asking for

personal data, otherwise the criminals risk not getting the personal information they

want.




Page No.14

PROBLEM IDENTIFICATION AND

DEFINING OBJECTIVES OF THE

PROJECT




Page No.15

3. PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT

Objectives of proposed system :

We enumerate the goals of an anti-phishing technique, arranged in

decreasing order of protection and generality:

1. Ensure that a user’s data only goes to the intended recipient.

2. Prevent a user’s data from reaching an untrustworthy recipient.

3. Prevent an attacker from abusing a user’s data.

4. Prevent an attacker from modifying a user’s account.

5. Prevent an attacker from viewing a user’s account.

1. Ensure that a user’s data only goes to the intended recipient

One of the main objectives is authentication. Authentication means only the

authenticated users should send or receive the data. Here, in our project, it should

be ensured that the user data should be received only by the authenticated recipient.

2. Prevent a user’s data from reaching an untrustworthy recipient

Along with authentication it is equally important to ensure that the user data

does not reach an untrustworthy recipient. As all the personal data’s are highly

confidential, it is very important that the data does not go in the hands of an

untrustworthy recipient.

3. Prevent an attacker from abusing a user’s data

The attacker must be prevented from exploiting the user data. Misusing of

user data can be a cause to many problems. So, the attacker must be stopped from

abusing the user data.

4. Prevent an attacker from modifying a user’s account

The attacker must be prevented from modifying the user account. Modifying of

user account can be a cause to many problems. So, the attacker must be stopped

from modifying the user account.




Page No.16

5. Prevent an attacker from viewing a user’s account

The attacker must be prevented from viewing the user account. Most of the

confidential letters or data of user will be there in his/her account so, viewing all the

data can be a cause to many problems. So, the attacker must be stopped from

viewing the user account.




Page No.17

PROBLEM ANALYSIS

AND DESIGN




Page No.18

4. PROBLEM ANALYSIS AND DESIGN

4.1. Existing System

Phishing attacks succeed by exploiting a user’s inability to distinguish

legitimate sites from spoofed sites. Most prior research focuses on assisting the user

in making this distinction; however, users must make the right security decision

every time. Unfortunately, humans are ill-suited for performing the security checks

necessary for secure site identification, and a single mistake may result in a total

compromise of the user’s online account. Fundamentally, users should be

authenticated using information that they cannot readily reveal to malicious parties.

Placing less reliance on the user during the authentication process will enhance

security and eliminate many forms of fraud. We propose using a trusted device to

perform mutual authentication that eliminates reliance on perfect user behavior,

thwarts Man-in-the-Middle attacks after setup, and protects a user’s account even in

the presence of key loggers and most forms of spyware.

4.2. Proposed System

We advocate the following set of design principles for anti-phishing tools.

Many anti-phishing approaches face the same problem as anti-spam solutions:

incremental solutions only provoke an ongoing arms race between researchers and

adversaries. This typically gives the advantage to the attackers, since researchers

are permanently stuck on the defensive. As soon as researchers introduce an

improvement, attackers analyse it and develop a new twist on their current attacks

that allows them to evade the new defenses. Instead, we need to research

fundamental approaches for preventing phishing. . Most anti-phishing techniques

strive to prevent phishing attacks by providing better authentication of the server.

However, phishing actually exploits authentication failures on both the client and the

server side. Initially, a phishing attack exploits the user’s inability to properly

authenticate a server before transmitting sensitive data.

However, a second authentication failure occurs when the server allows the

phisher to use the captured data to login as the victim. A complete anti-phishing




Page No.19

solution must address both of these failures: clients should have strong guarantees

that they are communicating with the intended recipient, and servers should have

similarly strong guarantees that the client requesting service has a legitimate claim to

the accounts it attempts to access. Reduce reliance on users. The majority of current

phishing countermeasures rely on users to assist in the detection of phishing sites

and make decisions as to whether to continue are in many ways unsuited to

authenticating others or themselves to others. As a result, we must move towards

protocols that reduce human involvement or introduce additional information that

cannot readily be revealed. These mechanisms add security without relying on

perfectly correct user behaviour, thus bringing security to a larger audience.

Avoid dependence on the browser’s interface. The majority of current anti-

phishing approaches propose modifications to the browser interface. Unfortunately,

the browser interface is inherently insecure and can be easily circumvented by

embedded JavaScript applications that mimic the “trusted” browser elements.

Software Specification :

Front end : Java

Back end : SQL Server

Operating System : Windows7

IDE : Visual Studio

Hardware Specification :

Processor : intel i3

System Bus : 32 BIT

RAM : 3 GB

HDD : 40 GB

Display : SVGA Color




Page No.20

4.3.MODULES

Proxy Server

It is a server that acts as an intermediary for requests from clients seeking

resources from other servers. A client connects to the proxy server, requesting

some service, such as a file, connection, web page, or other resource, available from

a different server. The proxy server evaluates the request according to its filtering

rules.

Server (Web Application)

This system consists of server and client application.

4.3.1. User Management And login

This modules deals with the registration of local users and management of

users. The management includes Delete , View and Edit details of local users. The

Administrator have the previlage of Delete and View Users. The registration and

Edit details are the privileges of local users.

The system Authentication is achieved through the login process. Only the

registered user can login into the system. So that the outsiders cant acess the

system. While signing into the system the users should provide a username and

password which is already chosen during the registration. If the username and

password is not exist, the user cant login. It means that the user is not registered.

Administrator username and password are already saved in database. After logging

in the users(Administrator, User) can change their password.

Report

This module can be done by the administrator of the system. While the user is

trying to access a site(browsing a site), first the request will be accepted by the

administrator. After getting the url, the administrator will check whether the site is a

phishing site or not. If he found that the site is a hacker site, he will alert the user by

giving option for continue / discontinue from the page.

http://en.wikipedia.org/wiki/Client_(computing)




Page No.21

4.3.2. Anti-Phishing System

This is the main module in this system. By using the following techniques the

administrator can found the site is a phishing site or not.

Age of Domain

This heuristic checks the age of the domain name. Many phishing sites

have domains that are registered only a few days before phishing emails are sent

out. We use a WHOIS search to implement this heuristic. This heuristic measures

the number of months from when the domain name was first registered. If the page

has been registered longer than 12 months, the heuristic will return +1, deeming it as

legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server

cannot find the domain, the heuristic will simply return -1, deeming it as a phishing

page.

Suspicious URL

This heuristic checks if a page’s URL contains an “at” (@) or a dash (-)

in the domain name. An @ symbol in a URL causes the string to the left to be

disregarded, with the string on the right treated as the actual URL for retrieving the

page.

Suspicious Links

This heuristic applies the URL check above to all the links on the page.

If any link on a page fails this URL check, then the page is labeled as a possible

phishing scam. This heuristic is also used by SpoofGuard.

IP Address

This heuristic checks if a page’s domain name is an IP address. This

heuristic is also used in PILFER. If any domain name as like an IP address indicate

that the site is a phishing page.

Dots in URL

This heuristic checks the number of dots in a page’s URL. We found

that phishing pages tend to use many dots in their URLs but legitimate sites usually

do not. Currently, this heuristic labels a page as phish if there are 5 or more dots.




Page No.22

4.4 SYSTEM DESIGN

Figure 4.4 System Architecture

4.5 USE CASE DIAGRAM

Figure 4.5 Use Case Diagram




Page No.23

The above use case diagram shows the connection of client and server to the

modules. The client and server has the connection to all the modules. The client and

server has separate functions for each module.

4.6. HIGH LEVEL DESIGN

Figure 4.6 High level design




Page No.24

4.7. TABLES Login :

Field

Data type Constraints

Uname

Varchar(50) Not_Null

Password Varchar(50) Not_Null

Status Int Not_Null

Usertype Varchar(50) Not_Null

Fig: 4.7.1.Login table This table is used to login to the homepage.

Register :

Field


FName


LName Varchar(50) Not_Null

ID Int Not_Null

Fig: 4.7.2.Registration table

This table is used for registration.

URL category :

Field


url


Ip adres Int Not_Null

ID Int Not_Null

Fig: 4.7.3.URL Category table This table is used to store the IP address of the system, and URL. Based on this the

sites accessed are categorized into the gray list and white list.




Page No.25

4.8. ER-DIAGRAM

Figure 4.8 ER-Diagram

The ER diagram contains three entities login, register and url. Each entity has its

own attributes.




Page No.26

4.9. DATA FLOW DIAGRAM




Page No.27




Page No.28

Figure 4.9.Data flow diagram




Page No.29

IMPLEMENTATION




Page No.30

5.SOURCE CODES 5.1.Login form <center> <div> <form action="../LoginServlet" method="post"> <table border="0"> <tr> <th colspan="2">Login</th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2"> <% if (f.equals("1")) { %> <font color="red">Login Failed.</font> <% } else if (f.equals("2")) { %> <font color="red">Blocked Account.</font> <% } %> </td> </tr> <tr> <td>UserName:</td> <td><input type="text" name="username"/></td> </tr> <tr> <td>Password:</td> <td><input type="password" name="password"/></td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Login"/></td> </tr>

Page No.31

</table> </form> </div> </center> </li> </ul> </div>  <div style="clear: both;"> </div> </div>  <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div>  </body> </html> 5.2.Registration <% String firstName = ""; String lastName = ""; String userName = ""; String Email = ""; String Mobile = ""; String Address = ""; Object firstNameObj = session.getAttribute("fname"); Object lastNameObj = session.getAttribute("lname"); Object userNameObj = session.getAttribute("uname"); Object EmailObj = session.getAttribute("email"); Object MobileObj = session.getAttribute("mobile"); Object AddressObj = session.getAttribute("address"); if (firstNameObj != null) { firstName = firstNameObj.toString(); } if (lastNameObj != null) { lastName = lastNameObj.toString(); } if (userNameObj != null) { userName = userNameObj.toString(); }




Page No.32

if (EmailObj != null) { Email = EmailObj.toString(); } if (MobileObj != null) { Mobile = MobileObj.toString(); } if (AddressObj != null) { Address = AddressObj.toString(); } %> <form action="../RegistrationActionServlet" method="post"> <table border="0"> <tr> <th colspan="2"><h3><b>Registration</b></h3></th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2"> <% if (f.equals("1")) { %> <font color="red">Enter All Fields!!!!</font> <% } else if (f.equals("2")) { %> <font color="red">Password and confirm Password should be same!!!</font> <% } else if (f.equals("3")) { %> <font color="red">Username not available!!!</font> <% } %> </td> </tr> <tr> <td>First Name</td>




Page No.33

<td><input type="text" name="firstname" style="width: 170px;" value="<%=firstName%>"/></td> </tr> <tr> <td>Last Name</td> <td><input type="text" name="lastname" style="width: 170px;" value="<%=lastName%>"/></td> </tr> <tr> <td>Login Name</td> <td><input type="text" name="loginname" style="width: 170px;" value="<%=userName%>"/></td> </tr> <tr> <td>Password</td> <td><input type="password" name="password" style="width: 170px;"/></td> </tr> <tr> <td>Confirm Password</td> <td><input type="password" name="confirmpassword" style="width: 170px;"/></td> </tr> <tr> <td>E-Mail</td> <td><input type="text" name="email" style="width: 170px;" value="<%=Email%>"/></td> </tr> <tr> <td>Mobile</td> <td><input type="text" name="mobile" style="width: 170px;" value="<%=Mobile%>"/></td> </tr> <tr> <td>Address</td> <td><textarea name="address" style="width: 170px;" value="<%=Address%>" rows="5"></textarea> </td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Save"/></td> </tr> </table> </form> </div> </center>

Page No.34

</div> </div> </div> <div id="sidebar"> </div>  <div style="clear: both;"> </div> </div>  <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div>  </body> </html> 5.3.client application public class SystemTraySupport { private String icon = "icon/icon.png"; private String applicationName; private PopupMenu popup; private TrayIcon trayIcon; public void setPopup(PopupMenu popup) { this.popup = popup; } public void setIcon(String icon) { this.icon = icon; } public SystemTraySupport(String applicationName) { this.applicationName = applicationName; popup = new PopupMenu(); initSystemTray(); } public void addMenuListener(String menuItemName, ActionListener listener) { MenuItem menuItem = new MenuItem(menuItemName); menuItem.addActionListener(listener); popup.add(menuItem); }




Page No.35

public void showInfoMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.INFO); } public void showErrorMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.ERROR); } public void showMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.NONE); } public void showWarningMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.WARNING); } private void initSystemTray() { if (SystemTray.isSupported()) { SystemTray tray = SystemTray.getSystemTray(); Image image = Toolkit.getDefaultToolkit().getImage(icon); trayIcon = new TrayIcon(image, applicationName + " Running...", popup); trayIcon.setImageAutoSize(true); try { tray.add(trayIcon); } catch (Exception e) { System.err.println("TrayIcon could not be added."); } } else { System.out.println("System Tray is not supported"); } } }




Page No.36

5.4.Server application public class NetworkProperties { private final String DIRECTORY = "setting"; private final String FILE_NAME = "network_config.xml"; //"database.properties"; private static NetworkProperties networkProperties; private String internetGatewayIP; private int internetGatewayPort; private String webServerIP; private int webServerPort; public String getInternetGatewayIP() { return internetGatewayIP; } public void setInternetGatewayIP(String internetGatewayIP) { this.internetGatewayIP = internetGatewayIP; } public int getInternetGatewayPort() { return internetGatewayPort; } public void setInternetGatewayPort(int internetGatewayPort) { this.internetGatewayPort = internetGatewayPort; } public String getWebServerIP() { return webServerIP; } public void setWebServerIP(String webServerIP) { this.webServerIP = webServerIP; } public int getWebServerPort() { return webServerPort; } public void setWebServerPort(int webServerPort) { this.webServerPort = webServerPort; } public NetworkProperties() { if (!loadPropertiesFormXMLFile()) { createBlankPropertyXMLFile();




Page No.37

} } public static NetworkProperties getInstance() { if (networkProperties == null) { networkProperties = new NetworkProperties(); } return networkProperties; } public void save() { Properties properties = new Properties(); properties.setProperty("InternetGatewayIP", internetGatewayIP); properties.setProperty("InternetGatewayPort", Integer.toString(internetGatewayPort)); properties.setProperty("WebServerIP", webServerIP); properties.setProperty("WebServerPort", Integer.toString(webServerPort)); File file = new File(DIRECTORY); if (!file.exists()) { file.mkdirs(); } file = new File(file, FILE_NAME); FileOutputStream out = null; try { out = new FileOutputStream(file); properties.storeToXML(out, null); } catch (IOException e) { e.printStackTrace(); } finally { try { out.close(); } catch (Exception e) { } } } private boolean loadPropertiesFormXMLFile() { FileInputStream in = null; try { File file = new File(DIRECTORY, FILE_NAME); in = new FileInputStream(file); Properties properties = new Properties(); properties.loadFromXML(in); internetGatewayIP = properties.getProperty("InternetGatewayIP").trim();




Page No.38

internetGatewayPort = Integer.parseInt(properties.getProperty("InternetGatewayPort").trim()); webServerPort = Integer.parseInt(properties.getProperty("WebServerPort").trim()); webServerIP = properties.getProperty("WebServerIP").trim(); return true; } catch (IOException e) { e.printStackTrace(); return false; } finally { try { in.close(); } catch (Exception e) { } } } private void createBlankPropertyXMLFile() { internetGatewayIP = "0.0.0.0"; internetGatewayPort = 0; webServerPort = 0; webServerIP = "0.0.0.0"; save(); } public static void main(String[] args) { NetworkProperties pro = NetworkProperties.getInstance(); System.out.println(pro.getInternetGatewayPort()); System.out.println(pro.getWebServerPort()); System.out.println(pro.getInternetGatewayIP()); System.out.println(pro.getWebServerIP()); pro.save(); } }




Page No.39

TESTING




Page No.40

6. TESTING

Testing performs a critical role for quality assurance and for the reliability of

the software. Testing forms is the first step in determining the errors in the program.

Different levels of testing are used in the testing process, each level of testing aims

to test different aspects of the software. The basic levels used are: unit testing and

integration testing.

There are several strategies that are used in the system.

Unit testing

Integration testing

6.1 UNIT TESTING

Unit testing is the first level of testing. It is a software verification and

validation method in which a programmer tests if individual units of source code are

fit to use. The unit testing is performed by the programmer prior to integration of

individual units or modules into a larger system. The testing is carried out in the

coding stage itself and each module is found to be working satisfactorily as regards

to the expected output from the module.

6.2 INTEGRATION TESTING

In this testing we test our project as a whole, combining the whole modules.

Integration testing sis a systematic technique for constructing the program structure

while conducting tests to uncover errors associated with the interfacing between

modules. The individual units are combined with other units to make sure that

necessary communication, links and data sharing occur properly.




Page No.41

6.1.1 TEST CASE – ADMIN LOGIN

Tc

No

UNIT INPUT EXPECTED

OUTPUT

OUTPUT

OBTAINED

STAT

US

REASON REME

DY

1. Login

form

Username

and

password

of the

administra

tor.

Checks the

database

and

redirects to

the admin

home. Else

shows error

message

box

Error

message

Failure Incorrect

usernam

e or

password

Enter

the

correct

userna

me

and

passw

ord

2. Login

form

Username

and

password

Checks the

database

and

redirects to

the admin

home. Else

shows error

message

box

Redirected

to the

admin

home

Succe

ss

Table: 6.1.1: Admin Login




Page No.42

6.1.2 TEST CASE – CANTINA WEBSIDE

Tc

No

UNIT INPUT EXPECTED

OUTPUT

OUTPUT

OBTAINED

STAT

US

REASON REME

DY

1. Login

form

No input

required.

Redirects to

the blind

home page

within ten

seconds.

Else

remains in

the login

form.

No

redirection.

Failure Code

error.

Verify

and

correct

the

code.

2. Login

form

No input

required.

Redirects to

the blind

home page

within ten

seconds.

Else

remains in

the login

form.

Redirected

to the blind

home

page.

Succe

ss

Table: 6.1.2: Cantina Web side




Page No.43

6.2 INTEGRATION TESTING

6.2.1. TEST CASE – CLIENT APPLICATION

Tc

No

UNIT INPUT EXPECTE

D

OUTPUT

OUTPU

T

OBTAIN

ED

STATUS REASON REME

DY

1. Downl

oad

access

Enter the

username

and

password.

The

contents

pertaining

to the

option are

made

available

to the

user.

No

further

navigati

on.

Failure The user

has been

blocked

by

administra

tor.

Unbloc

k the

user.

2. File

Setup

Click the

link to

download

the

software.

The

software

should be

download

ed

successful

ly.

Jar file

could

not be

downloa

ded.

Failure Setup fail

error.

Proper

setup of

jar file.

Table: 6.2.1: Client Application




Page No.44

7. FUTURE SCOPE

In the coming months, the Anti-Phishing Working Group will be working with

technology partners, government regulators, and leaders in industries being

victimized by phishing attacks to sculpt the most efficient and elegant digital signing

and authentication solution that can be employed by large user communities. We are

sending calls to action to all stakeholders in the hopes that by year end, a consensus

will be formed among them as to how the effected industries can establish a digital

signing and authentication regime that can be deployed post haste to end phishing

attacks, preclude regulatory adventurism and, ultimately, to establish a technological

foundation for a broader spam-proof electronic mailing infrastructure.




Page No.45

CONCLUSIONS




Page No.46

8. CONCLUSIONS CANTINA, a novel content-based approach for detecting phishing web sites.

CANTINA takes Robust Hyperlinks, an idea for overcoming page not found problems

using the well-known Term Frequency / Inverse Document Frequency (TF-IDF)

algorithm, and applies it to anti-phishing. We described our implementation of

CANTINA, and discussed some simple heuristics that can be applied to reduce false

positives. We also presented an evaluation of CANTINA, showing that the pure TF-

IDF approach can catch about 97% phishing sites with about 6% false positives, and

after combining some simple heuristics we are able to catch about 90% of phishing

sites with only 1% false positives.




Page No.47

SCREENSHOTS




Page No.48

Login Form




Page No.49

Admin Home Page




Page No.50

User’s List




Page No.51

Whitelist




Page No.52

Blacklist




Page No.53

REFERENCE

[1] 3Sharp, 3Sharp Study finds Internet Explorer 7 Edges Out Netcraft As Most

Accurate for Anti-Phishing Protection. 6.http://www.3sharp.com/projects/antiphishing/

[2] Anti-Phishing Working Group, Phishing Activity Trends Report. 2006.

Zttp://www.antiphishing.org/reports/ apwg_report_june_06.pdf

[3] Anti-Phishing Working Group (APWG). Visited: Nov 20,

2006.http://www.antiphishing.org/

[4] Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side

Defense against Web-Based Identity Theft. In Proceedings of The 11th Annual

Network and Distributed System Security Symposium (NDSS '04).

http://crypto.stanford.edu/SpoofGuard/webspoof.pdf

[5] Cloudmark Inc. Visited: Nov 20, 06.http://www.cloudmark.com/desktop/download/

[6] Robert T. Morris, A Weakness in the 4.2BSD UNIX TCP/IP Software. Computing

Science Technical Report 117, AT&T Bell Laboratories, February 1985.

[7] Rod Rasmussen, Phishing Prevention: Making Yourself a Hard Target. Internet

Identity / APWG (April 5, 2004).

[8] Blake Ross, Nick Miyake, Robert Ledesma, Dan Boneh and John C. Mitchell, A

Simple Solution to the Unique Password Problem.

[9] Z. Ye. Building Trusted Paths for Web Browsers. Master’s Thesis. Department of

Computer Science, Dartmouth College. May 2002

[10] Zishuang (Eileen) Ye and Sean W. Smith, Trusted Paths for Browsers, 11th

Usenix Security Symposium, August 2002.

http://crypto.stanford.edu/SpoofGuard/webspoof.pdf

cantina antiphishing

Documents

surge of phishing attacks

antiphishing approaches

antiphishing tools

users account

users inability

ordinary users

users online account

project title