cantina antiphishing
DESCRIPTION
prevents phishing sitesTRANSCRIPT
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.1
INTRODUCTION
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.2
1. INTRODUCTION
In phishing, an automated form of social engineering, criminals use the
Internet to fraudulently extract sensitive information from businesses and individuals,
often by impersonating legitimate web sites. The potential for high rewards (e.g.,
through access to bank accounts and credit card numbers), the ease of sending
forged email messages impersonating legitimate authorities, and the difficulty law
enforcement has in pursuing the criminals has resulted in a surge of phishing attacks:
estimates suggest that phishing affected 1.2 million U.S. Citizens and cost
businesses billions of dollars in 2004 alone. Phishing also leads to additional
business losses due to consumer fear. Anecdotal evidence suggests that an
increasing number of people shy away from Internet commerce due to the threat of
identity fraud, despite the tendency of US companies to assume the risk for fraud.
Also, many users now default to distrusting any email they receive from financial
institutions Current phishing attacks are still relatively modest in sophistication and
have substantial room for improvement, as we discuss in Section 2.2. Thus, the
research community and corporations need to make a concentrated effort to combat
the increasingly severe economic consequences of phishing. Unfortunately, as we
discuss in Section 8, current anti-phishing techniques do not offer adequate
safeguards for ordinary users. We present three main contributions in this paper.
First, we propose several design principles needed to counter phishing attacks: 1)
sidestep the arms race, 2) provide mutual authentication.
Phishing attacks succeed by exploiting a user’s inability to distinguish
legitimate sites from spoofed sites. Most prior research focuses on assisting the user
in making this distinction; however, users must make the right security decision
every time. Unfortunately, humans are ill-suited for performing the security checks
necessary for secure site identification, and a single mistake may result in a total
compromise of the user’s online account. Fundamentally, users should be
authenticated using information that they cannot readily reveal to malicious parties.
Placing less reliance on the user during the authentication process will enhance
security and eliminate many forms of fraud. We propose using a trusted device to
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.3
perform mutual authentication that eliminates reliance on perfect user behavior,
thwarts Man-in-the-Middle attacks after setup, and protects a user’s account even in
the presence of keyloggers and most forms of spyware.
We advocate the following set of design principles for anti-phishing tools.
Many anti-phishing approaches face the same problem as anti-spam solutions:
incremental solutions only provoke an ongoing arms race between researchers and
adversaries. This typically gives the advantage to the attackers, since researchers
are permanently stuck on the defensive. Instead, we need to research fundamental
approaches for preventing phishing. Most anti-phishing techniques strive to prevent
phishing attacks by providing better authentication of the server. However, phishing
actually exploits authentication failures on both the client and the server side. Initially,
a phishing attack exploits the user’s inability to properly authenticate a server before
transmitting sensitive data.
However, a second authentication failure occurs when the server allows the
phisher to use the captured data to login as the victim. A complete anti-phishing
solution must address both of these failures: clients should have strong guarantees
that they are communicating with the intended recipient, and servers should have
similarly strong guarantees that the client requesting service has a legitimate claim to
the accounts it attempts to access. Reduce reliance on users. The majority of current
phishing countermeasures rely on users to assist in the detection of phishing sites
and make decisions as to whether to continue are in many ways unsuited to
authenticating others or themselves to others. As a result, we must move towards
protocols that reduce human involvement or introduce additional information that
cannot readily be revealed. These mechanisms add security without relying on
perfectly correct user behaviour, thus bringing security to a larger audience.
Avoid dependence on the browser’s interface. The majority of current anti-
phishing approaches propose modifications to the browser interface. Unfortunately,
the browser interface is inherently insecure and can be easily circumvented by
embedded JavaScript applications that mimic the “trusted” browser elements.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.4
LITERATURE SURVEY
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.5
2. LITERATURE SURVEY
Recently, there has been a dramatic increase in phishing, a kind of attack in
which victims are tricked by spoofed emails and fraudulent web sites into giving up
personal information. Phishing is a rapidly growing problem, with 9,255 unique
phishing sites reported in June of 2006 alone [1]. It is unknown precisely how much
phishing costs each year since impacted industries are reluctant to release figures;
estimates range from $1 billion to 2.8 billion per year. To respond to this threat,
software vendors and companies have released a variety of anti-phishing toolbars.
For example, eBay offers a free toolbar that can positively identify eBay-owned sites,
and Google offers a free toolbar aimed at identifying any fraudulent site [2]. As of
September 2006, the free software download site download.com, listed 84 anti-
phishing toolbars. However, when we conducted an evaluation of ten anti-phishing
tools for a previous study, we found that only one tool could consistently detect more
than 60% of phishing web sites without a high rate of false positives [3]. Thus, we
argue that there is a strong need for better automated detection algorithms.
In this paper, we present the design, implementation, and evaluation of
CANTINA, 1 a novel content-based approach for detecting phishing web sites.
CANTINA examines the content of a web page to determine whether it is legitimate
or not, in contrast to other approaches that look at surface characteristics of a web
page, for example the URL and its domain name. CANTINA makes use of the well-
known TF-IDF (term frequency/inverse document frequency) algorithm used in
information retrieval [4], and more specifically, the Robust Hyperlinks algorithm
previously developed by Phelps and Wilensky for overcoming broken hyperlinks. Our
results show that CANTINA is quite good at detecting phishing sites, detecting 94-
97% of phishing sites. \We also show that we can use CANTINA in conjunction with
heuristics used by other tools to reduce false positives (incorrectly labeling legitimate
web sites as phishing), while lowering phish detection rates only slightly. We present
a summary evaluation, comparing CANTINA to two popular anti-phishing toolbars
that are representative of the most effective tools for detecting phishing sites
currently available. Our experiments show that CANTINA has comparable or better
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.6
performance to SpoofGuard (a heuristic-based anti-phishing tool) with far fewer false
positives, and does about as well as NetCraft (a blacklist and heuristic-based anti-
phishing toolbar). Finally, we show that CANTINA combined with heuristics is
effective at detecting phishing URLs in users' actual email, and that its most frequent
mistake is labeling spam-related URLs as phishing.
A number of studies have examined the reasons that people fall for phishing
attacks. For example, Downs et al have described the results of an interview and
role-playing study aimed at understanding why people fall for phishing emails and
what cues they look for to avoid such attacks. In a different study, Dhamija et al.
showed that a large number of people cannot differentiate between legitimate and
phishing web sites, even when they are made aware that their ability to identify
phishing attacks is being tested. Finally, Wu et al. studied three simulated anti-
phishing toolbars to determine how effective they were at preventing users from
visiting web sites the toolbars had determined to be fraudulent. They found that
many study participants ignored the toolbar security indicators and instead used the
site’s content to decide whether or not it was a scam.
Educating People about Phishing Attacks
Anti-phishing education has focused on online training materials, testing, and
situated learning. Online training materials have been by government organizations ,
non-profits and businesses. These materials explain what phishing is and provide
tips to prevent users from falling for phishing attacks. Testing is used to demonstrate
how susceptible people are to phishing attacks and educate them on how to avoid
them. For example, Mail Frontier has a web site containing screenshots of potential
phishing emails. Users are scored based on how well they can identify which emails
are legitimate and which are not. A third approach uses situated learning, where
users are sent phishing emails to test users’ vulnerability of falling for attacks. At the
end of the study, users are given materials that inform them about phishing attacks.
This approach has been used in studies conducted by Indiana University in training
students , West Point in instructing cadets and a New York State Office in educating
employees. The New York study showed an improvement in the participants’
behavior in identifying phishing over those who were given a pamphlet containing the
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.7
information on how to combat phishing. In previous work, we developed an email-
based approach to train people how to identify and avoid phishing attacks,
demonstrating that the existing practice of sending security notices is ineffective,
while a story-based approach using a comic strip format was surprisingly effective in
teaching people about phishing.
Anti-Phishing User Interfaces
Other research has focused on the development of better user interfaces for
anti-phishing tools. Some work looks at helping users determine if they are
interacting with a trusted site. For example, Ye et al. and Dhamija and Tygar have
developed prototype user interfaces showing “trusted paths” that help users verify
that their browser has made a secure connection to a trusted site. Herzberg and
Gbara have developed TrustBar, a browser add-on that uses logos and warnings to
help users distinguish trusted and untrusted web sites. Other work has looked at how
to facilitate logins, eliminating the need for end-users to identify whether a site is
legitimate or not. For example, PwdHash transparently converts a user's password
into a domain-specific password by sending only a one way hash of the password
and domain-name. Thus, even if a user falls for a phishing site, the phishers would
not see the correct password. The Lucent Personal Web Assistant and Password
Multiplier used similar approaches to protect people. PassPet is a browser extension
that makes it easier to login to known web sites, simply by pressing a single button.
PassPet requires people to memorize only one password, and like PwdHash,
generates a unique password for each site. Web Wallet is web browser extension
designed to prevent users from sending personal data to the fake page. Web Wallet
prevents people from typing personal information directly into a web site, instead
requiring them to type a special keystroke to log into Web Wallet and then select
their intended web site. Our work in this paper is orthogonal to this previous work, in
that our algorithms could be used in conjunction with better user interfaces to provide
a more effective solution. As Wu and Miller demonstrated, an anti-phishing toolbar
could identify all fraudulent web sites without any false positives, but if it has usability
problems, users might still fall victim to fraud.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.8
Automated Detection of Phishing
Anti-phishing services are now provided by Internet service providers, built
into mail servers and clients, built into web browsers, and available as web browser
toolbar. However, these services and tools do not effectively protect against all
phishing attacks, as attackers and tool developers are engaged in a continuous arms
race. Anti-phishing tools use two major methods for detecting phishing sites. The first
is to use heuristics to judge whether a page has phishing characteristics. For
example, some heuristics used by the SpoofGuard toolbar include checking the host
name, checking the URL for common spoofing techniques, and checking against
previously seen images. The second method is to use a blacklist that lists reported
phishing URLs. For example, Cloudmark [5] relies on user ratings to maintain their
blacklist. Some toolbars, such as Netcraft [6], seem to use a combination of
heuristics plus a blacklist with URLs that are verified by paid employees. Both
methods have pros and cons. For example, heuristics can detect phishing attacks as
soon as they are launched, without the need to wait for blacklists to be updated.
However, attackers may be able to design their attacks to avoid heuristic detection.
In addition, heuristic approaches often produce false positives (incorrectly labeling a
legitimate site as phishing). Blacklists may have a higher level of accuracy, but
generally require human intervention and verification, which may consume a great
deal of resources. At a recent Anti-Phishing Working Group meeting, it was reported
that phishers are starting to use one-time URLs, which direct someone to a phishing
site the first time the URL is used, but direct people to the legitimate site afterwards.
This and other new phishing tactics significantly complicate the process of compiling
a blacklist, and can reduce blacklists’ effectiveness. Our work with CANTINA focuses
on developing and evaluating a new heuristic based on TF-IDF, a popular
information retrieval algorithm. CANTINA not only makes use of surface level
characteristics (as is done by other toolbars), but also analyzes the text-based
content of a page itself. These heuristics were drawn primarily from SpoofGuard and
from PILFER, an algorithm for detecting phishing emails [7].
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.9
A Content-based approach for detecting phishing websites
CANTINA makes use of TF-IDF for detecting phishing sites. TFIDF is a well-
known information retrieval algorithm that can be used for comparing and classifying
documents, as well as retrieving documents from a large corpus. In this section, we
first review how TF-IDF works. We then introduce an application of TF-IDF called
Robust Hyperlinks. Finally, we describe how we adapted Robust Hyperlinks for
detecting phishing web sites.
How TDF/IF works
TF-IDF is an algorithm often used in information retrieval and text mining. TF-
IDF yields a weight that measures how important a word is to a document in a
corpus. The importance increases proportionally to the number of times a word
appears in the document, but is offset by the frequency of the word in the corpus.
The term frequency (TF) is simply the number of times a given term appears in a
specific document. This count is usually normalized to prevent a bias towards longer
documents (which may have a higher term frequency regardless of the actual
importance of that term in the document) to give a measure of the importance of the
term within the particular document. The inverse document frequency (IDF) is a
measure of the general importance of the term. Roughly speaking, the IDF measures
how common a term is across an entire collection of documents. Thus, a term has a
high TF-IDF weight by having a high term frequency in a given document (i.e. a word
is common in a document) and a low document frequency in the whole collection of
documents (i.e. is relatively uncommon in other documents)[8].
Robust Hyperlinks
Phelps and Wilensky developed the idea of Robust Hyperlinks to overcome the
problem of broken links. The basic idea is to provide a number of alternative,
independent descriptions of networked resources, that is, URLs. Specifically, Phelps
and Wilensky proposed adding a small number of well-chosen terms, which they
called a lexical signature, to URLs. An example of such a modified signature might
be: When locating a web page, one could first try the basic URL. If the resource
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.10
cannot be found, one could then supply the signature terms to a search engine to
locate the document whose signature most closely matches that in the robust
hyperlink. A key issue here is how to create signatures that have appropriate
properties. First, signatures should be effective in picking out few documents.
Second, subsequent changes to a document should have minimal impact on
signature effectiveness. Third, the addition of new documents should have minimal
impact on previous signature effectiveness. Finally, the effectiveness of the signature
should be largely search-engine-independent. To meet these requirements, Phelps
and Wilensky proposed using TF-IDF to generate lexical signatures. Specifically,
they proposed calculating the TF-IDF value for each word in a document, and then
selecting the words with highest value. The rationale here is that term frequency
provides robustness (repeated words are less likely to all be deleted), while inverse
document frequency provides rarity across a set of documents, minimizing the
chance that another document will be added with the same term. Their preliminary
empirical results suggest that lexical signatures of about five terms are sufficient to
determine a web resource virtually uniquely, out of the more than one billion pages
on the web[9]. Their experiments also showed that searching on lexical signatures
often yielded a unique document, namely the desired document. In those few cases
in which more than one document is returned, the desired document is among the
highest ranked.
Adapting TF-IDF for detecting Phishing
The first is that criminals often create phishing sites by copying and then
modifying a legitimate site’s web pages so that personal information is redirected to
the criminals rather than to the legitimate site. The second observation is that
phishing sites often contain brand names and other terms that are common on a
given web page but relatively rare across the web, leading us to hypothesize that,
again, Robust Hyperlinks could be applied to find the owner of those brands[10].
Roughly, CANTINA works as follows:
Given a web page, calculate the TF-IDF scores of each term on that web
page.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.11
Generate a lexical signature by taking the five terms with highest TF-IDF
weights.
Feed this lexical signature to a search engine, which in our case is Google.
If the domain name of the current web page matches the domain name of the
N top search results, we consider it to be a legitimate web site. Otherwise, we
consider it a phishing site.
It is also worth pointing out that, according to the Anti-Phishing Working Group
(APWG), the average time that a phishing site stays online is 4.5 days. Our
experiences show that sometimes it is on the order of hours. Furthermore, we argue
that phishing web pages will have a low Google Page Rank due to a lack of links
pointing to the scam. These two factors combined suggest that a phishing scam will
rarely, if ever, be highly ranked. At the end of this paper, however, we discuss some
ways of possibly subverting CANTINA. In an earlier implementation, we discovered
that TF-IDF alone yields a fair number of false positives, labeling legitimate sites as
phishing. To address this problem, we also add the current domain name to the
lexical signature. For example, if the page is at http://www.ebay.com/xxxxx, then we
add the term “eBay” to the lexical signature (even if it is already there). The rationale
here is that if a page is legitimate, the domain name itself usually can best identify
itself (e.g., ebay.com, paypal.com, bankofamerica.com).On the other hand, if the
suspected page is phishing, no matter what we add onto its content, Google will not
return it. Another design decision was what to do if Google returns zero search
results. This sometimes happens because added domain names are sometimes
meaningless (for example, “u-s-j.be”). To address this problem, if Google fails to
return any result, we now label the suspected site as phishing (initially we labeled it
as unknown). We refer to this as the “Zero results Means Phishing” heuristic (ZMP).
This heuristic has the potential to increase false positives (incorrectly labeling a
legitimate site as phishing), but our early experiments strongly suggest that when
combined with adding the domain name to the lexical signature, this approach can
reduce false positives while not impacting true positives.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.12
We developed our larger set of heuristics based on related work, drawing primarily
from SpoofGuard and PILFER. We implemented each heuristic to return either -1 if it
looks like a phishing page or +1 otherwise. Heuristics include:
Age of Domain
This heuristic checks the age of the domain name. Many phishing sites have
domains that are registered only a few days before phishing emails are sent out. We
use a WHOIS search to implement this heuristic. This heuristic measures the
number of months from when the domain name was first registered. If the page has
been registered longer than 12 months, the heuristic will return +1, deeming it as
legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server
cannot find the domain, the heuristic will simply return -1, deeming it as a phishing
page. The Netcraft and SpoofGuard toolbars use a similar heuristic based on the
time since a domain name was registered. Note that this heuristic does not account
for phishing sites based on existing web sites where criminals have broken into the
web server, nor does it account for phishing sites hosted on otherwise legitimate
domains, for example in space provided by an ISP for personal homepages.
Known Images
This heuristic checks whether a page contains inconsistent well-known logos.
For example, if a page contains eBay logos but is not on an eBay domain, then this
heuristic labels the site as a probable phishing page. Currently we store nine popular
logos locally, including eBay, PayPal, Citibank, Bank of America, Fifth Third Bank,
Barclays Bank, ANZ Bank, Chase Bank, and WellsFargo Bank. Eight of these nine
legitimate sites are included in the PhishTank.com list of Top 10 Identified Targets.
A similar heuristic is used by the SpoofGuard toolbar.
Suspicious URL
This heuristic checks if a page’s URL contains an “at” (@) or a dash (-) in the
domain name. An @ symbol in a URL causes the string to the left to be disregarded,
with the string on the right treated as the actual URL for retrieving the page.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.13
Combined with the limited size of the browser address bar, this makes it possible to
write URLs that appear legitimate within the address bar, but actually cause the
browser to retrieve a different page. This heuristic is used by Mozilla FireFox.
Dashes are also rarely used by legitimate sites, so we use this as another heuristic.
SpoofGuard checks for both at symbols and dashes in URLs.
Suspicious Links
This heuristic applies the URL check above to all the links on the page. If any
link on a page fails this URL check, then the page is labeled as a possible phishing
scam. This heuristic is also used by SpoofGuard.
IP Address
This heuristic checks if a page’s domain name is an IP address. This heuristic
is also used in PILFER [16].
Dots in URL
This heuristic checks the number of dots in a page’s URL. We found that
phishing pages tend to use many dots in their URLs but legitimate sites usually do
not. Currently, this heuristic labels a page as phish if there are 5 or more dots. This
heuristic is also used in PILFER [16].
Forms
This heuristic checks if a page contains any HTML text entry forms asking for
personal data from people, such as password and credit card number. We scan the
HTML for <input> tags that accept text and are accompanied by labels such as
“credit card” and “password”. Most phishing pages contain such forms asking for
personal data, otherwise the criminals risk not getting the personal information they
want.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.14
PROBLEM IDENTIFICATION AND
DEFINING OBJECTIVES OF THE
PROJECT
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.15
3. PROBLEM IDENTIFICATION AND DEFINING OBJECTIVES OF THE PROJECT
Objectives of proposed system :
We enumerate the goals of an anti-phishing technique, arranged in
decreasing order of protection and generality:
1. Ensure that a user’s data only goes to the intended recipient.
2. Prevent a user’s data from reaching an untrustworthy recipient.
3. Prevent an attacker from abusing a user’s data.
4. Prevent an attacker from modifying a user’s account.
5. Prevent an attacker from viewing a user’s account.
1. Ensure that a user’s data only goes to the intended recipient
One of the main objectives is authentication. Authentication means only the
authenticated users should send or receive the data. Here, in our project, it should
be ensured that the user data should be received only by the authenticated recipient.
2. Prevent a user’s data from reaching an untrustworthy recipient
Along with authentication it is equally important to ensure that the user data
does not reach an untrustworthy recipient. As all the personal data’s are highly
confidential, it is very important that the data does not go in the hands of an
untrustworthy recipient.
3. Prevent an attacker from abusing a user’s data
The attacker must be prevented from exploiting the user data. Misusing of
user data can be a cause to many problems. So, the attacker must be stopped from
abusing the user data.
4. Prevent an attacker from modifying a user’s account
The attacker must be prevented from modifying the user account. Modifying of
user account can be a cause to many problems. So, the attacker must be stopped
from modifying the user account.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.16
5. Prevent an attacker from viewing a user’s account
The attacker must be prevented from viewing the user account. Most of the
confidential letters or data of user will be there in his/her account so, viewing all the
data can be a cause to many problems. So, the attacker must be stopped from
viewing the user account.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.17
PROBLEM ANALYSIS
AND DESIGN
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.18
4. PROBLEM ANALYSIS AND DESIGN
4.1. Existing System
Phishing attacks succeed by exploiting a user’s inability to distinguish
legitimate sites from spoofed sites. Most prior research focuses on assisting the user
in making this distinction; however, users must make the right security decision
every time. Unfortunately, humans are ill-suited for performing the security checks
necessary for secure site identification, and a single mistake may result in a total
compromise of the user’s online account. Fundamentally, users should be
authenticated using information that they cannot readily reveal to malicious parties.
Placing less reliance on the user during the authentication process will enhance
security and eliminate many forms of fraud. We propose using a trusted device to
perform mutual authentication that eliminates reliance on perfect user behavior,
thwarts Man-in-the-Middle attacks after setup, and protects a user’s account even in
the presence of key loggers and most forms of spyware.
4.2. Proposed System
We advocate the following set of design principles for anti-phishing tools.
Many anti-phishing approaches face the same problem as anti-spam solutions:
incremental solutions only provoke an ongoing arms race between researchers and
adversaries. This typically gives the advantage to the attackers, since researchers
are permanently stuck on the defensive. As soon as researchers introduce an
improvement, attackers analyse it and develop a new twist on their current attacks
that allows them to evade the new defenses. Instead, we need to research
fundamental approaches for preventing phishing. . Most anti-phishing techniques
strive to prevent phishing attacks by providing better authentication of the server.
However, phishing actually exploits authentication failures on both the client and the
server side. Initially, a phishing attack exploits the user’s inability to properly
authenticate a server before transmitting sensitive data.
However, a second authentication failure occurs when the server allows the
phisher to use the captured data to login as the victim. A complete anti-phishing
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.19
solution must address both of these failures: clients should have strong guarantees
that they are communicating with the intended recipient, and servers should have
similarly strong guarantees that the client requesting service has a legitimate claim to
the accounts it attempts to access. Reduce reliance on users. The majority of current
phishing countermeasures rely on users to assist in the detection of phishing sites
and make decisions as to whether to continue are in many ways unsuited to
authenticating others or themselves to others. As a result, we must move towards
protocols that reduce human involvement or introduce additional information that
cannot readily be revealed. These mechanisms add security without relying on
perfectly correct user behaviour, thus bringing security to a larger audience.
Avoid dependence on the browser’s interface. The majority of current anti-
phishing approaches propose modifications to the browser interface. Unfortunately,
the browser interface is inherently insecure and can be easily circumvented by
embedded JavaScript applications that mimic the “trusted” browser elements.
Software Specification :
Front end : Java
Back end : SQL Server
Operating System : Windows7
IDE : Visual Studio
Hardware Specification :
Processor : intel i3
System Bus : 32 BIT
RAM : 3 GB
HDD : 40 GB
Display : SVGA Color
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.20
4.3.MODULES
Proxy Server
It is a server that acts as an intermediary for requests from clients seeking
resources from other servers. A client connects to the proxy server, requesting
some service, such as a file, connection, web page, or other resource, available from
a different server. The proxy server evaluates the request according to its filtering
rules.
Server (Web Application)
This system consists of server and client application.
4.3.1. User Management And login
This modules deals with the registration of local users and management of
users. The management includes Delete , View and Edit details of local users. The
Administrator have the previlage of Delete and View Users. The registration and
Edit details are the privileges of local users.
The system Authentication is achieved through the login process. Only the
registered user can login into the system. So that the outsiders cant acess the
system. While signing into the system the users should provide a username and
password which is already chosen during the registration. If the username and
password is not exist, the user cant login. It means that the user is not registered.
Administrator username and password are already saved in database. After logging
in the users(Administrator, User) can change their password.
Report
This module can be done by the administrator of the system. While the user is
trying to access a site(browsing a site), first the request will be accepted by the
administrator. After getting the url, the administrator will check whether the site is a
phishing site or not. If he found that the site is a hacker site, he will alert the user by
giving option for continue / discontinue from the page.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.21
4.3.2. Anti-Phishing System
This is the main module in this system. By using the following techniques the
administrator can found the site is a phishing site or not.
Age of Domain
This heuristic checks the age of the domain name. Many phishing sites
have domains that are registered only a few days before phishing emails are sent
out. We use a WHOIS search to implement this heuristic. This heuristic measures
the number of months from when the domain name was first registered. If the page
has been registered longer than 12 months, the heuristic will return +1, deeming it as
legitimate, and otherwise returns -1, deeming it as phishing. If the WHOIS server
cannot find the domain, the heuristic will simply return -1, deeming it as a phishing
page.
Suspicious URL
This heuristic checks if a page’s URL contains an “at” (@) or a dash (-)
in the domain name. An @ symbol in a URL causes the string to the left to be
disregarded, with the string on the right treated as the actual URL for retrieving the
page.
Suspicious Links
This heuristic applies the URL check above to all the links on the page.
If any link on a page fails this URL check, then the page is labeled as a possible
phishing scam. This heuristic is also used by SpoofGuard.
IP Address
This heuristic checks if a page’s domain name is an IP address. This
heuristic is also used in PILFER. If any domain name as like an IP address indicate
that the site is a phishing page.
Dots in URL
This heuristic checks the number of dots in a page’s URL. We found
that phishing pages tend to use many dots in their URLs but legitimate sites usually
do not. Currently, this heuristic labels a page as phish if there are 5 or more dots.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.22
4.4 SYSTEM DESIGN
Figure 4.4 System Architecture
4.5 USE CASE DIAGRAM
Figure 4.5 Use Case Diagram
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.23
The above use case diagram shows the connection of client and server to the
modules. The client and server has the connection to all the modules. The client and
server has separate functions for each module.
4.6. HIGH LEVEL DESIGN
Figure 4.6 High level design
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.24
4.7. TABLES Login :
Field
Data type Constraints
Uname
Varchar(50) Not_Null
Password Varchar(50) Not_Null
Status Int Not_Null
Usertype Varchar(50) Not_Null
Fig: 4.7.1.Login table This table is used to login to the homepage.
Register :
Field
Data type Constraints
FName
Varchar(50) Not_Null
LName Varchar(50) Not_Null
ID Int Not_Null
Fig: 4.7.2.Registration table
This table is used for registration.
URL category :
Field
Data type Constraints
url
Varchar(50) Not_Null
Ip adres Int Not_Null
ID Int Not_Null
Fig: 4.7.3.URL Category table This table is used to store the IP address of the system, and URL. Based on this the
sites accessed are categorized into the gray list and white list.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.25
4.8. ER-DIAGRAM
Figure 4.8 ER-Diagram
The ER diagram contains three entities login, register and url. Each entity has its
own attributes.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.26
4.9. DATA FLOW DIAGRAM
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.27
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.28
Figure 4.9.Data flow diagram
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.29
IMPLEMENTATION
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.30
5.SOURCE CODES 5.1.Login form <center> <div> <form action="../LoginServlet" method="post"> <table border="0"> <tr> <th colspan="2">Login</th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2"> <% if (f.equals("1")) { %> <font color="red">Login Failed.</font> <% } else if (f.equals("2")) { %> <font color="red">Blocked Account.</font> <% } %> </td> </tr> <tr> <td>UserName:</td> <td><input type="text" name="username"/></td> </tr> <tr> <td>Password:</td> <td><input type="password" name="password"/></td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Login"/></td> </tr>
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.31
</table> </form> </div> </center> </li> </ul> </div> <!-- end #sidebar --> <div style="clear: both;"> </div> </div> <!-- end #page --> <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div> <!-- end #footer --> </body> </html> 5.2.Registration <% String firstName = ""; String lastName = ""; String userName = ""; String Email = ""; String Mobile = ""; String Address = ""; Object firstNameObj = session.getAttribute("fname"); Object lastNameObj = session.getAttribute("lname"); Object userNameObj = session.getAttribute("uname"); Object EmailObj = session.getAttribute("email"); Object MobileObj = session.getAttribute("mobile"); Object AddressObj = session.getAttribute("address"); if (firstNameObj != null) { firstName = firstNameObj.toString(); } if (lastNameObj != null) { lastName = lastNameObj.toString(); } if (userNameObj != null) { userName = userNameObj.toString(); }
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.32
if (EmailObj != null) { Email = EmailObj.toString(); } if (MobileObj != null) { Mobile = MobileObj.toString(); } if (AddressObj != null) { Address = AddressObj.toString(); } %> <form action="../RegistrationActionServlet" method="post"> <table border="0"> <tr> <th colspan="2"><h3><b>Registration</b></h3></th> </tr> <% String f = request.getParameter("f"); if (f == null) { f = ""; } %> <tr> <td colspan="2"> <% if (f.equals("1")) { %> <font color="red">Enter All Fields!!!!</font> <% } else if (f.equals("2")) { %> <font color="red">Password and confirm Password should be same!!!</font> <% } else if (f.equals("3")) { %> <font color="red">Username not available!!!</font> <% } %> </td> </tr> <tr> <td>First Name</td>
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.33
<td><input type="text" name="firstname" style="width: 170px;" value="<%=firstName%>"/></td> </tr> <tr> <td>Last Name</td> <td><input type="text" name="lastname" style="width: 170px;" value="<%=lastName%>"/></td> </tr> <tr> <td>Login Name</td> <td><input type="text" name="loginname" style="width: 170px;" value="<%=userName%>"/></td> </tr> <tr> <td>Password</td> <td><input type="password" name="password" style="width: 170px;"/></td> </tr> <tr> <td>Confirm Password</td> <td><input type="password" name="confirmpassword" style="width: 170px;"/></td> </tr> <tr> <td>E-Mail</td> <td><input type="text" name="email" style="width: 170px;" value="<%=Email%>"/></td> </tr> <tr> <td>Mobile</td> <td><input type="text" name="mobile" style="width: 170px;" value="<%=Mobile%>"/></td> </tr> <tr> <td>Address</td> <td><textarea name="address" style="width: 170px;" value="<%=Address%>" rows="5"></textarea> </td> </tr> <tr> <td colspan="2" style="text-align: center;"><input type="submit" value="Save"/></td> </tr> </table> </form> </div> </center>
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.34
</div> </div> </div><!-- end #content --> <div id="sidebar"> </div> <!-- end #sidebar --> <div style="clear: both;"> </div> </div> <!-- end #page --> <div id="footer"> <p>Copyright (c) 2011 www.antiphishing.com.</p> </div> <!-- end #footer --> </body> </html> 5.3.client application public class SystemTraySupport { private String icon = "icon/icon.png"; private String applicationName; private PopupMenu popup; private TrayIcon trayIcon; public void setPopup(PopupMenu popup) { this.popup = popup; } public void setIcon(String icon) { this.icon = icon; } public SystemTraySupport(String applicationName) { this.applicationName = applicationName; popup = new PopupMenu(); initSystemTray(); } public void addMenuListener(String menuItemName, ActionListener listener) { MenuItem menuItem = new MenuItem(menuItemName); menuItem.addActionListener(listener); popup.add(menuItem); }
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.35
public void showInfoMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.INFO); } public void showErrorMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.ERROR); } public void showMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.NONE); } public void showWarningMessage(String message) { trayIcon.displayMessage(applicationName, message, TrayIcon.MessageType.WARNING); } private void initSystemTray() { if (SystemTray.isSupported()) { SystemTray tray = SystemTray.getSystemTray(); Image image = Toolkit.getDefaultToolkit().getImage(icon); trayIcon = new TrayIcon(image, applicationName + " Running...", popup); trayIcon.setImageAutoSize(true); try { tray.add(trayIcon); } catch (Exception e) { System.err.println("TrayIcon could not be added."); } } else { System.out.println("System Tray is not supported"); } } }
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.36
5.4.Server application public class NetworkProperties { private final String DIRECTORY = "setting"; private final String FILE_NAME = "network_config.xml"; //"database.properties"; private static NetworkProperties networkProperties; private String internetGatewayIP; private int internetGatewayPort; private String webServerIP; private int webServerPort; public String getInternetGatewayIP() { return internetGatewayIP; } public void setInternetGatewayIP(String internetGatewayIP) { this.internetGatewayIP = internetGatewayIP; } public int getInternetGatewayPort() { return internetGatewayPort; } public void setInternetGatewayPort(int internetGatewayPort) { this.internetGatewayPort = internetGatewayPort; } public String getWebServerIP() { return webServerIP; } public void setWebServerIP(String webServerIP) { this.webServerIP = webServerIP; } public int getWebServerPort() { return webServerPort; } public void setWebServerPort(int webServerPort) { this.webServerPort = webServerPort; } public NetworkProperties() { if (!loadPropertiesFormXMLFile()) { createBlankPropertyXMLFile();
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.37
} } public static NetworkProperties getInstance() { if (networkProperties == null) { networkProperties = new NetworkProperties(); } return networkProperties; } public void save() { Properties properties = new Properties(); properties.setProperty("InternetGatewayIP", internetGatewayIP); properties.setProperty("InternetGatewayPort", Integer.toString(internetGatewayPort)); properties.setProperty("WebServerIP", webServerIP); properties.setProperty("WebServerPort", Integer.toString(webServerPort)); File file = new File(DIRECTORY); if (!file.exists()) { file.mkdirs(); } file = new File(file, FILE_NAME); FileOutputStream out = null; try { out = new FileOutputStream(file); properties.storeToXML(out, null); } catch (IOException e) { e.printStackTrace(); } finally { try { out.close(); } catch (Exception e) { } } } private boolean loadPropertiesFormXMLFile() { FileInputStream in = null; try { File file = new File(DIRECTORY, FILE_NAME); in = new FileInputStream(file); Properties properties = new Properties(); properties.loadFromXML(in); internetGatewayIP = properties.getProperty("InternetGatewayIP").trim();
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.38
internetGatewayPort = Integer.parseInt(properties.getProperty("InternetGatewayPort").trim()); webServerPort = Integer.parseInt(properties.getProperty("WebServerPort").trim()); webServerIP = properties.getProperty("WebServerIP").trim(); return true; } catch (IOException e) { e.printStackTrace(); return false; } finally { try { in.close(); } catch (Exception e) { } } } private void createBlankPropertyXMLFile() { internetGatewayIP = "0.0.0.0"; internetGatewayPort = 0; webServerPort = 0; webServerIP = "0.0.0.0"; save(); } public static void main(String[] args) { NetworkProperties pro = NetworkProperties.getInstance(); System.out.println(pro.getInternetGatewayPort()); System.out.println(pro.getWebServerPort()); System.out.println(pro.getInternetGatewayIP()); System.out.println(pro.getWebServerIP()); pro.save(); } }
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.39
TESTING
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.40
6. TESTING
Testing performs a critical role for quality assurance and for the reliability of
the software. Testing forms is the first step in determining the errors in the program.
Different levels of testing are used in the testing process, each level of testing aims
to test different aspects of the software. The basic levels used are: unit testing and
integration testing.
There are several strategies that are used in the system.
Unit testing
Integration testing
6.1 UNIT TESTING
Unit testing is the first level of testing. It is a software verification and
validation method in which a programmer tests if individual units of source code are
fit to use. The unit testing is performed by the programmer prior to integration of
individual units or modules into a larger system. The testing is carried out in the
coding stage itself and each module is found to be working satisfactorily as regards
to the expected output from the module.
6.2 INTEGRATION TESTING
In this testing we test our project as a whole, combining the whole modules.
Integration testing sis a systematic technique for constructing the program structure
while conducting tests to uncover errors associated with the interfacing between
modules. The individual units are combined with other units to make sure that
necessary communication, links and data sharing occur properly.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.41
6.1.1 TEST CASE – ADMIN LOGIN
Tc
No
UNIT INPUT EXPECTED
OUTPUT
OUTPUT
OBTAINED
STAT
US
REASON REME
DY
1. Login
form
Username
and
password
of the
administra
tor.
Checks the
database
and
redirects to
the admin
home. Else
shows error
message
box
Error
message
Failure Incorrect
usernam
e or
password
Enter
the
correct
userna
me
and
passw
ord
2. Login
form
Username
and
password
Checks the
database
and
redirects to
the admin
home. Else
shows error
message
box
Redirected
to the
admin
home
Succe
ss
Table: 6.1.1: Admin Login
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.42
6.1.2 TEST CASE – CANTINA WEBSIDE
Tc
No
UNIT INPUT EXPECTED
OUTPUT
OUTPUT
OBTAINED
STAT
US
REASON REME
DY
1. Login
form
No input
required.
Redirects to
the blind
home page
within ten
seconds.
Else
remains in
the login
form.
No
redirection.
Failure Code
error.
Verify
and
correct
the
code.
2. Login
form
No input
required.
Redirects to
the blind
home page
within ten
seconds.
Else
remains in
the login
form.
Redirected
to the blind
home
page.
Succe
ss
Table: 6.1.2: Cantina Web side
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.43
6.2 INTEGRATION TESTING
6.2.1. TEST CASE – CLIENT APPLICATION
Tc
No
UNIT INPUT EXPECTE
D
OUTPUT
OUTPU
T
OBTAIN
ED
STATUS REASON REME
DY
1. Downl
oad
access
Enter the
username
and
password.
The
contents
pertaining
to the
option are
made
available
to the
user.
No
further
navigati
on.
Failure The user
has been
blocked
by
administra
tor.
Unbloc
k the
user.
2. File
Setup
Click the
link to
download
the
software.
The
software
should be
download
ed
successful
ly.
Jar file
could
not be
downloa
ded.
Failure Setup fail
error.
Proper
setup of
jar file.
Table: 6.2.1: Client Application
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.44
7. FUTURE SCOPE
In the coming months, the Anti-Phishing Working Group will be working with
technology partners, government regulators, and leaders in industries being
victimized by phishing attacks to sculpt the most efficient and elegant digital signing
and authentication solution that can be employed by large user communities. We are
sending calls to action to all stakeholders in the hopes that by year end, a consensus
will be formed among them as to how the effected industries can establish a digital
signing and authentication regime that can be deployed post haste to end phishing
attacks, preclude regulatory adventurism and, ultimately, to establish a technological
foundation for a broader spam-proof electronic mailing infrastructure.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.45
CONCLUSIONS
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.46
8. CONCLUSIONS CANTINA, a novel content-based approach for detecting phishing web sites.
CANTINA takes Robust Hyperlinks, an idea for overcoming page not found problems
using the well-known Term Frequency / Inverse Document Frequency (TF-IDF)
algorithm, and applies it to anti-phishing. We described our implementation of
CANTINA, and discussed some simple heuristics that can be applied to reduce false
positives. We also presented an evaluation of CANTINA, showing that the pure TF-
IDF approach can catch about 97% phishing sites with about 6% false positives, and
after combining some simple heuristics we are able to catch about 90% of phishing
sites with only 1% false positives.
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.47
SCREENSHOTS
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.48
Login Form
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.49
Admin Home Page
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.50
User’s List
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.51
Whitelist
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.52
Blacklist
Semester: VIII Branch: I.T. Project Title: Cantina Antiphishing
Toc H Institute of Science & Technology
Arakkunnam – 682 313
Page No.53
REFERENCE
[1] 3Sharp, 3Sharp Study finds Internet Explorer 7 Edges Out Netcraft As Most
Accurate for Anti-Phishing Protection. 6.http://www.3sharp.com/projects/antiphishing/
[2] Anti-Phishing Working Group, Phishing Activity Trends Report. 2006.
Zttp://www.antiphishing.org/reports/ apwg_report_june_06.pdf
[3] Anti-Phishing Working Group (APWG). Visited: Nov 20,
2006.http://www.antiphishing.org/
[4] Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side
Defense against Web-Based Identity Theft. In Proceedings of The 11th Annual
Network and Distributed System Security Symposium (NDSS '04).
http://crypto.stanford.edu/SpoofGuard/webspoof.pdf
[5] Cloudmark Inc. Visited: Nov 20, 06.http://www.cloudmark.com/desktop/download/
[6] Robert T. Morris, A Weakness in the 4.2BSD UNIX TCP/IP Software. Computing
Science Technical Report 117, AT&T Bell Laboratories, February 1985.
[7] Rod Rasmussen, Phishing Prevention: Making Yourself a Hard Target. Internet
Identity / APWG (April 5, 2004).
[8] Blake Ross, Nick Miyake, Robert Ledesma, Dan Boneh and John C. Mitchell, A
Simple Solution to the Unique Password Problem.
[9] Z. Ye. Building Trusted Paths for Web Browsers. Master’s Thesis. Department of
Computer Science, Dartmouth College. May 2002
[10] Zishuang (Eileen) Ye and Sean W. Smith, Trusted Paths for Browsers, 11th
Usenix Security Symposium, August 2002.