classification of anti-phishing solutions...35] [5] [73] [1] [68] [38] [12] [55] dataset...

18
Vol.:(0123456789) SN Computer Science (2020) 1:11 https://doi.org/10.1007/s42979-019-0011-2 SN Computer Science SURVEY ARTICLE Classification of Anti‑phishing Solutions S. Chanti 1  · T. Chithralekha 2 Received: 6 April 2019 / Accepted: 28 June 2019 / Published online: 16 July 2019 © Springer Nature Singapore Pte Ltd 2019 Abstract Phishing is an online fraud through which phisher gains unauthorized access to the user system to lure the personal credentials (such as username, password, credit/debit card number, validity, CVV number, and pin) for financial gain. Phishing can be carried out in many ways: through emails, phone calls, instant messages, advertisements, and popups on the website and poisoning the DNS. To protect the users from phishing, many anti-phishing toolbars/extensions had been developed. These anti-phishing tools prevent the Internet users not to fall a victim of phishing scams. No anti-phishing approach can give 100 % security. In this paper, we present a complete classification of an anti-phishing solution in algorithmic perspective. The taxonomy helps in understanding various anti-phishing approaches and algorithms developed for phishing detection. Popular anti-phishing toolbars are taken to show the media they address, mode of operation, and their pros and cons. It also provides further research gap that has to be addressed. Keywords Phishing · Anti-phishing · Content-based approach · Non-content-based approach · Machine learning · Anti- phishing toolbars Introduction Phishing is an Internet scam used by the phisher to fool Internet users for malicious activities. Phishing can be done in many ways. Among them, email phishing is the traditional and most common way of performing phishing. Usually, the phisher sends an email by stating some emergency which evokes the user to click on the hyperlink or the attachment provided in the email. The phisher comes with a new tech- nique every time to fool Internet users. According to the Anti-phishing Working Group [11] survey report, there are 1,220,523 unique phishing attacks that occurred in Janu- ary–March for the year 2018. Pharming is the advanced way of phishing scams, where the phisher redirects the user to a spoofed site that looks and feels exactly like the original site. This can be done either by modifying the host files on the user system or by hijacking (replacing the IP address) the DNS servers. If the IP address on the DNS server is changed, the entire traffic of the website is redirected to the site specified by the phisher. Pharming is more dangerous and very difficult to detect. To prevent internet users from phishing scams, anti- phishing solutions had been developed. Anti-phishing helps in detecting the phishing scams. In this study, we classified the existing anti-phishing solutions into two main categories, namely, content-based and non-content-based. The content- based approaches analyze the content from webpage, URL, email to decide whether it is phishing or not. The non-con- tent-based approaches do not analyze the content; instead, they verify with the existing blacklist (stores the phishing data), a whitelist (list of trusted sites). In this work, we focus on the following aspects that are different from the existing taxonomies: A complete classification of anti-phishing solutions. Presenting a literature survey on existing anti-phishing algorithms used by different approaches, the data set used and the limitations are discussed in detail. This article is part of the topical collection “Advances in Internet Research and Engineering” guest edited by Mohit Sethi, Debabrata Das, P. V. Ananda Mohan and Balaji Rajendran. * S. Chanti [email protected] T. Chithralekha [email protected] 1 Department of Banking Technology, Pondicherry University, Puducherry, India 2 Department of Computer Science, Pondicherry University, Puducherry, India

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

Vol.:(0123456789)

SN Computer Science (2020) 1:11 https://doi.org/10.1007/s42979-019-0011-2

SN Computer Science

SURVEY ARTICLE

Classification of Anti‑phishing Solutions

S. Chanti1  · T. Chithralekha2

Received: 6 April 2019 / Accepted: 28 June 2019 / Published online: 16 July 2019 © Springer Nature Singapore Pte Ltd 2019

AbstractPhishing is an online fraud through which phisher gains unauthorized access to the user system to lure the personal credentials (such as username, password, credit/debit card number, validity, CVV number, and pin) for financial gain. Phishing can be carried out in many ways: through emails, phone calls, instant messages, advertisements, and popups on the website and poisoning the DNS. To protect the users from phishing, many anti-phishing toolbars/extensions had been developed. These anti-phishing tools prevent the Internet users not to fall a victim of phishing scams. No anti-phishing approach can give 100 % security. In this paper, we present a complete classification of an anti-phishing solution in algorithmic perspective. The taxonomy helps in understanding various anti-phishing approaches and algorithms developed for phishing detection. Popular anti-phishing toolbars are taken to show the media they address, mode of operation, and their pros and cons. It also provides further research gap that has to be addressed.

Keywords Phishing · Anti-phishing · Content-based approach · Non-content-based approach · Machine learning · Anti-phishing toolbars

Introduction

Phishing is an Internet scam used by the phisher to fool Internet users for malicious activities. Phishing can be done in many ways. Among them, email phishing is the traditional and most common way of performing phishing. Usually, the phisher sends an email by stating some emergency which evokes the user to click on the hyperlink or the attachment provided in the email. The phisher comes with a new tech-nique every time to fool Internet users. According to the Anti-phishing Working Group [11] survey report, there are 1,220,523 unique phishing attacks that occurred in Janu-ary–March for the year 2018. Pharming is the advanced way

of phishing scams, where the phisher redirects the user to a spoofed site that looks and feels exactly like the original site. This can be done either by modifying the host files on the user system or by hijacking (replacing the IP address) the DNS servers. If the IP address on the DNS server is changed, the entire traffic of the website is redirected to the site specified by the phisher. Pharming is more dangerous and very difficult to detect.

To prevent internet users from phishing scams, anti-phishing solutions had been developed. Anti-phishing helps in detecting the phishing scams. In this study, we classified the existing anti-phishing solutions into two main categories, namely, content-based and non-content-based. The content-based approaches analyze the content from webpage, URL, email to decide whether it is phishing or not. The non-con-tent-based approaches do not analyze the content; instead, they verify with the existing blacklist (stores the phishing data), a whitelist (list of trusted sites). In this work, we focus on the following aspects that are different from the existing taxonomies:

• A complete classification of anti-phishing solutions.• Presenting a literature survey on existing anti-phishing

algorithms used by different approaches, the data set used and the limitations are discussed in detail.

This article is part of the topical collection “Advances in Internet Research and Engineering” guest edited by Mohit Sethi, Debabrata Das, P. V. Ananda Mohan and Balaji Rajendran.

* S. Chanti [email protected]

T. Chithralekha [email protected]

1 Department of Banking Technology, Pondicherry University, Puducherry, India

2 Department of Computer Science, Pondicherry University, Puducherry, India

Page 2: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 2 of 18

SN Computer Science

• Presenting a comparison of existing anti-phishing tool-bars in the literature.

In this paper, a complete classification of anti-phishing solu-tions is provided: the classification assists to understand various approaches utilized for developing anti-phishing solutions and the current trends. “Research Methodology” is about research methodology to illustrate how this lit-erature is performed. “Anti-phishing Solutions” explains a complete classification of anti-phishing solutions and the existing anti-phishing approaches. “Existing Anti-phishing Browser Extensions/Toolbars” elucidates existing anti-phishing browser extensions/toolbars with pros and cons. “Discussion” answers all the research questions raised, and finally, “Conclusion” provides the conclusion of the paper.

Research Methodology

A complete classification of anti-phishing solutions had been chosen as the research methodology for this study. The goal of this classification is to provide an overview of anti-phish-ing solutions with the amount of research rendered in this area. Based on the idea and works carried out by various authors [17, 28, 45] encouraged us to write this survey paper with the following research questions:

Research Questions

The main goal of the study was to provide a complete clas-sification of anti-phishing solutions. To do that, we define the following questions:

RQ1 What are the areas that current anti-phishing solu-tions address?

RQ2 Do the existing anti-phishing toolbars cover all types of phishing attacks?

RQ3 What are the current research gaps in anti-phishing?

Searching for Papers

The preliminary search is conducted to collect the articles from different sources. The keywords such as anti-phish-ing, email-based phishing detection, website-based phish-ing detection, URL-based phishing detection, social media phishing, and DNS phishing are used to search the relevant articles from digital libraries such as IEEE, ACM, Emer-alds, Science Direct, and Springer. The above-mentioned keywords are used to search the relevant literature from these digital libraries. To find the relevant papers, the titles

with these keywords are filtered. Only leading journals and international conference papers were chosen for this study.

Finding the Relevant Papers

To find the most relevant articles, a screening process is done, based on the presence of the keywords in the title of the search results. These papers are further analyzed by reading the abstract. The second-stage filtering is per-formed by reading the abstract of the papers and relevant papers are examined. The selected papers are classified as email-based, website-based, DNS-based, and social media-based phishing detection/contact-based, and noti-fication-based. and examined thoroughly. After reading the papers, the clarification of anti-phishing solution is defined.

Details About the Papers

This section provides an exhaustive information about the number of papers available on phishing, the different sources, where they are available, how the relevant papers have been filtered, and the yearwise publication of those papers .

Selection process of papers The process of selecting the paper is given in Fig. 1. Initially, 5269 papers were retrieved from five digital libraries.

From obtained results, the first filtering is performed based on the title of the papers. By examining these papers, 279 papers were selected. The papers that are out of the scope are removed from the literature. After this, the abstract of all these papers is studied and filtered 113 papers. In the next stage, an in-depth analysis is done on unclear papers. All 113 papers are read completely to exclude the uncleared papers, and finally, we got 75 papers for the study.

Publication of papers in different sources All the search results are from the digital libraries such as IEEE, ACM, Emeralds, Science Direct, and Springer. We considered only the journal articles and conference proceeding articles for the study. We found 5269 articles out of which 3288 were research articles and 1981 conference passed proceedings. Springer and Science Direct have published more research articles than conference proceedings. However, IEEE and ACM have more conference proceeding articles than research articles. The details are given in Fig. 2.

Yearwise publication of selected papers Figure 3 shows the yearwise publication of selected papers. The selected papers are from 1992 to 2018. Since 1992, the number of publications is increasing steadily. From the selected papers, 11 papers (14.6%) were published in 2017, and 8 papers (10.6%) were published in 2007, 2011, and 2016. The

Page 3: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 3 of 18 11

SN Computer Science

number of phishing scams is increasing drastically and the phisher uses different techniques to lure the Internet users.

Anti‑phishing Solutions

According to the APWG survey, thousands of phishing sites are developing every year and billions of dollars are lost. To overcome this problem, many business companies

and researchers started developing anti-phishing solutions. Anti-phishing can be implemented both for client side and server side [28, 52]. Based on the works carried out on anti-phishing, we proposed a taxonomy of anti-phishing solution, as shown in Fig. 5.

An evolution roadmap of existing anti-phishing solutions is listed in Fig. 4. A consolidated features’ list is given in Table 1 which includes the email, website, URL, and Social media features from various sources for phishing detection

Fig. 1 Selection process of papers

Fig. 2 Publication of papers in different sources

Page 4: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 4 of 18

SN Computer Science

[2, 3, 5, 10, 12, 23, 25, 35, 36, 38, 42, 48, 49, 58, 59, 63, 66, 68, 74]. Different anti-phishing approaches use differ-ent algorithms to classify the phishing attacks from the legitimate ones. The content-based and non-content-based approaches are further explained below in detail.

Content‑Based Phishing Detection

In content-based phishing detection, the phishing attack is detected by analyzing the content of the website. Analyz-ing the content requires some features such as checking the spelling and grammar, password fields, links, images, URLs, page rank, WHOIS information, verifying the HTML code, and JavaScript [19, 71].

Social media Social media Phishing is a new way of steal-ing user credentials using social networks such as Facebook, Twitter, LinkedIn, Google+, and so on. According to Ref. [31] study, stealing of user credentials from social networks sites is four times greater than the other phishing attacks. Social media phishing looks similar to email phishing, but it is not.

In email phishing, the phisher sends the email to either redirect the user to a suspicious site or attach some mali-cious code. However, in social media phishing, the attacker communicates with the user and slowly tries to collect their personal credentials or asks for financial support. As men-tioned in paper [65], the Social media Phishing can be dif-ferent from others in three ways:

• Social media phishing can be observed in the new social media environment, where the features and policies keep on changing.

• Second step can be performed in two levels: in level 1, the attacker creates a fake account to interact with the victim in a different manner (like a friend). In level 2, phisher collects the personal credentials of the victim.

• Finally, Social media phishing is successful, because it is very difficult to distinguish the fake request from a legitimate one.

In paper [7], detection of spear phishing attacks in relation to the individuals’ social media activities is performed. According to their preliminary results, social media sites provide the identity information, open to the public, which helps the phisher to target the individual user through spear phishing.

Website content-based phishing detection In website content-based phishing detection, the features from URL, Image, and text content are analyzed.

URL analysis URL analysis is conducted to verify whether the site requested by the user is trusted or phish-ing. This can be done by checking the presence of special character (@), IP address instead of the domain name, pre-fix/suffix, HTTPS in domain part and many other features. Rule-based approaches are the conditions that classify the phishing URLs from a legitimate one. Machine-learning approaches are also used for phishing detection [33].

Image analysis Image analysis includes images, logs, CAPTCHAs, and screenshots of the website that help to distinguish the phishing website from a legitimate website. The visual content similarity-based approach does image analysis by comparing the logo of the website. To do this, the screenshot of the page is captured and extracts the logo. This logo is compared with the blacklist, and if it matches,

Fig. 3 Yearwise publication of selected papers

Page 5: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 5 of 18 11

SN Computer Science

then it is a phishing site [19, 51]. The text extracted from the screenshot can also be used for phishing detection.

Text analysis Text content of a website helps in better detection of phishing attacks. Text content may be a sim-ple keyword, scripts, secure sockets layer (SSL) enabled or not, and so on. Text content-based approach [4], rule-based approach [19, 73], and machine-learning approaches [18, 29, 37] are used to analyze the text content.

Email content-based phishing Email-based Phishing is the most common way of phishing. In email phishing, the phisher either redirects the user to a fraudulent site/spoofed site or a malicious attachment that downloads and installs automatically without user’s knowledge when they click on that link. It provides unauthorized access to users’ system.

Spam filtering Spam filters [4, 30] classify the phish-ing emails from Spam; few instructions are given to the Spam filter like checking whether the sender information is

blacklisted or not, the presence of any urgency in the con-tent, malicious attachments, and suspicious URLs can help in classifying the phishing email from legitimate ones.

URL analysis Phishing email has become a very common and easy way of stealing the credentials of Internet users by redirecting their search. Before the user visits the site, the URL is to be validated to find the suspicious one. When the user clicks on the phishing hyperlink in an email, before loading the page, the URL will collect the information such as domain details, destination details, and age of the domain which are verified and allows the user only when the infor-mation is valid [12].

Spelling and grammar correction Phisher sends thou-sands of email every day to fool the Internet user to give their personal credentials. The content and the links look like a genuine mail that fools the user to click on the links provided. These types of emails can be verified by check-ing the misspelt words and grammar corrections from the incoming mail [34]. In paper [21], a toolbar is developed to provide an additional feature “scam blocker” which identi-fies the spelling and grammar correction in the email. The phisher normally uses misspelt words (For example, instead of Google they type Goog1e) which the Spam filters fail to detect. Scam Blocker assists in detecting this type of email and blocks them before reaching the inbox.

DNS DNS phishing (pharming) is phishing without a lure. In Phishing attacks, the attacker focuses on an individual, but in pharming, they target an entire network by modifying the DNS entries, so that all the requests are redirected to attackers’ server. Pharming attacks are very difficult to detect and even the URL looks exactly same as legitimate one. There are few works on pharming detection which compares the IP addresses. In paper [25], the author compared the IP address of the current site with the default DNS Server and if does not match, then it is pharming. More details are pro-vided in “Existing Anti-Phishing Approaches".

Non‑content‑Based Phishing Detection

Non-content-based approaches focus on the features other than the content. By verifying the suspicious URL in the blacklist, based on user rating, the popularity of the domain and many other features, it could be decided whether the site is phishing or legitimate.

Existing Anti‑phishing Approaches

The existing anti-phishing approaches are developed either by content-based or by non-content-based detection tech-niques. The efficiency of anti-phishing approaches depends on the factors such as the features, collection of data sets, and their size. For machine learning, approaches require more data samples to train the model to detect the phishing

Fig. 4 Evolution roadmap of anti-phishing solutions

Page 6: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 6 of 18

SN Computer Science

attack. Anti-phishing algorithms are also developed for phishing detection. Table 2 shows different algorithms used in different approaches with their performance and limitations. The primary requirement for anti-phishing is data set. In paper [17], the author listed some Benchmark-ing data set sources that provide legitimate and phishing data sets. Data set from PhishTank.com is the most widely used data set for phishing. The existing content-based and non-content-based anti-phishing approaches are given below:

Behaviour-based The behaviour-based approaches work on the behaviour of the Internet user to detect suspicious activities. In paper [65], the author presented a behaviour-based technique to detect Social Network Phishing (SNP). A study is conducted by selecting 127 students randomly who use Facebook. Four accounts were created from those accounts: (i) with no photo, personal info and friends; (ii) next with a photo but no friends; (iii) next without a photo but ten friends; and (iv) account with a picture and ten friends. They categorized the SNP into two levels. At first level, the phisher uses phony profiles to identify the Facebook users. In the second level, they try to extract the information directly. The users responded to the request with more friends even if the picture of the person is not available.

Visual content similarity-based approach Visual content similarity-based approach is used to visually compare the

images, logos, screenshots of the phishing site. In Refs. [28, 52], the screenshot of the URL requested by the user is obtained from PhishTank website. Using clipping tools, the logo is separated from the screenshot. Later, the logo is given to Google search engine and text content is obtained from the search results. If the current URL is listed in the Google search results, then it is considered as legitimate, else phishing.

In paper [19], the image in a website is captured, and optical character recognition (OCR) is used to extract the textual content from the image. This textual content is then loaded into Google for domain matching. If it matches, a green color (for a trusted site) indicator is displayed, else a red color (phishing site) indicator appears.

In paper [75], the author introduced a visual similarity- based approach with local and global features to compare the phishing web page with a legitimate web page. A logo detec-tion method is used to extract local features and modified EMD algorithm for global feature extraction. The screenshot of the suspected site is taken to extract non-text content fea-tures that include images of flash objects in an HTML page. To improve the performance of their technique, they col-lected a large amount of phishing and legitimate web pages and produced the outcome with 90% true positive and 97% true negative rates.

Rule-based approach Rule-based Approach [25, 29] is a content-based approach that analyzes the content within

Fig. 5 Anti-phishing solutions for phishing detection

Page 7: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 7 of 18 11

SN Computer Science

Table 1 List of phishing detection features at different levels

Types of features List of features

Email features Header features Compare-Msg-Sender-Domain, HTML-mail, Text-mail, Multi-Part-Mail, Number-Of-Receivers, Number-Of-

Attachments, Subject-Bank-Word,Subject-Debit-Word, Subject-Fwd-Word, Subject-Reply-Word, Subject-Ver-ify-Word, Subject-Num-Chars, Subject-Num-Words, Subject-Richness, Send-Num-Words, Send-Diff-Replay-to, Number-Of-Recipients, Number-Of-CcRecipients, Number-Of-BccRecipients, Absence of names (first, middle, last)

 URL feature in Email Num-Of-Link, Number-Of-Diff-Domain, Num-Of-Diff-Link-Text, Num-Domain-NLSender, Num-Of-Dots-InDo-main, Non visible links, Non matching links, Number-Link-Contain@, Number-Of-Link-ContainIP, Number-Of-Link-Contain-Esc, Number-Of-Link-Contain-NSPort, url-Bag-Link, Url-Num-Port, Black-List-URL, No. of Links Behind an image, Link with following word: Click, Here, Login, Update

 Word list feature Boolean indicators of whether the words or stems listed below appear in the email body: account, update, con-firm, verify, secure, notify,log, click, inconvenience, customer, client, suspend, restrict, Hold,Verify, username, password, SSN, user

 Structural features Total number of body parts, Total number of alternative parts HTML content HTML form, Contains Script, Count SSL Link, Number of linksusing Image, Number of non-ASCII links, Script

onclick, Script popup, Script status change Email body features Size of the document, Dear (keyword), no. of characters, no. ofwords, no. of unique words, Body richness, no. of

Functionalwords, no. of suspension words, Verify your account phrase, Disparities between “href” attribute and LINK text, Mention ofmoney, Presence of reply inducing sentence, sense of urgency

Website features Address bar features IP address, Long URL that hides suspicious part, Tiny URL, URLwith @ symbol, redirect using “//”, prefix or

suffix to domain, HTTPS, favicon, Using Non-Standard Port, Sub-domain and multisub-domains, Using free hosting Domains, Count of digit, Lengthof URL, Ration of special characters, Registration date of Domain, No. of dots(.) in the URL, Port no. in the URL, No. of tripletsin the path of URL, No. of triplets in the domain name, No.of Phishing keywords in the URL

 Abnormal web features Request URL, URL of Anchor, Links in f<meta>,<script>, and<Link>g, Suspicious action upon submitted information, Submittedinformation to email, Website Owner, Abnormal URL, AbnormalDNS record, Abnor-mal Anchors, Abnormal server form handler, Abnormal certificate in SSL, The no. of web pages, The avg no. ofinbound links, The avg no. of internal links, The avg no. of images,The avg no. of input boxes, The avg no. of password boxes, The proportion of form links, Dynamic web page proportion

 HTML and JavaScript Websites forwarding, Status bar customization, Disabling right click, Pop-up window, Iframe redirection, Count of hidden tags, Count ofexternal links, Count of unsymmetric tags, Count of JavaScriptsegments, Count of plug-ins and Active X controls, Count of longstring, Count of Unicode characters, Count of Hex and Octalcod-ing, Count of replace() function call, Count of eval() and exec() function, Count of string functions, Count of obfuscation function, Evaluation of (form, title, image, meta description, meta keywords, script, link and href) tags

 Domain features Age of domain, DNS record, Web traffic, Google index, Number of links pointing to the web page, Statistics report-based features

 Graphical features Grayscale histogram, color histogram, Spatial relationship between subgraphs are extracted from web image Country-code and TLD TLD evaluation in the domain name, TLD evaluation in the part of the URL, Country- code and TLD comparison

URL features IP-based URL, Age of the domain, Length of URL, No. of dots, Longest common sequence in URL, Presence of “@” and “-” symbol, Rank, Link-in-count, Mld-results, Mld-ps-results, Cardinality, Ration-associated, Ration-related, Jaccard-(RR, RA, AR, AA), Jaccard-AR-Registered, Jaccard-AR-Renaming, Domain exists inAlexa rank, Sub-domain length, Path length, URL entropy, Lengthratio, Punctuation count, Euclidean distance

Social media features  Twitter  Account- specific features Length of the account name, Length (size) of the account Description, Total count of friends, Total count of fol-

lowers, User reputation, Ratio of followers and friends, Life time of the user account, Rate of friends, Rate of followers, Total count of tweets posted, Average count of tweets posted per day, Average count of tweets gener-ated per week, Total count of tweets liked/favorited, Total count of lists

  Object Specific Features Average count of hash-tags present in a tweet, maximum count of hash-tags present in a tweet, Fraction of tweets with a hash-tag, Average count of URLs per tweet, Maximum count of URLs present in a tweet, Fraction of tweets with URLs, Average count of mentions per tweet, Maximum count of mentions per tweet, Fraction of tweets with mention, Average count of re-tweets per tweet Maximum count of re-tweets per tweet, Average count of favourites per tweet

Page 8: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 8 of 18

SN Computer Science

a URL, email, social media, and web content with some conditions (heuristics). In URL analysis, the content of the URL alone is analyzed.

The heuristics like more number of dots and slashes in domain part, whether it is an IP-based URL or not, the pres-ence of any special character (@) are grasped from the URL to predict phishing.

In paper [4], heuristics are such as the primary domain, sub-domain, path domain, page rank, Alexa rank, and Alexa reputation are considered. When the user clicks on any link, these features are extracted and checked whether it satis-fies the conditions or not. If URL satisfies with the above condition, then it is legitimate, else phishing. The lifespan of phishing URLs is very small and it will not be available in top search results.

In paper [73], the author introduced a content-based approach CANTINA for phishing web page classification. Term frequency-inverse document frequent (TF-IDF) is used to calculate the score of each term in a web page. Among the words, which contains high TF-IDF score is taken to gener-ate lexical signatures. This information is then provided to the Google search engine to check whether the domain is listed in the top 30 results or not. If the current domain is not listed in the top 30 search results, then it is a phishing site.

Text content similarity-based approach In text content similarity-based approach, the keywords that are very simi-lar to the actual words like IC1CI instead of ICICI to fool online customer to give up their personal credentials. To prevent this type of fraud, the textual content is analyzed, and a list of keywords is stored for verification. The data-base contains the keywords (such as click here, verify, login, apply online, dear, free access) commonly used in phishing emails. These approaches can monitor the incoming emails to check whether these keywords are present or not. If so, it is classified as Spam mail.

Text analysis also compares the current website con-tent with the stored profiles to spot the phishing scams. A stored profile contains URLs, SSL certificate details, images, HTML contents, and scripts. In Ref. [4], the tool-bar maintains a database with these profiles and extracts

these features from the current site. If the extracted infor-mation does not match with the stored profiles, then it is phishing.

In paper [4], they maintain a blacklist of keywords as tokens, and for every token, it is verified whether it is avail-able in that list of blocked keywords. If it is found, then the count automatically increases, and finally, if it crosses the threshold value, then it is a phishing email.

Machine learning Machine learning is a complex com-putation process of automatic pattern recognition and intel-ligent decision making based on training sample data [18]. Supervised and unsupervised classifiers are the two main classifications of machine learning. Machine learning has the ability to learn from the data without being explicitly programmed. Initially, in the training phase, we take few instances (each row in the data set is called one instance) to train the model with a machine-learning classifier, and then, we load a set of new instances to check whether it classifies them properly or not.

In paper [29], supervised machine-learning algorithms Adaline network, back propagation network along with sup-port vector machine are used and they found 15 features such as presence of IP-based URL, special character (@), adding (-), using anchor tags, the age of the domain, etc., for phishing detection. The data set is collected from PhishTank (phishing URLs) and Alexa (trusted URLs). It is a super-vised classifier, so that the output should know while train-ing. Later, the testing data without output label are given to check the efficiency of the model developed for phishing detection. The detection rate of machine learning can be calculated in terms of accuracy, precision, recall, false posi-tive, and false negative.

Bayesian anti-phishing toolbar (B-APT) a browser exten-sion used to filter the phishing email. The B-APT [37] has two parts:

• User interface.• B-APT engine.

Table 1 (continued)

Types of features List of features

 Facebook  Account specific features Average count of hash-tags present in a tweet, maximum count of hash-tags present in a tweet, Fraction of tweets

with a hash-tag, Average count of URLs per tweet, Maximum count of URLs present in a tweet, Fraction of tweets with URLs, Average count of mentions per tweet, Maximum count of mentions per tweet, Fraction of tweetswith mention, Average count of re-tweets per tweet Maximum count of re-tweets per tweet, Average count of favourites per tweet

  Object specific features Average count of hash-tags per post, Maximum count of hash-tags per post, Fraction of posts with hash-tags, Average count of an occurrence of URLs per post, Maximum count of URLs present in a post, Fraction of posts with URLs, Average count of tags per post, Maximum count of tags per post

Page 9: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 9 of 18 11

SN Computer Science

Tabl

e 2

Pop

ular

ant

i-phi

shin

g al

gorit

hms u

sed

in p

hish

ing

dete

ctio

n

Rese

arch

pap

ers

[63]

[35]

[5]

[73]

[1]

[68]

[38]

[12]

[55]

Dat

a se

t D

ata

set s

ourc

e  P

hish

ing

APW

G a

rchi

ves

Phis

hTan

kM

anua

lPh

ishT

ank

Phis

hTan

kW

orld

Wid

e W

ebW

estP

acPh

ishT

ank

PIRT

repo

rt  L

egiti

mat

e–

Goo

gle

whi

telis

tM

anua

lA

lexa

, Yah

ooW

eb c

raw

ler

Wor

ld W

ide

Web

Wes

tPac

Com

mon

cra

wl

Goo

gle

sear

ch D

ata

set s

ize

  Phi

shin

g20

3 A

rchi

ves

200

web

site

s60

0 em

ails

100

UR

Ls36

11 w

ebsi

tes

279

web

site

s61

3048

em

ails

1 m

illio

n em

ails

30 sa

mpl

es  L

egiti

mag

e–

200

web

site

s40

0 em

ails

100

UR

Ls16

38 w

ebsi

tes

100

web

site

s46

25 e

mai

ls1

mill

ion

emai

ls50

0 sa

mpl

esFe

atur

es E

mai

l*

**

 Web

site

**

**

 UR

L*

**

* S

ocia

l med

ia D

NS

App

roac

h us

edRu

le-b

ased

, pat

-te

rn m

atch

ing

Mac

hine

lear

ning

Mac

hine

lear

ning

Rule

-bas

edM

achi

ne le

arni

ngM

achi

ne le

arni

ngM

achi

ne le

arni

ngM

achi

ne le

arni

ngB

lack

list

Alg

orith

m u

sed

Link

Gua

rdTS

VM

Nat

ural

lang

uage

pr

oces

sing

, W

ordn

et

TF-I

DF

Goo

gle

page

rank

Supp

ort v

ec-

tor m

achi

ne

(SV

M)

Dec

isio

n tre

esR

ando

m fo

rest,

LS

TMB

lack

list g

ener

ator

Perfo

rman

ce in

% F

PR–

–2

1–

––

–9

 FN

R–

–4

––

––

––

 Pre

cisi

on–

96.4

99.6

––

––

98.6

– R

ecal

l–

90.7

99.3

––

––

98.9

– A

ccur

acy

9695

.599

.490

–84

99.8

98.7

– F

-Mea

sure

––

––

––

–98

.7–

Page 10: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 10 of 18

SN Computer Science

User interface contains a toolbar and a wizard. The tool-bar normally interacts with the B-APT engine and provides the URL or HTML. B-APT engine decides the incoming URL is phishing or not. B-APT engine has three modules: document object model (DOM) analyzer, Whitelist module, and a scoring module. The DOM analyzer is a JavaScript program that has the ability to navigate a web site’s DOM. Later, the DOM analyzer matches the current domain with the whitelist, and it also verifies the presence of any input fields. If not there is no way that a user can enter their per-sonal credentials. If there is an input field on the page, then the HTML is tokenized and sent it to the scoring module. In the scoring model, it assigns some weights to the token using Bogofilter. Bogofilter checks for the number of times a par-ticular token is repeated and assigns the weight accordingly. These tokens help in detecting the phishing site accurately.

The author in paper [6] proposed a machine-learning-based approach for detecting malicious URLs in social net-works such as Twitter. The data collection is prepared using twitter API and filter the tweets that contain URLs. From that URLs, 12 features are extracted for initial assessment and later pre-processing is performed to improve the results. Random Forest, a supervised classifier, is used to classify whether it is phishing URL or not with recall value of 0.92.

In another work [14], the author proposed logo-based website detection scheme using machine learning which has two steps in this process. Logo extraction is the first step, where the images are extracted from the web page using a machine-learning technique. In the next step, the images are loaded in Google search engine and compares the domain information with that image for phishing detection.

Email metadata Email metadata is used to store the data in an email about the email [64]. Metadata is collected and stored as one file entry for each email and they use these data to cross verify whether the emails are correctly classified as Spam or not. Metadata contains a large number of fields, and for classifying the phishing email from the legitimate, we require only a few fields.

In paper [30], they used WEBCO’s (an email System) Metadata for phishing email classification in DROPBOXES with the following fields: Time stamp, Source IP address, SMTP “mail to”, SMTP “mail from”, From, Subject, and URLs. They followed three different ways to classify the phishing emails [30]:

• Direct identification of DROPBOXES in WEBCO.• Indirect identification of DROPBOXES in WEBCO.• Identifying the Source of DROPBOX email.

Pattern matching Pattern matching is normally used to detect the unknown phishing attacks. In pattern matching, the DNS information is verified to spot the malicious links. Sometimes, the DNS name in the URL is different with the Ta

ble

2 (c

ontin

ued)

Rese

arch

pap

ers

[63]

[35]

[5]

[73]

[1]

[68]

[38]

[12]

[55]

Lim

itatio

nsLi

nkG

uard

may

re

sult

in fa

lse

posi

tives

, sin

ce

usin

g do

tted

deci

mal

IP

addr

ess i

nste

ad

of d

omai

n na

mes

may

be

desi

rabl

e in

so

me

spec

ial

circ

umst

ance

s

Maj

or li

mita

tion

of T

SVM

is

that

it in

volv

es

an e

xpen

sive

m

atrix

inve

rse

oper

atio

n w

hen

solv

ing

the

dual

pr

oble

m

The

data

set s

ize

is sm

all.

The

mac

hine

-lear

n-in

g cl

assi

fier

need

s mor

e da

ta fo

r tra

inin

g th

e m

odel

to g

et

good

resu

lts

It fa

ils if

the

phis

her u

ses

a di

ffere

nt

lang

uage

oth

er

than

Eng

lish.

It

is a

tim

e-co

nsum

ing

proc

ess a

s it

choi

rs g

oogl

e ea

ch ti

me.

It

also

fails

in

the

follo

win

g ca

ses.

(a) U

sing

im

ages

in p

lace

of

text

, (b)

us

ing

invi

sibl

e te

xt, (

c) c

hang

-in

g th

e w

ords

to

con

fuse

the

syste

m

Goo

gle

page

rank

al

gorit

hm c

an’t

clas

sify

phi

sh-

ing

atta

cks

corr

ectly

if it

is

a ne

wly

regi

s-te

red

dom

ain

A sm

alle

r num

-be

r of m

isla

-be

led

exam

ples

ca

n dr

astic

ally

aff

ect D

NS

phis

hing

at

tack

s

They

con

side

red

only

one

par

t of

feat

ures

and

th

ey d

idn’

t ad

dres

s DN

S ph

ishi

ng

atta

cks

The

inne

r wor

ks

are

not e

asy

to

inte

rpre

t eas

ily

in L

STM

. The

ra

ndom

fore

st re

quire

d ex

pert

know

ledg

e fo

r fe

atur

e se

lec-

tion

Acc

urac

y in

de

tect

ing

new

ph

ishi

ng a

ttack

s is

bas

ed o

n th

e up

date

s re

ceiv

ed. I

t has

a

high

fals

e-po

sitiv

e ra

te

Page 11: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 11 of 18 11

SN Computer Science

DNS name in the sender information. Pattern matching com-pares these two names to identify the phishing URL.

In paper [63], LinkGuard algorithm is used for phishing detecting using pattern matching. Pattern matching can be done either by extracting the domain names from URL plus sender information, and if these two pieces of information do not match, it can be treated as phishing. It can also be done by manually storing the list of domain names and comparing the current domain name with that list to generate a similar-ity score. It helps to distinguish the phishing URLs from legitimate URLs. DontPhishMe [43] is a browser extension (Firefox) that uses pattern matching for Phishing detection.

Blockchain Blockchain-based solutions are good in detecting phishing attacks at the DNS level. As it maintains their own naming system and all the users can have a com-plete copy of information locally, any correction made can be automatically updated everywhere. Namecoin, Block-stack, Nebulis, Bitforest, and so on are the examples of blockchain-based naming system.

Namecoin [20] is developed by modifying the Bitcoin source code to store the information other than digital cur-rency. It is the first blockchain-based naming system that introduced merged mining (mining of more than one cryp-tocurrency) concept in the blockchain. Information stored in blockchain is like an open ledger that can be available to everyone in a decentralised manner. Data in the blockchian are immutable, so that unauthorized modifications are not possible.

In paper [9], the author proposed an alternative naming system called Blockstack by addressing the limitations in their previous work Namecoin [20]. Blockstack is a Block-chain-based naming and storage system. The main limitation of Namecoin is storage, which is addressed in Blockstack by providing a separate layer for storage. It splits stored domain information into zones and maintains them in a separate lay-ers. Layer 1 is used for a consensus of the data stored in blockchain. Layer 2 is a virtual layer for Blockstack opera-tions and maintains a virtual chain. Layer 3 is routing layer that helps in fetching data from the actual source and it sup-ports multiple storage providers. Layer 4 is the top layer, where the actual data are stored. These layers help in iden-tifying the data in a fast manner and it is the best alternative to the DNS-naming system.

Blacklist The blacklist-based approach maintains the list of phishing URLs. The blacklist maintains a list of known phishing URLs and checks whether the currently visiting URL is listed in the data set or not. Phishing data can be col-lected manually or from the third party. It helps in detecting the phishing in an easy and effective manner. A newly reg-istered domain cannot be identified more accurately unless the data set is updated more frequently [28, 41, 72].

Whitelist The whitelist does not maintain any phish-ing data. Instead, it maintains a list of all trusted websites’

information. Any URL that does not appear in the whitelist is treated as a suspicious. The Whitelist should maintain all the trusted site’s information. However, it is not easy to maintain all the legitimate sites in the web under one roof to decide the legitimacy of the web page [28, 64].

Domain popularity The domain popularity-based approach [30] works based on the certificate details, domain registration details, certificate authority, and so on. If the user clicks on the suspicious link, then the browser extension will send the link to the server that is under the control and extract the features such as domain name, validity, certifi-cate authority and verify this information from Google, and based on the results, the toolbar will alert the user.

Restricted form filling Restricted form filling [47, 67] is an anti-phishing browser extension that keeps track of user credentials and alerts the users when they try to enter that information in any fraudulent site. The credentials of the user are stored and protected with a master password. Next time when the user visits the site, there is no need to enter his/her credentials instead; instead just click on the Icon provided by the browser extension. Once you login to the Browser extension, later, you can simply log into any of the websites without entering the credentials again. The anti-phishing tools [47, 67] will maintain a database to store the login details of the users. These login details can be accessed from any system by simply installing the extension/toolbar.

Dummy content filling Dummy content filling [69] is a browser extension that helps the user to not fall victim to phishing. When the user visits the fraudulent site by ignoring the security alert, then the bogus bitter will split the creden-tials (S) into a set of S-1 bogus credentials; then, it starts submitting the credentials one by one with few milliseconds delay and validates the web page. If the user tries to click on the warning alert and get back to the original site, then the credentials are filled in the trusted site.

Layout similarity The layout-similarity-based approach works by comparing the layout of the web pages. This can be performed with the help of a domain object model (DOM), an internal representation of the web pages. Extracting the DOM tree from the web page can be achieved in two ways:

• Simple HTML tags.• Identifying the isomorphic sub-trees.

To detect the phishing sites, the DOM Tree is extracted from both the websites (i.e., current website and the original web-site). If both websites have the same layout, then the current website is a phishing site that replicates the layout of the original website. The DOM AntiPhish is an example of a layout-similarity-based approach. In DOM AntiPhish [51], the password is hashed and the DOM Tree of the website, where the user first entered his/her credentials is stored. Later, if the credentials are used on any site, it will compare

Page 12: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 12 of 18

SN Computer Science

the layout of the page to see whether the current website is phishing or not.

User website rating In user website rating [21, 44], the feedback to the website is collected from the user, and based on that feedback, the website’s trustworthiness is decided. When the customer visits the site, they have to rate the legiti-macy of the site, so they can classify the website accord-ing to the user response. They consider some other features including this to decide whether the website is a phishing site or not.

Crowdsourcing The web of trust (WOT) is a crowdsourc-ing-based browser extension that depends on the user rating to the website they visit [76]. It protects the user from the attacks that can only be identified by the human eye such as scams unreliable web stores and content with questions. WOT is a patented system, where the behaviour of the user is regularly observed and analyzed to justify the rating. The working of WOT is when the user search for some content in the search engine then the search result will be displayed with some indicators at the corner. Green color indicates the trusted site; yellow color indicates the doubtful and red color for suspicious sites.

Steganography-based In steganography approach, it uses novel robust message-based image steganography algorithm [61]. Pre-processing is the first step in RMIS technique which outputs the embedding sequence by converting binary values to decimal values. Next, the product of embedding sequence and image size (rows × column) gives the Stego-Key. Embedding phase hides the secret messages into the given cover image in such a way that the resultant Stego-image is not differentiable by human visual system (HVS). The extraction phase extracts the secret message embedded from the Stego-image by the same secret key as in embed-ding phase. Bank website who wishes to use Pixastic plug-in should incorporate the Stego-image generated from robust message-based image steganography embedding algorithm in their website.

One-time password The one-time password (OTP) is very important for the present financial security which helps to defend the session hijacking attacks and the valid customer has access to perform the transaction. The OTP was sent by the server to the customers during any transaction either to a mobile phone or email which is already registered to the concern bank account. If the OTP entered by the user matches, then only the bank allows the particular transac-tion [54]. The single password protocol (SPP) allows the customer to use the one-time password for their accounts. There are two one-time password protocols, namely, Lamp-ort’s one-time password and Rubin’s one-time password. These two protocols work prior to the operation of SSL.

The author in paper [32] proposed a visual cryptog-raphy technique for phishing website detection. Image-based verification is applied in this technique, i.e., the

original CAPTCHA into two shares: one is with the user and the other with the server. While authenticating, both the CAPTCHAs should appear simultaneously. Then, only the CAPTCHA will be used as a password. It helps in authenti-cation each other before connecting.

Watermarking Watermarking can be used for protecting the user not to enter the credentials for the fraudulent site. In this approach, they ask the user to select the watermarking image, the position of the watermarking image, the secret key is collected at the time of registration. Based on this information, a customer is identified uniquely. When the user tries to log in, first identifies the position, where the watermarking image is fixed and then enters the secret key to authenticate oneself [56].

DNS-based An advanced form of phishing, i.e., phishing without lure, is called pharming or DNS-based Phishing. To detect DNS-based phishing, we have to find whether the IP address provided by the DNS server is genuine or fake. In DNS attack, the phisher modifies the DNS entries of a targeted domain with phishers’ server IP address to redirect the traffic.

In paper [13], a database is maintained to store the Bank name, its DNS’s server IP and user personal credentials. If the personal credentials of the user are being entered in some other site, then an inverse DNS query is sent to the respective bank to confirm whether it is a domain of that bank or not. Then, only the device allows the transaction to be happening.

In paper [26], the author developed a dual approach to detect client-side pharming attacks. When a user request for a website, the DNS request is sent to two DNS servers, i.e., local DNS (default DNS) and third party DNS and checks whether the IP address given by the local DNS is included in the list of IP addresses obtained from third party DNS server and allow the user only if it matches, else it collects the source code of the current page and the original site (from Third party DNS) for the web content analysis [26]. A score is calculated and compared with the threshold value and the site is considered as a phishing site if it crosses the threshold value.

Hashing‑Based

Hashing techniques can be used to protect the user creden-tials by hashing the password, domain name, email ID, etc., that helps in verifying the site before providing the pass-word. There are password hashing techniques available for phishing detection, i.e., Passpet [67], PwdHash [53].

In paper [67], they provide a single password (master password) to manage multiple accounts. A user assigned pet name helps to identify the site uniquely. The password (mas-ter password) is generated using some hashing techniques.

Page 13: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 13 of 18 11

SN Computer Science

In paper [53], the hashing technique is applied to gener-ate a separate password to each site. The hashed password is generated by combining the domain and the password of that site. This is because if the password of one site is known will not affect the other site.

Existing Anti‑phishing Browser Extensions/Toolbars

Most of the anti-phishing solutions are available as a browser extension/toolbar. When the users install any anti-phishing toolbar/browser extension, it keeps monitoring the user activities and alerts the user when they try to access any suspicious links. There are few approaches that still at the research level, which is not fully evolved as a browser extension. In Fig. 5, the black colored rectangle boxes are the approaches that evolved as browser extensions and the pink colored rectangle boxes are the approaches still at the research level.

Maturity Level

The maturity level of anti-phishing approaches is catego-rized into two types:

Anti‑phishing Approaches that Evolved as Browser Extensions/Toolbars

The anti-phishing browser extension is very useful in pro-tecting Internet users from phishing attacks. There are dif-ferent types of anti-phishing solutions are available and each of them follows various approaches such as the blacklist, whitelist, heuristics, layout similarity, machine learning, and so on.

The anti-phishing browser extensions protect the Internet users from phishing scams. Some popular browser exten-sions/toolbars are listed in Table 3. It also includes the approach used, mode of operation, advantages and disad-vantages of these toolbars.

Anti‑phishing Approaches at Research Level

A lot of research is going on to find a better solution for the prevention of phishing attacks. Approaches such as water-marking, one-time password, and Email Metadata-based approaches are at research level and not fully evolved as browser extensions/toolbars. Corporate companies, Banks, Anti-Phishing Organization (APWG, PhishTank, PhishME, and so on) and many others are fighting against phishing. Machine learning, rule-based, and list-based approaches (blacklist, whitelist) are available as a browser extension and more research works are also available.

Mode of Operation

The anti-phishing toolbars work based on the data set used by different anti-phishing approaches to detect the phish-ing scams. Some toolbars maintain their own data set to check whether the given link is phishing or not. Few toolbars depend on some third party for phishing detection. Depend-ing on the anti-phishing approach and data set they used, the mode of operation can be classified as follows:

1. Stand-alone In stand-alone mode, the toolbars will main-tain their own database or predefined rules for decision making. From the locally available information, it clas-sifies the phishing and non-phishing content correctly. Antiphish, BogusBitter, PhishZoo, etc., are the example tools that work independently.

2. From server In this mode, the anti-phishing tools get the help from their own server to check whether the given website or URL is phishing or not. For example, main-taining an updated blacklist, whitelist to verify the mali-cious URL’s from the trusted one. TrustWatch, Pixastic, PhishProof, etc., are fully dependent on their server.

3. From third party In some cases, the anti-phishing tools must depend on some other third parties for better clas-sification. To verify the DNS information, domain validity, SSL certification, verifying the URLs from the blacklist through API, extracting the text from an image and many more. GoldPhish, LinkGuard, web of trust (WOT) are few works that come under this category.

Discussion

In this paper, a taxonomy of anti-phishing solutions is dis-cussed. The anti-Phishing solutions are broadly classified into content and non-content-based approaches are briefly explained. The raised research questions are answered below:

RQ1 What are the areas that current anti-phishing solu-tions address?

When compared to non-content-based approaches, con-tent-based approaches are better in detecting phishing. New phishing attacks are difficult to detect by Non-content-based approaches because of the delay in their updates. Content-based approaches such as rule-based and machine learning are good in detecting, but sometimes, machine-learning approaches may have high false-positive rates. Blockchain-based solutions (blockstack) are good in detecting DNS phishing (pharming). Different approaches use different anti-phishing algorithms for phishing detection. Mobile phish-ing, voice phishing, and social media phishing are the areas, where more research is required.

Page 14: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 14 of 18

SN Computer Science

Tabl

e 3

List

of e

xisti

ng a

nti-p

hish

ing

brow

ser e

xten

sion

s

S. n

o.N

ame

of th

e to

olba

rA

ppro

ach

used

Mod

e of

ope

ratio

nPR

OS

CON

S

1.A

ntiP

hish

[47]

Restr

icte

d fo

rm fi

lling

Stan

d-al

one

Ant

iPhi

sh d

etec

ts p

hish

ing

atta

cks c

or-

rect

ly if

it is

pur

ely

an H

TML

web

page

It re

quire

s man

ual i

nter

actio

n of

the

user

. G

ener

ates

fals

e al

arm

s2.

B-A

PT [3

7]M

achi

ne le

arni

ngSt

and-

alon

eIt

uses

mac

hine

-lear

ning

app

roac

h w

ith

DO

M a

naly

zer f

or p

hish

ing

dete

ctio

nB

-APT

is v

ulne

rabl

e to

web

site

spoo

fing

atta

ck3.

Bog

usB

itter

[69]

Dum

my

cont

ent fi

lling

Stan

d-al

one

It fe

eds a

larg

e nu

mbe

r of b

ogus

cre

den-

tials

to p

rote

ct th

e us

er c

rede

ntia

ls fr

om

the

phis

her

The

Phis

her u

ses fi

lterin

g te

chni

ques

to

colle

ct th

e cr

eden

tials

4.D

OM

Ant

iPhi

sh [5

1]La

yout

sim

ilarit

ySt

and-

alon

eTh

e br

owse

r aut

omat

ical

ly st

ores

the

user

pa

ssw

ord

by h

ashi

ng it

. If t

he p

assw

ord

is re

used

it w

ill g

ive

an a

lert

to th

e us

ers

Spoo

fed

web

pag

es w

ith si

mila

r im

ages

an

d vi

sual

look

s of t

he le

gitim

ate

site

to

fool

the

user

5.D

ynam

ic se

curit

y sk

in [1

6]V

isua

l Sim

ilarit

ySe

rver

The

user

has

to re

mem

ber a

imag

e an

d a

imag

e to

aut

hent

icat

e on

esel

f to

the

serv

er. T

o au

then

ticat

e, th

e us

er h

as to

pe

rform

a v

isua

l mat

chin

g

Ther

e is

a c

hanc

e of

leak

ing

the

verifi

er,

leak

of i

mag

es, v

isua

l con

tent

s can

be

spoo

fed

by th

e ph

ishe

r

6.eB

ayA

ccou

nt G

uard

[22]

Heu

ristic

, bla

cklis

tSe

rver

It al

low

s use

rs to

subm

it th

e su

spec

ted

site

s to

eBay

whi

ch c

an b

e ad

ded

to th

e th

eir b

lack

list

Onl

y ap

plic

able

to e

Bay

and

Pay

Pal s

ites

and

deni

al o

f ser

vice

atta

cks a

re p

ossi

ble

7.Fi

rePh

ish

[60]

Ope

n da

taba

seSe

rver

It m

aint

ains

its o

wn

data

base

to st

ore

the

phis

hing

site

for b

ette

r det

ectin

g th

e at

tack

s

They

hav

e to

mai

ntai

n th

eir o

wn

safe

and

ph

ishi

ng si

tes

8.G

oldP

hish

[19]

Vis

ual s

imila

rity

Third

par

tyPr

otec

ts fr

om z

ero-

day

phis

hing

Del

ays t

he re

nder

ing

of a

web

pag

e.

Goo

gle

Page

Ran

k al

gorit

hm is

vul

ner-

able

to n

ew p

hish

ing

atta

cks

9.iT

rustP

age

[50]

Bla

cklis

t, w

hite

list

Third

par

tyIt

is e

ffect

ive

and

easy

to u

sePh

ishi

ng p

ages

shou

ld b

e di

scov

ered

qu

ickl

y an

d ad

ded

to a

bla

cklis

t. Th

e B

lack

list a

lone

can

’t be

a b

ette

r sol

utio

n fo

r phi

shin

g de

tect

ion

10.

Link

Gua

rd [6

3]B

lack

list,

whi

ltelis

t, pa

ttern

mat

chin

gTh

ird p

arty

It de

tect

s kno

wn

and

unkn

own

atta

cks

with

an

accu

racy

of 9

6%. T

here

is n

o fa

lse

posi

tive

and

fals

e ne

gativ

es fo

r ca

tego

ry 1

Fals

e po

sitiv

es c

an p

ossi

ble

in c

ateg

ory

2 so

lutio

n in

the

case

of I

P ad

dres

s ver

ifi-

catio

n in

the

plac

e of

Dom

ain

nam

e

11.

McA

fee

site

adv

isor

[57]

Rat

ing

the

site

with

thei

r ow

n te

stsSe

rver

McA

fee

mai

ntai

ns th

eir o

wn

data

base

th

at u

ses a

utom

atic

cra

wle

rs th

at se

arch

th

e si

tes a

nd p

erfo

rm te

sts a

nd in

clud

es

in th

e da

taba

se

It is

vul

nera

ble

to d

etec

t phi

shin

g si

tes

with

em

bedd

ed o

bjec

ts

12.

Mic

roso

ft sm

art s

cree

n fil

ter [

40]

Bla

cklis

t, he

urist

ics

Serv

erIt

prov

ides

add

ition

al se

curit

y at

the

netw

ork

leve

l. It

also

pro

tect

s fro

m

mal

icio

us a

ttach

men

ts li

ke k

eylo

gger

s

It m

ay b

e vu

lner

able

to n

ewly

cre

ated

ph

ishi

ng a

ttack

s if t

he b

lack

list n

ot re

gu-

larly

upd

ated

Page 15: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 15 of 18 11

SN Computer Science

Tabl

e 3

(con

tinue

d)

S. n

o.N

ame

of th

e to

olba

rA

ppro

ach

used

Mod

e of

ope

ratio

nPR

OS

CON

S

13.

Net

craf

t [44

]B

lack

list,

heur

istic

s, us

er ra

ting

Stan

d-al

one

It al

low

s phi

shin

g si

te fe

ed, p

rovi

des

phis

hing

ale

rts, m

appi

ng o

f cur

rent

ph

ishi

ng a

ttack

s

The

info

rmat

ion

like

site

rank

, IP

addr

ess,

web

serv

er, n

et-b

lock

ow

ner,

and

last

chan

ges m

ade

can

help

the

phis

her i

n m

any

way

s14

.Pa

sspe

t [67

]Re

stric

ted

form

filli

ngSe

rver

Allo

ws t

he u

ser t

o re

mem

ber o

nly

pass

-w

ord

to lo

g in

with

mul

tiple

syste

ms

Vul

nera

ble

to p

harm

ing

atta

ck. T

he

phis

her c

an st

eal t

he c

rede

ntia

ls o

f no

n-SS

L pr

otec

ted

site

s by

hija

ckin

g. It

is

als

o vu

lner

able

to o

fflin

e di

ctio

nary

at

tack

s15

.Ph

ishP

roof

[70]

Bla

cklis

t, w

hite

list,

heur

istic

sSe

rver

Phis

hPro

of u

ses t

hree

leve

ls o

f sec

urity

. It

aler

ts th

e us

ers o

n ph

ishi

ng si

tes.

Use

r inp

ut is

not

requ

ired.

Use

r can

al

so re

port

phis

hing

site

s

It ca

nnot

pro

tect

the

user

s fro

m m

alw

are

16.

Phis

hTan

k Si

te C

heck

er [6

2]O

pen

data

base

Serv

erIt

bloc

ks th

e us

ers f

or th

e si

tes w

hich

are

al

read

y re

porte

d as

phi

shin

g in

thei

r op

en d

atab

ase

New

phi

shin

g at

tack

s bec

ome

diffi

cult

to

dete

ct u

nles

s the

dat

abas

e is

upd

ated

fr

eque

ntly

. It i

s slo

w, b

ecau

se th

e us

ers

have

to re

port

the

site

as p

hish

ing

17.

Phis

hZoo

[4]

Con

tent

sim

ilarit

ySe

rver

Phis

hZoo

cre

ates

thei

r ow

n tru

sted

pro-

files

with

legi

timat

e si

tes u

sing

a fu

zzy

hash

ing

tech

niqu

e to

det

ect p

hish

ing

Phis

hZoo

is v

ulne

rabl

e to

web

site

spoo

fing

atta

ck

18.

Pixa

stic

[61]

Steg

ano-

grap

hy-b

ased

Serv

erRo

bust

mes

sage

-bas

ed im

age

stegn

ogra

-ph

y al

gorit

hm is

use

d to

hid

e th

e se

cret

im

age

and

prot

ect t

he u

sers

not

to e

nter

th

e pe

rson

al c

rede

ntia

ls in

phi

shin

g w

ebsi

tes

Vul

nera

ble

to D

NS

spoo

fing

atta

ck, b

rute

fo

rce

atta

ck, a

nd p

rint s

cree

n is

als

o po

ssib

le

19.

Spoo

fGua

rd [1

5]H

euris

tics

Stan

d-al

one

The

adva

ntag

e of

this

tool

bar i

s sto

ping

th

e ou

tgoi

ng d

ata

to p

hish

ing

site

s by

perfo

rmin

g im

age

chec

k an

d pa

ssw

ord

chec

k

It sh

ows a

fals

e al

arm

whe

n th

e us

er v

isits

th

e le

gitim

ate

site

for t

he fi

rst t

ime

20.

Spoo

fStic

k [3

9]–

Stan

d-al

one

The

user

can

cha

nge

the

appe

aran

ce o

f th

e to

olba

r bec

ause

of i

ts u

ser-f

riend

-lin

ess a

nd th

ey a

ddre

ss th

e gr

aphi

cs

prop

erty

Vul

nera

ble

to if

ram

es a

ttack

if th

e us

er

open

s mul

tiple

win

dow

s, w

hile

surfi

ng

21.

The

Earth

link

tool

bar [

21]

Heu

ristic

s, us

er ra

ting

Serv

erIt

rela

ys o

n th

e co

mbi

natio

n of

heu

ristic

s, us

er ra

tings

and

man

ual v

erifi

catio

n.

Tool

bar d

ispl

ays a

thum

b to

indi

cate

w

heth

er th

e si

te is

phi

shin

g or

not

No

aler

t mes

sage

is d

ispl

ayed

for u

sers

. U

ser r

atin

gs p

rodu

ce m

ore

fals

e al

arm

s

22.

Trus

tWat

ch [2

7]B

lack

list

Serv

erTr

ustW

atch

pro

vide

s a p

erso

nal s

ecur

ity

ID to

pre

vent

the

tool

bar s

poofi

ng. I

t is

easy

to u

se

Vul

nera

ble

to n

ewly

cre

ated

phi

shin

g at

tack

s if t

he d

atab

ase

is n

ot u

pdat

ed

regu

larly

Page 16: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 16 of 18

SN Computer Science

RQ2 Do the existing anti-phishing toolbars cover all the phishing attacks?

Whenever the researcher provides a solution to the prob-lem, the attacker comes up with a new trick as it is like a race. Most of the anti-phishing toolbars work on any specific type of attacks. BogusBitter [69] is a toolbar that fills the bogus credentials to the phishing site to prevent the user credentials from phisher. However, with a simple filtering technique, the phisher filters the information. Web of trust (WOT) [76] is Crowdsourcing-based technique that depends on user rating. If a single user rates the site a suspicious, the result will change drastically. Few toolbars use heuristics and blacklist for phishing detection. However, they may fail in detecting new phishing scams if the update delayed. Most of the Internet users are not aware of many phishing attacks. The performance of the anti-phishing toolbar depends on the approach and data set they used.

RQ3 What are the current research gaps in anti-phishing?Anti-phishing solutions help Internet users to accurately

identify the phishing attack. More works have been done on email phishing detection and website phishing detection and are published in many online sources. Social media phishing is difficult to detect due to its changing nature. Identifying fake news, fake offers, malicious attachments, links, and fake profiles makes the social media phishing complicated in detecting. As [8] said, fewer works have been done in instant messaging, social media, voice, blogs, and web forums.

Conclusion

In the above literature survey, we discussed phishing, anti-phishing, a complete classification of anti-phishing solu-tions, evolution roadmap of anti-phishing solutions, consoli-dated feature list for phishing detection, and a list of existing anti-phishing toolbars. Anti-phishing solutions can be classi-fied into two categories, i.e., (i) content-based and (ii) non-content-based approaches. Content-based approaches work by analyzing the content of the web page, Email, and URL. Non-content-based approaches use non-content features such as a blacklist, whiltelist, and so on. Different anti-phishing approaches use different algorithms for phishing detection. These algorithms have been listed in Table 2 with their per-formance metrics, data sets, and limitations. All the anti-phishing approaches are not evolved as a browser extension, but there are few approaches at research level are listed. The approaches at the research level and fully evolved browser extensions are distinguished with two different colors. The pros and cons of the existing anti-phishing toolbar are also listed. From the study, it infers that existing anti-phishing approaches focus only specific type of attacks. Mobile phish-ing, voice phishing, and social media phishing are the areas, where more research is required.Ta

ble

3 (c

ontin

ued)

S. n

o.N

ame

of th

e to

olba

rA

ppro

ach

used

Mod

e of

ope

ratio

nPR

OS

CON

S

23.

Veris

ign

EV g

reen

bar

ext

ensi

on [2

4]D

omai

n po

pula

rity

Serv

erIt

dete

cts t

he p

hish

ing

site

s by

verif

ying

th

e SS

L ce

rtific

ates

of t

he si

teIt

only

iden

tifies

SSL

cer

tifica

tes g

iven

by

Ver

iSig

n, n

ot th

e ot

her v

alid

SSL

ce

rtific

ates

24.

Virt

ual b

row

ser e

xten

sion

[46]

Bla

cklis

t, he

urist

ics,

visu

al si

mila

rity

Third

par

tyA

lerts

the

user

s if t

he si

te is

not

pre

sent

in

the

whi

telis

t the

y ar

e m

aint

aini

ngV

ulne

rabl

e to

key

-logg

ers,

scre

en lo

gger

s, an

d cl

ient

-sid

e sc

riptin

g at

tack

25.

Web

of t

rust

(WO

T) [7

6]B

lack

list,

crow

dsou

rcin

gTh

ird p

arty

The

repu

tatio

n of

the

site

is sh

own

next

to

the

sear

ch re

sults

. Ver

y us

er-f

riend

lyA

sing

le ra

ting

from

a p

erso

n ca

n m

ake

the

site

uns

afe,

bec

ause

it d

epen

ds o

n us

er

ratin

gs

Page 17: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:11 Page 17 of 18 11

SN Computer Science

Funding This study was not funded by anyone.

Compliance with Ethical Standards

Conflict of interest The authors declare that they have no conflict of interest.

References

1. Abunadi A, Akanbi O, Zainal A. Feature extraction process: a phishing detection approach. In: Intelligent systems design and applications (ISDA), 2013 13th international conference on. IEEE. 2013. pp. 331–335.

2. Abutair HY, Belghith A. Using case-based reasoning for phishing detection. Proc Comput Sci. 2017;109:281–8.

3. Adewumi OA, Akinyelu AA. A hybrid firefly and support vec-tor machine classifier for phishing email detection. Kybernetes. 2016;45(6):977–94. https ://doi.org/10.1108/K-07-2014-0129.

4. Afroz S, Greenstadt R. Phishzoo: detecting phishing websites by looking at them. In: 2011 IEEE fifth international conference on semantic computing. 2011. https ://doi.org/10.1109/ICSC.2011.52.

5. Aggarwal S, Kumar V, Sudarsan S. Identification and detection of phishing emails using natural language processing techniques. In: Proceedings of the 7th international conference on security of information and networks. ACM, ACM, Glasgow, Scotland UK. 2014. p. 217.

6. Al-Janabi M, Quincey E, Andras P. Using supervised machine learning algorithms to detect suspicious URLs in online social networks. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017. ASONAM ’17, ACM, New York, NY, USA. 2017. pp. 1104–1111. https ://doi.org/10.1145/31100 25.31162 01.

7. Alam S, El-Khatib K. Phishing susceptibility detection through social media analytics. In: Proceedings of the 9th international conference on security of information and networks. SIN ’16, ACM, New York, NY, USA. 2016. pp. 61–64. https ://doi.org/10.1145/29476 26.29476 37.

8. Aleroud A, Zhou L. Phishing environments, techniques, and coun-termeasures: a survey. Comput Secur. 2017;68:160–96.

9. Ali M, Nelson JC, Shea R, Freedman MJ. Blockstack: a global naming and storage system secured by blockchains. In: USENIX annual technical conference. 2016. pp. 181–194.

10. AlShboul R, Thabtah F, Abdelhamid N, Al-diabat M. A visu-alization cybersecurity method based on features’ dissimilarity. Comput Secur. 2018;77:289–303.

11. Anti-Phishing Working Group. Phishing Activity Trends Report 1 Quarter. Most, no. March, 2018. pp. 1–12.

12. Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, González FA. Classifying phishing urls using recurrent neural networks. In: 2017 APWG symposium on electronic crime research (eCrime). 2017. pp. 1–8. https ://doi.org/10.1109/ECRIM E.2017.79450 48.

13. Bin S, Qiaoyan W, Xiaoying L. A DNS based anti-phishing approach. In: Networks security wireless communications and trusted computing (NSWCTC), 2010 second international con-ference on. vol. 2. IEEE. 2010. pp. 262–265.

14. Chiew KL, Chang EH, Sze SN, Tiong WK. Utilisation of website logo for phishing detection. Comput Secur. 2015;54:16–26. https ://doi.org/10.1016/j.cose.2015.07.006.

15. Chou N, Ledesma R, Teraguchi Y, Mitchell JC, Ca S. Client-side defense against web-based identity theft. In: NDSS 2004.

16. Dhamija R, Tygar JD. The battle against phishing: dynamic security skins. In: Proceedings of the 2005 symposium on usa-ble privacy and security. ACM, 2005. pp. 77–88.

17. Dou Z, Khalil I, Khreishah A, Al-Fuqaha A, Guizani M. Systematization of knowledge (SoK): a systematic review of software-based web phishing detection. IEEE Commun Surv Tutor. 2017;19(4):2797–819. https ://doi.org/10.1109/COMST .2017.27520 87.

18. Dua S, Du X. Data mining and machine learning in cybersecu-rity. Boca Raton: CRC Press; 2016.

19. Dunlop M, Groat S, Shelly D. Goldphish: using images for content-based phishing analysis. In: 2010 Fifth international conference on internet monitoring and protection. 2010. pp. 123–128. https ://doi.org/10.1109/ICIMP .2010.24.

20. Durham V. Namecoin. 2011. https ://namec oin.info. Accessed Sept 2018.

21. Earthlink: Spam Blocker. 1994. http://www.earth link.net/elink /issue 95/secur ity_archi ve.html. Accessed Oct 2018.

22. eBay Toolbar and Account Guard. http://pages .ebay.in/help/accou nt/toolb ar-accou nt-guard .html. Accessed 5 Oct 2018.

23. Fette I, Sadeh N, Tomasic A.: Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web. ACM; 2007. pp. 649–656. https ://doi.org/10.1145/12425 72.12426 60. Accessed May 2019.

24. Firefox: Verisign for firefox. 2007. https ://addon s.mozil la.org/en-US/firef ox/addon /veris ignev -green -bar-exten sio/. Accessed Aug 2018.

25. Gastellier-Prevost S, Granadillo GG, Laurent M. Decisive heuristics to differentiate legitimate from phishing sites. In: 2011 Conference on Network and Information Systems Security. IEEE; 2011. pp. 1–9. https ://doi.org/10.1109/SAR-SSI.2011.59313 89.

26. Gastellier-Prevost S, Granadillo GG, Laurent M. A dual approach to detect pharming attacks at the client-side. In: New technologies, mobility and security (NTMS), 2011 4th IFIP international conference on. IEEE. 2011. pp. 1–5.

27. GeoTrust: TrustWatch Toolbar. https ://www.geotr ust.com/comca sttoo lbar/. Accessed Nov 2018.

28. Gupta BB, Tewari A, Jain AK, Agrawal DP. Fighting against phishing attacks: state of the art and future challenges. Neural Comput Appl. 2017;28(12):3629–54.

29. Hajgude J, Ragha L. Phish mail guard: phishing mail detection technique by using textual and url analysis. In: 2012 World con-gress on information and communication technologies. 2012. pp. 297–302. https ://doi.org/10.1109/WICT.2012.64090 92.

30. Herzberg A, Jbara A. Security and identification indicators for browsers against spoofing and phishing attacks. ACM Trans Internet Technol. 2008;8(4):1–36. https ://doi.org/10.1145/13919 49.13919 50.

31. Jagatic TN, Johnson NA, Jakobsson M, Menczer F. Social phishing. Commun ACM. 2007;50(10):94–100.

32. James D, Philip M. A novel anti phishing framework based on visual cryptography. In: 2012 International conference on power, signals, controls and computation. 2012. pp. 1–5. https ://doi.org/10.1109/EPSCI CON.2012.61752 28.

33. Jeeva SC, Rajsingh EB. Intelligent phishing URL detection using association rule mining. Hum Centric Comput Inf Sci. 2016;6(1):10.

34. Laorden C, Ugarte-Pedrero X, Santos I, Sanz B, Bringas PG. Enhancing scalability in anomaly-based email spam filtering. In: Proceedings of the 8th annual collaboration, electronic mes-saging, anti-abuse and spam conference. CEAS ’11, ACM, New York, NY, USA, 2011. pp. 13–22. https ://doi.org/10.1145/20303 76.20303 78.

35. Li Y, Xiao R, Feng J, Zhao L. A semi-supervised learning approach for detection of phishing webpages. Optik Int J Light Electron Opt. 2013;124(23):6027–33.

Page 18: Classification of Anti-phishing Solutions...35] [5] [73] [1] [68] [38] [12] [55] Dataset Datasetsource Phishing APWGarchives PhishTank Manual PhishTank PhishTank WorldWideWeb WestPac

SN Computer Science (2020) 1:1111 Page 18 of 18

SN Computer Science

36. Li Y, Yang L, Ding J. A minimum enclosing ball-based support vector machine approach for detection of phishing websites. Optik Int J Light Electron Opt. 2016;127(1):345–51.

37. Likarish P, Jung E, Dunbar D, Hansen TE, Hourcade JP. B-apt: Bayesian anti-phishing toolbar. In: Communications, 2008. ICC’08. IEEE international conference on. IEEE. 2008. pp. 1745–1749.

38. Ma L, Ofoghi B, Watters P, Brown S. Detecting phishing emails using hybrid features. In: Ubiquitous, autonomic and trusted computing, 2009. UIC-ATC’09. Symposia and workshops on. IEEE. 2009. pp. 493–497.

39. Majorgeeks: SpoofStick. 2004. http://www.major geeks .com/files /detai ls/spoof stick _for_inter net_explo rer.html. Accessed Nov 2018.

40. Microsoft: Microsoft Smart Screen Filter. https ://suppo rt.micro soft.com/en-in/help/17443 /windo ws-inter net-explo rer-smart scree n-filte r-faq. Accessed Oct 2018.

41. Mishra M, Jain A. Anti-phishing techniques: a review. 2012;2(2):350–5.

42. Mohammad RM, Thabtah F, McCluskey L. Intelligent rule-based phishing websites classification. IET Inf Secur. 2014;8(3):153–60.

43. MYCERT: About DontPhishMe toolbar. 2010. http://www.broth ersof t.com/dontp hishm e-39095 1.html. Accessed Dec 2018.

44. Netcraft: Netcraft Toolbar. 2004. http://toolb ar.netcr aft.com/. Accessed Dec 2018.

45. Purkait S. Phishing counter measures and their effectiveness-lit-erature review. Inf Manag Comput Secur. 2012;20(5):382–420.

46. Purkait S. Preventing phishing attacks with virtual browser extension. IUP J Inf Technol. 2013;9(3):7.

47. Raffetseder T, Kirda E, Kruegel C. Building anti-phishing browser plug-ins: an experience report. In: Proceedings of the third international workshop on software engineering for secure systems. IEEE Computer Society. 2007. p. 6.

48. Rathore S, Loia V, Park JH. Spamspotter: an efficient spammer detection framework based on intelligent decision support sys-tem on facebook. Appl Soft Comput. 2018;67:920–32. https ://doi.org/10.1016/j.asoc.2017.09.032.

49. Rathore S, Sangaiah AK, Park JH. A novel framework for internet of knowledge protection in social networking ser-vices. J Comput Sci. 2018;26:55–65. https ://doi.org/10.1016/j.jocs.2017.12.010.

50. Ronda T, Saroiu S, Wolman A. iTrustPage: pretty good phishing protection. Toronto: University of Toronto; 2007.

51. Rosiello APE, Kirda E, Kruegel, Ferrandi, F. A layout-similar-ity-based approach for detecting phishing pages. In: 2007 third international conference on security and privacy in communica-tions networks and the workshops—SecureComm 2007. 2007. pp. 454–463. https ://doi.org/10.1109/SECCO M.2007.45503 67.

52. Rosiello A. Anti-phishing security strategy.: Black Hat Briefing. 2008. pp. 1–31. https ://www.black hat.com/prese ntati ons/bh-europ e-08/Rosie llo/Prese ntati on/bh-eu-08-rosie llo.pdf.

53. Ross B, Jackson C, Miyake N, Boneh D, Mitchell JC. Stronger password authentication using browser extensions. In: Proceed-ings of the 14th conference on USENIX security symposium—vol. 14. SSYM’05, USENIX Association, Berkeley, USA. 2005. pp. 2–2. http://dl.acm.org/citat ion.cfm?id=12513 98.12514 00. Accessed Oct 2018.

54. San Martino A, Perramon X. A model for securing e-banking authentication process: antiphishing approach. In: Services-part I, 2008. IEEE Congress on. IEEE. 2008. pp. 251–254.

55. Sharifi M, Siadati SH. A phishing sites blacklist generator. In: 2008 IEEE/ACS international conference on computer systems and applications. 2008. pp. 840–843. https ://doi.org/10.1109/AICCS A.2008.44936 25.

56. Singh AP, Kumar V, Sengar SS, Wairiya Manoj EVV, Thomas G, Lumban Gaol F. Detection and prevention of phishing attack using

dynamic watermarking. In: Information technology and mobile communication. Berlin: Springer; 2011. pp. 132–137.

57. SiteAdvisor: MCAfee Site Advisor. 2006. https ://en.wikip edia.org/wiki/McAfe e_SiteA dviso r. Accessed July 2018.

58. Smadi S, Aslam N, Zhang L. Detection of online phishing email using dynamic evolving neural network based on reinforcement learning. Decis Support Syst. 2018;107:88–102. https ://doi.org/10.1016/j.dss.2018.01.001.

59. Sonowal G, Kuppusamy K. Phidma—a phishing detection model with multi-filter approach. J King Saud Univ Comput Inf Sci. 2017;. https ://doi.org/10.1016/j.jksuc i.2017.07.005.

60. Sureshkumar A, Palanisamy S, Sowmiya RAS. Data isolation and pro-tection in online social networks. In: 2013 International conference on information communication and embedded systems (ICICES). 2013. pp. 150–155. https ://doi.org/10.1109/ICICE S.2013.65082 28.

61. Thiyagarajan P, Mahindra VPV. Pixastic: steganography based anti-phihsing browser plug-in. J Internet Bank Commerce. 2012;17(1):1–19.

62. Ulevitch D. PhishTank site checker. 2006. https ://addon s.mozil la.org/en-US/firef ox/addon /phish tank-sitec hecke r/.

63. Naresh U. Intelligent phishing website detection and prevention system by using link guard algorithm. IOSR J Comput Eng IOSR-JCE. 2013;14(3):28–36.

64. Vaishnaw N, Tandan SR. A bird’s eye view of anti-phishing tech-niques for classification of phishing e-mails. Int J Res Appl Sci Eng Technol. 2015;3(6):263–75.

65. Vishwanath A. Getting phished on social media. Decis Support Syst. 2017;103:70–81. https ://doi.org/10.1016/j.dss.2017.09.004.

66. Wang R, Zhu Y, Tan J, Zhou B. Detection of malicious web pages based on hybrid analysis. J Inf Secur Appl. 2017;35:68–74.

67. Yee KP, Sitaker K. Passpet: convenient password management and phishing protection. In: Proceedings of the second symposium on Usable privacy and security. ACM. 2006. pp. 32–43.

68. Ying P, Xuhua D. Anomaly based web phishing page detection. n: 2006 22nd Annual Computer Security Applications Conference (ACSAC’06). IEEE, 2006. pp. 381–390. https ://doi.org/10.1109/ACSAC .2006.13

69. Yue C, Wang H. Bogusbiter: a transparent protection against phishing attacks. ACM Trans Internet Technol (TOIT). 2010;10(2):6.

70. Zahid T. An anti-phishing tool: Phishproof. Ph.D. thesis, Univer-sity of Manchester. 2012.

71. Zhang H, Liu G, Chow TW, Liu W. Textual and visual con-tent-based anti-phishing: a Bayesian approach. IEEE Trans Neural Netw. 2011;22(10):1532–46. https ://doi.org/10.1109/TNN.2011.21619 99.

72. Zhang Y, Egelman S, Cranor LF, Hong J. Phinding phish: evaluat-ing anti-phishing tools. 2006.

73. Zhang Y, Hong JI, Cranor LF. Cantina: A content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on world wide web. WWW ’07, ACM, New York, NY, USA, 2007. pp. 639–648. https ://doi.org/10.1145/12425 72.12426 59.

74. Zhang N, Yuan Y. Phishing detection using neural network—CS229 lecture notes. 2012.

75. Zhou Y, Zhang Y, Xiao J, Wang Y, Lin W. Visual similarity based anti-phishing with the combination of local and global features. In: Proceedings—2014 IEEE 13th international conference on trust, security and privacy in computing and communications, TrustCom 2014, 2014. pp. 189–196. https ://doi.org/10.1109/Trust Com.2014.28.

76. Zimmermann P. Web of trust (WOT). 1992. https ://www.mywot .com/en/about us. Accessed May 2018.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.