![Page 1: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/1.jpg)
25 January 2011 Kaiser: COMS E6125 1
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2011Spring 2011
![Page 2: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/2.jpg)
25 January 2011 Kaiser: COMS E6125 2
Today’s Topic• Basic Web Mechanics
– URI– HTTP– Client/Server Intermediaries
![Page 3: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/3.jpg)
25 January 2011 Kaiser: COMS E6125 3
What is a “URI”?• Uniform Resource Identifier• Compact string of characters for
identifying an abstract or physical resource
• Conforms to a simple and extensible format
• Example: http://bank.cs.columbia.edu/classes/cs6125
![Page 4: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/4.jpg)
25 January 2011 Kaiser: COMS E6125 4
What is a “Resource”?• Some piece of information that can be
identified by a URI• The most common kind of resource is a
file• But may also be a dynamically-
generated query result, the output of a script, a document available in several languages or formats, etc.
![Page 5: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/5.jpg)
25 January 2011 Kaiser: COMS E6125 5
Uniform Resource Identifier• Uniform: aka Universal - same string can be
used with same semantic interpretation, even when mechanisms used to access the resource differ
• Resource: Conceptual mapping to an entity or set of entities - not necessarily the entity that corresponds to that mapping at any particular instance in time
• Identifier: An object that can act as a reference to something that has identity
![Page 6: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/6.jpg)
25 January 2011 Kaiser: COMS E6125 6
Key Requirement: Transcribability
• May be transcribed from non-network source
• Often needs to be remembered by people• Should consist of characters that are most
likely to be able to be typed into a computer, within the constraints imposed by keyboards (and related input devices) across languages and locales
![Page 7: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/7.jpg)
25 January 2011 Kaiser: COMS E6125 7
Why do we usually say URL rather than URI?
• A Uniform Resource Locator (URL) refers to the subset of URIs that identify resources via a representation of their primary access mechanism (i.e., their network “location”)
• Most popular form of URI
![Page 8: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/8.jpg)
25 January 2011 Kaiser: COMS E6125 8
What’s a URI that’s not a URL?
• URN = Uniform Resource Name• Subset of URIs that denote a resource
independent of its current location, the name by which it is known, or the mechanism by which it is accessed
• Required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable
• Thus not necessarily “retrievable”
![Page 9: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/9.jpg)
25 January 2011 Kaiser: COMS E6125 9
URN vs. URL Example• Assume a published book (the resource)• The ISBN (International Standard Book Number)
is a 10-digit number that uniquely identifies books and book-like products published internationally - this is the URN
• The entire contents of the book might be placed on a Web server at http://www.xyz.com/book.gz and an Ftp server at
ftp://ftp.xyz.com/book.gz - both of these are URLs
• All of these are URIs
![Page 10: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/10.jpg)
25 January 2011 Kaiser: COMS E6125 10
URI Syntax• <scheme>:<scheme-specific-
part> • For a URL, the scheme indicates the protocol
employed for retrieval (http, ftp, file, mailto, etc.)
• More generally, a scheme is a specification for defining the syntax and semantics of the rest of the URI
• Extensible because new schemes can be defined, with their own scheme-specific format after the colon (:)
![Page 11: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/11.jpg)
25 January 2011 Kaiser: COMS E6125 11
URL Notation• <scheme>://<authority><path>?
<query>
typically, an Internet domainname
specific to the authority, identifies the resource within
the scope of the scheme and authority
a string of information to be interpreted
by the resource
![Page 12: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/12.jpg)
25 January 2011 Kaiser: COMS E6125 12
What’s a “domain name”?
• Domain Name System (DNS)– Maps domain names to IP addresses and vice versa – Hierarchy of DNS servers for top level domains
(.com, .edu, .uk, etc.), second level domains (columbia.edu, ibm.com, etc.), and so on
– Eventually finds IP address for individual host (e.g., bank.cs.columbia.edu)
– DNS servers cache responses based on TTL = Time to Live
• Originated ~1982, e.g., for email (gk60@CMUA -> [email protected] -> [email protected])
![Page 13: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/13.jpg)
25 January 2011 Kaiser: COMS E6125 13
Example URLs• http://www.ietf.org/rfc/rfc3986.txt • gopher://seanm.ca/00/nerd/gopher-
manifesto.txt
• mailto:[email protected]
• telnet:bank.cs.columbia.edu
![Page 14: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/14.jpg)
25 January 2011 Kaiser: COMS E6125 14
Relative URLs• Allows document trees to be independent of
their location and scheme• A single set of hypertext documents can be
simultaneously traversable via each of the ftp, http and file schemes
• Such document trees can be moved, as a whole, without changing any of the relative references
• Resolved to full (absolute) URLs using a base URL
![Page 15: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/15.jpg)
25 January 2011 Kaiser: COMS E6125 15
Example Relative URLs• http://somehost/absolute/URL/with/absolute/
path/to/resource.txt• /relative/URI/with/absolute/path/to/
resource.txt• relative/path/to/resource.txt• ../../../resource.txt• resource.txt• /resource.txt#frag01• #frag01• [empty string]
![Page 16: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/16.jpg)
25 January 2011 Kaiser: COMS E6125 16
URI “Standard”• URI is an Internet protocol element
defined currently in RFC 3986 (2005)• Originally RFC1630 (1994)
![Page 17: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/17.jpg)
25 January 2011 Kaiser: COMS E6125 17
What is an “RFC”?• Request for Comments • One of a series, begun in 1969, of
numbered informational documents and standards followed by commercial software and freeware in the Internet and Unix communities
• All Internet standards are recorded in RFCs
![Page 18: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/18.jpg)
25 January 2011 Kaiser: COMS E6125 18
Who keeps track of RFCs?
• IETF = Internet Engineering Task Force• Open, all-volunteer organization, with no
formal membership or membership requirements
• Organized into a large number of working groups, each dealing with a specific topic
• April 1st RFCs, e.g., http://www.apps.ietf.org/rfc/rfc3514.html
![Page 19: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/19.jpg)
25 January 2011 Kaiser: COMS E6125 19
What is “W3C”?• World Wide Web Consortium defines data
formats and usage conventions as well as Internet protocols relevant to Web
• Members pay fees depending on country, revenues and non-profit/for-profit status
• Otherwise organized similar to IETF, but writes “Recommendations” instead of “Requests for Comments”
• http://www.w3.org/
![Page 20: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/20.jpg)
25 January 2011 Kaiser: COMS E6125 20
Back to URLs• Most Web documents use the “http”
scheme (or “https” = http over TLS/SSL)
• What is “http” (HyperText Transfer Protocol)?
![Page 21: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/21.jpg)
25 January 2011 Kaiser: COMS E6125 21
HTTP = HyperText Transfer Protocol
• Most Web documents use the “http” scheme, the default Internet protocol used to deliver data on WWW
• Usually through TCP/IP sockets on port 80, but can use any port and can be implemented on top of any reliable networking protocol
• A Web browser (HTTP client) sends requests to an Web server (HTTP server), which sends responses back to the client
![Page 22: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/22.jpg)
25 January 2011 Kaiser: COMS E6125 22
What’s “TCP/IP”?• IP = Internet Protocol
– Delivers individual packets from one host to another, based on their IP address (in IPv4, four 8-bit octets as in 128.59.11.100)
– Network routers direct traffic of IP packets• Analogous to telephone numbers (area code
plus exchange plus 4 digits plus extension) and postal address (zip code plus street name plus building number plus apartment number)
![Page 23: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/23.jpg)
25 January 2011 Kaiser: COMS E6125 23
What’s “TCP/IP”?• TCP = Transmission Control Protocol
– Provides an abstraction of reliable, bidirectional connections for the delivery of IP packets to a particular port at a given IP address
– The so-called well known ports (< 1024) are reserved for specific protocols (telnet, ftp, smtp, pop3, imap, etc.)
– By default, HTTP uses port 80; this can be changed in the URL
– http://www.foo.com:2011/doc.html• Main alternative is UDP = User Datagram
Protocol, no connection, no reliable delivery (used by DNS)
![Page 24: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/24.jpg)
25 January 2011 Kaiser: COMS E6125 24
HTTP History• HTTP/0.9 (1990) - simple protocol for raw data
transfer• HTTP/1.0 (1996) - allows MIME-like messages,
containing meta-information about the resources transferred and modifiers on the request/response semantics
• HTTP/1.1 (1999) – lots of practical improvements, e.g., caching policies, chunked encoding, persistent connections
• W3C closed activity but IETF still has a working group to revise
![Page 25: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/25.jpg)
25 January 2011 Kaiser: COMS E6125 25
What is “MIME”?• Multipurpose Internet Mail Extensions• Standard representation for “complex”
message bodies (numerous RFCs since 1993)
• Examples include messages with embedded graphics or audio clips, messages with file attachments, messages in Japanese or Russian, signed messages
![Page 26: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/26.jpg)
25 January 2011 Kaiser: COMS E6125 26
MIME Header Fields• Mime-Version, Content-Type, Content-
Transfer-Encoding, Content-Description, Content-ID, Content-Location, Content-Disposition, Part Body
• Discrete (text, image, audio) and Multipart (mixed, digest) content types
![Page 27: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/27.jpg)
25 January 2011 Kaiser: COMS E6125 27
HTTP Properties• Uses URLs for identifying Web
resources• Request-response – always initiated by
client to server (never vice versa), the server responds with results
• Stateless – each request-response pair independent from every other, so any state information (login credentials, shopping carts, etc.) needs to be encoded somehow
![Page 28: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/28.jpg)
25 January 2011 Kaiser: COMS E6125 28
HTTP Request/Response
HTTPrequest
Port 80
ResponseOther port
Processing
HTTP C
lien
t
• Web server processes HTTP requests, generally over TCP Port 80
• The request specifies a resource URL
• The server parses the URL and processes the request:– Returns a document with
its type information– Invokes a program or
script, and returns its output
• The output (including metadata) is sent back to the client as a response message
![Page 29: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/29.jpg)
25 January 2011 Kaiser: COMS E6125 29
HTTP Requests
• Small number of request types (GET, POST, HEAD, etc.)
• Request may contain additional information, e.g. client info, parameters for forms, cookies, etc.
• Consists of a start-line, zero or more headers (one per line), an empty line (CRLF) indicating the end of the header fields, and possibly a message-body
![Page 30: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/30.jpg)
25 January 2011 Kaiser: COMS E6125 30
HTTP Responses• Larger number of response codes
(200 OK, 404 NOT FOUND)• Message body only allowed with
certain response status codes• Includes MIME metadata as well as
“payload” (data)
![Page 31: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/31.jpg)
25 January 2011 Kaiser: COMS E6125 31
Start Line• HTTP Version (0.9, 1.0, 1.1)• URI• Method (request) or Status Code
(response)
![Page 32: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/32.jpg)
25 January 2011 Kaiser: COMS E6125 32
Sample HTTP Exchange• To retrieve the file at the URL
http://bank.cs.columbia.edu• First open a socket to the host
bank.cs.columbia.edu, port 80 (use the default port because none is specified in the URL)
Connect to 128.59.11.100 on port 80 ... ok
![Page 33: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/33.jpg)
25 January 2011 Kaiser: COMS E6125 33
Sample• Then, send something like the following through the
socket: GET / HTTP/1.1[CRLF]
Host: bank.cs.columbia.edu[CRLF] Connection: close[CRLF] User-Agent: Web-sniffer/1.0.37 (+http://web-sniffer.net/)[CRLF] Accept-Encoding: gzip[CRLF] Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF] Cache-Control: no-cache[CRLF] Accept-Language: de,en;q=0.7,en-us;q=0.3[CRLF] Referer: http://web-sniffer.net/[CRLF]
[CRLF]
![Page 34: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/34.jpg)
25 January 2011 Kaiser: COMS E6125 34
• The server should respond with something like the followingHTTP Status Code: HTTP/1.1 403 Forbidden[CRLF] Content-Length:218[CRLF] Content-Type:text/html[CRLF] Server:Microsoft-IIS/6.0[CRLF] X-Powered-By:ASP.NET[CRLF] Date: Sat, 22 Jan 2011 14:024:22 GMT[CRLF] Connection:close[CRLF]<html><head><title>Error</title></
head><body><head><title>Directory Listing Denied</title></head>[LF] <body><h1>Directory Listing Denied</h1>This Virtual Directory does not allow contents to be listed.</body></body></html>
Sample
![Page 35: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/35.jpg)
25 January 2011 Kaiser: COMS E6125 35
Some Request Headers• User-Agent: identifies the program that's
making the request, in the form "Program-name/x.xx", where x.xx is the alphanumeric version of the program (e.g., browser)– User-Agent: Mozilla/5.0 (Windows; U;
Windows NT 5.1; de; rv:1.9) Gecko/2008052906 Firefox/3.0
• Referer: the URL of the previous webpage from which a link was followed– Referer: http://web-sniffer.net/
![Page 36: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/36.jpg)
25 January 2011 Kaiser: COMS E6125 36
Some Response Headers
• Server: analogous to User-Agent:, identifies the server software in the form "Program-name/x.xx"– Server: Apache/2.2.8 (Ubuntu)
• Last-Modified: gives the modification date of the resource that's being returned, e.g., for use in caching – Use Greenwich Mean Time, in the format
Last-Modified: Sat, 22 Jan 2011 14:46:32 GMT
![Page 37: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/37.jpg)
25 January 2011 Kaiser: COMS E6125 37
HTTP URIs• Up to some bounded length (often
255), or “unbounded”, status code 414 (Request-URI Too Long)
• Equivalence comparisonhttp://abc.com:80/~smith/home.htmlhttp://ABC.com/%7Esmith/home.htmlhttp://ABC.com:/%7esmith/home.html
![Page 38: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/38.jpg)
25 January 2011 Kaiser: COMS E6125 38
Request Messages• Method SP Request-URI SP HTTP-
Version CRLF • GET http://www.gailkaiser.org• Equivalent to client making TCP
connection to bank.cs.columbia.edu on port 80, then sending GET / Host: www.gailkaiser.org
• Host field allows for virtual hosts
![Page 39: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/39.jpg)
25 January 2011 Kaiser: COMS E6125 39
What is a “virtual host”?
• Enables the same machine to host multiple domain names, sometimes at the same IP address (name-based virtual hosting)
• Important for website hosting (e.g., www.foo.com maps to /www/foo/site1 and www.bar.com maps to /www/bar/site2), but usually there can be only one secure https website per IP address/port
![Page 40: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/40.jpg)
25 January 2011 Kaiser: COMS E6125 40
GET• Retrieve whatever information (in the form of
an entity) is identified by the URL• If the URL refers to a data-producing process,
it is the produced data (given the input parameters after the “?”, if any) that is returned as the entity in the response - not the source text of the process (unless that text happens to be the output of the process)
http://foo.com/run.cgi?name1=val1&name2=val2
![Page 41: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/41.jpg)
25 January 2011 Kaiser: COMS E6125 41
Conditional and Partial GET
• Conditional if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field
• Partial if the request message includes a Range header field
• Don’t retrieve data the client doesn’t need (e.g., at least the part already up to date in cache)
![Page 42: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/42.jpg)
25 January 2011 Kaiser: COMS E6125 42
HEAD• Identical to GET except that the server
must not return a message-body in the response - only returns headers
• Often used for testing hypertext links for validity and modification
• Can mark cache entries as stale if certain header information changes (e.g., length, last-modified)
![Page 43: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/43.jpg)
25 January 2011 Kaiser: COMS E6125 43
POST• Used to request that the origin server
accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line
• Actual function performed by the POST method is determined by the server, usually dependent on the Request-URI
![Page 44: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/44.jpg)
25 January 2011 Kaiser: COMS E6125 44
POST supports several functions
• Annotation of an existing resource• Posting a message to a bulletin board,
newsgroup, mailing list, or similar group of articles
• Providing a block of data, such as the result of submitting a form, to a data-handling process
• Extending a database through an append operation
![Page 45: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/45.jpg)
25 January 2011 Kaiser: COMS E6125 45
POST vs. GET• GET can only be used to send relatively
small amounts of data to a server, with the data following the ? character
• The rest of the request-URI (before the ?) refers to some kind of processing program
GET /run.cgi?name1=val1&name2=val2 HTTP/1.0
![Page 46: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/46.jpg)
25 January 2011 Kaiser: COMS E6125 46
PUT and DELETE
• Often unsupported (501 Not Implemented)• PUT requests that the enclosed entity be
stored under the supplied Request-URI – May create a new resource at a new URI, or modify
an existing resource already at that URI• DELETE requests that the origin server delete
the resource identified by the Request-URI– May be overridden, e.g., by human intervention,
even if status code indicates successfully completed• Effectively supplanted by WebDAV
![Page 47: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/47.jpg)
25 January 2011 Kaiser: COMS E6125 47
OPTIONS and TRACE• OPTIONS allows the client to determine the
requirements associated with a resource, or the capabilities of a server (OPTIONS *), without implying a resource action or initiating a resource retrieval
• TRACE used to invoke application-layer loop-back of the request message, allowing the client to see what is being received at the other end of the request chain for testing or diagnostic information
![Page 48: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/48.jpg)
25 January 2011 Kaiser: COMS E6125 48
HTTP Responses• HTTP-Version SP Status-Code SP
Reason-Phrase CRLF • Example: HTTP/1.0 404 Not Found • Status code: 3-digit integer result code
of the attempt to understand and satisfy the request
• Response phrase: short textual description of the Status-Code
![Page 49: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/49.jpg)
25 January 2011 Kaiser: COMS E6125 49
Response Messages• Larger number of response codes
(200 OK, 404 NOT FOUND)• Message body only allowed with
certain response status codes• Includes MIME metadata as well as
“payload” (data)
![Page 50: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/50.jpg)
25 January 2011 Kaiser: COMS E6125 50
Status Codes• Applications need only understand first digit, treat
others as equivalent to x00• 1xx: Informational - Request received, continuing
process ("100" : Continue, relevant to persistent connections in HTTP 1.1)
• 2xx: Success - The action was successfully received, understood and accepted ("200" : OK)
• 3xx: Redirection - Further action must be taken in order to complete the request ("300" : Multiple Choices)
• 4xx: Client Error - The request contains bad syntax or cannot be fulfilled ("400" : Bad Request)
• 5xx: Server Error - The server failed to fulfill an apparently valid request ("500" : Internal Server Error)
![Page 51: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/51.jpg)
25 January 2011 Kaiser: COMS E6125 51
HTTP is “Stateless”• Server doesn’t remember anything about
client between connections• Not even between requests during the same
persistent connection, except TCP data• So how does HTTP support “remembering” the
user during a session or across sessions?• Some state can be encoded in complex URLs
or otherwise in the web page itself (e.g., query strings added to links, hidden form fields)
• Or saved on client in “cookies”
![Page 52: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/52.jpg)
25 January 2011 Kaiser: COMS E6125 52
Cookies• String associated with a name/domain/path, stored at the
browser • Series of name-value pairs, interpreted by the web
application• Create in HTTP response with “Set-Cookie: ” (or “Set-Cookie2: ”)
• In all subsequent requests to this site, until cookie’s expiration, the client sends the HTTP header “Cookie: ” (or “Cookie2: ”)
• Often have an expiration (otherwise expire when browser closed)
• Various technical, privacy and security issues (e.g., inconsistent state after using “back” button, third-party cookies, cross-site scripting)
![Page 53: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/53.jpg)
25 January 2011 Kaiser: COMS E6125 53
Cookie Example• Set-Cookie: name=newvalue;
expires=date; path=/; domain=.example.org
• Set-Cookie: RMID=732423sdfs73242; expires=Sat, 31-Dec-2011 23:59:59 GMT; path=/; domain=.example.net
![Page 54: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/54.jpg)
25 January 2011 Kaiser: COMS E6125 54
HTTP Request/Response
• In HTTP 1.0, a connection is established by the client prior to each request and closed by the server after sending the response
• Either party may close the connection prematurely, due to user action, automated time-out, or program failure
• Closing of the connection by either or both parties always terminates the current request, regardless of its status
• But TCP connections are expensive…
![Page 55: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/55.jpg)
25 January 2011 Kaiser: COMS E6125 55
HTTP 1.1 “Persistent Connection”
• Many Web pages consist of several files on the same server
• If an HTTP 1.1 client sends multiple (pipelined) requests through a single connection, the server should send responses back in the same order
• Intermediate responses "100" : Continue
![Page 56: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/56.jpg)
25 January 2011 Kaiser: COMS E6125 56
How does the connection finally get
closed?
• If a request includes the "Connection: close" header, that request is the final one for the connection and the server should close the connection after sending the response
• The server should also close an idle connection after some timeout period
![Page 57: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/57.jpg)
25 January 2011 Kaiser: COMS E6125 57
Advantages of Persistent Connections
• Requests and responses can be pipelined - a client makes multiple requests without waiting for each response
• Network congestion reduced by fewer packets for TCP opens, and by allowing TCP sufficient time to determine the congestion state of the network
• Latency on subsequent requests is reduced since there is no time spent in theTCP connection’s opening handshake
![Page 58: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/58.jpg)
25 January 2011 Kaiser: COMS E6125 58
Basic HTTP Architecture
![Page 59: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/59.jpg)
25 January 2011 Kaiser: COMS E6125 59
Intermediary
• Program sitting in the path between HTTP clients and servers
• Acts as a server to clients and as a client to origin servers or other intermediaries
![Page 60: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/60.jpg)
25 January 2011 Kaiser: COMS E6125 60
Purposes of Intermediaries
– Reduce communication cost– Lower the latency perceived by the
client– Reduce the load on the network– Reduce the load on the Web server– Implement security for an organization– Translate requests to various servers
![Page 61: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/61.jpg)
25 January 2011 Kaiser: COMS E6125 61
Proxy
• Forwarding agent• Receives request, rewrites all or
parts of the message, and forwards the reformatted request toward the server identified by the URI
![Page 62: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/62.jpg)
25 January 2011 Kaiser: COMS E6125 62
Gateway• Receiving agent• Acts as a layer above some other server(s)
and, if necessary, translates the requests to the underlying server's protocol
• Example: Web mail accessing an IMAP server– A URL identifies the mail server, mailbox,
password– Converts the HTTP request to an IMAP
request, gets the IMAP response, converts it to HTTP response
![Page 63: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/63.jpg)
25 January 2011 Kaiser: COMS E6125 63
Tunnel• Relay point between two connections
without changing the message• Looks at the first line of the HTTP
message to locate the host to be contacted and accept the request
• Simply relays bits between the two connection points
• Does not parse or interpret messages • Used when the communication needs to
pass through a firewall
![Page 64: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/64.jpg)
25 January 2011 Kaiser: COMS E6125 64
Transcoder• Modifies data as it passes to clients, e.g.,
to filter ads, reduce image sizes, compress content
• Particularly useful for wireless and/or constrained devices– Convert HTML to XHTML MP– Modify content to fit small screen– Convert modality of interaction, e.g., driving
directions from displaying text to playing audio
![Page 65: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/65.jpg)
25 January 2011 Kaiser: COMS E6125 65
Caching
• Request/response chain is shortened if one of the participants along the chain has a cached response applicable to request
![Page 66: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/66.jpg)
25 January 2011 Kaiser: COMS E6125 66
HTTP 1.1 Caching Support
• Allows a server to determine caching policies in its response– Expires xx-xx-xx yy:yy:yy.yy– Cache-Control: no-store – don’t cache at all– Cache-Control: no-cache – validate every time
or don’t cache– Cache-Control: private – can’t keep in a public
cache
• Secure sessions (https) generally not cached
![Page 67: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/67.jpg)
25 January 2011 Kaiser: COMS E6125 67
HTTP 1.1 Chunked Encoding
• Faster response for dynamically-generated pages or very large pages
• Allows the beginning of a response to be sent before its total length is known
• Each chunk is prefixed by its size in bytes• A zero size chunk indicates the end of the
response message• If a server is using chunked encoding it must set
the Transfer-Encoding header to "chunked"
![Page 68: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/68.jpg)
25 January 2011 Kaiser: COMS E6125 68
Summary• Clients (browsers) often implement
many schemes• Technically, only http scheme is World
Wide Web• But many of the more recent schemes
also associated with the Web• Clients do not always talk directly to
origin servers indicated in URLs
![Page 69: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/69.jpg)
18 January 2011 Kaiser: COMS E6125 69
First Assignment: Logistics
• Due Tuesday February 1st by 10am• Two pages (not including optional
figures and required reference list)• Submit by posting in Paper Proposals
folder on CourseWorks• Must be in a format I can read, which
means pdf, word, html, plain ascii text (with all figures embedded or viewable in a browser without special “plugins”)
![Page 70: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/70.jpg)
18 January 2011 Kaiser: COMS E6125 70
First Assignment: Paper Proposal
• Sketch the topic you have in mind• Include tentative reference list (specific
background reading to learn more about the topic)
• Some general topic areas suggested at http://bank.cs.columbia.edu/classes/cs6125/topics.htm, or invent your own
![Page 71: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/71.jpg)
18 January 2011 Kaiser: COMS E6125 71
First Assignment:“Goal” of Paper
• Do not simply survey some topic • Compare this to that, argue a position
in favor or against something, evaluate something according to some meaningful criteria, etc.
• Explain why your topic is relevant to this course
![Page 72: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/72.jpg)
18 January 2011 Kaiser: COMS E6125 72
First Assignment: Background Reading
• List some specific materials you intend to read to learn about the topic– Scholarly papers from conferences or journals– White papers– Third-party reviews or commentaries (blogs ok)– System documentation– Specifications of "standards" (or proposed
standards)– Not advertising or publicity brochures– Not wikipedia
• Should include materials from at least two different points of view (e.g., do not get all your background information from the same website)
![Page 73: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/73.jpg)
18 January 2011 Kaiser: COMS E6125 73
Upcoming Assignments:
Paper• Paper outline due Tuesday February
14th • Full paper due Tuesday March 9th
![Page 74: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/74.jpg)
18 January 2011 Kaiser: COMS E6125 74
Student Presentations• Individual ~10 minute talk in class• Schedule will be assigned (posted next
week)• One paragraph proposal, due Tuesday
February 15th • May be based on paper, project, or
some other topic
![Page 75: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/75.jpg)
18 January 2011 Kaiser: COMS E6125 75
Heads Up on Project• Project Proposal due Tuesday March 9th • Optionally work in teams (see
http://bank.cs.columbia.edu/classes/cs6125/team_advice.htm)
• Build a new system or extend an existing system
• OR evaluate/compare one or more existing system(s)
• You may "continue" your paper topic towards the project, or do something entirely different
![Page 76: 25 January 2011Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011](https://reader035.vdocument.in/reader035/viewer/2022081516/5697bff81a28abf838cbf581/html5/thumbnails/76.jpg)
25 January 2011 Kaiser: COMS E6125 76
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2011Spring 2011