COMP 655:Distributed/Operating
SystemsSummer 2011
Dr. Chunbo ChuWeek 10: Web
04/19/23 1Distributed Systems - COMP 655
04/19/23 Distributed Systems - Comp 655 2
The Web• Origin and overview of the web• Drill-down on distributed system aspects
– Communication– Processes– Naming– Synchronization– Replication (especially caching)– Fault tolerance– Security
04/19/23 Distributed Systems - Comp 655 3
Origin of the web• CERN (European particle physics lab)• Purpose: facilitate document sharing
– Large user community– Geographically dispersed
• Founder: Tim Berners-Lee• Use exploded in late 90’s
– Graphical user interfaces (Mosaic and descendants)
– Huge amounts of content– Search engines– Interactive pages
04/19/23 Distributed Systems - Comp 655 4
Definition of the Web• Many standards
– HTML– HTTP– DNS– URL, URI, URN– XML– DOM
• W3C• IETF
04/19/23 Distributed Systems - Comp 655 5
A word about RFCs• Standards track
– Proposed standard– Draft standard (at least two independent
and interoperable implementations)– Internet standard (also has STD number,
for example IP is STD-005 and RFC-0791)• “Off-track”
– Experimental– Informational– Historic(al)
See RFC 2026 for details
04/19/23 Distributed Systems - Comp 655 6
Yet more words about RFCs
Before using an RFC,• check the Obsolete RFC list• or find it on the Active RFC list
I use the RFC index at faqs.org because I find it a bit easier to use than the IETF’s list. Remember, if there’s a conflict, IETF is the authority.
04/19/23 Distributed Systems - Comp 655 7
Overall structure
04/19/23 Distributed Systems - Comp 655 8
What’s in a web page?
Client-side script
04/19/23 Distributed Systems - Comp 655 9
Some web pages are XML
04/19/23 Distributed Systems - Comp 655 10
XML document type definition
04/19/23 Distributed Systems - Comp 655 11
Other document types
04/19/23 Distributed Systems - Comp 655 12
CGI – early Web interaction
04/19/23 Distributed Systems - Comp 655 13
Problems with CGI• Process per request• Wide variety in server-side runtime
environments
• Solutions– Server-side scripting (JSP, ASP, PHP)– Servlets
04/19/23 Distributed Systems - Comp 655 14
Problems with browsers• Browser-based user interfaces tend
to be clunky and limited
• Solutions:– Client-side scripting– Applets– More recently, AJAX
• An example: http://www.javarss.com/ajax/j2ee-ajax.html
• See http://en.wikipedia.org/wiki/AJAX for more information
04/19/23 Distributed Systems - Comp 655 15
Server-side scripts and servlets
04/19/23 Distributed Systems - Comp 655 16
Nothing’s perfect• What Web technology has big
problems with server-side page generation?
04/19/23 Distributed Systems - Comp 655 17
Communcation on the web: HTTP
• TCP-based client/server protocol– Create connection– Send request– Send response– Close connection
• HTTP 1.1 reduces connection overhead with persistent connections
04/19/23 Distributed Systems - Comp 655 18
HTTP connections
non-persistent persistent
04/19/23 Distributed Systems - Comp 655 19
HTTP request types
04/19/23 Distributed Systems - Comp 655 20
HTTP request example
GET /xyzzy HTTP/1.1Connection: Keep-AliveAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, application/x-shockwave-flash, */*Accept-Language: en-usHost: laptop:1215If-Modified-Since: Sun, 27 Jun 2004 00:58:28 GMTUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
type path protocol
headers
04/19/23 Distributed Systems - Comp 655 21
HTTP header types
04/19/23 Distributed Systems - Comp 655 22
Processes• Browsers• Proxies• Apache web server framework
04/19/23 Distributed Systems - Comp 655 23
Browser with plug-in
04/19/23 Distributed Systems - Comp 655 24
Web proxy
Most browsers today support ftp. However, proxies are still used for shared caching.
04/19/23 Distributed Systems - Comp 655 25
Apache
www.apache.org
04/19/23 Distributed Systems - Comp 655 26
Server cluster – simple minded
Web Server Clusters
A scalable content-aware cluster of Web servers
04/19/23 Distributed Systems - Comp 655 28
Web naming
URI
URL
URN
04/19/23 Distributed Systems - Comp 655 29
URI examples from RFC 2396
ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services
gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services
http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services
mailto:[email protected] -- mailto scheme for electronic mail addresses
news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles
telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET Protocol
More examples on page 670
04/19/23 Distributed Systems - Comp 655 30
Naming – URL – how to access
04/19/23 Distributed Systems - Comp 655 31
Naming – URN – true resource identifier
RFC 2648 defines a URN namespace for IETF documents. RFC 2141 defines URN syntax. RFC 3406 is a BCP (Best Current Practice) for defining URN namespaces.
04/19/23 Distributed Systems - Comp 655 32
Activity – hitting a web page
• Check your understanding: draw a UML sequence diagram showing the interaction of key software elements when a browser hits a web page containing graphics
• Assume the web page and the images are on different servers
• “Classes” in the diagram should include– Browser– DNS resolver– DNS server– Server for the page– Server for the images
04/19/23 Distributed Systems - Comp 655 33
Not much to synchronize …
• Generally, web clients don’t exchange information with other clients, and servers don’t exchange with other servers
• Most documents have a single author – few write/write conflicts
• However, WebDAV is a simple locking and versioning scheme– Locks are connection-independent– Handling abandoned locks is left to
implementation
04/19/23 Distributed Systems - Comp 655 34
Replication – client and proxy
Virtually all browsers can cache
Many organizations run proxy serversSome proxies
can cooperate
04/19/23 Distributed Systems - Comp 655 35
Security on the Web
If using client authentication
NOTE: using both public and private key encryption, for performance reasons
NOTE: client has to use same server for entire session