mm-wt/1 web technology (an overview) miroslav milinovic croatian academic and research network -...
TRANSCRIPT
MM-WT/1
Web technology(an overview)
Web technology(an overview)
Miroslav MilinovicCroatian Academic and Research Network - CARNet
Zagreb, Croatia<[email protected]>
6th CEENet Workshop on Network Technology, Budapest, Hungary, August 2000.
MM-WT/2
Content
• History and statistics
• Basic concepts and components
• How Web works?
• URI - resource identification
• HTTP - Web and protocols
• Markup story - HTML and beyond
• Active Web pages
• Recent development
• Summary
MM-WT/3
Computer networkComputer network
MM-WT/4
Information networkInformation network
MM-WT/5
History
• invented by Tim Berners Lee (1989, CERN, CH)
• “enabling reference of any network-accessible information by a single universal document identifier”
• 1990 - hypertext editor
• 1991 - Web server (info.cern.ch), and textual browser
• 1993 - Mosaic browser by NCSA
• 1994 - World Wide Web Consortium (W3C) founded; first WWW conference
MM-WT/6
Statistics
• 75% of bytes, 70% of packets on the Internet
• growing rapidly since 1993
• size estimates:– by Lawrence and Giles (1999) - paper on accessibility of info
800 millions web pages; 15 TB (6TB) data; 83% -.com; 6% -.edu
– by BrightPlanet.com (2000) - White paper on “deep” Web400-550 times lager then “surface” Web; 7500 TB of data
• e-commerce as a driving force
MM-WT/7
What it the Web?What it the Web?
• distributed, multimedia information service based on hypertext– distributed: information located on hosts around the world
– multimedia: information includes text, graphics, sound, video
– hypertext: hypertext techniques used to enable access to the information
• provides access to networked resources– provide access to Web resources as well as FTP, News, …
– brings together whole range of services
MM-WT/8
Web - important conceptsWeb - important concepts
• Web = protocol + language + naming infrastructure
• HTTP - HyperText Transport Protocol– defines communication between WWW client and server
• HTML - HyperText Markup Language– markup language for preparing the WWW documents
• URL - Uniform Resource Locator– resource address = unique identifier
MM-WT/9
Web software components
• main components: clients, servers, proxies
• other components: gateways, caches, …
• client = user-agent; sends request, processes response
• (origin) server = sends requested resources
• proxy = acts as a server to client and client to server
• cache = holds temporary copies of resources to conserve bandwidth and ensure better response time
MM-WT/10
How Web works?How Web works?
users (clients) browse
Internet
(WWW)
WWW servers
authors write HTML
resources
(HTML files)
MM-WT/11
Web model with proxyWeb model with proxy
servers
proxy
clients
MM-WT/12
Client
• user-agent
• program that establishes connections for sending requests and processes responses
• can be:– as simple as telnet www.srce.hr 80
– browser (MS IE, Netscape, Opera, Amaya, ...)
– spider/robot or any other program that “can talk with server”
MM-WT/13
BrowserBrowser
• retrieve (display if possible) various resources
• can be:– text-only (Lynx, ...)
– graphic (MSIE, Netscape, ...)
• there are some differences in displaying HTML documents between different clients
• can display a variety of formats– TEXT, GIF, JPEG, ...
MM-WT/14
BrowserBrowser
• has multiprotocol support– HTTP, FTP, LDAP, GOPHER, NNTP, SMTP, POP, ...
• can automatically launch helper application (viewer)to handle some data formats (sound, video, postscript, MS applications, ...)
• “plug-in” extensions can be used to extend browser capabilities (3D animation, various graphics formats, ...)
• has it’s own cache (memory and disk)
MM-WT/15
Server
• general purpose data delivery vehicle
• a program (daemon, httpd):– responds to an incoming TCP connection
and provides a service to the client
– runs independently
• Web servers:– do NOT validate HTML code (parse documents)
– do NOT check links
– follow MIME rules (without checking file content)
• Web site = host + Web server + information (file system)
MM-WT/16
Proxy
• intermediary acting as server to clients and client to servers
• makes requests on behalf of the clients– request can be terminated or modified
• often associated with caches (caching proxy)
• functions:– funnels requests form many clients to server
– caching
– transformation of requests/responses
– filtering requests/responses
– providing anonymity to clients
MM-WT/17
Proxy types
• (explicit) proxy– users are aware of the proxy
– browsers must be configured to use proxy (IP and port number)
• transparent (interception) proxy– users are not aware of the proxy
– no need to configure browsers
– implemented by intercepting traffic
• caching proxy– proxy with caching function
MM-WT/18
Internet resources identification
• URI - Uniform Resource Identifier (RFC 2396)– URL - Uniform Resource Locator (RFC 1630)
• identify objects accessed with existing protocols
• PURL - Persistent URL
– URN - Uniform Resource Name (RFC 1737)• globally unique, persistent identifiers
– URL identifies the location of a resource identified by a URN
• URC - Uniform Resource Characteristics– data about the networked resource
– metadata = data about data
MM-WT/19
URL - locating Internet resourcesURL - locating Internet resources
• URL is unique identifier for Internet resources
• indicates:– means of access
– location
• simple syntax:protocol://host_name[:port_num][/path][/file_name]
• example:http://www.ceenet.org/constitution.html
MM-WT/20
HTTPHTTP
• application-level protocol
• stateless
• supports: – use of URL’s
– Internet media types (MIME types: RFC2045-RFC2049)
• allows access to different data formats
• standards:– HTTP/1.0 (RFC 1945), HTTP/1.1 (RFC 2616, 06.99.)
MM-WT/21
HTTP
• HTTP is a simple protocol:1. Client finds out that it should use HTTP protocol
2. Client opens TCP connection to the server info.nowhere.hr on the port 8000 (or if not specified on the default port 80)
protocol server name port directory/file name on the server
http://info.nowhere.hr:8000/directory/file.html
MM-WT/22
What does the protocol specify?
• grammar, character sets, content coding
• data types: entity, resource, message
• request methods and response codes
• headers: general, request, response, entity
• requirements for clients, proxies and servers
• connections
• caching needs, content negotiation and security considerations (RFC 2617)
MM-WT/23
Resource, entity, message
• resource:– networked data object or service identified by a URI (URL)
– may be available in several representations (language, …)
• entity:– representation of a resource in payload of a request or
response
– has possible entity-header and entity-body
• message:– unit of communication
– structured sequence of octets transmitted via the transport connection
MM-WT/24
Request methods
Method Client Server
GET request send header & data
HEAD request send header
POST request & data receive data(pass to CGI script)
PUT request & data receive data(store as requested)
Other methods: – DELETE
– TRACE, CONNECT, OPTIONS (HTTP 1.1)
MM-WT/25
Headers• provide additional info about request criteria and response
properties
• general headers:• applicable to request and response
• Connection, Date, Pragma, ...
• request headers:• Accept-Encoding, User-Agent, If-Modified-Since, …
• response headers:• Age, Location, WWW-Authenticate, ...
• entity headers:• Content-Encoding, Content-Length, Last-Modified, ...
• headers are hop-by-hop or end-to-end (HTTP 1.1)
MM-WT/26
Server status codes• three digit numbers grouped as follows:
1xx - informational
2xx - client request successful
200 - OK
3xx - request redirected
304 - Not Modified
4xx - client errors (request incomplete)
403 - Forbidden
404 - Not found
5xx - server errors
501 - Not Implemented
MM-WT/27
HTTP request
• user-agent initiates requests with:– request method (GET in 95% of requests)
– URI
– protocol version
– MIME-like message containing
• request modifiers (If-Modified-Since, ...)
• client information (User-Agent, …)
• possible body content
MM-WT/28
HTTP response
• server responds with:– status line with message’s protocol version
– server status code
– MIME-like message containing:
• server information
• entity meta-information
• possible entity-body content
MM-WT/29
Client - server communicationClient - server communication
• simple client request (entered manually)telnet www.srce.hr 80
Trying 161.53.2.69...
Connected to regoc.srce.hr.
Escape character is '^]'.
GET /index.html HTTP/1.0
ACCEPT: */*
USER-AGENT: manually entered HTTP
(blank line)
MM-WT/30
Client - server communicationClient - server communication
• server response:HTTP/1.0 200 OK
Date: Tue, 29 Jul 1997 12:56:15 GMT
Server: Apache/1.1.3
Content-type: text/html
Content-length: 2320
Last-modified: Fri, 22 Nov 1996 10:07:27 GMT
(blank line)
(content - document source)
MM-WT/31
Typical transaction on the Webbrowser
DNS serverURL
origin server
1. DNS lookup
2. TCP connection
3. HTTP request
4. HTTP response
optional parallel connections
MM-WT/32
WebDAV
• WWW Distributed Authoring and Versioning
• an extension of HTTP 1.1.
• provides infrastructure for asynchronous collaborative authoring across the Internet
• RFC 2518, 02.1999.– HTTP Extensions for Distributed Authoring
• supported by MS and Apache
• webDAV home page– http://www.ics.uci.edu/pub/ietf/webdav/
MM-WT/33
WebDAVti
me
client server
File Open
File Close
File Save
LOCK
PROPFIND
GET
PUT
UNLOCK
MM-WT/34
HTMLHTML• HTML is the “native language” of the WWW
• HTML file = Web page
• SGML (Standard Generalized Markup Language)• ISO standard
• HTML is SGML application
• SGML Document Type Definition (DTD)<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN” "http://www.w3.org/TR/REC-html40/strict.dtd">
• standards:– HTML 1.0, 2.0, 3.0, 3.2, 4.0, … , XHTML 1.0
– browser extensions (Netscape, MS IE, ...)
– other MLs: VRML, SMIL, MathML, XML, ...
MM-WT/35
HTML syntaxHTML syntax• HTML document contains markup tags
<H1> Example </H1>
• tags are case insensitive<H1> or <h1>
• tag attributes may be case sensitivee.g. filenames
• tags are (usually) paired to denote the start and end of an element<H1> Example </H1>
• BE CAREFULL: XHTML 1.0 has more strict syntax
MM-WT/36
HTML documentHTML document
. . . <tag attribute=“value” . . .> . . . </tag> . . .
text and/or tags
element (tag pair)
MM-WT/37
Minimal HTML documentMinimal HTML document
<html>
<head>
<title> document title </title>
</head>
<body>
document body - text . . .
</body>
</html>
MM-WT/38
Cascading Style Sheets (CSS)• mechanism for adding a style to a HTML document
• designed to separate content from presentation
• cascading concept improves accessibility
• actual recommendations (standards): CSS1 & CSS2– full CSS1 implementation: MAC IE, Netscape 6.0, Opera 4.0
• reference URLs:– http://www.w3.org/Style/
– http://www.htmlhelp.com/reference/css/
"Hopefully, future Web innovations will emulate the example
set by the Web Consortium in its work on CSS”, Jakob Nielsen
MM-WT/39
CSS use
<STYLE TYPE=“text/css”>css rules ...</STYLE>____________________________________________
<LINK REL="STYLESHEET" TYPE="text/css" HREF=".../my_style.css">____________________________________________
<TAG STYLE=“css-rule;...;css-rule”>...</TAG>
MM-WT/40
CSS example
H1 {font: 17pt "Arial CE"; font-weight: bold; color: red} H2 {font: 15pt "Arial CE"; font-weight: bold; color: green} P {font: 12pt "Courier New CE"; color: blue}...
MM-WT/41
Markup story overview
• SGML:– the basic architecture behind all MLs
• XML:– simplified SGML suitable for use on the WWW
• HTML:– an SGML application (DTD)
– XHTML = HTML (re)written in XML
• Style Sheets:– add more presentation control
– CSS (for HTML)
– XSL (for XML)
MM-WT/42
XML, XSL
• XML (Extensible Markup Language)– standard for structured documents on the Web developed by W3C
– created to become a subset of SGML optimised for the Web
– XML fits for Web applications where HTML is insufficient
• XSL (Extensible Stylesheet Language)– language for expressing style sheets
– has two parts:• a language for transforming XML documents (XSLT)
• an XML vocabulary for specifying formatting semantics
MM-WT/43
CSS .vs. XSL
CSS XSL
Can be used with HTML? yes no
Can be used with XML? yes yes
Transformation language? no yes
MM-WT/44
XHTML• challenges for HTML:
– new kinds of browsers: Digital TVs, handhelds, phones
– pressure to subset HTML for simple clients
– pressure to extend HTML for richer clients
• XHTML (Extensible HTML)– HMTL 4.0 (strict) written in XML
– modularised HTML for subsetting/combining with other tag-sets
– next generation forms
– XML requires:• make tags case-sensitive (lower case)
• include end tags and add a / to empty tags
• attribute values in quotes
MM-WT/45
Other standards
• VRML (Virtual Reality Modelling Language)– for modelling three-dimension scenes
• MathML (Mathematical Markup Language)– inclusion of mathematical expressions in Web pages
– XML application
• SMIL (Synchronized Multimedia Integration Language)– XML-based language that allows authors to write interactive
multimedia presentations
• ...
MM-WT/46
Active Web pagesActive Web pages
• to enhance your site:– two way interaction
– page animation
– better multimedia
– access to other systems
– browser intelligence
– desktop integration
MM-WT/47
Active Web pagesActive Web pages
• techniques:– CGI - Common Gateway Interface
• WWW server communicates with other programs (CGI scripts)
– SSI - Server Side Includes (*.shtml)
– API - Application Programming Interface
– Cookies (“making a browser remember”)
– scripting languages (embedded in HTML document)• Javascript, VBscript, …
– DHTML
– Java (applets, servlets)
– ActiveX
MM-WT/48
Active Web pagesActive Web pages
Who is doing the job?
• browser downloads and automatically executes program (Java applet)
OR
• HTML document is generated on the server machine (by CGI script)
MM-WT/49
Active Web pages
CGI program
other program
(application)
WWW server
API
WWW client
Java servlet
Java applet
script (embedded in HTML)
SSI
HTTP
server side
client side
CGI
MM-WT/50
Active Web pagesActive Web pages
• common examples:– forms (feedback processing)
• special tags: <FORM>, <INPUT>, <SELECT>, ...
• usually CGI script is used to process a form
– active maps (clickable maps)• special tags and attributes: <MAP>, <IMG>, ...
• client-side or server-side (CGI scripts are used)
– database or other internet service gateways
MM-WT/51
CGI
• WWW communicates with other programs (CGI scripts)
• CGI scripts should be in separate directory defined in WWW servers configuration file
• CGI scripts can be written in any programming language (shell script, PERL, C, …)
• workload is on the servers side (be careful)
MM-WT/52
Calling CGI script
<A HREF=“program_url?parameter_list“><IMG SRC=“program_url?parameter_list“><FORM ACTION=“program_url?parameter_list”>
program_url http://server-name/ cgi-bin-directory-name/ program-name
parameter_list par_1=val_1&...&par_n=val_n
MM-WT/53
SSI, API, ...
• workload is on the servers side (be careful)
• SSI:– simple mechanism for generating pages on the fly
– *.shtml
• API:– enables writing server extensions (plug-ins)
– not standardized (ISAPI, NSAPI, Apache API)
MM-WT/54
Cookies
• cookies.txt
• info about client-server communication
• data is sent in a Set-Cookie header by server
• .. and returned in a Cookie header by browser whenever that server is visited
• enables “ browser intelligence”
• server dependant
• browser dependant (MS IE, Netscape)
• security ?
MM-WT/55
Scripting languages
• JavaScript, VBScript, ...
• embedded in HTML source
• workload is on the clients side
• simple example: <HTML>
<HEAD>
<SCRIPT LANGUAGE=“JavaScript”>
document.write(“Hello World!”)
</SCRIPT>
</HEAD>
<BODY>
Example
</BODY>
</HTML>
MM-WT/56
Java
• object oriented programming language
• platform independent
• programs are transferred via network and executed on client side - applets
• Java programs executing on server side - servlets
• special development tools (JDK, …)
• http://www.javasoft.com
• http://www.javaworld.com
• http://www/gamelan.com
MM-WT/57
DHTML
• Dynamic HTML
• HTML + Style Sheets + Scripts
• extension to HTML standard
• started by Microsoft & Netscape
• not only <layer> tag
• enables user to activate (make dynamic) his pages
• DOM (Document Object Model)– glue for DHTML
– platform-independent and language-independent interface
MM-WT/58
Web-initiated transactions
• not all communication goes over the Web
• most streaming A/V are web-initiated
• implemented via plug-ins
• initial request is over HTTP
• response is a pointer to actual data on a media server(e.g. pnm://radio.broadcast.com/broadcast.ra)
• another, separate client contacts the media server using different protocol
MM-WT/59
Internationalization (I18N)Internationalization (I18N)
• originally:– plain ASCII (Latin 1); English language
• (X)HTML – UNICODE; new elements (LANG attribute)
• HTTP 1.1– enables charset and language negotiation
• META tag usage (override HTTP 1.0 limitations)<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-2”>
MM-WT/60
Internationalization (I18N)Internationalization (I18N)
• major topics:– character encoding
– language information
– local dependant info
– specific structures & formatting
• who should act:– content providers - label the content
– ISPs - allow user configuration & negotiation
– SW developers - use UNICODE internally
MM-WT/61
Security
• plain WWW is not secure!
• security on: – content level
• PGP, data encription
– channel level• SSL (Secure Socket Layer)
– message level• SHTTP, PEP, ...
MM-WT/62
Recent development• HTTP 1.1 & WebDAV
• CSS & XSL
• RDF - Resource Decsription Framework
• XML - Extensible Markup Language
• XHTML - Extensible HTML
• SVG - Scalable Vector Graphics
• Java & Jini
• WAP & WML
• Dial tone Web tone
• …
I think there may be a world market for perhaps 5 computers” (Thomas Watson, IBM, 1943)
“The internet is a fad” (Bill Gates, Microsoft, 1981)
MM-WT/63
SummarySummary
• History and statistics
• Basic concepts and components
• How Web works?
• URI - resource identification
• HTTP - Web and protocols
• Markup story - HTML and beyond
• Active Web pages
• Recent development