1 web servers herng-yow chen. 2 outline survey many different types of software and hardware web...

49
1 Web Servers Herng-Yow Chen

Upload: regina-cole

Post on 26-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

1

Web Servers

Herng-Yow Chen

Page 2: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

2

Outline Survey many different types of software an

d hardware web servers. Describe how to write a simple diagnostic

web server in Perl. Explain how web servers process HTTP tra

nsactions, step by step.

Page 3: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

3

Different types of web servers General-purpose software web server Web server appliances Embedded web servers

Page 4: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

4

Jobs of web servers Implement HTTP and the related TCP

connection handling. Manage the server-slide resource and

provide administrative features to configure, control, and enhance the web service.

Page 5: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

5

Jobs of Operating System Manages the hardware details of the underl

ying computer system Provide TCP/IP network support Provide filesystems to hold web resources Provide process management to control co

mputing activities.

Page 6: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

6

General-purpose software web server

General-purpose software web servers run on standard, network-enabled computer system.

Open source software (such as Apache or W3C’s Jigsaw).

Commercial software (such as Microsoft’s and iPlanet’s web servers).

Web server software is available for just about every computer and operating systems.

Page 7: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

7

General-Purpose Software Web Servers

In September 2004, the Netcaft survey (http://news.netcraft.com/archives/web_server_survey.html)

Page 8: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

8

Web server appliances Web server appliances are prepackaged software/hardwa

re solutions. The vendor preinstalls a software server onto a vendor-chosen computer platform and preconfigures the software.

Sun/Cobalt RaQ web appliance(http://www.cobalt.com)

Toshiba Magnia SG10 (http://www.toshiba.com) IBM Whistle web server application (http://www.whistle.com)

Appliance solutions remove the need to install and configuration software and often greatly simplify administration. However, the web server often is less flexible, feature-rich, and the server hardware is not easily upgradable.

Page 9: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

9

Embedded web servers Embedded servers are tiny web servers intended

to be embedded into consumer products (e.g., printers or home appliances).

Allow users to administer their consumer devices using a convenient web browser interface. IPic match-head sized web server

(http://www-ccs.cs.umass.edu/~shri/iPic.html) NetMedia SitePlayer SP1 Ethernet web server

(http://www.siteplayer.com)

Page 10: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

10

A Minimal Perl Web server Type-o-serve – a minimal Perl web server

used for HTTP debugging http://www.http-guide.com/tools/type-o-serv

e.pl

Page 11: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

11

A Minimal Perl Web ServerGET /blah.txt HTTP/1.1

Accept: */*

Accept-language: en-us

Accept-encoding: gzip, deflate

User-agent: Mozilla/4.0

Host: www.csie.ncnu.edu.tw:8080

Connection: Keep-alive

HTTP/1.0 200 OK

Connection: close

Content-type: text/plain

Hi there!

% ./type-o-serve.pl 8080

<<Request From 'www.csie.ncnu.edu.tw'>>

GET /blah.txt HTTP/1.1

Accept: */*

Accept-language: en-us

Accept-encoding: gzip, deflate

User-agent: Mozilla/4.0

Host: www.csie.ncnu.edu.tw:8080

Connection: Keep-alive

<<Type Response followed by '.’>>

HTTP/1.0 200 OK

Connection: close

Content-type: text-plain

Hi there!

HTTP request message

Type-o-serve dialog

HTTP response message

Page 12: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

12

What do web servers do?

1. Set up connection

2. Receive request

3. Process request

4. Access resource

5. Construct response

6. Send response

7. Log transaction

Page 13: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

13

What Real Web Servers Do

client Network interface

TCP/IP network stack

Operating system

Object Storage

User space

(5)Create response

HTTP server software process(3)Process

request

(1)Set up connection

(4)Access resource(7) Log

transaction

(6)Send response

(2)Receive request

Page 14: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

14

Step 1: accepting client connections

Handling new connections Exacting client IP from a new TCP connection

Client hostname identification Using “reverse DNS”

Determining the client user through ident Some web servers support the IETF ident prot

ocol

Page 15: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

15

Handling new connection When a client requests a TCP connection to the

web server, the web server establishes the connection and determines which client is on the other side of the connection, extracting the IP address from the TCP connection. (e.g., using getpeername call in UNIX socket)

The server is free to reject and immediately close connections, because the client IP is unauthorized or is known malicious client.

Once a new connection is established and accepted, the server adds the new connection to its list of existing connections and prepares to watch for data on the connection.

Page 16: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

16

Client host identification Most web servers can be configured to convert client IP a

ddresses into client hostnames, using “reverse DNS.” The hostname information is used for detailed access con

trol and logging. Note that hostname lookups can take a long time, slowing

down web transactions. Many high-performance web servers either disable hostname resolution or enable it only for particular content.

Ex: Configuring Apache to lookup hostnames for HTML and CGI resourcesHostnameLookups off<Files ~ “\. (html | htm | cgi)$”>

HostanmeLookups on</Files>

Page 17: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

17

Determining the client user through ident

The ident protocol let servers find out what username initiated an HTTP connection.

The username information is particularly useful for logging – the 2nd field of the popular Common Log Format contains the ident username of each HTTP request. (RFC931, the updated ident specification is documented by RFC 1413).

If a client supports the ident protocol, the client listens on TCP port 113 for ident requests.

Page 18: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

18

Determining the Client User Through ident

Web serverMary

HTTP connection

ident connection

Port 80

Port 80Port

113

Port 4236

4236, 80:USERID:UNIX:MARY

(b)Server establishes ident connection4236, 80

(c)Server sends request

(a) Mary establishes new HTTP connection

(d)Client returns ident response

Page 19: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

19

Ident protocol (cont.) Ident can work inside organizations, but it does n

ot work well across public Internet for the following reasons.

Many client PC don’t run the identd identification protocol daemon software.

The ident protocol significantly delays HTTP transactions. Many firewalls won’t permit incoming ident traffic. The ident protocol is insecure and easy to fabricate. The ident protocol doesn’t support virtual IP address well. There are privacy concerns about exporting client usernames.

Enable ident lookup in Apache IdentityCheck on Common Log Format log files typically contain typhens (-) in the 2

nd filed if no ident information is available.

Page 20: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

20

Step 2: Receiving request messages As the data arrives on connections, the server

reads out the data and start parsing the request message. Parse the request line looking for the request method,

the specified URI, and the version number. Read the message headers, each ending in CRLF. Detects the end-of-headers blank line, ending in

CRLF. Reads the request body, if any (length specified by

Content-Length header) Internet Representations of Messages

Some web servers also store the request message in internal data structures that make the message easy to manipulate.

Page 21: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

21

Receiving Request Messages

Internet

GET /specials/hychen.gif HTTP/1.0CRLF

Accept: image/gifCRLF

Host: www.j

Request message being read from network

serverclient

LF CR LF CR moc.erawdrah-seo

Page 22: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

22

Internal Representations of MessageGET /specials/saw-blade.gif HTTP/1.0CRLF

Accept: image/gifCRLF

Host: www.joes-hardware.comCRLF

CRLF

specials/saw-blade.gif

www.joes-hardware.com

Image/gifName:Host

Name:Accept

Value: ●

Value: ●

method: 1

version:1.0

uri: ●

header count: 2

headers: ●

body: -

Parse

Page 23: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

23

Different web server architectures

Single-threaded web servers Multi-process and multi-threaded web

servers Multiplexed I/O web servers

Non-blocking network accessing Multiplexed multi-threaded web servers

Page 24: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

24

Connection Input/Output Processing Architectures

Page 25: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

25

Step 3: Processing requests Once the web server has received a

request, it can process the request using method, resource, headers, and optional body.

Some method (e.g., POST) require entity body data in the request message. A few methods (e.g., GET) forbid entity body data in the request message.

Page 26: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

26

Step 4: Mapping and Accessing resources

Docroot Virtually hosted docroots User home directory docroots Directory Listings Dynamic content resource mapping Server-Side Include (SSI) Access Control

Page 27: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

27

Docroots Web servers support different kinds of resource mapping, b

ut the simplest form of mapping uses the request URI to name a file in the web server’s filesystem.

Typically, a special folder in the web server filesystem is reserved for web content. The folder is called the document root, or docroot.

The web server takes the URI from the request message and appends it to the document root. The docroot setting in apache servers

DocumentRoot /usr/local/httpd/files

Servers must be careful not to let relative URLs back up out of a document root and expose other parts of the filesystem. E.g., http://www.csie.ncnu.edu.tw/../

Page 28: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

28

Docroots

GET /specials/hychen.gif HTTP/1.0

Host: www.csie.ncnu.edu.tw

Internet

client

Object Storage

Web serverRequest URI: /specials/hychen.gif Server resource: /usr/local/httpd/files/specials/hychen.gif

Request message

/usr/local/httpd/filesdocroots

Page 29: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

29

Virtually hosted docroots Virtually hosted web servers host multiple

web site on the same web server, giving each site its own distinct document root on the server.

A virtual hosted web server identifies the correct document root to use from the IP or hostname in the Host header.

Page 30: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

30

Apache’s virtual host configuration <VirtualHost www.joes-hardware.com>

ServerName www.joes-hardware.com DocumentRoot /docs/joe TransferLog /log/joe.access_log ErrorLog /logs/joe.error_log

</VirtualHost>

<VirtualHost www.marys-hardware.com> ServerName www.marys-hardware.com DocumentRoot /docs/mary TransferLog /log/mary.access_log ErrorLog /logs/mary.error_log

</VirtualHost>

Page 31: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

31

Virtually hosted docroots

/docs/joe

/docs/mary

www.joes-hardware.com

www.marys-antiques.com

GET /index.html HTTP/1.0

Host: www.joes-hardware.com

GET /index.html HTTP/1.0

Host: www.marys-antiques.com

Internet

client

Request message A

Request message B

Page 32: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

32

User home directory docroots

/home/bob/public_html

www.joes-hardware.com

www.marys-antiques.com

GET /~bob/index.html HTTP/1.0

GET /~betty/index.html HTTP/1.0

Internet

client

Request message A

Request message B

/home/betty/public_html

Page 33: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

33

User home directory docroots Another common use of docroots gives people private we

b site on a web server. A typical convention maps URIs whose paths begin with a

slash and tilde (/~) followed by a username to a private document root for that user.

The private docroot is often the folder called public_html inside that user’s home directory, but it can be configured differently (e.g., in the NCNU web server, we use WWW as the user’s private document root.)

In apache’s configuration, UserDir public_html

Page 34: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

34

Directory listings A web serer can receive request for directory

URLs, where the path resolves to a directory, not a file.

Most web servers can be configured to take a few different actions when a client requests a directory URL: Return an error. Return a special, default, “index file” instead of the

directory. Scan the directory, and return an HTML page

containing the contents.

Page 35: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

35

Directory Listings (continued) Most web servers look for a file named index.htm

l or index.htm inside a directory to represent that directory.

In apache configuration DirectoryIndex index.html index.htm home.html home.

html index.cgi

Disable the automatic generation of directory index files with the apache directive: Option -Indexes

Page 36: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

36

Dynamic content resource mapping Web server also can map URIs to dynamic resou

rces – that is, to programs that generate content on demand.

In fact, a whole class of web servers called application servers connect web servers t sophisticated backend applications.

The web server need to be able to tell when a resource is a dynamic resource, where the dynamic content generator program is located, and how to runt he program.

Page 37: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

37

Dynamic content … In apache’s configuration

ScriptAlias /cgi-bin/ /usr/lcoal/etc/httpd/cgi-programs/ AddHandler cgi-script .cgi

CGI is an early, simple, and popular interface for executing server-side applications. Modern application servers have more powerful and server-side dynamic content support, including Active Server Pages, java servlets, and PHP.

Page 38: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

38

Dynamic Content Resource Mapping

serverclient

Internet

Page 39: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

39

Server-Side Includes (SSI) Many web servers also provide support for

server-side includes. If a resource is flagged as containing server-side

includes, the server processes the resource contents before sending them to the client.

The content are scanned for certain special patterns, which can be variable name or embedded scripts. The special patterns are replaced with the values of variables or the output of executable scripts.

This is an easy way to create dynamic content.

Page 40: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

40

Access controls Web servers also can assign access controls to

particular resource.

When a request arrives for an access-controlled resource, the web server can control access based on the IP address of the client, or it can issues a password challenge to get access to the resource.

We will see more details in the later lecture, chapter 12 (HTTP authentication).

Page 41: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

41

Step 5: Building Responses Once the web server has identified the

resource, it performs the action described in the request method and returns the response message, which contains status code, response header, and a response body.

Response Entities MIME Typing Redirection

Page 42: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

42

Response entities If the transaction generated a response

body, the content is sent back with the response message, which usually contains: a Content-Type header, i.e. MIME typing a Content-Length header, describing body size The actual message body content

Page 43: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

43

MIME typing The web server is responsible for determining the

MIME type of the response body. There are many ways to configure servers to

associate MIME types with resources: mime.types: extension-based type association Magic typing: content-based association, scanning a known

patterns Explicit typing: force particular files or directory contents to

have a MIME types, regardless of the file extension or contents. Type negotiation: server is configured to store a resource in

multiple document formats. In a client-server negotiation process the server can determine the “best” format to use. (chapter17)

Page 44: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

44

MIME Typing

www.csie.ncnu.edu.tw

GET /specials/hychen.gif HTTP/1.1

Host: www.csie.ncnu.edu.tw

HTTP/1.1 200 OK

Content-type: image/gif

Content-length: 8572

client

hychen.gif fileHTTP request message contains the command and the URI

Page 45: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

45

Redirection Web servers sometimes return redirection respon

ses (indicated by a 3XX return code) instead of success messages. The Location response header contains a URI for the new or preferred location of the content. Redirections are useful for: Permanently moved resources Temporarily moved resources URL augmentation Load balancing Server affinity Canonicalizing directory names

Page 46: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

46

300-399: Redirection Status Code

Status code Reason Phrase300 Multiple Choices

301 Moved Permanently

302 Found

303 See other

304 Not Modified

305 Use Proxy

306 (Unused)

307 Temporary Redirect

Page 47: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

47

Step 6: Sending Responses The servers may have many connections to many clients,

some idle, some sending data to the server, and some carrying response data back to the clients.

The servers needs to keep track of connection state and handle persistent connections with special care.

For non-persistent connections, the server is expected to close its side of connection when the entire message is sent.

For persistent connections, the connection may stay open, in which case the server needs to be extra cautious to compute the Content-Length header correctly, or the client will have no way of knowing when a response ends (c.f., Chapter 4).

Page 48: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

48

Step 7: Logging Finally, when a transaction is complete, the

web server notes an entry into a log file, describing the transaction performed.

Most web servers provide several configurable forms of logging. (Later lectures, Chapter 21, for details)

Page 49: 1 Web Servers Herng-Yow Chen. 2 Outline Survey many different types of software and hardware web servers. Describe how to write a simple diagnostic web

49

Reference: Web server http://www.apache.org

The apache web site http://www.w3c.org/Jigsaw

Jigsaw- W3C’s Server http://www.ietf.org/rfc/rfc1413.txt

RFC 1413, “Identification Protocol,” By M. St. Johns.