cs/info 330 enterprise architectures - cornell

31
1 CS/INFO 330 Enterprise Architectures Mirek Riedewald [email protected] (Some of the slides are courtesy of Gustavo Alonso, Fabio Casati, Harumi Kuno, Vijay Machiraju and Ethan Cerami) CS/INFO 330 2 The Big Picture WWW Site Visitor THE WEB Public Web Server Business Transaction Server Main Memory Cache DBMS Data Warehouse Application Server INTRANET, VPN Internal User Internal Web Server CS/INFO 330 3 Overview Enterprise architectures Internet concepts – URIs – HTTP Protocol The presentation tier – HTML – HTML forms – JavaScript, style sheets – Cookies

Upload: others

Post on 03-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS/INFO 330 Enterprise Architectures - Cornell

1

CS/INFO 330Enterprise Architectures

Mirek [email protected]

(Some of the slides are courtesy of Gustavo Alonso, Fabio Casati, Harumi Kuno, Vijay

Machiraju and Ethan Cerami)

CS/INFO 330 2

The Big PictureWWW Site

Visitor

THE WEB

Public Web Server

BusinessTransaction

Server

MainMemory

Cache

DBMS

DataWarehouseApplication

Server

INTRANET,VPN

Internal User

InternalWeb Server

CS/INFO 330 3

Overview

• Enterprise architectures• Internet concepts

– URIs– HTTP Protocol

• The presentation tier– HTML– HTML forms– JavaScript, style sheets– Cookies

Page 2: CS/INFO 330 Enterprise Architectures - Cornell

2

CS/INFO 330 4

Layers and Tiers• Client

– Any user or program that wants to perform an operation over the system

– Interacts with the system through presentation layer

• Application logic– Determines what the system actually

does– Enforces business rules and

establishes business processes– Can take many forms: programs,

constraints, business processes…• Resource manager

– Deals with organization (storage, indexing, and retrieval) of the data necessary to support the application logic

– Typically a database, but can also be a text retrieval system or other data management system providing querying capabilities and persistence

Client

Application Logic

Resource Manager

Presentation layer

Business rules

Business objects

Client

Server

Database

Client

Business processes

Persistent storage

CS/INFO 330 5

A Game of Boxes and Arrows• Box = system part• Arrow = connection between two system

parts• More boxes => more modular system

– More opportunities for distribution and parallelism

– Allows encapsulation, component based design, reuse

• More boxes => more arrows– More sessions (connections) to be maintained– More coordination necessary– System more complex to monitor and manage

• More boxes => more context switches and intermediate steps to go through before one gets to the data

– Performance suffers considerably

System designers try to balance the flexibility of modular design with the performancedemands of real applications. Once a layer is established, it tends to migrate down and merge with lower layers.

There is no problem in system design that cannot be solved by

adding a level of indirection. There is no performance

problem that cannot be solved by removing a level of

indirection.

CS/INFO 330 6

Top-Down Design

top-down designPL-A PL-B

PL-C

AL-AAL-B

AL-D

AL-C

RM-1 RM-2

top-down architecture

RM-1 RM-2

AL-A AL-D

AL-C AL-B

PL-APL-B

PL-C

-Start with high-level goals of problem-Proceed to define everything required to achieve these goals

-Emphasizes final system goals-Can be tailored to address functional (supported operations) and non-functional (performance, availability) issues-But: difficult to do when integrating legacy systems

Page 3: CS/INFO 330 Enterprise Architectures - Cornell

3

CS/INFO 330 7

Top-Down Design

presentation layer

resource management layer

application logic layer

client

info

r mat

ion

sys t

em

1. define access channelsand client platforms

2. define presentation formats and protocols forthe selected clients andprotocols

3. define the functionalitynecessary to deliver thecontents and formats neededat the presentation layer

4. define the data sourcesand data organization neededto implement the applicationlogic

top-down design

CS/INFO 330 8

Bottom-Up Design• Many of the basic components

already exist and cannot be easily replaced

– Stand alone systems which need to be integrated into new systems

• Components do not necessarily cease to work as stand alone components

– Often old applications continue running at the same time as new applications

• Approach has a wide application• Much of the work and products in

this area are related to middleware

– Intermediate layer– Provides a common interface– Bridges heterogeneity– Copes with distributionLegacy systems

Newapplication

Legacy application

CS/INFO 330 9

Bottom-Up Designbottom-up design

PL-A PL-BPL-C

AL-AAL-B

AL-D

AL-C

bott

om-u

p ar

chitec

ture

AL-A AL-D

AL-C AL-B

PL-APL-B

PL-C

wrapper wrapper wrapperwrapper wrapperwrapper

legacyapplication

legacyapplication

legacysystem

legacysystem

legacysystem

Page 4: CS/INFO 330 Enterprise Architectures - Cornell

4

CS/INFO 330 10

Bottom-Up Design

presentation layer

resource management layer

application logic layer

client

info

r mat

ion

sys t

em

1. define access channelsand client platforms

2. examine existing resourcesand the functionalitythey offer

3. wrap existing resourcesand integrate their functionalityinto a consistent interface

4. adapt the output of the application logic so that itcan be used with the requiredaccess channels and clientprotocols

bottom-up design

CS/INFO 330 11

One Tier: Fully Centralized• Presentation layer, application

logic and resource manager built as a monolithic entity

• Access through dumb terminals

• Was the typical architecture of mainframes, offering several advantages:– No forced context switches in

control flow (everything happens within the system)

– All is centralized, managing and controlling resources is easier

– Design can be highly optimized by blurring the separation between layers

Server

CS/INFO 330 12

Two Tier: Client/Server • As computers became more powerful,

it was possible to move the presentation layer to the client.

• Several advantages– Frees up resources for application logic– Can tailor presentation layer without

increasing system complexity• Independent development and

maintenance– Introduces concept of API (Application

Program Interface)• Interface to invoke the system from the

outside• Allows designers to think about

federating the systems into a single system

• Resource manager only sees one client: the application logic

– No context switches or calls between components of lower two layers

Server

Page 5: CS/INFO 330 Enterprise Architectures - Cornell

5

CS/INFO 330 13

APIs in Client/Server• Introduced notion of a service• Introduced notion of an interface

– How client can invoke a given service• Many standardization efforts due to need for common APIs

resource management layer

serv

e r

serviceinterface

serviceinterface

serviceinterface

serviceinterface

server’s API

serviceserviceserviceservice

CS/INFO 330 14

Technical Aspects Of Two Tier• Advantages compared to Single Tier

– Take advantage of client capacity to off-load work to clients– Work within the server takes place within one scope (almost as in 1 tier)– Server design still tightly coupled and can be optimized by ignoring

presentation issues– Still relatively easy to manage and control from a software engineering

point of view• Weaknesses

– Connection management to clients– Clients are “tied” to the system (no standard presentation layer)– Connecting to two systems, a client needs two presentation layers– No failure or load encapsulation

• If the server fails, nobody can work– Load created by one client will directly affect work of others

• All compete for the same resources

CS/INFO 330 15

The Main Limitation of Client/Server

• Accessing multiple servers:– Underlying systems do not know

about each other– No common business logic– Client is the point of integration

(increasingly fat clients)– Responsibility of dealing with

heterogeneous systems shifted to client

– Client becomes responsible for knowing where things are, how to get to them, and how to ensure consistency

• Very inefficient– Software design, portability, code

reuse, performance (since client capacity is limited)

• These issues cannot be solved with 2-tier

Server A Server B

Page 6: CS/INFO 330 Enterprise Architectures - Cornell

6

CS/INFO 330 16

Three Tier: Middleware• Three layers fully

separated– Better for application

integration, flexibility, and portability of application logic

• The layers are also typically distributed, taking advantage of the complete modularity of the design

CS/INFO 330 17

Middleware• Middleware is just a level of

indirection between clients and other system layers

• Introduces additional layer of business logic encompassing all underlying systems

• By doing this, a middleware system:

– simplifies the design of the clients by reducing the number of interfaces,

– provides transparent access to the underlying systems,

– acts as the platform for inter-system functionality and high level application logic, and

– takes care of locating resources, accessing them, and gathering results

Middleware or global application logic

Clients

Local resource managers

Local application logic

Server A Server B

middleware

CS/INFO 330 18

Technical Aspects of Middleware

• Benefits of middleware layer– Reduces number of necessary interfaces

• Clients see only one system (the middleware)• Local applications see only one system (the middleware)

– Centralizes control– Makes necessary functionality widely available to all clients– Allows to implement functionality that otherwise would be very

difficult to provide– First step towards dealing with application heterogeneity (some

forms of it)• Middleware layer weaknesses

– Another indirection level– Complex software

Page 7: CS/INFO 330 Enterprise Architectures - Cornell

7

CS/INFO 330 19

External clients

connecting logic

control

user logic

internal clients

2 tie

r sys

tem

s

Resource managers

wrappers

middleware

Resource manager

2 tier system

mid

dlew

are

syst

em

External client

Three-Tier Middleware-Based System

CS/INFO 330 20

N-Tier Architectures• Appear in two settings

– Connecting several three tier systems to each other or have 1-, 2-, and 3-tier systems in the resource management layer

– Adding connectivity through the Internet

• Web server treated as additional tier (more complex than most presentation layers)

• Addition of Web layer led to the notion of “application servers”– Middleware platforms

supporting access through the Web

client

resource management layer

application logic layer

information system

middleware

presentationlayer

Web server

Web browser

HTML filter

CS/INFO 330 21

INTERNET

FIREWALL

LAN

Webserver cluster

LAN,gateways

LAN

internalclients

LAN

middlewareapplication

logic

resource management

layer databaseserver

LAN

middlewareapplication

logic

additional resource management layers

LAN

Wrappersand

gateways

fileserver

application

N-tier In RealityProblems:•High complexity•Too muchmiddleware

•Redundantfunctionality

Page 8: CS/INFO 330 Enterprise Architectures - Cornell

8

CS/INFO 330 22

Blocking or Synchronous Interaction

• Traditionally, information systems use blocking calls– Synchronous interaction– Both parties have to be

“on-line”• Caller makes request• Receiver gets request,

processes it and sends response

• Caller receives response• Caller must wait until

response comes back

Disadvantages due to synchronization:– Connection overhead– Higher probability of

failures– Difficult to identify and

react to failures– It is not really practical for

complex interactions

CallReceive

ResponseAnswer

idle time

client server

CS/INFO 330 23

Overhead of Synchronism• Need to maintain a session

between caller and receiver– Expensive– Limit on how many sessions

can be active at the same time• For this reason, client/server

systems often resort to connection pooling to optimize resource utilization– Have a pool of open

connections– Allocate connections as

needed

• Synchronous interaction requires a context for each call and a context management system for all incoming calls

request()

do with answer

receiveprocessreturn

sessionduration

request()

do with answer

receiveprocessreturn

Context is lostNeeds to be restarted!!

CS/INFO 330 24

Failures In Synchronous Calls• If client or server fail, the context

is lost– If the failure occurred before 1,

nothing has happened– If the failure occurs after 1 but

before 2 (receiver crashes), then the request is lost

– If the failure happens after 2 but before 3, side effects may cause inconsistencies

– If the failure occurs after 3 but before 4, the response is lost but the action has been performed (try again?)

• Who is responsible for finding out what happened?

• Finding out when the failure took place not easy

– Chain of invocations—failure can occur anywhere along the chain

request()

do with answer

receiveprocessreturn

12

34

request()

do with answertimeout

try again

do with answer

receiveprocessreturn

12

3

receiveprocessreturn

2’

3’

Page 9: CS/INFO 330 Enterprise Architectures - Cornell

9

CS/INFO 330 25

Two SolutionsENHANCED SUPPORT

• Client/Server systems and middleware platforms provide a number of mechanisms to deal with the problems created by synchronous interaction– Transactional interaction– Service replication and

load balancing

ASYNCHRONOUS INTERACTION

• Example: email• Caller sends message• Message gets stored

somewhere until receiver reads it and sends response

• Response is sent in a similar manner

• Asynchronous interaction can take place in two forms:– Non-blocking invocation– Persistent queues

CS/INFO 330 26

Message Queuing• Reliable queuing is an

excellent complement to synchronous interactions– Modular design

• Code for making a request can be in a different module (even a different machine) than code for dealing with the response

– Easier to design sophisticated distribution modes

– Helps to handle communication sessions in more abstract way

– More natural way to implement complex interactions between heterogeneous systems

do with answerdo with answer

request()request()

receiveprocessreturn

queue

queue

CS/INFO 330 27

Overview

• Enterprise architectures• Internet concepts

– URIs– The HTTP Protocol

• The presentation tier– HTML– HTML forms– JavaScript, style sheets– Cookies

Page 10: CS/INFO 330 Enterprise Architectures - Cornell

10

CS/INFO 330 28

Internet Concepts

• URIs• The HTTP Protocol

– HTTP Overview– Example HTTP Session– HTTP 1.0 v. 1.1– Live Demo via HTTP Tracer Plus– Structure of Client Requests/Server

Responses

CS/INFO 330 29

Uniform Resource Identifiers• Uniform naming schema to identify resources on

the Internet• A resource can be anything:

– Index.html– mysong.mp3– picture.jpg

• Example URIs:http://www.cs.wisc.edu/~dbbook/index.htmlmailto:[email protected]

CS/INFO 330 30

Structure of URIs

http://www.cs.wisc.edu/~dbbook/index.html

• URI has three parts:– Naming schema (http)– Name of the host computer (www.cs.wisc.edu)– Name of the resource (~dbbook/index.html)

• URLs (Uniform Resource Locators) are a subset of URIs

Page 11: CS/INFO 330 Enterprise Architectures - Cornell

11

CS/INFO 330 31

HTTP Overview

• HTTP: HyperText Transfer Protocol• Developed by Tim Berners Lee, 1990• Client/Server Architecture:

– Client requests a document• Example clients: Firefox, IE, Safari

– Server returns the document• Example servers: Apache, IIS (MSFT’s Internet

Information Server)

CS/INFO 330 32

Watch HTTP• Telnet:

– telnet www.yahoo.com 80– GET /– Hit enter twice

• See your requests:– http://www.schroepl.net/cgi-bin/http_trace.pl

• Many products for tracing HTTP traffic– Search for http tracer or similar

CS/INFO 330 33

Example HTTP Session• Client sends request, Server sends response

• Client requests the following URL: http://www.cs.cornell.edu:80/

• Anatomy of the Request:– http:// HyperText Transfer Protocol

• Other options: ftp, mailto– www.cs.cornell.edu : host name– :80: Port Number

• 80 is reserved for HTTP• Ports can range from 1 to 65,535

– / Root document

Page 12: CS/INFO 330 Enterprise Architectures - Cornell

12

CS/INFO 330 34

The Client RequestActual Browser Request

GET / HTTP/1.1Accept: image/gif, image/x-xbitmap, image/

jpeg, image/pjpeg, */*Accept-Language: en-usAccept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE

5.01; Windows NT)Host: www.cs.cornell.eduConnection: Keep-Alive

CS/INFO 330 35

Anatomy of the Client Request• GET / HTTP/1.1

– Requests the root / document– Specifies HTTP version 1.1– HTTP Versions: 1.0 and 1.1 (more on this later)

• Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */*– Indicates what type of media the browser will accept

• Accept-Language: en-us– Browser’s preferred language

• Accept-Encoding: gzip, deflate– Accepts compressed data (faster download times)

CS/INFO 330 36

Anatomy of the Client Request• User-Agent: Mozilla/4.0 (compatible; MSIE 5.01;

Windows NT)– Indicates the browser type.

• Host: www.cs.cornell.edu– Required for HTTP 1.1– Optional for HTTP 1.0– A Server may host multiple hostnames. Hence, the

browser indicates the host name here.• Connection: Keep-Alive

– Enables “persistent connections”, better performance (more later)

Page 13: CS/INFO 330 Enterprise Architectures - Cornell

13

CS/INFO 330 37

Server ResponseHTTP/1.1 200 OKDate: Mon, 24 Sept 2001 20:54:26 GMTServer: Apache/1.3.6 (Unix)Last-Modified: Mon, 24 Sept 2001 14:06:11 GMTContent-length: 327Connection: closeContent-type: text/html <title>Sample Homepage</title><img src="/images/oreilly_mast.gif"><h1>Welcome</h2>This is the webpage of ...

CS/INFO 330 38

Anatomy of Server Response• HTTP/1.1 200 OK

– Server Status Code– Code 200: Document was found– We will examine other status codes shortly

• Date: Mon, 24 Sept 2001 20:54:26 GMT– Date on the server – GMT (Greenwich Mean Time)

• Last-Modified: Mon, 24 Sept 2001 14:06:11 GMT– Time when document was last modified– Very useful for browser caching– If page in browser cache, may not need to request whole

document again (more later)

CS/INFO 330 39

Anatomy of Server Response• Content-length: 327

– Number of bytes in the document response• Connection: close

– Indicates that server will close connection– If client wants to send another request, it will need to open

another connection to the server• Content-type: text/html

– Indicates MIME Type of the return document• Multi-Purpose Internet Mail Extensions

– Enables web servers to return binary or text files– Other MIME Categories:

• Audio, video, images, xml

Page 14: CS/INFO 330 Enterprise Architectures - Cornell

14

CS/INFO 330 40

Anatomy of Server Response

The actual HTML document:<title>Sample Homepage</title>

<img src="/images/oreilly_mast.gif">

<h1>Welcome</h2>This is the web page of ...

CS/INFO 330 41

HTTP 1.0 vs 1.1: Getting Objects

Once a browser receives an HTML page, it makes separate connections to retrieve different objects within the page.

Client Web Browser

Web Server

Give me /index.html

Here you go...

Now, give me logo.gif

Here you go...

CS/INFO 330 42

HTTP 1.0 vs 1.1

• HTTP 1.0– For each request, opens a new connection

with the server• HTTP 1.1

– For each request, default action is to maintain an open connection with the server

– Faster, persistent connections– Supported by most browsers and servers

Page 15: CS/INFO 330 Enterprise Architectures - Cornell

15

CS/INFO 330 43

Example: HTTP 1.0 v. 1.1

• HTTP 1.0: Get HTML Page plus Images– Open Connection: GET /index.html– Open Connection: GET /logo.gif– Open Connection: GET /button.gif

• HTTP 1.1: Get HTML Page plus Images– Open Persistent Connection: GET /index.html– GET /logo.gif– GET /button.gif

CS/INFO 330 44

Client Requests

• Every client request includes three parts– Method: Indicates type of request, HTTP

version and name of requested document– Header Information: Used to specify browser

version, language, etc.– Entity Body: Used to specify form data for

POST requests

CS/INFO 330 45

Client Methods• GET and POST: We will see them later when we

discuss HTML forms• HEAD

– Similar to GET, except that the method requests only the header information

– Server will return date-modified, but will not return the data portion of the requested document

– Useful for browser caching– Example

• If browser contains a cached version of a page, it issues a head request

• If document has not been modified recently, use cached version

Page 16: CS/INFO 330 Enterprise Architectures - Cornell

16

CS/INFO 330 46

Server Responses

• Every server response includes three parts– Response Line: HTTP version number, three

digit status code, and status message– Header: Information about the server and the

object being served– Entity Body: The actual data

CS/INFO 330 47

Server Status Codes

• 100-199 Informational• 200-299 Client Request Successful• 300-399 Client Request Redirected• 400-499 Client Request Incomplete• 500-599 Server Errors

CS/INFO 330 48

Some Important Status Codes

• 200: OK – Request was successful.

• 301: Moved Permanently– Server redirects client to a new URL

• 404 Not Found– Document does not exist

• 500 Internal Server Error– Error within the Web Server

Page 17: CS/INFO 330 Enterprise Architectures - Cornell

17

CS/INFO 330 49

HTTP Is Stateless• What does this mean?

– No “sessions”– Every message is completely self-contained– No previous interaction is “remembered” by the protocol– Tradeoff between ease of implementation and ease of

application development• Other functionality has to be built on top

• Implications for applications– Any state information (shopping carts, user login-information)

needs to be encoded in every HTTP request and response (!)– Popular methods on how to maintain state:

• Cookies• Dynamically generate unique URL’s at the server level

CS/INFO 330 50

Overview

• Enterprise architectures• Internet concepts• The presentation tier

– HTML– HTML forms– JavaScript, style sheets– Cookies

CS/INFO 330 51

Web Data Formats

• HTML– The presentation language for the Internet

• XML– A self-describing, hierarchal data model

• We will cover XML and associated query and transformation languages (XPath, XSLT) later.

Page 18: CS/INFO 330 Enterprise Architectures - Cornell

18

CS/INFO 330 52

HTML: An Example<h3>Fiction</h3><b>Waiting for the Mahatma</b><UL><LI>Author: R.K. Narayan</LI><LI>Published 1981</LI>

</UL><b>The English Teacher</b><UL><LI>Author: R.K. Narayan</LI><LI>Published 1980</LI><LI>Paperback</LI>

</UL>

</BODY></HTML>

<HTML><HEAD></HEAD><BODY><h1>Barns and Nobble Internet

Bookstore</h1>Our inventory:<h3>Science</h3><b>The Character of Physical

Law</b><UL>

<LI>Author: Richard Feynman</LI>

<LI>Published 1980</LI><LI>Hardcover</LI>

</UL>

CS/INFO 330 53

HTML: A Short Introduction

• HTML is a markup language• Commands are tags:

– Start tag and end tag– Examples

• <HTML> … </HTML>• <UL> … </UL>

• Many editors automatically generate HTML directly from a document (e.g., Microsoft Word has “Save as html”)

CS/INFO 330 54

HTML: Sample Commands

• <HTML>: HTML document • <UL>: unordered list• <LI>: list item• <h1>: largest heading• <h2>: second-level heading, <h3>, <h4>

analogous• <B>Title</B>: bold

Page 19: CS/INFO 330 Enterprise Architectures - Cornell

19

CS/INFO 330 55

Overview

• Enterprise architectures• Internet concepts• The presentation tier

– HTML– HTML forms– JavaScript, style sheets– Cookies

CS/INFO 330 56

HTML Forms• Web form allows user to enter data through a

browser (presentation tier)• Completed form is submitted to the server

(middle tier)

Source: Wikipedia

CS/INFO 330 57

HTML Form Example<FORM method="post" action="bar.php"><TABLE border="1"><TR bgcolor="#CCCCFF"><TH>Name</TH><TH>Value</TH>

</TR><TR><TD>Name</TD><TD><input type="text" size="25">

</TD></TR><TR><TD>Sex</TD><TD><input type="radio" name="sex" value="male"> Male<BR><input type="radio" name="sex" value="female"

checked> Female</TD>

</TR><TR><TD>Eye color</TD><TD><select name="eye color"><option>blue</option><option>brown</option><option selected>green</option><option>other</option>

</select></TD>

</TR>

<TR><TD>Check all that apply</TD><TD><input type="checkbox" name="height" value="1">

Over 6 feet tall</input><BR><input type="checkbox" name="weight" value="1">

Over 200 pounds</input></TD>

</TR><TR><TD colspan="2">Describe your athletic ability:<BR><textarea name="athletic" cols="50"

rows="4"></textarea></TD>

</TR><TR><TD colspan="2" align="center"><input type="submit" value="Enter my information">

</TD></TR>

</TABLE></FORM>

Page 20: CS/INFO 330 Enterprise Architectures - Cornell

20

CS/INFO 330 58

General Form Format• <FORM ACTION=“page.jsp” METHOD=“GET”

NAME=“LoginForm”> … </FORM>– Forms cannot be nested

• ACTION: URI of page to which form contents are submitted– Absence of ACTION => use current page– page.jsp provides logic for processing form

• METHOD: GET or POST– Method for submitting completed form to web server

• NAME: name of form (optional)

CS/INFO 330 59

HTML Form Elements• <INPUT TYPE=“text” NAME=“username”

VALUE=“Joe”>• TYPE: type of input field

– text: single line of text– checkbox– radio: radio button– submit: button to submit form to the server

• NAME: symbolic name for field• VALUE: default contents of text field or label of

buttons

CS/INFO 330 60

More Form Elements

• textarea– Like text, but multiple rows possible

• select– Drop-down list showing possible selections

<textarea name="athletic" cols="50“ rows="4"></textarea>

<select name="eye color"><option>blue</option><option>brown</option><option selected>green</option><option>other</option></select>

Page 21: CS/INFO 330 Enterprise Architectures - Cornell

21

CS/INFO 330 61

Submitting Forms• <FORM ACTION=“page.jsp” METHOD=“GET”

NAME=“LoginForm”> … </FORM>• GET method

– Form contents assembled into query URI– Visible to user in browser, e.g.,

http://www.google.com/search?hl=en&q=cs+330 for Google search for CS 330

– Can bookmark that URI• POST method

– Form contents sent in separate data block inside HTTP message body

CS/INFO 330 62

GET Method• action?name1=value1&name2=value2

– http://www.google.com/search?hl=en&q=cs+330• action: URI specified in ACTION attribute of

FORM tag (or current document URI if no ACTION specified)– To process form at middle tier, ACTION attribute

should point to page, script, or program that will process the form values

• name=value: user input from INPUT fields• URI has to be single string with no spaces

– Convert special characters to hex character code– Convert space to +

CS/INFO 330 63

Overview

• Enterprise architectures• Internet concepts• The presentation tier

– HTML– HTML forms– JavaScript, style sheets– Cookies

Page 22: CS/INFO 330 Enterprise Architectures - Cornell

22

CS/INFO 330 64

JavaScript• For adding programs to web pages that run at the client

(i.e., the machine running the web browser), e.g.:– Detect browser type to load browser-specific page– Simple consistency checks on form fields

• Does email address contain @?– Pop-ups

• Embedded in HTML document with SCRIPT tag– <SCRIPT LANGUAGE=“JavaScript” SRC=“validateForm.js”> …

</SCRIPT>– LANGUAGE: scripting language– SRC: file with script code that is automatically embedded into the

HTML document

CS/INFO 330 65

JavaScript<SCRIPT LANGUAGE=“JavaScript”>

<!--alert(“This is a Pop-up!”);

//--></SCRIPT>• JavaScript code can be placed inside comments

– Avoids displaying it by browsers that do not understand SCRIPT tag

• Lightweight programming language– Variables, usual operators, assignments, conditional statements

(if (condition) {statements;} else {statements;}), loops (for, do-while, while)

– Functions (function f(args) {statements;})

CS/INFO 330 66

Example: Check for Empty FieldsHTML Form:

<H1>Please enter login and password:</H1><form name=“LoginForm” method=“POST”action=“TableOfContents.jsp” onSubmit=“return testLoginEmpty()”><input type=“text” name=“userid”><input type=“password” name=“password”><input type=“submit” value=“Login” name=“submit”><input type=“reset” value=“Clear” name=“reset”>

</form>

Associated JavaScript:

<script language=“javascript”>function testLoginEmpty(){loginForm = document.LoginFormif ((loginForm.userid.value == “”) ||(loginForm.password.value == “”))

{alert(‘Please enter values for userid and password.’);return false;

}else return true;

}</script>

Implicitly defined; refers to current page

Form event handler; called whensubmit button is pressed or userpresses return in text field

Page 23: CS/INFO 330 Enterprise Architectures - Cornell

23

CS/INFO 330 67

Style Sheets• Idea: Separate display from contents, adapt display to

different presentation formats• Two aspects

– Document transformation: what part of the document to display in what order

– Document rendering: how to display each part of the document• Why use style sheets?

– Reuse same document for different displays– Tailor display to user preferences– Reuse document in different contexts

• Stylesheet languages– Cascading Style Sheets (CSS) for HTML documents– Extensible Stylesheet Language (XSL) for XML documents

CS/INFO 330 68

Cascading Style Sheets

• Defines how to display HTML documents• Many HTML documents can refer to same CSS

– Can change format of entire web site by changing single style sheet

– Usage in HTML: <LINK REL=“style sheet”TYPE=“text/css” HREF=“books.css”/>

• Style sheet line format: selector {property: value}– Selector: tag whose format is defined– Property: tag’s attribute whose value is set– Value: value of attribute

CS/INFO 330 69

CSS Example

body {background-color: yellow}h1 {font-size: 36pt}h3 {color: blue}p {margin-left: 50px; color: red}

• First line has same effect as <body background-color=“yellow”>

Page 24: CS/INFO 330 Enterprise Architectures - Cornell

24

CS/INFO 330 70

XSL

• Language for expressing style sheets• Three components

– XSLT: XSL Transformation Language• Can transform one document into another

– XPath: XML Path language– XSL Formatting Objects

• Formats output of an XSL transformation

• Will be covered in later lectures

CS/INFO 330 71

Overview

• Enterprise architectures• Internet concepts• The presentation tier

– HTML– HTML forms– JavaScript, style sheets– Cookies

CS/INFO 330 72

Sites That Know You...

• Examples:– www.weather.com– www.amazon.com

• Each time I return to these sites, they remember who I am…– Weather.com remembers previous locations,

preferences– Amazon.com remembers products I looked at and

makes recommendations• How do they do that?

Page 25: CS/INFO 330 Enterprise Architectures - Cornell

25

CS/INFO 330 73

What is a Cookie?

• Small piece of data generated by a web server, stored on the client’s hard drive

• Serves as an add-on to the HTTP specification– Remember: HTTP by itself is stateless

• Controversial, as it enables web sites to track web users and their habits

CS/INFO 330 74

Example Cookie Use• Web Site Acme.com wants to

track number of unique visitorswho access its site

• HTTP Server logs shownumber of “hits”, but notnumber of unique visitors*

• Problem: HTTP is stateless– Retains no memory regarding individual users

• Cookies provide mechanism to solve this problem * Actually, you could check the log files for IP addresses, but

Internet proxies and NAT are a problem.

© Warner Bros.

CS/INFO 330 75

Tracking Unique Visitors• Step 1: Wile E. Coyote requests home page for

acme.com• Step 2: acme.com web server generates new

unique ID for him• Step 3: Server returns home page plus a cookie

set to the unique ID• Step 4: Each time Coyote returns to

acme.com, the browser automaticallysends the cookie along with the GETrequest

Page 26: CS/INFO 330 Enterprise Architectures - Cornell

26

CS/INFO 330 76

Cookie Conversation

Browser ServerGive me the home page!

Here’s the home page plusa cookie.

Now, give me the news page(cookie is sent automatically)

I’ve seen you before… Here’sthe news page.

CS/INFO 330 77

Cookie Notes

• Created in 1994 for Netscape 1.1• Cookies cannot be larger than 4K• Limit on number of cookies per domain

(e.g., netscape.com, microsoft.com)• Cookies stay on your machine until:

– they automatically expire– they are explicitly deleted

• Cookies work the same on all browsers

CS/INFO 330 78

Magic Cookies

• The term cookie comes from an old programming hack, called Magic Cookies

• If a programmer needed to make two programs communicate, he would create a “magic cookie”, a small file containing data to transfer between program parts

Page 27: CS/INFO 330 Enterprise Architectures - Cornell

27

CS/INFO 330 79

Cookie Standards

• Version 0 (Netscape)– The original cookie specification– Implemented by all browsers and servers– We will focus on this Version

• Version 1– Internet Official Protocol Standard RFC 2109– Compatible with V0, but with some extensions

CS/INFO 330 80

Why use Cookies?• Tracking unique visitors• Creating personalized web sites• Shopping Carts• Tracking users across a site

– E.g. do users who visit the sports news page also visit the sports store?

• Type javascript:alert("Cookies:"+document.cookie) in browser URL field to see active cookies for page

CS/INFO 330 81

Cookie Anatomy

• Version 0 specifies six cookie parts:– Name– Value– Domain– Path– Expires– Secure

Page 28: CS/INFO 330 Enterprise Architectures - Cornell

28

CS/INFO 330 82

Cookie Parts: Name/Value

• Name– Name of your cookie (Required)– Cannot contain whitespaces, semicolons or

commas• Value

– Value of your cookie (Required)– Cannot contain whitespaces, semicolons or

commas

CS/INFO 330 83

Cookie Parts: Domain

• Only pages from the domain which created a cookie are allowed to read the cookie– Example: amazon.com cannot read yahoo.com’s

cookies (imagine the security flaws if this were otherwise)

• By default, domain is set to the full domain of the web server that served the web page– Example: myserver.mydomain.com would

automatically set the domain to .myserver.mydomain.com

CS/INFO 330 84

Cookie Parts: Domain• Domains are always prepended with a dot

– This is a security precaution: all domains must have at least two periods

• Can set a higher level domain– Example: myserver.mydomain.com can set domain to

.mydomain.com• Allows hisserver.mydomain.com and

herserver.mydomain.com to access the same cookies

• No matter what, you cannot set a domain other than your own

Page 29: CS/INFO 330 Enterprise Architectures - Cornell

29

CS/INFO 330 85

Cookie Parts: Path

• Restricts cookie usage within the site• Default: path is set to the path of the page

that created the cookie– Example: user requests page from

mymall.com/storea– By default, cookie will only be returned to

pages for or under /storea• If path is /, cookie will be returned to all

pages (a common practice)

CS/INFO 330 86

Cookie Parts: Expires

• When the cookie will expire• Specified in Greenwich Mean Time (GMT):

– Wdy DD-Mon-YYYY HH:MM:SS GMT• If value is left blank, browser will delete the

cookie when the user exits the browser– Known as a session cookie, as opposed to a

persistent cookie

CS/INFO 330 87

Cookie Parts: Secure

• Secure flag is designed to encrypt cookies while in transit

• Secure cookie will only be sent over a secure connection (such as SSL)

• In other words, if a cookie is set to secure, and you connect using a non-secure connection, the cookie will not be sent

Page 30: CS/INFO 330 Enterprise Architectures - Cornell

30

CS/INFO 330 88

Weaknesses of Cookies

• People share machines– Per-user cookie files solve this

• People use multiple machines– Single user has different cookies on different

machines—is this a bug or a feature?• Cookies can be erased from the client

machine’s hard drive (bug or feature…)• Cookies can be copied

– Security implications for eCommerce sites

CS/INFO 330 89

Cookie Abuse - I

• Conventional catalog stores could sell information about customers– Name, address, purchases

• eCommerce sites can gather and sell much more detailed information– All the way down to clickstreams

• But that’s only for a single site

CS/INFO 330 90

Cookie Abuse - II • Ad servers and the “1-pixel gif”

– Mediawiki.org’s page pXYZ contains• <img src=“x... lotofbanners.com/stat?page=...pXYZ”>

– lotofbanners.com sets a persistent UID cookie in the usual way– Gets around cookie domain specification

• So lotofbanners.com can maintain user page visit statistics across multiple sites

Image source: Wikipedia

Page 31: CS/INFO 330 Enterprise Architectures - Cornell

31

CS/INFO 330 91

Cookie Blocking Software

• Cookie Central (and others) have pointers to lots of cookie blocking software– Cookie Pal– Cookie Crusher– Cookie Cruncher– Many more…

• But many (most?) sites don’t work properly if you disable cookies these days

CS/INFO 330 92

Some Cookie Alternatives

• Embedding information in URL– Typically in query string

• Hidden form fields– Similar to URL embedding

• HTTP authentication– Browser stores access credentials