bscit2012.weebly.com · prof. s. kannan director & dean (in-charge) directorate of distance...

BT 0078 Website Design

Contents

Unit 1

Introduction to Internet 1

Unit 2

Website development with HTML – I 40

Unit 3

Website development with HTML – II 70

Unit 4

XML Programming – I 104

Unit 5

XML Programming – II 129

Unit 6

XML Programming – III 158

Acknowledgements, References

and Suggested Readings 178

Edition: Spring 2009

BKID – B1005 10th

June 2009

Prof. S. Kannan Director & Dean (in-charge) Directorate of Distance Education Sikkim Manipal University of Health, Medical & Technological Sciences (SMU DDE)

Board of Studies Dr. U. B. Pavanaja (Chairman) Nirmal Kumar Nigam General Manager – Academics HOP – IT Manipal Universal Learning Pvt. Ltd. Sikkim Manipal University – DDE Bangalore. Manipal. Prof. Bhushan Patwardhan Dr. A. Kumaran Chief Academics Research Manager (Multilingual) Manipal Education Microsoft Research Labs India Bangalore. Bangalore. Dr. Harishchandra Hebbar Ravindranath.P. S. Director Director (Quality) Manipal Centre for Info. Sciences. Yahoo India Manipal. Bangalore. Dr. N. V. Subba Reddy Dr. Ashok Kallarakkal HOD-CSE Vice President Manipal Institute of Technology, Manipal IBM India, Bangalore Dr. Ashok Hegde H. Hiriyannaiah Vice President Group Manager MindTree Consulting Ltd., Bangalore EDS Mphasis, Bangalore Dr. Ramprasad Varadachar Director, Computer Studies Dayanand Sagar College of Engg. Bangalore.

Content Preparation Team Content Writing Content Editing Mr. Balasubramani R Dr. E. R. Naganathan Assistant Professor, Dept. of IT Professor & HOD – IT Sikkim Manipal University – DDE Sikkim Manipal University – DDE Manipal. Manipal. Instructional Design Mr. Kulwinder Pal Senior Lecturer (Education) Sikkim Manipal University – DDE, Manipal

Edition: Spring 2009

This book is a distance education module comprising a collection of learning material for our students. All rights reserved. No part of this work may be reproduced in any form by any means without permission in writing from Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim. Printed and published on behalf of Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim by Mr. Rajkumar Mascreen, GM, Manipal Universal Learning Pvt. Ltd., Manipal – 576 104. Printed at Manipal Press Limited, Manipal.

SUBJECT INTRODUCTION

‘Website Design’ is a two credit subject in third semester of BscIT program,

that introduces you about the essential skills needed for students to develop

web site and for writing the script to process at the client side. The HTML

tutorial will give clear idea about the designing of the user interface to the

web. The subject also gives clear idea about handling of the XML data in the

web environment. The JavaScript is used to process the web page at the

client side and server side. This subject gives idea about how to use the

JavaScript at the client side to process the web form.

This SLM has been split into ten units to cover the overview of web

designing.

Unit 1: Introduction to Internet:

In this unit we shall begin with an introduction to the internet, then discuss

about the client-server model for communication and different types of

connections. We shall also discuss about Internet Service Providers and

addressing in the internet. At the end we shall explain about the Resource

Addressing and Electronic mail.

Unit 2: Web site development with HTML – I:

In this unit we shall study about the various HTML tags and to create a web

page using these tags. We shall also study to design a form using HTML.

Unit 3: Web site development with HTML – II:

In this chapter you are going to study about frames in HTML. The

Cascading Style Sheets usage will be studied in this chapter. The design of

Tables in HTML and the general web site layout and design is also

explained in this unit. Foundations of DHTML are also studied in this unit.

Unit 4: XML Programming – I:

XML is far more than a solution to the deficiencies of HTML. It provides a

simple and universal way of storing textual data of any kind. In this chapter,

you are going to study the need of XML, the XML document structure and

XML namespaces.

Unit 5: XML PROGRAMMING – II:

A schema is similar to a class definition. In this chapter you are going to

study how schema is used in XML. You are also going to study about an

overview of SOAP (Simple Object Access Protocol) and an introduction to

the web services. Overview of the XML Document Object Model will be

discussed in this chapter.

Unit 6: XML PROGRAMMING – III:

XSLT style sheets are used to transform XML documents into different

forms or formats, perhaps using different DTDs. In this chapter you are

going to study about the transformation of XML documents into different

formats using XSLT style sheets. Also you will study an overview of the XSL

Formatting Objects.

After studying this subject, you should be able to develop professional

Interactive websites using HTML, DHTML, XML features.

The subject requires knowledge and understanding of skills related to

Internet, ISP, DNS servers and HTML.

For various multimedia and other resources on the

subject, log on to TeL portal of SMU DDE at www.smude.edu.in.

Website Design Unit 1

Sikkim Manipal University Page No. 1

Unit 1 Introduction to Internet

Structure:

1.1 Introduction

Objectives

1.2 What is Internet?

Definition

Internet from practical and technical angle

Who owns and cares for the Internet?

What is TCP/IP?

Introduction to RFC

How Internet Works?

Internet Applications

1.3 Concepts of Server

Client Server Model

Servers

1.4 Getting Connected

Different Types of Connections

Requirements for Connections

1.5 Internet Service Providers

1.6 Address in Internet

The Domain Name System and DNS Servers

IP Addresses

1.6 Resource Addressing

URL (Uniform Resource Locator)

URLs and HOST Names

URLs and Port Numbers

Pathnames

1.7 Email

Email Basics

Mail protocols

How to Access the Mail System

1.8 Summary

1.9 Terminal Questions

1.10 Answers



1.1 Introduction

We have covered the basic concepts of internet and website in the previous

semester. In addition to it we are going to cover few advance concepts in

this unit.

In this unit, we would begin with an introduction to the internet, then discuss

about the client-server model for communication and different types of

connections. We would also discuss about Internet Service Providers and

addressing in the internet. At the end we will explain about the Resource

Addressing and Electronic mail.

Objectives

After studying this unit, you should be able to:

explain the meaning, evolution, working and application of internet

discuss the client server model and various types of internet

describe how to get connected to internet

use IP addressing scheme

explain the concepts of resource addressing

describe the E-mail basics, mail protocol & methods of accessing mail

system

1.2 What is Internet?

This section covers the definition, meaning and practical & technical angle

of internet.

1.2.1 Definition

There is no single, generally agreed upon definition for internet because the

internet is a different thing to different people. We can give the following few

expressions in this context.

The Internet links are computer networks all over the world so that users

can share resources and communicate with each other.

It is the name for a vast, worldwide system consisting of people,

information and computers

It is a network of networks that spans the globe

It is an ocean of information

It is a set of computers communicating over fiber optics, phone lines,

satellite links and other media



It is a gold mine of professionals from all fields sharing information about

their work

It is a world wide interconnected system of thousands of computer

networks, each network in turn linking thousands of computers together

The Internet is also what we call a distributed system; there is no central

archive.

The Internet thrives and develops as its many users find new ways to

create, display and retrieve the information that constitutes the Internet.

1.2.2 Internet from practical and technical angle

From the practical angle

Internet is a vast collection of globally available information which can be

accessed electronically – information which is of practical use for business,

research, study and technical purposes. It is a means for electronic

commerce – marketing, buying, services, economic and financial data

research. It is a collection of hundreds of libraries and archives that will open

to your fingertips. It is also a vast store of information relating to your

hobbies, travel, health, entertainment, games, software, etc.

Today the information can be in the form of Text, Images, Animation,

Sound, Video etc tomorrow it would probably be in the form of smell, touch,

taste or some energized form. If information can be put on computers, that

mean it can be digitized, it can be made available on the internet. The only

catch is, how fast? Even the future may not be able to tell.

From the Technical Angle

To be technically correct, we can say that the internet is “an ever growing

wide area network of millions of computers and computer networks across

the globe, which can exchange information through standard rules

(protocols). Each computer has a unique address. Information is divided into

packets which may travel through different paths to the destination address

where it is recombined into its original form.”

1.2.3 Who owns and cares for the Internet?

Owning of internet

No one owns the Internet. Any single person, corporation, university or

government does not fund it. Internet has been described as the cooperative

anarchy. Every person who makes a connection, every group whose Local

Area Network (LAN) becomes connected, owns a slice of the Internet.



You can compare Internet model with phone companies and the electric

companies. For example, there is phone service in almost every part of the

country. With a phone company, each person who wants telephone service

contacts a local service provider. The service provides a “hook-up” from the

residence or office to the service network.

The person wanting service actually provides the telephone instrument and

the connections within the residence or office. As long as the calls you want

to place are restricted to your local area, you do not need anything else.

However, if you want to place a call to someone in another area, you need

to purchase services from a long-distance service provider. The local area

provider supplies the connection from the local network into the long-

distance network. This model allows you to connect to the telephone almost

anywhere in the world. Moving among networks of computers works much

the same way (which is not surprising since the telephone networks – that

is, the physical cables – are used to connect the computers).

Who cares for Internet?

Many people care about the internet. All the people who use it, even if only

to send a note to someone on some other network that is connected into the

Internet, care about it. Someone or some enterprise owns each computer

connected. The owner of the connected equipment therefore „owns‟ a piece

of the internet. The telephone companies „own‟ the pieces that carry the

information packets. The service providers „own‟ the packet routing

equipment. So, while no one person or entity owns the internet, all who use

it or supply materials for it play a part in its existence.

Since communication between networks cannot happen without

co-operation, there are committees and groups working hard all the time to

ensure smooth functioning. Some issues related to providing standards and

identification of computers on the NET are to be cared by somebody. Some

groups have thus been formed who look after primarily about the

commonality part of internet. This body is called IAB (Internet Architecture

Board), earlier called Internet Board as named by ARPA. There are two

main wings to this board:

IETF (Internet Engineering Task Force)

IRTF (Internet Research Task Force)



IETF does a documentation of the internet known as RFC (Request For

Comments), named so because it is a set of open-ended documents always

available to public for their comments and thus the standards keep

continuously evolving.

Apart from maintaining protocols and norms/standards, another important

function of commonality is assigning unique names and addresses to

computers connected on the Net. This function is performed by InterNIC

(Internet Network Information Center) which is a group of three

organizations.

1. General Atomics, CA : Provides Information Services

2. AT & T., NY : Provides Directory and Database Services

3. Network Solutions, VA : Provides Registration Service

The services of InterNIC group are available on the Internet itself. Each

individually connected network maintains its own user policies and

procedures as to who can be connected, what kind of traffic the network will

carry, and so on.

1.2.4 What is TCP/IP?

As we have already discussed, the Internet is built on a collection of

networks covering the world, and obviously, these networks contain many

different types of computers. To hold the whole thing together we have

something called TCP/IP (Transmission Control Protocol/Internet Protocol).

Protocols are the rules that all networks use to understand each other. For

example, there is a protocol describing exactly what format should be used

for sending mail message. All internet mail programs follow this protocol

when they prepare a message for delivery. Collectively, more than 100

protocols are given the common name, called TCP/IP, used to organize

computers and communication devices into a network. It is glue holding it all

together.

Information within the Internet is not transmitted as a constant stream from

host to host; rather data is broken into small packages called segments. To

divide the data (or message) into number of segments is the task of TCP.

TCP marks each segment with a sequence number, the address of the

recipient, the address of the sender, and it also inserts some error control

information.



The segments are then sent over the network, where it is the job of the IP to

transport them correctly to the remote host. TCP of the other end receives

the segments and checks for errors. If an error has occurred, TCP can ask

for that particular block to be resent. Once all the segments are received

correctly, TCP will reconstruct the original message using the sequence

number. Therefore, the job of TCP is to manage the flow and ensure the

data is correct, and for IP the job is to route the raw data – the packets from

one place to another.

The technical answer of, “What is TCP/IP, is: TCP/IP is a large family of

protocols used to organize computers and communication devices into a

network. The two important protocols are TCP and IP. IP transmits the data

from place to place, while TCP makes sure it all works correctly.

1.2.5 Introduction to RFC

The internet is based on a large number of protocols and conventions. Each

such protocol is explained in the technical publication called a request for

comment or RFC. An RFC is usually a detailed technical explanation of how

something is supposed to work, not an invitation for people to send in

comments. Each RFC is given a number and is made available to anyone

who wants to read it. In this manner, the technical information that supports

the internet is distributed around the world in an organized, reliable manner.

Programmers and engineers who want to design products to work with the

internet protocols can download the RFCs and use them as reference

material. This ensures that everyone is using the same specifications and

that all the internet programs are designed to follow the same set of

standards.

1.2.6 How Internet Works?

In this section, we are going to cover the concepts of Internet, sending and

receiving messages.

Working of Internet

The primary objective of any network is to exchange information between

different locations. The rules for this exchange are called Protocols. The

protocol on Internet is TCP/IP (Transmission Control Protocol/Internet

Protocol) which is actually a name for a set of many rules framed to connect

computers in a wide area network, a network which is established between

computers across cities or countries.



Let us take a practical example of simply exchanging a message between

two persons, one at Lucknow and another at Mumbai.

Surya has an Internet account at Lucknow as [email protected]

Rishaba has an Internet account at Mumbai as Rishaba@bmOl.

vsnl.net.in

When Surya wants to send a message to Rishaba at Mumbai, he dials

from his telephone to his local service provider, types out his message

and types out the address of the recipient.

Surya‟s message is then broken into packets, which is an easy and

reliable communicable entity.

These packets are then broadcast to various connected links along with

the destination address, say o Delhi and Kanpur. At these sites also,

there is packet forwarding facilities available based on address available

and after a while, all packets ultimately reach the destination address

that is Mumbai.

At Mumbai, all packets marked for a particular address

[email protected] and particular message number are

(automatically generated) are reassembled and then posted in the box

that Rishaba is supposed to access regularly.

The above example cites a case of store and forward type of message

transfer. However the on-line transfer also occurs in the same way,

provided machines at both the ends are switched on and set to

transmit/receive internet traffic.

Sending and Receiving messages

How the messages are sent and received across a network? Suppose I

send a message. It could be a simple E-mail saying „Hi‟ to tell you I am on

the network. On the other hand, it could be a file, like the text of this chapter.

Now I have to tell the system the address of your computer. It is generally

your name. Therefore, what I do is put this message in an envelope super

scribing your name on it. The actual operation is not much complicated

either. It will be relevant here to understand the mechanisms adopted in

telephones and cable networks transmitting satellite channel, to your home.

Nevertheless, keep in mind that the way communication takes place

between computers is different from both these cases.



When your phone is off the hook, your line is engaged and you cannot

receive another call. However, your cable operator can beam so many

channels and you can surf them at will. Telephones are circuit switched. In

simple terms, it means that when you dial a number it goes to your nearest

exchange, which routes it to the nearest exchange of the called number,

and the ell rings at the end. The moment the receiver lifts his phone off the

hook, a circuit between you and him is established. This is a dedicated

circuit. The whole mechanism is called circuit switching.

Your cable operator, on the other hand, can send multiple channels

because each channel has a different frequency and depending on the

bandwidth of the cable, many channels can be beamed. Imagine a wide

road with neatly defined lanes, one for two wheelers, one for cars, one for

light commercial vehicles and so on. Imagine frequency as the type of

vehicles and you have it!

In case of computer-to-computer communication, you cannot afford to have

circuit switching, and you cannot assign different frequencies to each

computer. The computer networks are packet switched. The different

stations send discrete blocks of data to each other. You can think of these

blocks of data as corresponding to some piece of a file, a piece of e-mail, or

an image.

The message is broken into pieces called packets. The time too is divided

and each computer gets a quota of time to send packets. Suppose many

stations want to communicate at the same time, they have to share the

network resources, especially the wires. This can be achieved through

multiplexing techniques.

Each packet has actual contents surrounded by a header and a trailer. The

packet header has information about its destination. The NIC (Network

Interface Card) transmits the packet on the network. All computers passed

by this packet get to see it but ignore it after seeing the header. The NIC at

the intended receiver copies the packet. But does it copy each packet

separately? Yes. The information at the two ends of the packets helps these

to be put together.



1.2.7 Internet Applications

Internet is an important tool for practically everybody. The applications are

endless. Whatever information is required, it is ether already available on

the internet or it is soon going to be available. Here are some interesting

application areas:

Electronic mail, which was until recently considered only an internal

mechanism, is quickly becoming the most widely used application on

Internet. The most common of the communication methods used by the

people on the Internet is the private letter, written by one individual to

another (on any subject and in any language), and sent between any two

connected Internet sites or through an Internet e-mail gateway to or from a

service which provides an Internet gateway.

The ability to exchange visual information in readable and reusable formats

such as charts, figures, tables, images, databases, software code – opens

up possibilities for collaboration at the global as well as local levels. With the

trend specialization, the ability not only to communicate but also to actually

work with colleagues in the same field scattered all over the world makes

long distance collaboration feasible.

The resources for on-line research are multiplying at an astounding rate.

Searchable databases‟ library holdings, alerting services, pre-prints, and

other information systems are all changing the way research is done. Library

shelves are overflowing with journals and proceedings and with acquisitions‟

budgets receiving deep cuts, a likely scenario for the future is one in which

libraries achieve electronically, share holdings and become information

clearing houses instead of closets.

Another very important application of internet is Multimedia. Live music

concerts, radio broadcasts, live or recorded television shows, interactive

audio and web phone, and video conferencing are no more a dream on

Internet, even for a desktop PC user.

Internet provides a variety of information to everybody ranging from

entertainment to serious business application to areas of daily life such as:

Magazines and newspapers

Household shopping items

Ordering novelties from anywhere in the world



Radio and TV broadcast schedules and sometimes the broadcast itself

Tour and travel plan guides and bookings

Health consultation

Tips for doing various things

Talking to friends and relatives in any part of the globe

Games of various kinds

Language interpreter

On-line education course material, examination conduction, advertising

on popular information sites, making payments on the net and getting an

item, Internet Banking.

Self-assessment questions

1. RFC stands for __________.

2. Internet is a network of ____________.

1.3 Concepts of Server

In this section, we are going to discuss the concepts of client server models,

mail servers and FTP servers.

1.3.1 Client Server Model

Well, some computers are more equal than others. There are more powerful

computers (not necessarily bigger) called servers. Actually they are like our

public servants, administrators to the core. These servers are connected to

other dependent (but not in all respects) computers called client, hence the

client-server model. The two are connected either through physical links

(wires, optical fibers, etc) or through microwaves using satellites or

microwave towers. When you have many computers talking and sending

and receiving an infinite number of signals traveling through these media,

there have to be traffic snarls. So there are devices to take care of these.

More detailed explanation for server and client is given below:

Server:

Many of the host computers on the Internet offer services to other

computers on the Internet. For example, your ISP probably has a host

computer that handles your incoming and outgoing mail. Computers that

provide services for other computers to use are called servers. The software

run by server computers to provide services is called server software. A

server usually runs n a computer that is connected directly to network and



keeps running till any client login is expected. The size of that network is not

important to the client/server concept – it could be a small local area

network or the global Internet. The server is designed to interact with client

programs

Client:

Conversely, many of the computers on the Internet use servers to get

information. For example, when your computer dials into an Internet

account, your e-mail program downloads your incoming messages from

your ISP‟s mail server.

Programs that servers for services are called clients. Your e-mail program is

more properly called an e-mail client. A client program is designed for a

particular computing platform (for example, UNIX, Macintosh, Windows) to

take advantage of the strengths of the platform. It uses environmental

elements just like the ones used in word processing or a spreadsheet, or

even in playing a computer game.

Using the familiar computer environment, the client may help you locate

servers of interest, send a query, process the query results, and display

them using familiar tools. Popular client/server software include WinGopher,

Mosaic, World Wide Web software, Netscape Navigator and Novell Netware

file server software.

The client/server model has become one of the central ideas of network

computing. Most business applications being written today use the

client/server model.

1.3.2 Servers

Mail Servers

The mail servers handle incoming and outgoing mail. Specifically, Post

Office Protocol (POP) servers (or POP3 servers) store incoming mail, while

Simple Mail Transfer Protocol (SMTP) Servers relay outgoing mail. Mail

clients get incoming message from, and send outgoing message to a mail

server, and enable you to read, write, save and print messages, store web

pages and transmit them in response to requests from web clients, which

are usually called browsers.



FTP Servers

Stores files that you can transfer to or from your computer if you have an

FTP client

News Servers

Stores Usenet newsgroup articles that you can read and send if you have a

news client or newsreader.

IRC servers

Act as a switchboard for Internet based on-line chats. To participate, you

use an IRC client.


3. Many of the host computers on the Internet offer services to other

computers on the Internet. (true/false)

4. SMTP stands for ___________.

1.4 Getting Connected

Since Internet is a composite network of more than thousands of discrete

networks, each having its own rules and procedures, there could be many

different ways by which you can connect to the Internet. To use the Internet

you need three things:

1. A Computer

2. Client programs to run on your computer (one client for each type of

service you want to use).

3. A way to connect your computer to the Net so your clients can service

your request.

1.4.1 Different Types of Connections

To start with, we need to go over the different types of Internet connections.

There are essentially three different types of connections for accessing the

services and resources of the Internet:

Dialup Connections

ISDN, ADSL, and Leased Line Connections

Satellite Connections

Dialup Connections

To access the Internet via a phone line, the concept is: Connect your

computer to the telephone system using either a regular phone line (with a



modem) or an ISDN line (which requires special equipment). To start work,

you run a communication program to dial the phone and establish a

connection with a remote Internet host. Once the connection is established,

you log in to the server by typing your user name and password. At this

point, there are three possible types of dial-up connections:

a) Shell account access

b) TCP/IP account access

c) Dial-up or on-demand TCP/IP link through your LAN

a) Conventional Dial-up Shell Account: With this type of account, you

actually do your work on the remote computer. You establish an

interactive session wit another computer which is an Internet host. Your

desktop assumes the role of an ASCII terminal. With shell access, your

provider‟s computer is considered a part of the Internet, but your

computer is not. The only program that runs on your computer is the

terminal emulator. When you connect to your provider, you type

commands to its system, which tell it what functions you want to do. The

program on your provider‟s computer that receives and acts on the

commands is known as a shell. The shell and the programs it runs for

you send back to your computer some text that is displayed on the

screen. A terminal emulator only supports a text-based interface, not a

graphical interface. You are usually limited to running one client at a

time

b) Protocol dial-up (TCP/IP Account): A protocol dialup account lets your

computer behave like it is connected directly to another computer on the

Internet – when it is really connected over a phone line whenever you

dialup and it enables you to run software, such as a graphical Web

browser like Microsoft Internet Explorer or Netscape Navigator, that

functions in your computer‟s native environment instead of forcing you to

deal with plain text programs like the text only browser Lynx and UNIX.

This means when you have a protocol dialup (TCP/IP) account, during

the time you are connected your computer is a full fledged Internet host.

The client programs you use as many clients as you want at the same

time. For example, you could start four programs – a web client, a

gopher client, a mail client, and switch back and forth from one to the

other. This type of connection is also known as TCP/IP type of account

and it uses the TCP/IP protocol to perform data transfer on the Internet.



PPP and SLIP: The family of Internet protocols is called TCP/IP. The

connection protocol with ISP‟s server is known as PPP (Point to Point

Protocol), which is used in Indian context, although there are other

connection types such as SLIP or CSLIP which are available from other

Internet Service Providers in the world. But to your satisfaction you can

be sure that PPP is the most recent and advanced connection protocol.

The job of IP is to move the raw data from one place to another. Thus,

the protocol developed to support TCP/IP over a serial cable was called

SERIAL LINE IP or SLIP. SLIP dates back to the early 1980s and was

designed to be a simple, but not very powerful method of connecting two

IP devices over a serial cable. PPP is more powerful , more dependable,

more flexible, and is a lot easier to configure when you need to get it up

and running on a new system.

c) Dial-Up or On-Demand TCP/IP link through your LAN: A dial-up link

from your LAN is the intermediate step between individual dial-up and a

dedicated high speed link. It is therefore somewhat like dial-up and

somewhat like having a direct link. The main difference between this

type of connection and the one to your individual computer is that the

TCP/IP software runs on the LAN server, and your connection is to the

server. A TCP/IP connection through a LAN, either on a dial-up

connection or a direct connection, is the most common type of IP

connection, much more common than a personal dial-up IP connection.

ISDN, ADSL and Leased Line Connections: An alternative to a regular

phone line is ISDN (Integrated Services Digital Network) and Asymmetric

Digital Subscriber Line (ADSL) – a type of telephone service. ISDN and

ADSL allow the user to connect to another computer at a speed which is

much faster than even the fastest modem because it is digital. Thus, if you

are using a phone line connect your computer to the Internet, you are better

off with an ISDN or ADSL (not all phone companies offer them) connection

because it is digital and it is a lot faster. These services can be run as fast

as 128 kbps.

ISDN or ADSL services are a boon for corporates that have multiple users

who need simultaneous Internet access. However, it is still a medium that

very few Internet users have tried out in India. Primary reasons for this being

delayed implementation by MTNL (Mahanagar Telephone Nigam Ltd.) and



relatively higher costs. Mantraon-line is the first private ISP to offer the

same.

A dedicated link (or leased line) is a permanent connection over a telephone

line between a modem pointer to another modem pointer. A router is a

specialized computer that reads the address of each TCP/IP packet and

sends the packet to its destination. At higher speeds (56 kbps and above),

routers are used. With a dedicated link, your personal computer or LAN is

connected to the Internet at all times (compare it with hotlines, in which you

just pick the phone and start conversation, no dialing, no engage problem,

etc.). This type of connection is the most costly connection because it is

private (nobody else can share) to a person‟s computer or organization.

Leased lines come in various speeds, including T1 (1.5 Mbps, or enough for

24 voice channels) and 13 (44 Mbps, or enough for 672 voice channels). If

you do not need quite that much speed, you can ask for fractional – T1 (half

or a quarter of a T1 line). You also need to connect your ISP for a leased

line account, which costs more than a dial-up account.

ISDN Advantages: To the subscriber, however, the most interesting

advantage perhaps is that via ISDN the entire services can be used with

one phone number only. One line is sufficient for telephone, telescopy,

video conference, or data transmission. A special protocol is responsible for

the fact that each incoming call will be directed to the right terminal. Thanks

to the Multiple Subscriber Number (MSN), it is now even possible to dial

each device by a central or PBX from the outside, without establishing a

connection prior to this.

1.4.2 Requirements for Connections

This section deals with shell account, TCP/IP account, TCP/IP software and

Web Browser.

For Shell Account

If you have a shell account type of access, what you all need is to become a

terminal on the computer of your ISP, thus the minimum possible PC

configuration with a VT-1 00 or equivalent type of terminal emulation

software can server your purpose well. In fact you may have a simple dumb

terminal to access such an account. The terminal emulation software on PC

is also widely available such as PROCOMM, etc. Please choose emulation

software which has KERMIT and ZMODEM file download capability. A



modem with error correction capabilities 9.6 kbps or better, and telephone

line with capability to dial service provider Local/STD are also required.

For TCP/IP Account

It is the power of software available for TCP/IP account which has made

Internet so popular these days. It is highly desirable that you have a GUI

operating system such as Windows on your desktop, if you are a TCP/IP

account holder. Typically, you would require a TCP/IP connection

establishing software and a Web-Browser to access this type of account. A

modem which best suits your pocket and is fastest to its class is the right

choice. Typically a 28.8 kbps modem is found to perform best with Indian

ISPs.

TCP/IP Software

Such software is now bundled with new operating systems such as which

are also called TCP/IP sockets. If you do not have it along with your OS you

can have third party socket software such as Trumpet Winsock. It is

important to run this software to get connected to your ISP before you can

do the browsing part.

Web Browser

Web browsers are the Client software (your machine is a client to ISP‟s

server) which has various graphics capabilities to access the information

from the Internet. Modem Web Browsers are capable of browsing WWW,

Gopher sites, FTP sites and also provide facilities for e-mail. Initially NCSA‟s

web browser Mosaic hit the market which actually made the browsing

popular. Now web browsers from Netscape and from Microsoft are the

user‟s choice. You can get hold of any such browser and start browsing the

Net.

Self Assessment questions

5) There are three types of dial-up connection are available. (true/false)

6) An alternative to a regular phone is ISDN. (true/false)

1.5 Internet Service Providers

An Internet Service Provider (ISP) is an organization or business offering

public access to the Internet. It is your gateway, to the Net. You have to

subscribe to a provider your Internet connection. You use your computer

and modem to access the provider‟s system and the provider handles the



rest of the details of connecting you to the Internet. There are many types of

Internet providers. You can, for instance, choose one of big commercial

online service providers. The primary business of an ISP is hooking people

to the Internet by giving an Internet account to subscribers, and providing

them with two different kinds of access: shell access and SLP/PPP access.

Most ISPs offer both kinds of access, some offer both with a single account

and others require that you choose one or the other. Once you register, your

provider will give you a user name (called a user id password, and a phone

number to dial). To establish the Internet connection, you have your

communications program dial the number. You then log in using your

particular user ID and password. At present it is VSNL (Videsh Sanchar

Nigam Limited) which is dominating the Internet scene in India through its

GIAS (Gateway Internet Access Service). The other service provides in

India are MTNL (Mahanagar Telephone Nigam Limited), Mantraon-line and

Satyam on-lie. Due to the new options in BSNL where the user need to

register from the telephone number and no separate account, the number of

users has increased. In this case what ever the usage of the person the

individual has to pay.

Choosing an ISP

The privatization of Internet Service Providers (ISPs) is set to give a further

fillip to the Internet boom. Central to the success of any service is the price

criterion. You will be amazed to find out how a service offered at a premium

could in effect be cheaper, considering the add-on facilities that are offered

along with the core service. Do not forget that apart from the Internet

connection, the ISP gives you an international contact address, that is, your

e-mail address. It is because of this e-mail address that you must be

discerning while choosing your ISP. The e-mail address provided by the ISP

would be all over your business and it will not be easy for you to change

your service provider if you wish to change your address. You will have to

live with the ISP as well as the e-mail address.

User ID – Telephone Ratio: The first thing you must keep in mind while

zeroing in on your ISP is the user-to-line ratio it commands. That is, how

many users are using or are expected to use one single telephone line.

Ascertaining this, however, is not easy as the numbers of subscribers are

growing every day. Nevertheless, even the current user-to-line ratio will give

you an idea about the standards the ISP has set for itself. This factor is very



critical because it determines the ease of usage whether you would be able

to connect to your ISP or not. Another way of finding this is to check out with

some of the existing users as to how much time it normally takes to dial into

a given ISP. If it takes more than 10 minutes to get through, that particular

ISP should be avoided.

Interface Simplicity: Very few organizations take into account the simplicity

of the interface while opting for an ISP. This occurs to them only when they

begin to use the Internet service across their organizations. The right kind of

interface can lead to tremendous savings in cost. There are other problems

too. How many users in an organization know about dial-up networking

under Windows? How many can remember and use passwords correctly?

To how many people would you like to give the password? Does terms like

TCP/IP sound friendly to them? Questions like these determine the success

of the Internet enabled organizations. There are some ISPs to whom these

questions do not apply. They provide an easy-to-use interface that once

installed works by simply pressing a button.

Roaming Facility: The roaming facility is particularly relevant for those who

travel a lot. Though most ISPs advertise this particular facility, there are not

many who pay heed to it. Its benefits are realized only when one reaches

another city and wants to access an urgent e-mail or the Internet. How does

one connect to the Internet when one is not an ISP subscriber in that

particular city? To overcome this problem, either you will have to use a

facility like Hotmail to access your mail from round the world or use the

roaming facility provided by your ISP. The roaming facility allows you to dial-

in into the local node of your ISP or of the regional ISP that your service

provider has a tie-up with. Then all you have to do is to plug in your

computer to a telephone line, find out the numbers for dial-up access, and

then using your password, access your original Internet account. A crucial

point here is the number of cities that your ISP has presence in or has tie-

ups for the same.

Multiple Login Facility: Very few users know about this facility, mainly

because it is hardly advertised. However, it can prove to be a life-saver and

a great help for small and medium business houses. If n organization has

only one Internet connection, but more than one employee wants to access

the net simultaneously then this would be possible only if the ISP offers to

the organization the multiple login facility. In fact, this facility can even be



availed of while being away from the Organization. For instance, one user

may be in New Delhi and the other user in Mumbai. But, with the e-mail it

would be possible for the man away in Mumbai to simultaneously access

the Internet. Some ISPs offer multiple e-mail IDs that allow you to segregate

e-mail individually. But you have to pay extra for this.

Special Packages: The private ISPs are putting out some unique usage

packages. It has launched a special package for night users. For those who

access the Net at night, some ISPs offer a dial-up account which costs

almost half compared to the regular connection. This account cannot be

used during day time. This is only the beginning as far as special packages

are concerned. Soon you will find ISPs (especially the regional ones)

coming out with packages that will fit your needs better than your cotton

trousers. So do not forget to check out each and every player before

deciding on your Internet provider.

Support: This is very crucial topic and an area of service where most of the

players have been found wanting. Try getting any help from the service

provider and the beautifully programmed EPABX system will take you

around each and every option, only to disconnect your call at the end saying

“Sorry, the person handling your call is busy at the moment”. In case, you

happen to be using pulse-dialing equipment, you can forget using the

telephone, and may as well go to their office and clear out the matter there

and then.

Ideally, new users should subscribe to an ISP where they can be hand-held

through the initial process, as Bill Gate‟s Windows operating system does

try its best to support you in the exercise. An installation guide, the help

desk‟s phone number, Windows 95 installation CD are part of the necessary

survival kit that a new user must have while undergoing this procedure.

Discounts on Renewal: Last but no the least, you must find out whether

your ISP will renew your account at the same rate or whether there are any

discounts to retain its old customers? This is a factor that can upset those

lining for their first-buy. VSNL has been very successful in playing this card.

It offers slashed rates to those subscribers ho renew their accounts.

Brochure-speak: If you can have more than a hundred different versions of

the holy Ramayana, just think what the crafty marketing people can do to

simple terms of the Internet. Hence, one must see through the exotic looking



tariff cards of most ISPs. You must have the ability to judge beyond the

gloss and the glitter. To summarize, here is what you want from an Internet

Service Provider:

Access via a local phone call

A flat monthly fee

An ISDN or fast (28.8 kbps) connection

A PPP account

A shell account at no extra charge

The ability to use whichever Internet clients you want

Full Internet access to all resources

The capability of having your own web home page

Software support, through which you can use to connect to and use the

Internet

Technical support should be open 24 hours a day, 7 days a week

Self Aassessment Questions

7. ___________ are the examples of ISP.

8. The private ISPs are putting out some unique usage packages.

(true/false)

1.6 Address in Internet

Understanding of Internet also requires you to know a little about how the

systems connected on the network are named and identified. With these

names only you locate a computer and get connected to it. Every computer

that is on the Internet has its own unique address. On the Internet, the word

ADDRESS always refers to an electronic address. There are two kinds of

addresses in the Internet:

Domain names

IP Addresses

1.6.1 The Domain Name System and DNS Servers

On a TCP/IP network, computers know each other by their IP addresses.

But for human beings, remembering numbers is not the easiest thing to do.

Remembering names is much easier. Similarly, a way was devised to

associate IP addresses with names that can be easily remembered. In the

early days of the Internet, “hosts” files were used to associate machines with

names. The hosts file is simply a table of IP addresses and corresponding

names like a phone directory. Any name lookup (the process of identifying



the IP address associated with a name) will first check the hosts file (if

present) on the machine making the query, to see whether the name can be

resolved.

Within the Internet, each separate computer is called a host. For example,

you might tell someone he can find the information he wants by connecting

to a host in Switzerland. If your computer is connected to the Internet, then it

too is a host, even though you may not be sharing any resources with the

rest of the world. If you connect to an log into a host and then use its rest of

the world. If you connect to and log into a host and then use its functions to

reach out onto the Internet, you are using your computer as a terminal to

reach another computer. Host connections are designed to use very simple

text based interactions.

Being connected to the Internet means your computer system or network is

actual a node on the Internet. It has an individually assigned Internet

address and client program to in running on the computer system that can

take full advantage of the computer‟s capabilities. Your workstation is a peer

of every other computer on the Net. So, a node is any “addressable device”

attached to a computer network.

But with the number of hosts on the Internet increasing rapidly to an

unmanageable level, that soon became impossible. The way out was the

DNS: the Domain Name Server. The DNS is a distributed, scalable

database of IP addresses and their associated names. It is distributed in the

sense that unlike the hosts file, no single computer contains all the DNS

information in the world. The DNS data is distributed across many name

servers. It is scalable – you can increase the volume of total DNS data and

requests from machines for the same data, without significantly increasing

the querying time. Otherwise the World Wide Web would really become the

World Wide Wait.

To understand the DNS and the way it is used, we need to understand the

Internet naming structure. Let us take, for example, the address:

http://www.trg.hclsso.hclinfosystems.com/

www: Indicates that the machine is part of the world

com: Indicates the top-level domain (TLD) that the machine is part of. Top

Level Domain include .com, .edu, .gov, .in etc



hclinfosystems: Shows that the computer we are looking for is in a network

called hclinfosystems

hclsso: Indicates a sub-network (a group of computers with a common

function or at a common location).

trg: Is the name of the machine that we are interested in.

Let us see how the DNS aids in identifying the machine‟s IP address, given

its name. at the top level of DNS structure are the nine root name servers of

the world, which contain pointers to the master name servers of each of the

top-level domains. To find the IP address of http://www.trg.hclsso.hclinfo

systems.com/ the DNS server will have to ask one of the root name servers

for he address of the master name server for the .com domain. This master

name server will have the addresses of the name servers for all the .com

domains. From here you get the address of the name server, for the

hclinfosystems.com/ domain. You move on to this name server, which will

give you the IP address of the machine trg.hclsso.com. If there is a name

server for the trg.hclsso.com sub-domain, then the name server for

hclinfosystems will guide you on to this name server, which will give you the

IP address of trg.

A domain name is a way by which a company can uniquely identify itself on

the Internet. Registering a domain name on the Internet is the equivalent of

registering a company name at Companies House. Based on the top level

identifications, there are basically two types of domains:

1. Non-geographic domains

2. Geographic domains

Non Geographic Domains

The top level Internet domain types those are non-geographical:

Domain Indicates Example

Com Commercial Organizations hclinfosystems.com

Edu Educational Institutions Stanford.edu

Mil A (US) military setup Nic.mil

Gov A (US) government setup Nasa.gov

Org Other organizations www.bjp.org

Net Other networks Ns.stph.net

Int An international organization Tpc.int



Geographic Domains

The geographically based top-level domains use two-letter country

designations.

Domain Meaning

Au Australia

Ca Canada

Dk Denmark

Fr France

Gr Greece

In India

Jp Japan

Us United States

In a complete (fully qualified) domain name, the part furthest to the right is

the top level domain, representing either a type of organization or a country.

As you read in from the right, the name gets more specific until you reach

the name of the individual host computer. For instance: rubens.anu.edu.au

is the name of a computer. It is in Australia (au), in the educational area

(edu), at the Australian National University (anu) and the host computer is

named rubens.

1.6.2 IP Addresses

Each host computer on the Internet has a unique number, called its IP

address. IP addresses identify the host computers, so that packets of

information reach the correct computer. You may have to type IP addresses

when you configure your computer for connection to the Internet. An IP

address is a 32-bit number that uniquely identifies a network interface. The

IP address is assigned to a network interface card and not a computer. So if

you have two Network Interface Cards, then each card is assigned an IP

address. The 32 bit IP addresses are normally expressed in dotted-decimal

format, with four numbers separated by periods, such as 151.202.123.132

These numbers can be the ranges of 0 to 255. The four constituent numbers

together represent the network that the computer is on and the computer

interface itself. IP addresses are organized from left to right, with the left-

hand octet describing the largest network organization and the rightmost

octet describing the actual network connection. Each octet has value of 8

bits within the computer. When the four octets of the address are added



together, the total address has a value of 32 bits. Using the various

combinations of these octets, several million unique identifiers can be

assigned.

Classes of Networks

Just as with our phone numbers, we can look at the leftmost octet and

determine something about the network. Network addresses are divided into

classes, which are assigned depending on the size of the physical network.

The value of the first octet tells us what class the network is in, and how

large the physical network that underlines the number is. The first octet is

sometimes called the network address or net number.

Class A: Over 16 million served

These are very big networks with up to 224 (16 million) nodes. Class A

networks have their network addresses from 1.0.0.0 to 126.0.0.0. The zeros

are replaced with the node addresses. NEARNET, Sprint, ANSnet, Merit

and AT&T are examples of organizations with class A network numbers.

Class B: Larger nets

Class B networks are smaller than Class A networks. They can have up to a

maximum of 65000 nodes. Network addresses range from 128.0.0.0 to

191.0.0.0. In this case only the last two zeros are replaced with the node

addresses. Class B addresses go to organizations with larger nets, such as

universities or large businesses. The first two octets in a Class B address

describe the network itself, and the second two identify the host.

Class C: Addresses

Class C networks are smaller than Class B networks. They can have up to

254 nodes. Network addresses range from 192.0.0.0 to 223.0.0.0. In this

case only the last zero are replaced with the node addresses. The first three

octets are used for the network numbers and the last octet is the host

number. This class is where most networks will be assigned. Originally,

Class C addresses were intended for small company networks, K-12

schools and single machines that were not connected to other, larger nets.

Other Classes

There are other classes of networks, Class D and Class E. They are

primarily used for experimental purposes. For a given network address, the

last node address is the broadcast address. For example, for Class C



network with address 193.168.1.0, the address 193.168.1.255 is the

broadcast address. The IP addresses for networks on the Internet are

allocated by the InterNIC, the official body in charge of allocating domain

names and addresses.

Subnet Masks

In an IP network, every machine on the same physical network sees all the

data packets sent out on the network. As the number of computers grows,

the increase in network traffic brings down the performance. In such a

situation it is recommended to divide your network into sub-networks and

minimize the traffic across different sub-networks. Interconnectivity between

the different subnets would be provided by routers, which will only transmit

data meant for another subnet across itself. To divide the given network into

two or more subnets you use subnet masks. The default subnet mask for

Class A networks is 255.0.0.0; for Class B is 255.255.0.0; for Class C is

255.255.255.0 which signifies a network without subnets. The subnet mask

is used to identify the subnet to which an IP address belongs, by performing

a bit-wise AND operation on the mask and the IP address.


9. Class A networks have their network addresses from __________

to _______.

10. ______________are examples of organizations with class A network

numbers.

1.7 Resource Addressing

Using the Web means having your browser act as a client program on your

behalf. In order to fulfill your requests, your browser will contact a server,

and ask for either some information or a service of some type.

1.7.1 URL (Uniform Resource Locator)

URLs provide a standard way to specify the exact location and name of just

about any Internet resource. In general, most URLs have one of two

common formats:

Scheme: //hostname/description

Scheme: description

Example 1: http://www.alan.com/afan



This example describes a particular web page on a particular computer. The

URL begins with a name, indicating a specific type of resource.

Example 2: news.rec.human

This example describes a more general resource. The scheme is news,

which indicates a Usenet discussion group.

1.7.2 URLs and HOST Names

On the Internet, a hostname is a domain name assigned to a host computer.

This is usually a combination of the host's local name with its parent

domain's name. For example, "en.wikipedia.org" consists of a local

hostname ("en") and the domain name "wikipedia.org". This kind of

hostname is translated into an IP address via the local hosts file, or the

Domain Name System (DNS) resolver. It is possible for a single host

computer to have several hostnames; but generally the operating system of

the host prefers to have one hostname that the host uses for itself.

List of schemes used within URLs

Scheme Meaning

ftp File accessed via file transfer protocol

gopher Gopher resource

http Hypertext resource

mailto Mail

news Usenet newsgroup

telnet Interactive telnet session

wais Access a Wais database

1.7.3 URLs and Port Numbers

Each type of Internet service has its own specific port number. Within a URL

you only have to specify a port number if it is not the default for that type of

service. For example, the default port number for telnet is 23. The following

two URLs are equivalent:

telnet://locis.loc.gov/

telnet://locis.loc.gov:23/

The http service, by default, uses port 80. Similarly, the gopher service uses

port 70. For instance, the following two URLs are equivalent. They both



point to the same hypertext resources, using port 80, on the computer

named www.wendy.com:

http://www.wendy.com/~wendy

http://www.wendy.com:80/~wendy

1.7.4 Pathnames

Here is a typical hypertext URL:

http://www.cathouse.org/cathouse/humor/tech/data

We can divide such URLs into three parts, the scheme, the host name and

the pathname. To analyze such a URL look at each of the parts:

The scheme (http) identifies this resource as being hypertext

The hostname (www.cathouse.org/) is the name of the computer

The pathname (cathouse/humor/tech/data) shows where on the host the

hypertext resource is stored


11. URL stands for ______________.

12. ______________ is a domain name assigned to a host computer.

1.8 Email

This section covers the email concepts, definition, e-mail services and e-

mail networks.

1.8.1 Email Basics

The Internet is a valuable tool for accessing information, but it also opens a

whole new world of communications to its users. Using electronic mail

(email) a person can engage in conversations with people all over the world.

Yet, because of its convenience, it is also a powerful tool for even local

communication. With typical telephone communications you may be either

interrupted by a call, or may return a call only to find that the other person is

not available, an occurrence referred to as "telephone tag." Electronic mail

though, sits on the server computer until you are ready to read it, and when

you respond it will then wait patiently on the other person's computer until

they have time to read it. This is especially valuable for busy teachers, who

because of their duties and general working isolation in a classroom with

just their students, usually aren't able to communicate with peers on as

regular a basis as they would like.



Meaning and definition of email

Electronic mail could be defined as the transmission of letters and memos

from one computer to another. When E-mail originated in the 1970s, it was

just the sending of messages. The capability to send various items has

rapidly become true of E-mail: users now can attach spreadsheets, business

forms, lengthy documents, scanned images, faxed images, computer

graphics, meeting schedules, sound and video to their messages.

Electronic mail or Email lets you communicate with other people on the

Internet. Email is one of the basic Internet services, and by far, the most

popular. It is used for conversation purpose, to keep in touch with friends,

get information, start relationships or express your opinion. This is called

Email because:

a) You put it into an electronic envelope and address it

b) You post it or hand the message to someone else (i.e. the network) to

be delivered

c) You may not know when the Email is read

d) You get Email back in your mailbox, if you addressed it incorrectly

e) If the recipient leaves a forwarding address, the Email system will keep

trying to route it to him/her until it runs out of forwarding locations

f) If the network is unable to deliver your Email, it will return the mail (this is

called bounced mail).

Email Services

In practice, Email usually refers to a service that includes the following

facilities:

Store and Forward: Messages are held until they are requested by the

recipient. Direct person-to-person contact is not required and the service

can be used by either party at whatever time and on whatever day that

suits them

Blind copies: Copies can be sent automatically to names on a

distribution list, including „blind‟ copies (where the principal recipient is

not notified that others have received the message).

Advise delivery: The sender an be told (by a confirming message to his

or her mailbox) when the recipient has read the message. An immediate

reply could also be demanded.



Off-line working: Text can be prepared in advance of transmission and

incoming messages can be saved for later consideration or for use

within word-processed documents.

Email Networks

Email networks consist of Gateway and Closed user groups.

Gateways: Most electronic mail services include access to other facilities.

They include the telex system, on-line information services and electronic

typesetting bureaux which accepts Emailed text and return phototypeset

matters.

Closed user groups: These are areas of the Email service with restricted

access. In some cases they are available to anyone who pays an additional

fee; usually they will include extra gateways and more services. Other

closed user groups (CUGs) will be specific to members of a particular

profession – Telecom hold hosts cues for solicitors and accountants, for

instance; and there are also cues for customers of individual companies

(handy for disseminating and sharing information or making requests) and

user groups for particular computer products.

In addition to these basic functions of electronic delivery systems, most

systems provide features related to other aspects of office work. These

features include:

Composing messages

Text editing

Message filing and retrieval

Authentication of message authorship

Broadcasting and distribution of messages as per specified addresses

Content processing of messages

Message switching

Accounting and billing

Security

Many Email services offer some or all of these:

Radiopaging: Your pager will beep when an urgent message is

received in your mailbox. Or you can beep someone by sending a

message to the service‟s radiopaging mailbox.



Telemessages: This is a replacement of the old style telegram, can be

sent from some Email services rather than by you calling the Post Office

yourself. Delivery on the next working day (including Saturdays, usually)

is guaranteed for messages received by a set time (which can be as late

as 10 p.m.). The Telemessage service can include „special occasion‟

formats for birthday, anniversaries and the like; the delivery can include

a special reply envelope to encourage an immediate reply.

Message translation: Messages sent or received can be translated by

the Email service into the recipient‟s native tongue.

Courier services: A message placed by you on the Email service can be

copied ad delivered by hand or mailed.

The basic functions involved in an Email system are the message creation,

message transfer and post delivery processing. These are provided by the

User Agent (UA) and a Message Transfer Agent (MTA). Thus, an Email

system is actually a message handling system. The user agent is

responsible for providing the text editing and proper presentation services to

the end user. It provides for other activities such as user friendly interaction,

security, priority provision, delivery notification and distribution subsets. The

message transfer agent is oriented towards the actual routing of the

electronic move. It is responsible primarily for the store-and-forward path,

channel security and the actual routing through the communication media.

Several MTAs taken together form the Message Transfer System (MTS).

1.8.2 Mail protocols

Email is instantaneous, cost effective and above all, personal. It produces

the immediate results in terms of increased productivity from reduced

turnaround time, and reduced costs. Email is one of the easiest services to

implement on your Internet. The ideal Mail System consists of Email servers

and clients that support standards. A clear understanding of popular

acronyms of Email will help the users in choosing the right Mail Systems.

SMTP

The transmission of Email message through the Internet relies on the

SMTP, which stands for Simple Mail Transfer Protocol. SMTP is part of the

TCP/IP family of protocols. The SMTP protocol is used to transport

messages between computer systems in the Internet. SMTP uses TCP,

Transmission Control Protocol, which provides a reliable means of



communication. Throughout the Internet, there are millions of computers

using SMTP to send and receive mails.

Many of the host computers on the Internet run UNIX. Therefore, of

hundreds of thousands of transport agents scattered around the Net, are

running under UNIX. Specifically, most of these computers use a transport

agent called send mail, which runs automatically in the background and is

always ready to respond to whatever requests it may receive. In UNIX, such

a program is called a DAEMON and every UNIX system has various

daemons to provide fast services for you.

Internet mail system works only because everybody‟s network has at least

one computer running a transport agent, sending and receiving mail

according to the SMTP protocol. SMTP is fast and efficient. Nevertheless,

its drawback is both nodes should be on-line for communicating between

them. That is where POP comes in. SMTP governs the way; a UA (User

Agent) establishes a connection with a MTA (Message Transfer Agent) and

it transmits its Email message. MTAs also use SMTP to relay the Email from

MTA to MTA, until it reaches the appropriate MTA for delivery to the

receiving UA. The interactions that happen between two nodes on the

TCP/IP, whether a UA to an MTA or an MTA to another MTA, have similar

processes and follow a basic „call-and-response procedure‟.

POP

Post Office Protocol (POP) is a mail collection and distribution system,

which works on the office principle with the mail server. It is designed to

allow single-user hosts to read mail from a server. POP allows creating a

mailbox for each user who has a mail account on the server.

There are three versions of POP: POP, POP2 and POP3. POP is a system

by which a mail server on the Internet lets us to grab our mails and

download them to our PCs. Like SMTP POP also uses plain ASCII and

independent platform and the Operating System. POP depends on SMTP to

send mails and it handles the access to the messages. POP3 is the latest

version of this protocol.

IMAP

Internet Mail Access Protocol (IMAP), unlike POP, allows hierarchical

storage of mail and a message retrieval system that allows selective access



to your mailbox. While POP is used; for simply retrieving and deleting the

messages, using IMAP, we can organize our mails and read them on the

server itself. For a user getting connected over a slow dial up lines, IMAP

provides ways to download only the Header or the Body of the message that

contains a large attachment. In addition, IMAP allows one user to access

multiple mail serves and multiple users to share a single mailbox. IMAP can

work on any of the three basic models of communication, On-line, Off-line or

Disconnected Operation. In the On-line mode, the mail is processed in an

interactive fashion, that is, the client can ask the server for only the

message headers and then request only specified messages, or can even

retrieve parts of certain messages.

MIME and S/MIME

SMTP can handle only messages containing the 7 bit ASCII text and it

cannot handle other types of data such as 8-bit binary data and other

multimedia formats that nowadays we are sending both within the body of

Email messages and as attachments. However, as a solution to this

limitation, the ETF developed the Multipurpose Internet Mail Extensions

(MIME) protocol, which packs multimedia data into a format that SMTP can

handle. Stands for Secure/Multipurpose Internet Mail Extensions and was

designed to add security to Email messages in MIME format. The security

services offered are authentication (using digital signatures) and privacy

(using encryption). S/MIME is not specific to the Internet and can be used in

any electronic mail environment.

UUCP

As an Internet user, you may want to exchange mails with different types of

networks and then you should know what type of addresses they use. Same

popular networks to send mails are CompuServe, MCI Mail, America Online,

UNIX-based UUCP network, and so on.

All UNIX systems come with a built in networking system called UUCP.

Although the job of UUCP is to connect UNIX computers, it is not as

powerful as TCP/IP. For example, UUCP does not provide a remote login

facility; mail facility is slower and awkward than the TCP/IP based Internet

system. However, UUCP does have an important advantage. It is a

standard part of UNIX and it runs cheaply and reliably over dial-up or

hardwired connections.



UUCP works by allowing UNIX systems to connect together to form a chain.

To understand this let us consider all connections in Internet are permanent

and messages are transmitted quickly, often within seconds. Therefore,

there is no comparison between Internet and UUCP connections. To send

mail to UUCP address, you must specify the route you want the message to

take. For the above example the mail command will be: Mail second! Third!

Fourth! Pant

After creating such a message, your system will start this message until a

contact is established with the computer within seconds and then the

message will be sent on its way. If the path is too long or you have no idea

about what path to use from your computer to send the mail,

UUCPMAPPING PROJECT is undertaken. This allows you to use a UUCP

address that is similar to an Internet address. Thus, on occasions, you may

see an address that uses a top-level domain of UUCP. Look at the following

example: [email protected]

Say you are using a computer named first. Your computer is connected to

another computer named second. This computer is connected to third,

which is, in turn connected to fourth. You decide to send the message from

your computer, first to a person having user id pant to fourth. UUCP will

pass the message from first to second to third to fourth, where it will be

delivered to user id pant.

Therefore, in our example, four computers and three connections are

involved. The system works well as it provides an economical way to send

mail from computer to computer over large distances. However, the

limitation is since many UUCP connections are made over a telephone line

at certain predefined time, mail delivery can take hours or even several

days.

1.8.3 How to Access the Mail System

As we explained in the previous section, SMTP is used to send and receive

mail behind the scenes. The question now arises, how the mail gets from

the transfer agent to you. The computer that provides the Internet

connection also acts as the mail host. Typically, this computer runs a

transport agent program which is connected to the Internet 24 hours a day.

This means, whenever your mail arrives, the transport agent available

accepts it and saves it in a file called a MAILBOX. Each person who has an



account on the host computer is given his own mailbox file. In this way the

host computer always keep everyone‟s mail in an organized manner and at

the same time it assures you that no one can read your messages.

Ways of Accessing Email

There are many ways to access your Email. You may use a mail client, such

as Eudora, Outlook or anyone of the popular packages that download your

incoming messages from the POP server to your computer and upload your

outgoing messages to the SMTP server. This may occur through a Local

Area Network (LAN) or through a dial-up connection.

You may use a Web based Email service

You may use a commercial provider, such as CompuServe or America

On-line which have their own Email programs

You may get your Email through a LAN, a common system at large

organization, if your organization has some sort of Internet connection,

Email arrives in the company‟s POP server. You then read your Email

either on the server using an Email application or on your own computer,

by downloading your Email from the server through the LAN by using an

Email application. Your company may use a POP server or some kind of

proprietary protocol.

You may have a UNIX shell account and use a UNIX Email program that

reads your POP mailbox directly.

How does Email Work?

Let us review how Email works, using an example. In this example, you are

using a PC with Windows OS, which connects to the Internet using TCP/IP.

Let us suppose you want to send a mail to two of your friends: Surya in

Washington and Rishaba in Germany. Surya uses a Macintosh and also

connects to the Net using PPP. Rishaba uses a shell account by connecting

UNIX host computer. The following steps illustrate the example.

1. First using a Windows mail client, you compose the message on your

own computer.

2. After you compose the message, address it to both Surya and

Rishaba.

3. Once the message is finished, you tell your program to send it on its

way.



4. Now your client program contacts the mail server on your Internet host

and using the POP protocol, sends your message to the server.

5. In the next step, the server passes your message to the transport

agent.

6. Now, it is the job of transport agent to look at the addresses in your

message and connect to the appropriate computers over the Net.

7. First, the transport agent connects it on the host computer in

Washington that receives mail for Surya.

8. Once the connection is made, the two transport agents use the SMTP

to relay the message.

9. After the message is sent, your transport agent terminates the

connection and forms a new connection with the transport agent on

the appropriate computer in Germany.

10. Again, the two transport agents use SMTP to relay the message.

11. Once the message is sent, your transport agent terminates the

connection, its job is finished.

12. In Washington, Surya turns on his computer to check the mail. He tells

his Macintosh mail client to see if any new mail has arrived. Now it is

the turn of his mail client to connect to the mail server on Surya‟s host

computer and using the POP protocol, asks the server to check

Surya‟s mailbox. Since server finds your message, so using POP, it

sends the message to the client and places the message in his local

mailbox (a file on the Mac) and tells him that new mail has arrived.

Now, with the help of mail program, Surya displays the message.

13. Similarly, in Germany, Rishaba has logged into his shell account on a

UNIX host. He runs his UNIX mail program which checks his mailbox

and tells him new mail has arrived. Using appropriate command,

Rishaba tells the mail program to show him your message.

The important thing in this example is that, even though they use different

computers and different programs, the mail moves smoothly and quickly,

just because of the Internet and SMTP.

Understanding the Internet Email Addresses

In this section, we will talk a little more about how to specify addresses

when you send mail as you have now become aware that whenever we talk



about the word “mail”, it always means electronic ail and the word address

always refers to an Internet address. Thus, if someone on the Net asks

“What is your address?” tell him or her, your electronic address.

An Email address defines the location of an individual‟s mailbox on the

Internet. An address consists of two parts: username and domain name,

separated by the @ symbol. Here is an example:

Username in the preceding example is Leenu. Usernames are usually pretty

straightforward; often, companies give employees‟ usernames that use one

initial and one full name. However, usernames can also contain characters

other than letters – they can contain numbers, underscores, periods and

some other special characters. They cannot contain commas, spaces or

parenthesis.

The host name provides the Internet location of the mailbox, usually the

name of a computer owned by a company or Internet service which has

been discussed in Unit 2. If the recipient is within your local network, you

can often leave out part of the address. For example, say your address is

[email protected] and you are mailing to a friend whose computer is on

the same network. Your friend‟s address is [email protected]. So you

can leave off the part of the address you both have in common. That is, in

this case, you use sachin@more. The mail program easily recognizes it as a

local address and delivers the message properly. If you have a problem, you

may have to use the full address. it is also possible to leave out the

computer name entirely and just use the user ID if the person you are

sending mail to is [email protected] and you want to send mail to

[email protected], you can use: rishaba. When you do not know

someone‟s Email address, and you have an ides of his login name and the

name of the Internet site he uses you should be able to send Email to the

postmaster at any Internet site. That is the address to use if you have

questions about an Email to or from a specific host or site, or general

questions about a site. However, you may not get a quick response, since

the person designated as “postmaster” usually has lots of other duties. For

example, you have trouble finding out the address of someone who uses a

computer named great.vsnl.in; you can send a message asking for the

person‟s mail address to: [email protected].




13. Three versions of POP are ____________.

14. _____________ allows hierarchical storage of mail and a message

retrieval system that allows selective access to your mailbox.

1.9 Summary

The Internet links are computer networks all over the world so that users

can share resources and communicate with each other.

Internet is a vast collection of globally available information which can be

accessed electronically – information which is of practical use for

business, research, study and technical purposes.

No one owns the Internet. Any single person, corporation, university or

government does not fund it. Internet has been described as the

cooperative anarchy.

Protocols are the rules that all networks use to understand each other.

For example, there is a protocol describing exactly what format should

be used for sending mail message.

The internet is based on a large number of protocols and conventions.

Each such protocol is explained in the technical publication called a

request for comment or RFC.

The primary objective of any network is to exchange information

between different locations.

Many of the host computers on the Internet offer services to other

computers on the Internet.

Conversely, many of the computers on the Internet use servers to get

information.

The mail servers handle incoming and outgoing mail.

A protocol dialup account lets your computer behave like it is connected

directly to another computer on the Internet.

The family of Internet protocols is called TCP/IP.

A dedicated link (or leased line) is a permanent connection over a

telephone line between a modem pointer to another modem pointer.

Each host computer on the Internet has a unique number, called its IP

address.



On the Internet, a hostname is a domain name assigned to a host

computer.

Internet Mail Access Protocol (IMAP), unlike POP, allows hierarchical

storage of mail and a message retrieval system that allows selective

access to your mailbox.


1. Briefly explain the Internet from practical and technical angle.

2. What are the requirements for internet connections?

3. Explain the Domain Name System and DNS servers.

4. Briefly explain the various classes of networks.

5. Explain the various mail protocols used.

1.11 Answers

Self Assessment questions:

1. Request for comment

2. Networks

3. True

4. Simple mail transfer protocol

5. True

6. True

7. Airtel, Bsnl, Tataindicom

8. True

9. 1.0.0.0 to 126.0.0.0.

10. NEARNET, Sprint, ANSnet, Merit and AT&T

11. Uniform Resource Locator

12. Hostname

13. POP, POP2 and POP3

14. POP



Terminal Questions

1. Internet is a vast collection of globally available information which can be

accessed electronically – information which is of practical use for

business, research, study and technical purposes. (Refer Section 1.2.2)

2. There are essentially three different types of connections for accessing

the services and resources of the Internet. (Refer Section 1.4.1)

3. On a TCP/IP network, computers know each other by their IP

addresses. (Refer Section 1.6.1)

4. Network addresses are divided into classes, which are assigned

depending on the size of the physical network. (Refer Section 1.6.2)

5. It produces the immediate results in terms of increased productivity from

reduced turnaround time, and reduced costs. (Refer Section 1.8.2)



Unit 2 Website Development with HTML – I

Structure:

2.1 Introduction

Objectives

2.2 HTML Fundamentals 1

Architecture of Web Page Contents

Browser Specific Tags

Structure Tags

Physical Tags

Logical Tags

HTML Tags

Tools for HTML Validation

2.3 Using Graphics

Tools for creating and manipulating Web Graphics

Image Tags and Attributes

Sources for web site graphics

Introduction to Client-Side Image Maps

Tools for creating image maps

GIF, JPEG, and PNG Formats

Transparent Graphics

Transparency and Interlacing of Graphics

Creating Animated Graphics

Interactive Graphics

2.4 Constructing Forms

2.5 Marketing Your Site

Characteristics of Search Engines

Registering with Search Engines and Directories

The <meta> Tags and Attributes keywords, description and robots

Creating Effective <title> tags

Designing Your Site for Effective Search Engine Optimization

2.6 Summary


2.8 Answers



2.1 Introduction

In the previous unit, we have studied the concepts of Internet, Servers,

Internet application, client server model, Internet connection, URL and email

system. On that basis we are going to continue with some advance

concepts.

In this unit we shall study about the various HTML tags and to create a web

page using these tags. We shall also study to design a form using HTML.

Objectives


explain the architecture of the web page contents & various tags used in

HTML

describe how to use graphics in HTML

explain how to construct form

discuss about marketing your site

2.2 HTML Fundamentals

This section deals with structure of webpage, HTML language, URI and

HTTP concepts.

2.2.1 Architecture of Web Page Contents

The basic web architecture is two-tiered and characterized by a web client

that displays information content and a web server that transfers information

to the client. This architecture depends on three key standards: HTML for

encoding document content, URLs for naming remote information objects in

a global namespace, and HTTP for staging the transfer.

HyperText Markup Language (HTML)

The common representation language for hypertext documents on the Web.

HTML had a first public release as HTML 0.0 in 1990, was Internet draft

HTML 1.0 in 1993, and HTML 2.0 in 1994. The September 22 1995 draft of

the HTML 2.0 specification has been approved as a standard by the IETF

Application Area HTML Working Group. HTML 3.0 and Netscape HTML are

competing next generations of HTML 2.0. Proposed features in HTML 3.0

include: forms, style sheets, mathematical markup, and text flow around

figures. HTML is an application of the Standard Generalized Markup

Language (SGML ISO-8879), an international standard approved in 1986,



which specifies a formal meta-language for defining document markup

systems.

An SGML Document Type Definition (DTD) specifies valid tag names and

element attributes. HTML consists of embedded content separated by

hierarchical case sensitive start and end tag names which may contain

embedded element attributes in the start tag. These attributes may be

required, optional, or empty. In addition, documents can be inter or intra

linked by establishing source and target anchor points. Many HTML

documents are the result of manual authoring or word processing HTML

converters, but now several WYSIWYG editors support HTML styles. HTML

files are viewed using a WWW client browser (software), the primary user

interface to the Web. HTML allows for embedding of images, sounds, video

streams, form fields and simple text formatting.

Universal Resource Identifier (URI)

An IETF addressing protocol for objects in the WWW ("if it's out there, we

can point at it"). There are two types of URIs, Universal Resource Names

(URN) and the Universal Resource Locators (URL). URLs are location

dependent and contain four distinct parts: the protocol type, the machine

name, the directory path and the file name. There are several kinds of

URLs: file URLs, FTP URLs, Gopher URLs, News URLs, and HTTP URLs.

URLs may be relative to a directory or offsets into a document.

HyperText Transfer Protocol (HTTP)

An application-level network protocol for the WWW. Tim Berners-Lee, father

of the Web, describes it as a "generic stateless object-oriented protocol."

Stateless means neither the client nor the server store information about the

state of the other side of an ongoing connection. Statelessness is a

scalability property but is not necessarily efficient since HTTP sets up a new

connection for each request, which is not desirable for situations requiring

sessions or transactions. In HTTP, commands (request methods) can be

associated with particular types of network objects (files, documents,

network services). Commands are provided for

Establishing a TCP/IP connection to a WWW server,

Sending a request to the server (containing a method to be applied to a

specific network object identified by the object's identifier, and the HTTP

protocol version, followed by information encoded in a header style)



Returning a response from the server to the client (consisting of three

parts: a status line, a response header, and response data), and

Closing the connection.

2.2.2 Browser Specific Tags

Netscape only tags – There are six tags that are only visible with

Netscape. Keep in mind that many of these tags (such as <layer></layer>)

are not a part of the XHTML specification.

<blink></blink>

The text within the <blink></blink> tag will turn on and off (blink). This can

make text fairly difficult to read. This tag works in both Netscape and

Mozilla.

<keygen></keygen>

This tag was meant to generate a public key to encrypt HTML forms and

make them secure. It works in Netscape and Opera.

<layer></layer>

The layer tag allows you to place sections of your Web page on different

"layers" and treat them as separate objects within your page. Use of this tag

is discouraged in favor of CSS positioning. This only works in Netscape 4.x.

<multicol></multicol>

The enclosed text will be displayed in multiple columns. This tag was meant

to be used to create newspaper-like columns of text. This works in

Netscape 4.x.

<nolayer></nolayer>

The <nolayer></nolayer> tag indicates HTML that should be displayed in

browsers that don't support the <layer></layer> tag, when layers are used. It

is similar to the <noframes></noframes> tag in a frameset.

<spacer/>

The spacer tag was Netscape's take on the non-breaking space. Use this

tag to put a specific sized block of white space on your Web page. Note that

this tag works in the Netscape 4 and 6.

Internet Explorer only tags – Two tags are only supported by Internet

Explorer.



<bgsound>/>

This tag will set a sound file to play in the background as the Web page is

displayed.

<marquee></marquee>

With the <marquee></marquee> tag, you can create a scrolling text

marquee on your Web page. This tag is also supported by MSNTV.

2.2.3 Structure Tags

This section describes the tags that indicate the basic structure of a web

page.

HTML - The HTML tag identifies a document as an HTML document. All

HTML documents should start with the <HTML> tag and end with the

</HTML> tag.

Syntax:

<HTML>….</HTML>

HEAD – The HEAD tag defines an HTML document header. The header

contains information about the document rather than information to be

displayed in the document. The web browser displays none of the

information in the header, except for text contained by the TITLE tag. You

should put all header information between the <HEAD> and </HEAD> tags,

which should precede the BODY tag.

The HEAD tag can contain TITLE, BASE, ISINDEX, META, SCRIPT,

STYLE, and LINK tags.

Syntax:

<HEAD>… </HEAD>

TITLE – The TITLE tag defines the TITLE of the document. This is what is

displayed in the top of your browser window. In addition, many search

engines use this as their primary name of a document.

Syntax:

<TITLE> … </TITLE>

BODY – The BODY tag specifies the main content of a document. You

should put all content that is to appear in the web page between the



<BODY> and </BODY> tags. The BODY tag has attributes that let you

specify characteristics for the document. You can specify the background

color or an image to use as a tiled background for the window in which the

document is displayed. You can specify the default text color, active link

color, unvisited link color, and visited link color. You can specify actions to

occur when the document finishes loading or is unloaded, and when the

window in which the document is displayed receives or loses focus.

Syntax:

<body> … </body>

2.2.4 Physical Tags

Text in HTML code can be dressed up in various ways so that it's displayed

differently by the browser. Text can be made Bold, Underlined, Italicized,

Struck-through etc. Moreover, you can make text both italicized and bold at

the same time.

Physical tags define how the text should be displayed in the browser. They

control the Physical characteristics of the text. There are 10 physical tags

each requiring a closing tag:

<I> Italics: I am in italics

Syntax: <I> .. </I>

<B> Bold: I am in bold

Syntax: <B> .. </B>

<U> Underline: I am underlined

Syntax: <U> .. </U>

<STRIKE> Strikethrough: I am struck!

Syntax: <STRIKE> .. </STRIKE>

<SUP> Superscript: My superscript

Syntax: <SUP> .. </SUP>

<SUB> Subscript: My subscript

Syntax: <SUB> .. </SUB>

<TT> Typewriter: I am in typewriter form

Syntax: <TT> .. </TT>

<BIG> Bigger font: I am bigger

<BIG> .. <BIG>

<SMALL> Smaller font: I am smaller



Syntax: <SMALL> .. </SMALL>

<S> Strikethrough alternative: I am also struck!

Syntax: <S> .. </S>

Tag Nesting

Physical tags can be nested i.e. one tag can be placed (including its closing

tag) inside another. Let's test this:

<B>Some text</B> displays Some text which is in bold

Give more emphasis by underlining this text:

<U><B>Some text</B></U> displays Some text which is bold and

underlined

2.2.5 Logical Tags

Logical tags allow the browser to render that information in the manner most

appropriate for that browser. Following are the logical tags used:

<h1> through <h6>

Create headings. They should flow sequentially (try not to skip levels). The

title of the page should always appear as a level 1 heading, with

subheadings cascading down from it. Text is usually displayed in a large,

bold font. Remember that they’re all block-level elements.

<em>

Creates emphasis, and is usually displayed as italicized text. Equivalent to

<i>.

<strong>

Creates strong emphasis, and is usually displayed as bold text. Equivalent

to <b>.

<code>

Is suitable for giving examples of computer code, and is usually rendered in

a mono-spaced font. Equivalent to <tt>.

<blockquote>

Is a block-level tag that’s used to enclose multi-line quotations from other

sources. It is usually displayed as indented from both sides.



<cite>

Is used to enclose the title of a work that is currently being referred to. It’s

usually displayed as italicized text.

<q>

Is a short quotation from another source. Modern browsers will display

contained text with quotation marks added on both sides.

<pre>

Is a block-level element that displays text in a fixed-width font exactly how it

was typed in the source code (i.e. honouring all tabs, spaces and line

breaks). pre is not strictly a logical element, but its use is often necessary.

<del>

Is a HTML 4 tag used to show document revisions; text deleted from a page

in this case. It is usually displayed as text with a strike-through.

<ins>

Is del’s partner in crime, used to show text inserted during a revision. It is

usually displayed with an underline.

<address>

Should be wrapped around contact information, including email addresses.

<kbd>

Is suitable for marking up text that is meant to be entered by the reader on

the keyboard. It is usually displayed in a fixed-width font.

<var>

Marks up a variable’s name. Useful if you’re writing about technical subjects

like computer programming.

2.2.6 HTML tags

The syntax is <HTML> .. </HTML>

Attribute definitions:

Version = cdata



The value of this attribute specifies which HTML DTD version governs the

current document. This attribute has been deprecated because it is

redundant with version information provided by the document type

declaration.

Lang = Language Information – Gives the information of the language

used

Dir = Text Direction – Gives the direction of the text.

2.2.7 Tools for HTML validation

Total Validator is a free one-stop all-in-one validator comprising a HTML

validator, an accessibility validator, a spelling validator, a broken links

validator, and the ability to take screenshots with different browsers to see

what your web pages really look like. Currently Total Validator provides the

following main features:

A parser that validates the basic construction of your pages

True HTML validation against the W3C Markup Specifications or

ISO/IEC definition using the published DTDs (2.0, 3.2, 4.0, 4.01,

ISO/IEC, XHTML 1.0 and 1.1)

An accessibility validator that validates against the W3C WAI

Accessibility Guidelines and US Section 508 Standard

A broken links validator that checks each page for broken links

A spelling validator that spell checks the content of your pages (English,

French, Italian, Spanish, German)

Snapshots (screenshots) of your pages in different browsers, on

different platforms, at different resolutions

A desktop tool so you can validate pages before you publish, and pages

behind firewalls

A Firefox extension for fast, one click validation


1. URI stands for _______________.

2. <blink> </blink> is a example of ____________ tag.

3. <h1> through </h6> is belongs to __________ tag.



2.3 Using Graphics

This section deals with web graphics, image tags, source of web graphics,

and client side Image maps.

2.3.1 Tools for creating and manipulating Web Graphics

Graphics convey complex ideas, lend emotional components, and add style

to a Web page. Following are the various tools for creating and manipulating

web graphics.

Buttonmania: Buttonmania is a freeware utility that allows you to create

impressive Web page buttons.

DeKnop: This freeware utility allows you to create customized Web

page buttons quickly and easily.

GIF Optimizer: GIF Optimizer is a freeware utility that compresses GIF

images allowing your Web pages to load faster.

JPEG Cleaner: This free tool allow you to compress .jpeg or .jpg images

so that your Web page images will load more quickly.

Gimp: This program is a powerful and free graphics editor available for

UNIX, Mac OS X, and Windows.

PaintShop Pro: PaintShop Pro offers the easiest, most affordable way

to achieve professional results.

PhotoPlus: PhotoPlus has the features you'll need for importing,

creating pictures and animations, and manipulating colors and effects.

Ulead WebRazor: This is a set of indispensable graphic utilities

including GIF Animator, Web Plugins for Photoshop, SmartSaver, Photo

Explorer, Photo Viewer, and Screen capture.

Adobe Photoshop: Photoshop is the ultimate graphics program used

by nearly all graphic designers and a recommended tool for anyone

serious about Web design.

Corel Draw: Corel Draw is a complete suite of powerful graphics

applications and supporting utilities.

Macromedia Fireworks: Fireworks lets users import files from all major

graphics formats and manipulate both vector and bitmap images to

quickly create graphics and interactivity.

2.3.2 Image Tags and attributes

The image is stored in a file, which is specified by an HTML request. The in

the file is inserted into the display of the document by the browser.



The image tag, <img>, which is an inline tag, specifies an image that is to

appear in a document. In its simplest form, the image tag includes two

attributes: src, which specifies the file containing the image; and alt, which

specifies text to be displayed when it is not possible to display the image. If

the file is in the same directory as the HTML file of the document, the value

of src is just the image’s filename. In many cases, image files are stored in

a subdirectory of the directory where the HTML files are stored.

For example, the image files might be stored in a subdirectory named

images. If the image file’s name is stars.jpg and it is stored in the images

subdirectory, the value of src would be as follows:

“images/stars.jpg”

Example:

<img src = “c210.jpg” alt = “Picture of a Cessna 210” />

Two optional attributes of img tag, width and height, can be included to

specify (in pixels) the size of the rectangle for the image.

2.3.3 Sources for web site graphics

Most graphic designers and art directors will know about Getty Images. And

there's no doubt that the resources of the largest image library in the World

are vast. But sometimes you want something just a little bit special.

Something that the big fish just won't have. And that's when it is necessary

to look for a more specialized image resource. That's where the

Photographic Libraries directory comes in. A vast listing of image and

photographic resources ranging from photography, to film, to fine art,

fashion, maps and many others. Photographic Libraries also have an

extensive search facility.

MyFonts.com provides the largest collection of fonts ever assembled for on-

line delivery. Well that's what the website says anyway. That's obviously

debatable, but it's true to say that they do have a lot of fonts to view - over

30,000 - with links to dozens of font foundries. A wholly owned subsidiary of

Bit stream Inc., the site certainly has an impressive selection of typefaces.

One interesting feature is the WhatTheFont tool. This lets you upload

scanned images of fonts to the Web site and then will try and display the

closest match to your font sample. Hours of fun uploading pictures of



inanimate objects to see what they come up with. Well it kept us amused,

but then we probably need to get out more.

The following links and graphic design resources are in no particular order.

They are on our list to be reviewed and listed (or discarded if it proves to be

out of date, or inappropriate). As ever, we neither endorse nor recommend

any of these, other than to promote them as sites that were either useful to

us, or to others, for graphic design research purposes.

PBS Digital Television News Archive. Digital TV information.

http://www.pbs.org/digitaltv/dtvtech. And also PBS Digital Television News

Archive. History of TV.

http://www.pbs.org/opb/crashcourse/tv_grows_up/mechanicaltv.html

Cloninger, Curt. Usability Experts are from Mars, Graphic Designers are

from Venus. http://www.alistapart.com/stories/marsvenus/

Gentnera, Dona and Nielsen, Jakob. The Anti-Mac Interface www.useit.com

Humanoid Animation Group. http://sunee.uwaterloo.ca/~h-anim

2.3.4 Introduction to Client-side Image Maps

Image maps aren't as bad as they seem, at least if you use a client side

image map using HTML rather than a CGI program. Now you need to put

the image on the page. To do this, you use the image tag, but with a new

attribute: usemap.

<img src="eximap1.gif" width="200" height="40" border="0" alt="image map"

usemap="#mymap" />

The usemap="#mymap" command tells the browser to use a map on the

page, which is named "mymap". Notice how it uses the "#" symbol in front of

the map name. Also notice that we defined the width and height of the

image. This need to be done so we can use coordinates later on when we

define the map. Speaking of that, let’s see how to define the map. For this

map, we would place the following code somewhere on the page.

<map name="mymap" id="mymap">

<area shape="rect" coords="0, 0, 99, 40" href="table1.htm" alt="Tables" />

<area shape="rect" coords="100, 0, 200, 40" href="frame1.htm"

alt="Frames" />



<area shape="default" href="http://www.pageresource.com" alt="Home" />

</map>

Now you can see where the usemap="#mymap" from the <img> tag comes

from. The name of the map is "mymap". Now, let's look at what all of this

means:

<map name="mymap" id="mymap">

This defines your image map section, and gives the map a name. This map

is named "mymap" In XHTML, the id attribute is required rather than name.

If you are using XHTML transitional, both the name and id can be used.

<area shape="rect" coords="0,0,99,40" href="table1.htm" alt="Tables" />

The area tag defines an area of the image that will be used as a link. The

shape attribute tells the browser what shape the area will be. To keep it

simple, I only used "rect", which stands for rectangle. The coords attribute is

where we define the edges of each area. Since it is a rectangle, we will use

two sets of coordinates. The first set defines where to start the rectangle,

where the top-left edge of the rectangle will be. Since this rectangle starts at

the top-left edge of the image, the coordinates are (0 pixels, 0 pixels). The

second two numbers define where to end the rectangle. This will be the

lower-right edge of the rectangle. Remember that the total image size was

200x40. We want the lower-right edge of this rectangle to be halfway across

the image and at the bottom of the image. Going across, half of 200 is 100,

but we use 99 here because 100 can only be used once. We will use it in

the second rectangle here. Of course, 40 pixels take us to the bottom of the

image. So the lower-right corner of this rectangle will be 99 pixels across the

image, and 40 pixels (all the way) down the image. And now the easy part:

The href attribute is used to tell the browser where to go when someone

clicks someplace on that rectangle. Put the URL of the page you want to go

to in there, and the first rectangle is set up! The alt attribute allows you to

define alternate text for that area.

<area shape="rect" coords="100, 0, 200, 40" href="frame1.htm"

alt="Frames" />

Basically the same as the previous area tag, but it is for our second

rectangle. We start where the other one left off, but back at the top of the

image. Since the right edge of the last rectangle was at 99 pixels accross,



we start this one at 100 pixels accross. And since this will be the upper-left

of the second rectangle, we start it at 0 pixels down the image (the top!). We

end this rectangle where the image ends, so the lower-right coordinate here

is pretty nice- (200, 40), the size of the image!

<area shape="default" href="http://www.pageresource.com"

alt= "Home">

The default is not really a new shape; it just covers anything that may have

been left out. We didn't leave out anything in this map, but if we had, this

would be the URL someone would go to if they clicked on any area we did

not define earlier.

</map>

This ends the map section!

Now, you can use other shapes besides rectangles, but those are a lot

tougher to code by hand.

2.3.5 Tools for creating image maps

Following are some of the tools used to create Image Maps

Client-Side Image Map Editor (CSIME): is a standalone Java application

for maintaining the HTML tags that form a client-side image map. The

CSIME allows the creation of RECT, CIRCLE, and POLY regions overlaying

a GIF or JPEG image. Cool features include 1) use your Netscape

bookmarks file, 2) import a client-side image map straight from a web

server, 3) export to server-side image map format, 4) fading an image to

allow easier editing, and more. Written in Java, the CSIME is platform

independent (runs on UNIX, Mac, Win95/NT, OS/2, etc). The CSIME is

freeware.

Glorglox: is a replacement for NCSA's image map. It allows you to make

image maps with irregular and/or discontiguous areas, and is much more

flexible for some applications.

Imaptool: is for creating client-side image maps. It's for the X Window

System and tested with Linux 1.2.13.

Mapedit: is a WYSIWYG editor for image maps, available for Microsoft

Windows and the X Window System. Use Mapedit to generate, or convert

to, NCSA, CERN, or client-side map files.



Map This: freeware 32 bit application for creating, editing, and converting

map files. Supports NCSA, CERN, and Client Side Image Maps. Handles

both GIF and JPG images. Runs under Win 3.1/3.11 with Win32s installed;

Win95 and WinNT.

Web Hotspots 2.0: is an image map editor for Windows supporting both

server and client-side image maps, multiple image file formats including GIF

and JPEG and more microscopic (zoomed-in) editing, advanced shape

manipulation, subtractive regions (cutouts), starter host page generation,

insertion of host (i.e., IMG) entries into existing pages, and live testing for

Windows Sockets 1.1 compliant configurations.

2.2.6 GIF, JPEG and PNG Formats

GIF (Graphics Interchange Format):

This uses the file extension .gif. This format is invented by Bob Berry and

team at Compuserve. This format is created in 1987 and updated in 1989. It

uses 256 colors. One color may optionally be 100% transparent. It uses

Lossless – LZW (Abraham Lempel, Jacob Ziv, and Terry Welch)

compression technique. It uses a palette, and instead of putting 24-bit

values in its map for the image, it puts palette values. So it starts off with 3:1

compression. The LZW compression on top of that can raise it to 5:1 or

even 10:1.

GIF is good for Line Drawings, Clip Art, CAD drawings, Text, Animations

and Images with transparent areas. GIF is bad for Photographs and images

with more than 256 colors.

JPEG (Joint Photographic Experts Group):

This uses the file extension .jpe, .jpg, .jpeg. This format is invented by Eric

Hamilton, Joint Photographic Experts Group, Tom Lane, Independent JPEG

Group. This format is created in 1990. It uses ISO/IEC 10918 standard. It

uses 16,777,215 colors. No transparency. It uses Lossy – JPEG

compression (lossy discrete cosine transform followed by Huffman coding).

JPEG is good for Photographs, images with more than 256 colors, making

smaller files. JPEG is bad for Text, images with sharp edges especially

vertical edges, Line drawings, CAD drawings, Transparency and most scans

from books or news papers.



PNG (Portable Network Graphics):

This uses the file extension .png. This format is invented by Tom Boutell,

Tom Lane, Greg Roelofs, others. The version 1.0 format is created in 1996

and version 1.1 in 1998. It uses World Wide Web Consortium

recommendation 1996, RFC 2083 1997 standard. It uses 2-256 (palette

mode) or 16,777,215 colors. Single color is 100% transparent (like GIF),

variable transparency (256 levels of transparency per pixel). It uses

Lossless – "deflation" compression technique. For each image line, a filter

method is chosen which predicts the colour of each pixel based on the

colours of previous pixels and subtracts the predicted colour of the pixel

from the actual color. An image line filtered in this way is often more

compressible than the raw image line would be. On most images, PNG can

achieve greater compression than GIF, but some implementations make

poor choices of filter methods and therefore produce unnecessarily large

PNG files.

PNG is good for wherever you would use GIF, images with variable

transparency. PNG is bad for Full color images will probably be bigger than

equivalent JPEGs and ring around the image.

2.3.7 Transparent Graphics

Transparency is possible in a number of graphics file formats. The term

transparency is used in various ways by different people, but at its simplest

there is "full transparency" i.e. something that is completely invisible. Of

course, only part of a graphic should be fully transparent, or there would be

nothing to see. More complex is "partial transparency" or "translucency"

where the effect is achieved that a graphic is partially transparent in the

same way as colored glass. Since ultimately a printed page or computer or

television screen can only be one color at a point, partial transparency is

always simulated at some level by mixing colors. There are many different

ways to mix colors, so in some cases transparency is ambiguous.

In addition, transparency is often an "extra" for a graphics format, and some

graphics programs will ignore the transparency.

Transparent Pixels: One color entry in a single GIF or PNG image's palette

can be defined as "transparent" rather than an actual color. This means that

when the decoder encounters a pixel with this value, it is rendered in the



background color of the part of the screen where the image is placed, also if

this varies pixel-by-pixel as in the case of a background image.

Applications include:

An image that is not rectangular can be filled to the required rectangle

using transparent surroundings; the image can even have holes (e.g. be

ring-shaped)

In a run of text, a special symbol for which an image is used because it

is not available in the character set, can be given a transparent

background, resulting in a matching background.

The transparent color should be chosen carefully, to avoid items that just

happen to be the same color vanishing.

Even this limited form of transparency has patchy implementation, though

most popular web browsers are capable of displaying transparent GIF

images. This support often does not extend to printing, especially to printing

devices which do not include support for transparency in the device or

driver. Outside the world of web browsers, support is fairly hit-or-miss for

transparent GIF files.

2.3.8 Transparency and Interlacing of Graphics

Interlacing is a method of encoding a bitmap image such that a person who

has partially received it sees a degraded copy of the entire image. When

communicating over a slow communications link, this is often preferable to

seeing a perfectly clear copy of one part of the image, as it helps the viewer

decide more quickly whether to abort or continue the transmission.

Interlacing is supported by the following formats:

GIF stores the lines in the order 0, 8, 16, ..., 4, 12, ..., 2, 6, 10, 14, ..., 1,

3, 5, 7, 9,.

PNG uses the Adam7 algorithm

JPEG and JPEG 2000

PGF

Interlacing is also known as "progressive" encoding, because the image

becomes progressively clearer as it is received.

2.3.9 Creating Animated Graphics

In this section you will study how a animated graphics is created in GIF

format. The world of animated gifs is a fascinating one indeed. Anyone can



create animated gifs, irrespective of the graphics skills one has. Usually the

initial animations will look ugly and wierd but with practice and after viewing

hundreds of animations on the web the animations will begin to look better,

until one day you will say... "ah I'm proud of this animation".... This pride will

be greater when your animations will be used in different web pages.

Let us start off with a check list of all the things you need to create animated

gifs.

An imaging software such as paint shop pro or colorworks

A gif assembling software such as Gif animator, Animation Shop, Giffy

Creativity and

A lot of patience

A number of software programmes are available either for free or shareware

on the Internet. We recommend Paint Shop Pro (JASC. Inc.) as the imaging

software, and Gif Animator (Ulead).

Creating animated gifs is really simple. Let us start off with an example.

shown in figure 2.1.

Figure 2.1: Image of a Ball

The aim of this first exercise is to make the ball move from left to right and

then back.

In the imaging software often a new work area 450 pixels wide and height

equal to the original image (in this case it is 49 pixels). Copy the ball and

paste it into the working area at the far left.

Create a new working area same height and width. Paste the ball at a

position more at the right than the previous frame. Repeat this procedure

until the ball is completely at the right. At this point you should have a

number of frames shown in figure 2.2.



Figure 2.2: Series of Ball

Now import the individual images in the gif assembler program in sequential

order, and then in the reverse order. Save the animation.

2.3.10 Interactive Graphics

Ivan Sutherland (MIT 1963) established the basic interactive paradigm that

characterizes interactive computer graphics:

User sees an object on the display

User points to (picks) the object with an input device (light pen, mouse,

trackball)

Object changes (moves, rotates, morphs)

Repeat

Input devices contain a trigger which can be used to send a signal to the

operating system; Button on mouse or Pressing or releasing a key. When

triggered, input devices return information (their measure) to the system;

Mouse returns position information and Keyboard returns ASCII code. Most

systems have more than one input device, each of which can be triggered at

an arbitrary time by a user. Each trigger generates an event whose measure

is put in an event queue which can be examined by the user program.

Figure 2.3: Process of interactive graphics



Programming interface for event-driven input defines a callback function for

each type of event the graphics system recognizes. This user-supplied

function is executed when the event occurs.


4. JPEG uses ______ number of colours.

5. PNG stands for __________.

6. Interlacing is a method of encoding a bitmap image such that a person

who has partially received it sees a degraded copy of the entire

image.(true/false)

2.4 Constructing Forms

The most common way for a user to communicate information from a Web

browser to the server is through a form. HTML provides tags to generate the

commonly used objects on a screen form. These objects are called controls

or widgets. Together, the values of all of the controls in a form are called the

form data.

<FORM> tags and attributes

All of the components of a form appear in the content of a <FORM> tag. The

action attribute specifies the URL of the application on the Web server that

is to be called when the user clicks the Submit button. The method attribute

of <FORM> specifies one of the two techniques, get or post, used to pass

the form data to the server. Get is the default, so if no method attribute is

given in the <FORM> tag, get will be used. The alternative technique is

post.

Example:

<FORM action=”pgm1.php” method =”post”>

….

</FORM>

<INPUT> tags and Attributes

Many of the commonly used controls are specified with the inline tag

<input>, which is used for text, passwords, checkboxes, radio buttons and

the special buttons Submit and Reset. The one attribute of <input> that is

required for all of the controls is Type, which specifies the particular kind of

control. The control’s kind is its type name, such as checkbox.



Text Types

A text control, referred to as Text Box, creates a horizontal box into which

the user can type a line of text. Default size of the text box is often 20

characters. The attributes used are Type, Name, size and maxlength.

For the Text Box, the type value is “text”. Name indicates the name given to

the control. Size indicates the size of the text box in terms of characters. If

the user types more characters than will fit in the text box, the box is

scrolled. If you do not want the box to be scrolled, you can include the

maxlength attribute to specify the maximum number of characters that the

browser will accept in the box. Any additional characters are ignored.

Example:

<form action =””>

<input type=”text” name=”fname” size=”25” maxlength =”50” />

</form>

If the contents of a text box should not be displayed when it is entered by

the user, a password control can be used.

Example:


<input type=”password” name=”MyPasswd” size=”10” maxlength =”10” />

</form>

Regardless of what characters are typed into a password control, only

bullets or asterisks are displayed by the browser.

In some situations, a multiline text area is needed. The <textarea> tag is

used to create such controls. The text typed into the area created by

<textarea> is not limited in length, and there is implicit scrolling both

vertically and horizontally. The default size of the visible part of the text is is

often quite small, so the rows and cols attributes should usually be included

and set to reasonable sizes.

Example:

<textarea name=”address” rows=”3” cols=”40” />

Radio Buttons and Checkboxes

Checkbox and radio controls are used to collect multiple-choice input from

the user. A checkbox control is a single button that is either on or off

(checked or not). If a checkbox button is on, the value associated with the



name of the button is the string assigned to its value attribute. A checkbox

button doesn’t contribute to the form data if it is off. Every checkbox button

requires a name attribute and a value attribute in its <input> tag. The

attribute checked, which is assigned the value checked, specifies that the

checkbox button is initially on. The content of the <input> tag is displayed

next to the checkbox button, providing a label.

Example:


<input type=”checkbox” name=”groceries” value = “milk” checked

=”checked” />Milk

<input type=”checkbox” name=”groceries” value = “bread” />Bread

<input type=”checkbox” name=”groceries” value = “eggs”” />Eggs

</form>

Radio buttons are closely related to checkbox buttons. The difference

between a group of radio buttons and a group of checkboxes is that only

one radio button can be on or pressed at any time. Every time a radio button

is pressed, the button in the group that was previously on is turned off. The

type value for radio buttons is radio. All radio buttons in a group must have

the name attribute set in the <input> tag, and all radio buttons in a group

have the same name. The attribute checked, which is assigned the value

checked, specifies that the radio button is initially on. If no radio button in a

group is specified as being checked, the browser usually checks the first

button in the group.

Example:


<input type=”radio” name=”age” value = “under20” checked =”checked” />

0-19

<input type=”radio” name=”age” value = “20-35” />20-35

<input type=”radio” name=”age” value = “36-50” />36-50

<input type=”radio” name=”age” value = “over50” />over 50

</form>

Scrolling and Selection Lists

If the number of possible choices is large, the displayed form becomes too

long to display. In these cases, a menu should be used. A menu is specified

with a <select> tag. There are two kinds of menus: those in which only one



menu item can be selected at a time and those in which multiple menu items

can be selected at any given time. The default option is the one related to

radio buttons. The other option can be specified by adding the multiple

attribute. The size attribute specifies the number of menu items that are to

be displayed for the user. If either multiple is specified or the size attribute

is set to a number larger than 1, the menu is usually displayed as a scrolled

list.

Each of the items in a menu is specified with an <option> tag, nested in the

select element. The content of an <option> tag is the value of the menu

item, which is just text. The <option> tag can include the selected attribute,

which specifies that the item is pre selected. The value assigned to

selected is “selected”.

Example:


With size = 1(the default)

<select name=”groceries”>

<option> Milk </option>

<option> Bread </option>

<option> Eggs </option>

<option> Cheese </option>

</select>

</form>

Submit and Reset Buttons

The Reset button clears all of the controls in the form to their initial states.

The Submit button has two actions: First, the form data is encoded and sent

to the server. Second, the server is requested to execute the server-resident

program specified in the action attribute of the <form> tag. Every form

requires a Submit button. The Submit and Reset buttons are created with

the <input> tag, as illustrated in the following example:

<form action =”pgm1.php” method = “post”>

<input type=”submit” value=”Submit Form” />

<input type=”reset” value = “Reset Form” />

</form>



Scripts for Form Processing

Before the form data is submitted to the server it has to be processed at the

client side. For example, consider a form which has inputs like the user

name, age, information regarding his marks etc. If any one of the field is

missing then the data has to go to server and the server has to process and

find some data is missing. Then the server will send an error message to the

client indicating some fields have missed. The drawback of this is that the

server will be given more responsibility of validating the form content.

Instead of this the form validation can be done at the client side it self. This

reduces the burden of the server and reduces delay. Some scripting

languages can be used to process the form.

Sources for Sample Scripts

The scripting languages to process the form can be JavaScript or VB Script.

These scripts can be embedded within the HTML content and processed by

the browser. Client-side JavaScript cannot replace all of server-side

computing. In particular, while server-side software supports file operations,

database access and networking, client side JavaScript supports none of

these. Many JavaScripts, however, are an integral part of the HTML

document, so no secondary downloading is necessary.


7. A text control, referred to as ____________.

8. ___________ are used to collect multiple-choice input from the user.

9. ___________ clears all of the controls in the form to their initial states.

2.5 Marketing Your Site

This section deals with search engine mechanism, directories, Meta tags

and its attributes.

2.5.1 Characteristics of Search Engines

Most search engines work by sending out a spider to fetch submitted

documents. Another program, called an indexer, then reads these

documents and creates an index based on the words contained in each

document. Each search engine uses a proprietary algorithm to create its

indices such that, ideally, only meaningful results are returned for each

query.



META Tags. These are special HTML tags that provide information

about a Web page. Unlike normal HTML tags, Meta tags do not affect

how the page is displayed. Instead, they provide information such as

what the page is about, which keywords represent the page's content

and who created the page. Many search engines use Meta tags when

they build and update their indices.

Spider Support. As mentioned above spiders are programs used by

some search engines to fetch submitted documents.

Popularity. Some search engines count the number of linked to a page

to measure its popularity. Those with largest number of linked sites

receive a higher rating.

Lag Time. The time required to index a page and have it appear in

subsequent search results. All search engine submission times are

approximations.

2.5.2 Registering with Search Engines and Directories

A search engine is a piece of software that enables users to search through

an index or database of websites that has been created either by people or

automatically by software that crawls through the World Wide Web looking

for new websites and indexing them. A search engine is actually the tool

that a website such as Yahoo or Google employs to enable people to search

its index for websites, images, words or phrases.

Registering your website with search engines such as Yahoo is relatively

easy. It is often free and is the first thing you should do once a new website

has been launched or an existing one has been re-developed. Registering

with search engines is one of the most effective ways of making it easy for

people to find your website.

What to do?

Option 1: You can register your website yourself with search engines.

Here is what to do.

Compose a descriptive sentence (usually up to 25 words) that

summarizes your site's content. This sentence should be simple, in plain

English, and state the main contents of the website. For example, if you

owned a cardboard box factory that sold standard sized boxes and also

made them to clients' specifications, you might compose a sentence like

this: "XYZ Box Company makes quality cardboard boxes of every



standard size and we can produce boxes to your specifications and your

budget."

Identify the most popular search engines that allow you to register your

site with them.

Log on to their sites, locate the online registration form or area and

complete the instructions - and you will probably be asked to use the

sentence you composed in step 1 above.

Many search engine directories, like Yahoo, are organized into

categories, and allow you to register your site in multiple categories. It takes

time to register your website with the most popular search engines and may

be a day's work, but usually it is free.

The search engine owners will check your application and choice of

categories and index the site. This usually takes 2 to 6 weeks.

Option 2: You can pay an organization to register your website with search

engines. Many companies offer this service. Locate them using a search

engine and select one that offers the best value for money and will register

your site with search engines that are popular with your target audience.

2.5.3 The <Meta> tag and its attributes, keywords, description and

robots

Metadata is information about data. The <Meta> tag provides metadata

about the HTML document. Metadata will not be displayed on the page, but

will be machine parsable. Meta elements are typically used to specify page

description, keywords, author of the document, last modified and other

metadata. The <Meta> tag always goes inside the head element. The

metadata can be used by browsers (how to display content or reload page),

search engines (keywords), or other web services.

Required Attribute:

Attribute Value Description

content text Specifies the content of the meta information

Table 2.1: Optional Attributes




http-equiv content-type content-style-type expires refresh set-cookie

Provides an HTTP header for the information in the content attribute

name author description keywords generator revised others

Provides a name for the information in the content attribute

2.5.4 Creating Effective <title> tags

Title tag shows the words which describe your web page. It is the most

important factor in luring visitors. Your visitors get the initial information

about you website through the title tags. You can create effective title tags

with judging the needs of the visitor. Following are few important tips to

increase your title value.

a. Utilize Keywords

Keywords are the most important expressions of your website. Try to

accommodate few keywords in your title tag. Make your title tag look

informative with keywords. This can increase your page ranking also as

more and more visitors will access you through your keywords.

b. Preference in Keywords

Organize keywords in your title according to their importance. Place most

important keyword first and then follow it with other keywords. For example

if your keywords are Ethnic Women Wear, Online Women Wear, Women

Shoes, Women Clothing India, then you can make your title tag as

<TITLE>Online Women Wear, Women Clothing India</TITLE>

c. Target Traffic

It is widely said that for websites 'content is the king'. If you want to get

better search engine results then you need to put more thought on your

website content. Your content can have all the pertinent keywords and it

makes your reader read your website. As your visitors read your content,

search engines too go through your website to find right keywords for

enlisting your website.



d. Limit Characters

Meta tags are created to give search engines the important information

about your website. Though Meta tags help a lot in optimization as it help

search engines to determine the information. Meta tags should involve

important keywords but many people spam the search engines with Meta

tags which is gorged with keywords.

e. Be different

Always use different tag line for different pages. Don't use a single tagline in

every page. Try to be different and put a unique tagline in every page. Your

tagline should reflect the content of your page. Hence craft every tagline

with adeptness, so that it will look effective.

2.5.5 Designing your site for Effective Search Engine Optimization

(SEO)

Search engine optimization is crucial for anyone who wants people to visit

his or her Web site. You can place as many ads as you like, but most

people are still going to find your site because of its listings in search

engines or directories. It's a fact that most people who use search engines

only look at the first one or two page of search listings. The goal of effective

search engine optimization is to get your pages listed on those critical first

pages for particular key terms. The following rules are applied for designing

your site for Effective Search Engine Optimization (SEO).

a) Phrasing matters. Many more people search for the term "effective

search engine optimization" than for "effectively optimizing for search

engines". To find out which key words or phrases are more popular than

others, you can use a tool such as Overture’s Search Term

Suggestion Tool; enter your chosen phrases and you'll see how many

people searched for that term recently.

b) Give each page an appropriate title that includes the key word or phrase

at least once. I so often see sites that use the name of their business as

the title of all their pages. Is every page of their site about their

business? Probably. But chances are really low that people will be

searching for their business' name!

c) Put the key words or phrase that you've chosen in the page's title tag,

Meta keywords, and Meta description. Make sure that the Meta

description is as appealing as possible, because some search engines



actually use this description in the search engine results pages that

people will be reading.

d) Be sure your chosen key words or phrase is repeated judiciously

throughout the content of the page. You don't want to overdo it, or your

page may be rejected as spam, but you need to repeat it enough times

that the search engine's software will consider the phrase relevant.

Self Assessment questions

10. _______ are special HTML tags that provide information about a Web

page.

11. ________ is a piece of software that enables users to search through an

index or database of websites that has been created either by people or

automatically by software

2.6 Summary

The basic web architecture is two-tiered and characterized by a web

client that displays information content and a web server that transfers

information to the client.

An SGML Document Type Definition (DTD) specifies valid tag names

and element attributes.

The text within the <blink></blink> tag will turn on and off (blink).

The HTML tag identifies a document as an HTML document.

Physical tags can be nested i.e. one tag can be placed (including its

closing tag) inside another.

Graphics convey complex ideas, lend emotional components, and add

style to a Web page.

The image tag, <img>, which is an inline tag, specifies an image that is

to appear in a document.

Input devices contain a trigger which can be used to send a signal to the

operating system

These scripts can be embedded within the HTML content and processed

by the browser.

A search engine is a piece of software that enables users to search

through an index or database of websites that has been created either

by people or automatically by software that crawls through the World

Wide Web looking for new websites and indexing them.




1. Explain the architecture of the web page contents

2. Briefly explain the various tools used for validating HTML document

3. Explain the use of client-side image maps

4. Briefly explain the transparent graphics

5. Explain the characteristics of Search Engines.

2.8 Answers

Self Assessment Questions:

1. Universal Resource Identifier

2. Browser specific tag

3. Logical tags

4. 16,777,215

5. Portable Network Graphics

6. True

7. Text Box

8. Checkbox and radio controls

9. Reset button

10. META Tags

11. Search engine

Terminal Questions

1. The basic web architecture is two-tiered and characterized by a web

client that displays information content and a web server that transfers

information to the client. (Refer Section 2.2)

2. Total Validator is a free one-stop all-in-one validator comprising a HTML

validator, an accessibility validator, a spelling validator, a broken links

validator, and the ability to take screenshots with different browsers to

see what your web pages really look like. (Refer Section 2.2.7)

3. client side image map using HTML rather than a CGI program. (Refer

Section 2.3.4)

4. Transparency is possible in a number of graphics file formats. (Refer

Section 2.3.7)

5. Most search engines work by sending out a spider to fetch submitted

documents. (Refer Section 2.5.1)



Unit 3 Website development with HTML – II

Structure:

3.1 Introduction

Objectives

3.2 Frames

The <frame> Tags and Attributes

The <frameset> Tags and Attributes

Frame Construction

Frame Navigation

3.3 Creating and Managing Styles

Cascading Style Sheets (CSS)

<style> Tags and Attributes

Defining Styles

Creating CSS Rules

Using Style Sheets To Support Multiple Browsers

Creating Custom Styles (classes)

Using <div> and <span> Tags

3.4 Tables

Purpose of Tables

Table Tags

Table Attributes

Using Tables for Page Layout and Structure

Creating Nested Tables

3.5 Website Layout and Design

Layout and Design Heuristics

Content Organization

Page Size and Load Time Optimization

Navigation Styles

Providing Navigational Feedback

Tables vs. CSS

Use of Color and Graphics

3.6 Managing Source Files

Recommended Folder Structure

Testing and Production Folders

Development Steps



File Naming

Version Control

3.7 Foundations of Dynamic HTML

DHTML Capabilities

Netscape vs. Microsoft Support for DHTML

<link> Tags and External Styles

Creating Custom Styles (classes)

<layer> Tags

Positioning Layers

HTML Vs DHTML

3.8 Summary


3.10 Multiple Choice Questions

3.11 Answers

3.1 Introduction

In this chapter you are going to study about frames in HTML. The

Cascading Style Sheets usage will be studied in this chapter. The design of

Tables in HTML and the general web site layout and design is also

explained in this unit. Foundations of DHTML are also studied in this unit.

Objectives:


design frames in HTML

create and manage style sheets

to design tables in HTML

discuss website layout and design

give overview of DHTML

3.2 Frames

The browser display window can be used to display more than one

document at a time. The window can be divided into rectangular areas, each

of which is a Frame. Each frame is capable of displaying its own document.



The <frame> tag and attributes

The content of a frame is specified with the <frame> tag, which can appear

only in the content of a frameset element. The content of a frame is

specified as the value of the src attribute in the <frame> tag.

Example:

<frame src = “apples.html” >

If the <frame> tag has no src attribute, the browser displays an empty

frame. If the content of a frame doesn’t fit into the given frame, scroll bars

are implicitly included. If you want a frame to have scroll bars, regardless of

the size of its content, the <frame> attribute scrolling can be set to yes. If a

<frame> tag includes a name attribute, the content of its associated frame

can be changed by the selection of a link in some other frame that specifies

that name.

The <frameset> tag and attributes

The number of frames and their layout in the browser window are specified

with the <frameset> tag. A frameset element takes the place of the body

element in a document. A document has either a body or a frameset but

cannot have both.

The <frameset> tag must have either a rows or a cols attribute, and they

often have both. The rows attribute specifies the number of rows of frames

that will occupy the window. There are 3 kinds of values for rows: numbers,

percentages, and asterisks. Normally, two or more values, separated by

commas, are given in a quoted string. When a number is used as a value, it

specifies the height of one row in pixels. A percentage is given as a number

followed immediately by percent sign. When used, a percent value specifies

the percentage of the total browser window height that a row should occupy.

When an asterisk is used as the value of rows, it means the remainder of

the window height.

Examples:

<frameset rows = ”200, 300, 400”>

<frameset rows = “22%, 33%, 45%”>

<frameset rows = “22%, 33%, *”>

The cols attribute is very much like the rows attribute, except that it

specifies the number of columns of frames. For example, the following tag



specifies that the window is to have six frames in three equal-height rows

and two columns.

<frameset rows = 33%, 33%, 33%” cols = “25%, *”>

Frame Construction

Consider the following example.

<html>

<frameset cols = "50%,*">

<frameset rows = "50%, 50%">

<frame src =" EX1.HTML" />


</frameset>

<frameset cols = "50%, 50%">



</frameset>

</frameset>

</html>

This example creates totally 4 frames. First 2 vertical frames will be created.

Within first vertical frame two horizontal frames will be created. Within

second vertical frame two vertical frames will be created.

Frame Navigation

The navigation frame contains a list of links with the second frame as the

target. This example demonstrates how to make a navigation frame. The file

called "tryhtml_contents.htm" contains three links. The source code of the

links:

<a href ="frame_a.htm" target ="showframe">Frame a</a><br>

<a href ="frame_b.htm" target ="showframe">Frame b</a><br>

<a href ="frame_c.htm" target ="showframe">Frame c</a>

The second frame will show the linked document.

Self Assessment Questions

1. The content of a frame is specified as the value of the src attribute in

the_________.



3.3 Creating and managing styles

This section deals with, CSS concepts, style tag, CSS rule and multiple

browser support systems.

3.3.1 Cascading Style Sheets

Some of the tags of HTML, for example, <i> specify presentation details, or

style. However, these presentation specifications can be more precisely and

more consistently described with style sheets. Furthermore, many of the

tags and attributes used for describing presentation have been deprecated

in favor of style sheets.

Most HTML tags have associated properties, which store presentation

information for browsers. Browsers use default values for these properties if

the document doesn’t specify values. For example, the <h2> tag has the

font-size property, for which a browser could have the default value of 30

points. A style sheet could specify that the font-size property for <h2> be set

to 26 points, which would override the default value. The new value could

apply to one occurrence of an <h2> element or all such occurrences in the

document, depending on how the property value is set.

Perhaps the most important benefit of style sheets is their capability of

imposing consistency on the style of Web documents. For example, they

allow the author to specify that all occurrences of a particular tag use the

same presentation style. HTML style sheets are called Cascading Style

Sheets because they can be defined at three different levels to specify the

style of a document. Lower level style sheets can override higher level style

sheets, so the style of the content of a tag is determined through a cascade

of style-sheet applications.

The three levels of style sheets, in order from lowest level to highest level,

are inline, document level, and external. Inline style sheets apply to the

content of a single tag, document level style sheets apply to the whole body

of a document, and external style sheets can apply to the bodies of any

number of documents. Inline style sheets have precedence over document

style sheets, which have precedence over external style sheets.

3.3.2 <Style> tag and attributes

The format of a style specification depends on the level of style sheet. The

general form of the content of a style element is as follows:



And to define more than one value for a single property, just add them on,

separated by commas:

H3 { font-family: Arial, Helvetica, sans-serif;

font-style: italic;

color: green }

The font-family property in the code above offers the browser several values

to choose from; the browser will go down the line until it finds a typeface it

recognizes. The first item listed (Arial) is the preferred typeface, and the

second item (Helvetica) is an alternate typeface in case the user's system

doesn't have Arial. The third item (sans-serif) is a generic style of font rather

than a specific one--this is recommended as a last alternative because most

systems have at least one typeface in that generic family. If the browser

doesn't find any matches, it will use its default font.

3.3.4 Creating CSS Rules

A Cascading Style Sheets rule is made up of a selector and a declaration.

H2 {color: blue;}

selector {declaration;}

The declaration is the part of the rule inside the curly braces. It specifies

what a style effect will be. For example, "color:blue".

The selector specifies which element(s) will be affected by the delaration.

Think of the selector as a link of sorts between the HTML mark-up

document and the style of the Web page. A selector that refers to an HTML

element is called a type selector. (Other kinds of selectors will be discussed

later). Any HTML element name can be used as a type selector. HTML

"tags" without content ("empty containers") such as <BR> or <HR> can not

be used as a selector. They are not included in the current CSS

specification.

A declaration has two parts separated by a colon: property and value.

selector {property:value}

More than one declaration may be placed inside the curly braces and a

semi-colon must separate each declaration from the next. The ending

declaration does not require a semi-colon but I like to use it.

selector {property:value; property:value;}

H2 {color:blue; font-family:Arial, sans-serif;}



Instead of coding,

H1 {font-family:Arial, Helvetica, sans-serif;}



You may group selectors together. When grouping selectors you will need to

separate each selector with a comma. When grouped together, one rule

applies to several selectors.

H1, H2, H3 {font-family:Arial, Helvetica, sans-serif;}

3.3.5 Using Style Sheets to support multiple browsers

It is not impossible to design a single style sheet that works properly on all

the different browsers. One option is to use different style sheet documents

for the different browsers you want to support. In this way you can specify

CSS formatting customized to the strengths (and weaknesses) of each

different browser, without compromising for the average of them.

There are essentially two ways to do this. The first is to use content

negotiation to send the browser a browser-specific style sheet. With HTTP,

a request for any resource (including a style sheet) will look something like

(omitting several other pieces of information):

GET /path/stylesheet.css HTTP/1.0

....

User-Agent: Mozilla/4.61 [en] (Win98; I)

The User-agent string uniquely identifies the browser (here Navigator 4.6).

Most Web server can be configured to return different style sheet

documents depending on this value. Unfortunately, this breaks caching on

some proxy servers, so it doesn't always work. Also you, as an author, may

have not control over server configuration.

The second way is to use JavaScript to test, on the browser, for the browser

version and model number, and to then "write" link elements referencing

appropriate style sheets directly into the document. Both Navigator and

Internet Explorer will then process the script-generated link elements, and

will load the referenced style sheet. Of course, this will only work if

JavaScript is enabled, but in many cases this may be an entirely acceptable

requirement.



3.3.6 Creating Custom Styles (Classes)

A simple selector can have different classes, thus allowing the same

element to have different styles. For example, an author may wish to display

code in a different color depending on its language:

code.html { color: #191970 }

code.css { color: #4b0082 }

The above example has created two classes, css and html for use with

HTML's CODE element. The class attribute is used in HTML to indicate the

class of an element, e.g.,

<P CLASS=warning>Only one class is allowed per selector.

For example, code.html.proprietary is invalid.</p>

Classes may also be declared without an associated element:

.note { font-size: small }

In this case, the note class may be used with any element.

A good practice is to name classes according to their function rather than

their appearance. The note class in the above example could have been

named small, but this name would become meaningless if the author

decided to change the style of the class so that it no longer had a small font

size.

3.3.7 Using <div> and <span> tags

The <span> and <div> tags are very useful when dealing with Cascading

Style Sheets. People tend to use the two tags in a similar fashion, but they

serve different purposes.

<div>:

The <div> tag defines logical divisions in your Web page. It acts a lot like a

paragraph tag, but it divides the page up into larger sections. <div> also

gives you the chance to define the style of whole sections of HTML. You

could define a section of your page as a call out and give that section a

different style from the surrounding text.

The <div> tag gives you the ability to name certain sections of your

documents so that you can affect them with style sheets or Dynamic HTML.

One thing to keep in mind when using the <div> tag is that it breaks

paragraphs. It acts as a paragraph end/beginning, and while you can have

paragraphs within a <div> you can't have a <div> inside a paragraph.



The primary attributes of the <div> tag are:

style

class

id

Even if you don't use style sheets or DHTML, you should get into the habit

of using the <div> tag. This will give you more flexibility when more XML

parsers become available. Also, you can use the id and name attributes to

name your sections so that your Web pages are well formed (always use

the name attribute with the id attribute and give them the same contents).

Because the <center> tag has been deprecated in HTML 4.0, it is a good

idea to start using

<div style="text-align: center ;"> to center the content inside your div.

<span>:

The <span> tag has very similar properties to the <div> tag, in that it

changes the style of the text it encloses. But without any style attributes, the

<span> tag won't change the enclosed items at all.

The primary difference between the <span> and <div> tags is that <span>

doesn't do any formatting of it's own. The <div> tag acts includes a

paragraph break, because it is defining a logical division in the document.

The <span> tag simply tells the browser to apply the style rules to whatever

is within the <span>.

The <span> tag has no required attributes, but the three that are the most

useful are:

style

class

id

Use <span> when you want to change the style of elements without placing

them in a new block-level element in the document. For example, if you had

a Level 3 Heading (<h3>) that you wanted the second word to be red, you

could surround that word with

<span style="color: #f00 ;"> 2ndWord</span> and it would still be a part of

the <h3> tag, just red.




2. The format of a style specification depends on the level of ________.

3. CSS stands for ___________.

3.4 Tables

This section deals with usage of tables, table tags, table attributes, table for

page layout and creating nested tables.

3.4.1 Purpose of Tables

The TABLE element defines a table for multi-dimensional data arranged in

rows and columns. TABLE is commonly used as a layout device, but

authors should avoid this practice as much as possible. Tables can cause

problems for users of narrow windows, large fonts, or non-visual browsers,

and these problems are often accentuated when tables are used solely for

layout purposes. As well, current visual browsers will not display anything

until the complete table has been downloaded, which can have very

noticeable effects when an entire document is laid out within a TABLE.

3.4.2 Table Tags

The <table> tag defines an HTML table. A simple HTML table consists of

the table element and one or more tr, th, and td elements. The tr element

defines a table row, the th element defines a table header, and the td

element defines a table cell.

Example:

<table border="1">

<tr>

<th>Month</th>

<th>Savings</th>

</tr>

<tr>

<td>January</td>

<td>$100</td>

</tr>

</table>

The <th> tag defines a header cell in an HTML table.



An HTML table has two kinds of cells:

Header cells – contains header information (created with the th element)

Standard cells – contains data (created with the td element)

The text in a th element is bold and centered.

The text in a td element is regular and left-aligned.

Example:

<table border="1">

<tr>

<th>Month</th>

<th>Savings</th>

</tr>

<tr>

<td>January</td>

<td>$100</td>

</tr>

</table>

The <tr> tag defines a row in an HTML table.

A tr element contains one or more th or td elements.

<table border="1">

<tr>

<th>Month</th>

<th>Savings</th>

</tr>

<tr>

<td>January</td>

<td>$100</td>

</tr>

</table>

The <td> tag defines a standard cell in an HTML table.

An HTML table has two kinds of cells:

Header cells - contains header information (created with the th element)

Standard cells - contains data (created with the td element)

The text in a th element is bold and centered.

The text in a td element is regular and left-aligned.



Example:

<table border="1">

<tr>

<th>Month</th>

<th>Savings</th>

</tr>

<tr>

<td>January</td>

<td>$100</td>

</tr>

</table>

The <caption> tag defines a table caption.

The <caption> tag must be inserted immediately after the <table> tag. You

can specify only one caption per table. Usually the caption will be centered

above the table.

Example:

<table border="1">

<caption>Monthly savings</caption>

<tr>

<th>Month</th>

<th>Savings</th>

</tr>

<tr>

<td>January</td>

<td>$100</td>

</tr>

</table>

3.4.3 Table Attributes

Table 3.1: Table Attributes


Align Left / Center/ right Specifies the alignment of a table according to surrounding text

Border pixels Specifies the width of the borders around a table

bgcolor Rgb(x,x,x) / #xxxxxx / colorname

Specifies the background color for a table



The nowrap attribute

Browsers treat each table cell as though it's a browser window unto itself,

flowing contents inside the cell as they would common body contents

(although subject to special table-cell alignment properties). Accordingly, the

browsers automatically wrap text lines to fill the allotted table cell space. The

nowrap attribute, when included in a table row, stops that normal word

wrapping in all cells in that row. With nowrap, the browser assembles the

contents of the cell onto a single line, unless you insert a <br> or <p> tag,

which then forces a break so that the contents continue on a new line inside

the table cell.

3.4.4 Using Tables for Page Layout and Structure

Tables are the main method used to layout/structure Web Pages. Layout

using tables is considered by many purists as table misuse, but it is far

simpler than using Cascading Style sheets for element positioning. Although

we are going to use tables for the page layout we are going combine that

with CSS to create a very flexible and easily update page layout.

3.4.5 Creating Nested Tables

This technique means that tables can be placed within tables. Tables could

then be placed within those tables, which would create a 3rd level of

nesting, but at this point we won't go into anymore complexity than is

necessary.

Let’s say you have a page and you would like a navigational part on the left

with content on the right. You don’t want to use frames, and layers are too

fiddly. A good way to create this kind of effect is by using nested tables. A

table which contains 2 other tables, one of the 2 inner tables would be quite

narrow and on the left (for the navigation) and then the other table on the

right with the majority of the page space available to it. The example below

indicates this.

<table width="500" cellspacing="2" border="1">

<tr>

<td><div align="left"><b>The containing table</b></div>

<table width="120" cellspacing="2" cellpadding="2" align="left" border="1">

<tr>

<td>A nested table</td>

</tr>



</table>

<table width="380" cellspacing="2" cellpadding="2" align="right"

border="1">

<tr>

<td>Another nested table</td>

</tr>

</table>

</td>

</tr>

</table>

This will produce something as shown in figure 3.1.

Figure 3.1: Nested table

3.4.6 Self Assessment Questions (For Section 3.3)

4. The <table> tag defines an HTML table.

5. _________defines a table for multi-dimensional data arranged in rows

and columns.

3.5 Web Site Layout and Design

This section deals with layout design, design heuristics, content

organization, page size and load time.

3.5.1 Layout and Design Heuristics

A few basic design principles that every developer should have a fairly good

understanding of can be picked up or gleaned from a quick run through of

any print or Web design text.



Choice heuristics that is found in today’s best on-screen presentations,

whether in Web, television or cinematic media, are the rule of thirds and the

divine proportion.

The rule of thirds is widely known to be used in photography, and it more or

less states that dividing an image into nine equal part can help one

aesthetically lead the viewers eye to most important sections of the piece.

Imagine overlaying a three by three grid on a photograph. The intersections

of these gridlines can help you align the main features of your image.

The divine proportion is a similar guideline that comes in extremely handy

for Web media. Also known as the golden ratio, the divine proportion is in

effect if the ratio between the sum of two line segments and the larger

segment is equivalent to the ratio between the smaller segment and the

larger segment. When expressed algebraically, the divine proportion is

equivalent to 1:1.61803… or 1:phi.

Shapes that are constructed with the divine proportion can be used to frame

your Web site so that the smaller segment in the ratio makes up your

sidebar or header, while the larger segment forms your content division or

main section division.

3.5.2 Content Organization

The objectives of content assessment and organization are to gather a list of

the necessary content and to organize that content relative to your

audience's needs. This process works "hand in glove" with the process of

defining your Audience. Both these processes require that you have

defined the Purpose of your website.

Create a list of all the information sources, services, processes, and other

content you offer (or plan to offer) that can be made available through the

Web. Eliminate items that don't directly advance the purpose of your site or

may not fulfill audience objectives.

Note: at this time it may be a good opportunity to enlist a focus group of

your audience to help define and describe your offerings.

1. Assess your service offerings by mapping them to the audience based

on their needs.

2. Next, categorize the items in your content inventory according to both

user needs and the purpose of your site.



3. For example, if you have content that concerns the graduation process

and part of your purpose is to offer that content to your users, then

graduation may be a likely category. Continue to group all of your

content into their respective categories.

4. After all the content is categorized, organize the content within each

category by its relative importance to users. Finally, name each category

with a concise and descriptive title. These will become your main

"category" links for your Web site.

5. By completing this process you have collected content that satisfies the

needs of your target audience, categorized your content into groups that

form the foundation for your site structure, and prioritized the relative

importance of the content in each category.

3.5.3 Page size and Load Time Optimization

A really good graphic does indeed convey a great deal of information in a

remarkably economical way. A really bad graphic (or worse, several really

bad graphics) merely adds overhead to your page. Each and every graphic

on your site must contribute enough to the page it’s on to make it worth the

time it takes for that graphic to load. Any graphic that can't "pull its own

weight" is ultimately parasitic, and really ought to be summarily discarded

from your Website. (Be forewarned: As you peruse the following list, you will

probably find your favorite graphical doo-dad targeted for elimination. That's

precisely why your readers are unhappy.) Some of the most likely

candidates for removal include:

Separator bar graphics: At 4 bytes, the <HR> tag transmits in a few

hundredths of a second, and it's a perfectly adequate tool for breaking

up a page visually. In contrast, separator bar graphics eat hundreds of

bytes -- and many eat over 1,000! It's hard to believe that putting a fancy

curly-q at the end of an otherwise horizontal line truly makes it 150 times

better as a text separator.

Oversized icons: Many Websites employ icons that are much larger

and more elaborate than they need to be. Icons are fundamentally

different from other graphics. For an icon to be effective it does not need

to be realistic, but merely recognizable. Anything more elaborate is

ultimately wasteful.



The ubiquitous imagemap: Text-based navigational aids typically

require (at most) a few dozen bytes, whereas imagemaps can easily eat

bandwidth by the 10's of K. And, as catchy as your imagemap may be, it

probably doesn't render your site thousands of times more navigable

than a simple, readable, text-based nav-bar.

Unoptimized banner graphics: A well-placed, well-optimized logo

graphic is a great way to unify a site visually. But unoptimized banner

graphics can kill a page's load time. It's a rare logo that can't be

brutalized.

Pictures of words

For sheer bandwidth-guzzling, there's probably nothing more wasteful

than GIF images of words. If you're rendering words, nothing transmits

faster than simple text.

Once you've eliminated the leech graphics from your site, the surviving

graphics must be optimized for load time. Optimization takes only a few

moments to do, and it can have a stunning payback in terms of load time

and reader satisfaction. It's really fairly easy to cut load time in half, without

sacrificing visual appeal.

Recommended page size:

The 0-10K range qualifies as exemplary

Pages between 10-20K rate as well-optimized

The 20-40K range is merely adequate

40-60K pages earn a dubious designator

Anything over 60K is unacceptable

One of the keys to building a reader-friendly Website is to provide readers

with navigational shortcuts. Provide a link to at least one "master"

navigational page on each and every page on your Website. That master

page can be an alphabetical index, a topical table of contents, or a

comprehensive site map.

Ideally, every page on your site should be accessible from the master

page(s). That way, any page on your site is accessible from every other

page with a minimal number of mouse clicks.

Another alternative is to provide a high-level navigator bar on every page.

This approach, when used well, can add significantly to the visual



consistency of a site. It's a fair amount more work, however, than the

"master" page approach, particularly for larger sites.

3.5.4 Navigation Styles

The first step in developing your navigation scheme is to think about how

your information is best presented. According to Information Architecture for

the World Wide Web, the de facto authority on navigation, there are three

basic types of navigation:

Hierarchical

Hierarchical applies to sites that are information-rich and are best organized

as a large tree, much like a library.

Global

Global applies to sites where you can easily and logically jump among all

points; this is best if you are presenting information in fewer, broader

categories.

Local

Local navigation sits somewhere in between. This applies when you have

depth of information within broader areas.

The most basic form of navigation is the embedded link. That's just

anyplace where you link text within the body of the page.

Styles of navigation

Embedded links: the most basic form of navigation.

Bread-crumb trail: if you're organizing large amounts of information.

Left/top/pop-up nav bar: Most common, generally usable.

Tab navigation: When breaking into a few primary categories.

Site map: One-stop shopping for everything on your site.

Mix and match navigation schemes for optimal usability.

3.5.5 Providing Navigational Feedback

Presentation of navigation elements should incorporate visual and non-

visual cues that indicate the range of possible choices and the appropriate

action required to make a choice. The navigation system should also

provide the user with feedback so that they know if their actions have been

successful.



The basic coding language for Web pages, HTML, incorporates cues and

feedback mechanisms for the user. For example, when the mouse moves

over an image or text containing a link the default setting is for the cursor

appearance to change from an arrow to a hand with a pointing finger. This

suggests to the user that the item is some how different to the surrounding

material and provides a clue to the possible action that is required. The use

of different default colors for visited and unvisited links provides feedback

about which pages of a site have already been visited.

3.5.6 Tables vs. CSS

There are 13 reasons why Cascading Style Sheets (CSS) are superior to

table-based layouts when designing a website. Some web designers swear

that table-based layouts are better than CSS-based layouts, while others

believe that table-based layouts are ancient history and XHTML combined

with CSS is the only real solution to coding a web site’s visual layout.

a) Faster page loading

b) Lowered hosting costs

c) Redesigns are more efficient

d) Redesigns are less expensive

e) Visual consistency maintained throughout website(s)

f) Better for SEO

g) Accessibility

h) Competitive edge (job security)

i) Quick website-wide updates

j) Easier for teams to maintain (and individuals)

k) Increased usability

l) More complex layouts and designs

m) No spacer gifs

3.5.7 Use of color and graphics

Color is very important in web design, and can be used to add spice to your

website, relay the mood of a page, as well as to emphasize sections of a

site. If you think about it, as soon as you look at a website, you can normally

guess within seconds what that site is all about. Just like we all are quick to

judge other people by their appearance, and surroundings by the way they

smell, look, and feel, we also judge a website by its color scheme and style

of design. We can usually tell almost immediately, whether a website is

corporate, personal, whether it is for kids, teens, or just for adults, etc. Most



of this information is perceived solely by taking in color and design

elements.

What Elements Of Website Design Will Catch A Site Visitor’s Eyes?

Eyes naturally being scanning left to right

When viewing a website, a visitor’s eyes most often fixate first on the

upper left portion of the screen. Viewers often fixate on the point for a

few seconds before moving their eyes to the right and then down the

page.

Dominant, noticeable headlines tend to draw the visitor’s eyes first upon

entering the website (especially when they are in the upper left, and

most of the time when they are in the upper right.)

Website readers often read blurbs and headlines, however, they tend to

only read the first one-third of the blurb. Unfortunately, you only have

less than a second to grab the reader’s attention on these headlines.

Website visitors often will scan down to the bottom of the page to see if

something catches their eyes.

Website navigation works best on the top of the page…so try to use

navigational features on the top of your page instead of on the side or on

the bottom of the page.

Images of beautiful, clean faces, causes the visitor’s eyes to fixate on

this image.

If you display articles on your website, then try to use short paragraph

structure. Web surfers prefer short paragraphs opposed to longer ones.

And it is no surprise that we all tend to like one column formats opposed

to a newspaper format of several columns.

Details and Depth within elements of design are noticed before items

lacking depth.

The bigger a graphic or image, the longer the user will fixate on it.

Eyes always lock on the most noticeable aspect of a website, for

example color within a grey-toned website.

Ads tend to do better on the top left portion of the site. This is no

surprise considering that this is the first place people look when opening

a webpage.

Placing ads next to popular content increases an ad’s success.

Bigger banner ads did better than smaller, less noticeable ads.



Text ads do better than banner ads because users tend to mistake the

text ad for a link to content within your site.


6. When expressed algebraically, the divine proportion is equivalent to

____.

3.6 Managing Source Files

This section deals with, folder structure, testing, production folders,

development steps, file naming and version control.

3.6.1 Recommended Folder Structure

Before you start building your website template it’s important that you set up

your folder structure correctly if you haven’t done so already for your new

website. You can start doing this by creating a folder. Create this folder

anywhere on your PC or Mac.

Once you create this folder open the folder by either clicking once on the

folder or double clicking the folder (depending how your operating system is

configured) and create a folder by the name of images inside this folder you

created. The purpose of this is that the main folder will be where you save

your webpage’s and you will place or save your images in the images folder.

So if you created a webpage and placed an image in your webpage the path

would be /images/myimage.jpg

It is very important you setup your website folder structure correctly and

there are important reasons for this. The main one is that when you design

and build your website on your computer your website webpage’s will use

the same image path that you will use on your web hosting account when

you upload your website.

When you have finished building your website you will need to login to your

hosting account and create a folder in your main directory and give it a

name identically to your images folder you created on your computer in this

case being images.

3.6.2 Testing and Production Folders

Below is a screen shot of a folder setup on a PC. The main folder name is

called www.affacademy.com and highlighted in light blue is the images

folder which is created to store the images for the website.



Figure 3.2: PC Folder Setup

Below is a screen shot of a folder setup on the web hosting account for

affacademy.com. On the left hand side you can see I am in the www folder

which is the main directory for this web hosting account setup type which

may vary. In this folder I have created the images folder.



Figure 3.3: Website Web hosting Server Folder Setup

The Idea behind this is that your image path will be identical so it will work

on your website once everything has been uploaded and created. If you use

a folder mapping path for example like c:\mysite\images if this is the path

being saved in the webpage’s the site will work on your PC but no one will

be able to view the images even though you have uploaded them into the

images.

3.6.3 Development Steps

A web site system development process can follow a number of standard or

company specific frameworks, methodologies, modeling tools and

languages. Software development life cycle normally comes with some

standards which can fulfill the needs of any development team. Like

software, web sites can also be developed with certain methods with some



changes and additions with the existing software development process.

Let us see the steps involve in any web site development.

1. Analysis: Input: Interviews with the clients, Mails and supporting docs

by the client, Discussions Notes, Online chat, recorded telephone

conversations, Model sites/applications etc., Output: 1. Work plan,

2. Cost involved, 3. Team requirements, 4. Hardware-software

requirements, 5.Supporting documents and 6. The approval

2. Specification Building: Input: Reports from the analysis team

Output: Complete requirement specifications to the individuals and the

customer/customer's representative

3. Design and development: After building the specification, work on the

web site is scheduled upon receipt of the signed proposal, a deposit,

and any written content materials and graphics you wish to include. Here

normally the layouts and navigation will be designed as a prototype.

Some customers may be interested only in a full functional prototype. In

this case we may need to show them the interactivity of the application

or site. But in most of the cases customer may be interested in viewing

two or three design with all images and navigation. There can be a lot of

suggestions and changes from the customer side, and all the changes

should be freezed before moving into the next phase. The revisions

could be redisplayed via the web for the customer to view. As needed,

customer comments, feedback and approvals can be communicated by

e-mail, fax and telephone.

Figure 3.4: Development life cycle



4. Content writing: This phase is necessary mainly for the web sites.

There are professional content developers who can write industry

specific and relevant content for the site. Content writers to add their text

can utilize the design templates. The grammatical and spelling check

should be over in this phase.

Input: Designed template

Output: Site with formatted content

5. Coding: Input: The site with forms and the requirement specification

Output: Database driven functions with the site, Coding documents

6. Testing: Input: The site, Requirement specifications, supporting

documents, technical specifications and technical documents

Output: Completed application/site, testing reports, error logs, frequent

interaction with the developers and designers

7. Promotion: Input: Site with content, Client mails mentioning the

competitors

Output: Site submission with necessary meta tag preparation

8. Maintenance and Updating: Web sites will need quite frequent

updations to keep them very fresh. In that case we need to do analysis

again, and all the other life cycle steps will repeat. Bug fixes can be

done during the time of maintenance. Once your web site is operational,

ongoing promotion, technical maintenance, content management &

updating, site visit activity reports, staff training and mentoring is needed

on a regular basis depend on the complexity of your web site and the

needs within your organization.

Input: Site/Application, content/functions to be updated, re-Analysis

reports

Output: Updated application, supporting documents to other life cycle

steps and teams.

3.6.4 File Naming

File name conventions, again, are the way that you name your web pages

so that search engines can use it as a method of determining what your web

page is about. It's important that you use these names wisely and not abuse

them.



Consider this; you have a web page about search engine tools. If you didn't

use a standard file naming convention you may name it webpage1.htm. A

search engine could still crawl the web page to determine the subject matter

but they give relevancy to the file names.

A better way to name the page would be to name it search-engine-

tools.com. This is specifically telling the search engines that this page is

related to 'search engine tools'. This is valuable because at this point search

engines will use this data when determining the subject matter of your web

page. When naming your files it's recommended that you separate your

keywords with a dash. When you're naming a page it's important to find out

if the keywords are even searched.

It's important to note that you don't want to name files names like:

file-naming-conventions-best-practice-should-i-use-a-dash-or-

underscore.htm.

This isn't what the search engines have in mind when they try to determine

relevancy. With that said, your job is to make search engines realize how

relevant and unique the content is on your web page.

3.6.5 Version Control

Version control is a special kind of software used to track and manage

changes. Example, CVS version control is used to track any sort of change

made to our web sites, whether it's a single edit of one file to fix a typo, or a

series of adjustments to a project where several files, folders, and graphics

are added to (or removed from) the site.

In an uncontrolled site where multiple authors have access to edit and

contribute, the potential for conflict and problems arises – more so when

these authors work from different offices at different times of day and night.

You may spend the day improving the file index.html for a customer. After

you've made your changes, another developer who works at home after

hours, or in another office, may spend the night uploading their own newly

revised version of the file index.html, completely overwriting your work with

no way to get it back.

With the same site under CVS version control, the late-night author will be

alerted to a conflict with the file index.html, presented with the exact parts of

the index.html file that are causing a problem, and asked to adjust their work



to incorporate anything you added and committed to the site while working

on it earlier in the day.

If a customer needs to remove a recently added page or content area for

legal reasons--or if they simply prefer an earlier version of their site-CVS

can be used to restore the entire site to any previous state of their choosing,

rolling back multiple variations and edits by all authors until a satisfactory

site can be put back in place.


7. _______ is a special kind of software used to track and manage

changes.

3.7 Foundations of Dynamic HTML

This section deals with DHTML, Netscape, external style sheet, custom

style sheets, layer tags and positioning layers.

3.7.1 DHTML Capabilities

There are four primary features of DHTML:

1. Changing the tags and properties

2. Real-time positioning

3. Dynamic fonts (Netscape Communicator)

4. Data binding (Internet Explorer)

Changing the tags and properties: This is one of the most common uses

of DHTML. It allows you to change the qualities of an HTML tag depending

on an event outside of the browser (such as a mouse click, time, or date,

and so on). You can use this to preload information onto a page, and not

display it unless the reader clicks on a specific link.

Real-time positioning: When most people think of DHTML this is what they

expect. Objects, images, and text moving around the Web page. This can

allow you to play interactive games with your readers or animate portions of

your screen.

Dynamic Fonts: This is a Netscape only feature. Netscape developed this

to get around the problem designers had with not knowing what fonts would

be on a reader's system. With dynamic fonts, the fonts are encoded and



downloaded with the page, so that the page always looks how the designer

intended it to.

Data binding: This is an IE only feature. Microsoft developed this to allow

easier access to databases from Web sites. It is very similar to using a CGI

to access a database, but uses an ActiveX control to function. This feature

is very advanced and difficult to use for the beginning DHTML writer.

3.7.2 Netscape vs. Microsoft Support for DHTML

DHTML can do some pretty neat stuff. There's more to it than flash-style

effects and the ugly square spotlight. The first problem is that NS and IE

comply to the DHTML standard in two completely different ways. In many

cases, it's necessary to write 2 HTMLs per page. Netscape has chosen to

use the <layer> tag, while Microsoft's Internet Explorer treats DHTML more

like an extension of JavaScript.

The second problem is that Netscape's implementation may provide better

document control, but it relies far too heavily on a coordinate style of

authoring. This would be fine if the backbone for web design weren't

something as loose as HTML. Although with more substance than the 80s

Cola War, this Browser War is just getting tiresome, and is just as much in

the interest of the consumer. It's sad when the only way Netscape can keep

their product alive is to make it 100% incompatible with their competitor.

This divergence can only progess until they are two utterly unalike systems,

with a full set of code required for each.

3.7.3 <link> Tags and External Styles

The most commonly used type of link is the stylesheet link. This looks like:

<link href="styles.css" rel="stylesheet" type="text/css" />

The first attribute href defines the URL where the style sheet is located.

Then the rel attribute indicates that the relationship of this link is a style

sheet. Finally the type attribute tells the user agent what MIME type the

linked document will be. For style sheets this should always be "text/css".

The rel and rev attributes are where you define the type of link you're

including in your document. Rel and rev act as complementary attributes, rel

defining related links that are forward while rev defines related links that are

reverse from the current page. This is most often used in a series of pages,



where you would define the rel="next" and rev="prev" links on the pages.

Most links are considered forward or "rel" links.

Alternate pages are a useful way to provide more details for your customers

and for search engines. You might define alternate natural language pages

or alternate pages in a different file format. You can do both, in fact.

To define a link to a Spanish version of the current page, you would write:

<link href="spanish.html" lang="sp" hreflang="sp" rel="alternate"

type="text/html" title="The page in Spanish" />

To define a link to a PDF version of the current page, you would write:

<link href="page.pdf" rel="alternate" type="application/pdf" title = "A PDF

version of the page" media="print" />

Another great use of the alternate type is to define alternate style sheets for

specific uses. This allows readers using user agents like Firefox to choose

between different style sheets. The most common alternate style sheet is

the zoom layout style sheet. You would define an alternate style sheet with

two types (separated by spaces) in the rel attribute:

<link href = "zoom.css" rel = "alternate stylesheet" type = "text/css" title =

"Zoom style sheet" />

Be sure to title your alternate style sheet with the title attribute so that the

browsers can display them effectively.

3.7.4 Creating Custom Styles (Classes)

A simple selector can have different classes, thus allowing the same

element to have different styles. For example, an author may wish to display

code in a different color depending on its language:

code.html { color: #191970 }

code.css { color: #4b0082 }

The above example has created two classes, css and html for use with

HTML's CODE element. The class attribute is used in HTML to indicate the

class of an element, e.g.,

<P CLASS=warning>Only one class is allowed per selector.

For example, code.html.proprietary is invalid.</p>

Classes may also be declared without an associated element:

.note { font-size: small }

In this case, the note class may be used with any element.



A good practice is to name classes according to their function rather than

their appearance. The note class in the above example could have been

named small, but this name would become meaningless if the author

decided to change the style of the class so that it no longer had a small font

size.

3.7.5 <Layer> Tags

The layer tag is a new tag introduced in Netscape 4 that allows authors to

position and animate (through scripting) elements in a page. A layer can be

thought of as a separate document that resides on top of the main one, all

existing within one window.

The layer tag has been left behind in Netscape development - it may not be

supported at all by future versions.

<layer id="UNIQUE_NAME" src="URL" bgcolor= "COLOUR" width="500"

height="450" top="10" left="270" visibility="show"></layer>

Id specifies the name of the layer, enabling other layers and JavaScript

scripts to refer to it. The src specifies the pathname of a file that contains

HTML-formatted content for the layer. The height and width elements must

be fixed pixels, not percentages or the external page may not be visible. The

height and width elements cannot be altered by JavaScript in real time. This

has important implications if visitors are browsing at a high resolution. The

layer tag definition does not include scrollbars. The bgcolor specifies the

background color of the layer. The left and top attributes specify the

horizontal and vertical positions of positioned layers or the relative horizontal

and vertical positions for inflow layers.

3.7.6 Positioning Layers

This tag allows you to position blocks of contents. These blocks of

positioned content are also called layers. Navigator 4.0. Positioned blocks

of content can overlap each other, be transparent or opaque, and be visible

or invisible. They can also be nested. Use the LAYER tag to specify an

absolute position for a block of content, and use the ILAYER tag to specify a

relative position.

This example creates three overlapping layers. The back one is red, the

middle one is blue, and the front one is green.



<LAYER ID=layer1 TOP=250 LEFT=50 WIDTH=200 HEIGHT=200

BGCOLOR=RED>

<P>Layer 1</P>

</LAYER>


BGCOLOR=BLUE>

<P>Layer 2</P>

</LAYER>


BGCOLOR=GREEN>

<P>Layer 3</P>

</LAYER>

3.7.7 HTML vs. DHTML

Dynamic HTML is an extension of HTML that enables, among other things,

the inclusion of small animations and dynamic menus in Web pages.

DHTML code makes use of style sheets and JavaScript.

When you see an object, or word(s), on a webpage that becomes

highlighted, larger, a different color, or a streak runs through it by moving

your mouse cursor over it is the result of adding a DHTML effect. This is

done in the language coding and when the file of the webpage was saved it

was saved as the .dhtml format instead of .htm or .html.

DHTML sites are dynamic in nature. DHTML uses client side scripting to

change variables in the presentation which affects the look and function of

an otherwise static page. DHTML characteristics are the functions while a

page is viewed, rather than generating a unique page with each page load

(a dynamic website).

On the other hand, HTML is static. HTML sites rely solely upon client-side

technologies. This means the pages of the site do not require any special

processing from the server side before they go to the browser. In other

words, the pages are always the same for all visitors - static. HTML pages

have no dynamic content.


8. ________ tag is a new tag introduced in Netscape 4 that allows authors

to position and animate (through scripting) elements in a page.



3.8 Summary

1. The browser display window can be used to display more than one

document at a time.

2. The <frameset> tag must have either a rows or a cols attribute, and

they often have both.

3. Most HTML tags have associated properties, which store presentation

information for browsers.

4. The format of a style specification depends on the level of style sheet.

5. A Cascading Style Sheets rule is made up of a selector and a

declaration.

6. The <span> and <div> tags are very useful when dealing with

Cascading Style Sheets.

7. The TABLE element defines a table for multi-dimensional data

arranged in rows and columns.

8. The first step in developing your navigation scheme is to think about

how your information is best presented.

9. Presentation of navigation elements should incorporate visual and

non-visual cues that indicate the range of possible choices and the

appropriate action required to make a choice.

10. Color is very important in web design, and can be used to add spice to

your website, relay the mood of a page, as well as to emphasize

sections of a site.


1. Explain <frameset> tag and its attributes.

2. Briefly explain how the style sheets can be used to support multiple

browsers.

3. Explain the various tags used in Table.

4. Bring out the differences between Tables and CSS.

5. Explain the development steps in a web site construction.

3.10 Answers

1. <frame> tag

2. style sheet

3. Cascading Style Sheets

4. The <table>



5. TABLE element

6. 1:1.61803… or 1:phi.

7. Version control

8. The layer

Terminal Questions

1. The content of a frame is specified with the <frame> tag, which can

appear only in the content of a frameset element. (Refer section 3.2)

2. It is not impossible to design a single style sheet that works properly on

all the different browsers. (Refer section 3.3.5)

3. The TABLE element defines a table for multi-dimensional data

arranged in rows and columns. (Refer section 3.4)

4. There are 13 reasons why Cascading Style Sheets (CSS) are superior

to table-based layouts when designing a website. (Refer section 3.5.6).

5. A web site system development process can follow a number of

standard or company specific frameworks, methodologies, modeling

tools and languages. (Refer section 3.6.3)



Unit 4 XML Programming – I

Structure:

4.1 Introduction

Objectives

4.2 The Need for XML

Introduction

Structured Data and Formatting

Advantages of XML

SGML, XML, and HTML

World Wide Web Consortium (W3C) Specifications and Grammars

XML Applications and Tools

Creating and Viewing XML Documents

Transforming XML Documents

4.3 XML Document Syntax

4.4 Validating XML Documents with DTDs

4.5 XML Namespaces

4.6 Summary


4.8 Answers

4.1 Introduction

XML is far more than a solution to the deficiencies of HTML. It provides a

simple and universal way of storing textual data of any kind. In this chapter,

you are going to study the need of XML, the XML document structure and

XML namespaces.

Objectives:


discuss the need of XML

describe the XML Document Syntax

write DTD files

use XML Namespaces



4.2 The Need for XML

XML stands for Extensible Markup Language. XML is a markup language

much like HTML. XML was designed to carry data, not to display data.

XML tags are not predefined. You must define your own tags. XML is

nothing special. It is just plain text. Software that can handle plain text can

also handle XML. However, XML-aware applications can handle the XML

tags specially. The functional meaning of the tags depends on the nature of

the application. XML is now as important for the Web as HTML was to the

foundation of the Web. XML is everywhere. It is the most common tool for

data transmissions between all sorts of applications, and is becoming more

and more popular in the area of storing and describing information.

4.2.1 Structured Data and Formatting

An XML document has two correctness levels:

Well-formed. A well-formed document conforms to the XML syntax

rules; e.g. if a start-tag (< >) appears without a corresponding end-tag

(</>), it is not well-formed. A document not well-formed is not in XML; a

conforming parser is disallowed from processing it.

Valid. A valid document additionally conforms to semantic rules, either

user-defined or in an XML schema, especially DTD; e.g. if a document

contains an undefined element, then it is not valid; a validating parser is

disallowed from processing it.

4.2.2 Advantages of XML

XML is used in many aspects of web development, often to simplify data

storage and sharing.

XML Separates Data from HTML:

If you need to display dynamic data in your HTML document, it will take a lot

of work to edit the HTML each time the data changes. With XML, data can

be stored in separate XML files. This way you can concentrate on using

HTML for layout and display, and be sure that changes in the underlying

data will not require any changes to the HTML. With a few lines of

JavaScript, you can read an external XML file and update the data content

of your HTML.

XML Simplifies Data Sharing

In the real world, computer systems and databases contain data in

incompatible formats.



XML data is stored in plain text format. This provides a software- and

hardware-independent way of storing data. This makes it much easier to

create data that different applications can share.

XML Simplifies Data Transport

With XML, data can easily be exchanged between incompatible systems.

One of the most time-consuming challenges for developers is to exchange

data between incompatible systems over the Internet. Exchanging data as

XML greatly reduces this complexity, since the data can be read by different

incompatible applications.

XML Simplifies Platform Changes

Upgrading to new systems (hardware or software platforms), is always very

time consuming. Large amounts of data must be converted and

incompatible data is often lost. XML data is stored in text format. This makes

it easier to expand or upgrade to new operating systems, new applications,

or new browsers, without losing data.

XML Makes Your Data More Available

Since XML is independent of hardware, software and application, XML can

make your data more available and useful. Different applications can access

your data, not only in HTML pages, but also from XML data sources. With

XML, your data can be available to all kinds of "reading machines"

(Handheld computers, voice machines, news feeds, etc), and make it more

available for blind people, or people with other disabilities.

4.2.3 SGML, XML and HTML

Standard Generalized Markup Language (SGML)

SGML is a metalanguage or a language that describes another language

and has been an international standard for describing electronic text since

1986. SGML does not provide the definitive list of allowed elements or

definitions of specific elements, but rather provides the rules as to how

elements can be used or interact within a document. SGML is not a

language that dictates how text is formatted. The power comes in that

SGML encodes the semantics or meaning of text, which is totally separated

from how the text is rendered or appears on paper or on the screen. In

SGML, presentation is separated from content, which allows programmers

to write many different applications that take care of how the document is

displayed.



No Web browser can display SGML as is, so an application must be written

to convert the SGML into a language like HTML that the browser can

understand.

HyperText Markup Language (HTML)

HTML is just a specific application of SGML with a specific set of rules that

are defined in a HTML DTD. The DTD is constructed, refined, and published

as a specification through committee work at the World Wide Web

Consortium. The most recent version of the HTML specification is 4.01 and

all current Web browsers should have a built-in ability to interpret this

specification. Each Web page that is created is supposed to declare which

HTML specification it is using, which helps the browser out with its

interpretation. The following is an example of a DTD declaration from a Web

page:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

The biggest differences between SGML and HTML are that HTML‟s element

set is based more on presentation than on meaning or semantics, and the

HTML element set is predetermined by the specification. The specification

does not allow users to create their own elements or produce their own

DTDs. This has often been a point of contention as the major Web browser

software companies have introduced their own takes on HTML, which may

or may not be supported by the competitors‟ browsers.

eXtensible Markup Language (XML)

XML is designed to bring some of the features of SGML to the Web and

other output options like print. So, like SGML and unlike HTML, developers

can define their own DTDs or XML schema. XML is a metalanguage just like

SGML, and the extensible part of XML refers to the ability of developers to

define their own elements. XML is a simplified version of SGML, making

XML easier for developers to work with and deploy than SGML, while

providing the flexibility, semantic structures, and ability to exchange data

that is not available in HTML.

4.2.4 World Wide Web Consortium (W3C) Specifications and

Grammars

Here is a one line description of the W3C Groups that are listed and linked

in this section of the W3C website:



XML Coordination Group whose functions can be viewed on the XML

CG Charter link

XML Core Working Group whose function is development and

maintenance of the specs for XML and intimately related other specs

e.g. Namespace specs in XML.

XSL Working Group develops specifications for Extensible Style sheet

Language (XSL), including XSL Formatting Objects (XSL/FO) and XSL

Transformations (XSLT).

The Efficient XML Interchange Working Group looks after the

development of exchanging XML documents.

XML Binary Characterization Working Group investigated whether it

was necessary to develop a binary Interchange format. The reports and

recommendations can be found on the public pages of this working

group.

The XML Processing Model Working Group works on development of

a scripting language.

XML Linking Working Group is now deprecated but their work can be

found on the group page.

XML Query Working Group is working on ways to provide a flexible

query language to extract data from a XML document.

XML Schema Working Group works to provide protocols and

specifications to define and describe the content, structure, and is

looking at defining and describe the semantics of XML documents.

Service Modeling Language Working Group to find ways to define

and support extensions to the XML Schema language.

4.2.5 XML Applications and Tools

XML applications are software programs that process and manipulate data

using XML technologies including XML, XSLT, XQuery, XML Schema,

XPath, Web services, etc. Stylus Studio already provides many intuitive

tools for working with all of the above and now using XML pipeline you can

design a complete XML application from start to finish! For example, you

can visually specify the order in which different XML processing steps

should occur, and can even debug the entire application and deploy it to

your production environment in just minutes.



A Sample XML Application

In the following sample XML application, we'll building an order report. This

will involve some XML processing, for example, applying various XML

operations (converting, parsing, validating, transforming and publishing

XML) on several data sources. The order report XML application is

displayed in figure 4.1.

Figure 4.1: XML application

The steps involved in creating this XML application include:

1. Getting a catalog of books from a Text File

2. Getting an Order from an EDIFACT file

3. Using XQuery to extract the order information

4. Using XSLT to publish an HTML order report

5. Using XQuery to generate an XSL:FO style sheet

6. Using XSL:FO to publish a PDF order report

Following are some of the XML tools used:

Sense: X

Is an intelligent, XML-aware editing feature that provides XML sensing, XML

tag completion, syntax coloring, and more. It's the best XML editor in the

industry!



Integrated XML Schema/DTD Validator

Creating valid XML documents is simple and easy using an integrated XML

Schema/DTD Validator which automatically finds and highlights errors, and

provides detailed error messages.

XML Canonicalizer

It provides an easy way to convert any XML document into W3C-standard

XML canonical form.

XML Generator

The XML Generator automatically creates well-formed & valid XML sample

instance documents from any XML Schema in a highly customizable way.

XML Code Folding

The XML Editor features code folding to help maximize valuable screen

real-estate and simplify editing of large XML files.

4.2.6 Creating and Viewing XML Documents

XML documents form a tree structure that starts at "the root" and branches

to "the leaves". XML documents use a self-describing and simple syntax:

<?xml version="1.0" encoding="ISO-8859-1"?>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

The first line is the XML declaration. It defines the XML version (1.0) and the

encoding used. The next line describes the root element of the document

(like saying: "this document is a note"). i.e. <note>. The next 4 lines

describe 4 child elements of the root (to, from, heading, and body). And

finally the last line defines the end of the root element: </note>.

The XML document will be displayed with color-coded root and child

elements. A plus (+) or minus sign (-) to the left of the elements can be

clicked to expand or collapse the element structure. To view the raw XML

source (without the + and - signs), select "View Page Source" or "View

Source" from the browser menu.



4.2.7 Transforming XML Documents

XML processing techniques that are available in the browser have

complements on the server, and then some. Server-based processing can

deliver formatted displays of XML to the browser; it also can perform a full

range of data maintenance activities to modify documents and to enable

sharing of XML data between servers.

On the server there are file-based, memory-based, and stream-based

methods of accessing XML documents. File-based methods input XML files

and XSLT style sheets to transform XML into XHTML for display in the

browser. Memory-based methods use the Document Object Model (DOM)

to access and process full in-memory representations of XML documents.

Stream-based methods provide simple read and write capabilities through

which XML documents are accessed and output is produced an element at

a time.

Self-assessment Questions

1. ________designed to bring some of the features of SGML to the Web

and other output options like print.

4.3 XML Document Syntax

This section deals with structure, elements, tags, XMl declaration, type

declaration, start & end tags and elements attributes.

Well Formed Structure

A textual object is a well-formed XML document if:

1. Taken as a whole, it matches the production labeled document.

2. It meets all the well-formedness constraints given in this specification.

3. Each of the parsed entities which is referenced directly or indirectly

within the document is well-formed.

Elements and Tags

XML Element: XML is a markup language that is used to store data in a

self-explanatory manner. Making the data "self-explanatory" comes about by

containing information in elements. If a piece of text is a title then it will be

contained within a "title" element.

XML Tag: A tag is just a generic name for a <element>. An opening tag

looks like <element>, while a closing tag has a slash that is placed before



the element's name: </element>. From now on we will refer to the opening

or closing of an element as open or close tags. All information that belongs

to an element must be contained between the opening and closing tags of

an element.

The XML Declaration

All XML documents begin with an XML declaration, which has the

appearance of a processing instruction but technically is not one. The XML

declaration identifies the document as being XML and provides the version

number of the XML standard being used. It may also specify an encoding

standard.

Example: <?xml version = “1.0” encoding = “utf-8”?>

Document Type Declaration

Document Type Declaration are information for the parser, upon which the

validity of XML documents are checked. Document Type Declaration is a

XML mechanism that defines the constraints of the logical structure and

supports the use of predefined storage units. The document Type

declaration can contain the following:

1. Document name

2. Reference to an external DTD (Document Type Definition)

3. Markup declaration (internal DTD)

4. Parameter entity references

Start and End Tags

In XML a Tag is what is written between angled brackets, i.e. XML tags

open with the < symbol and end with the > symbol. They always come in

and matched pairs, with the defined element between the open and close

tag. An example of a start-tag: <composer>. An example of an end-tag:

</composer>.

Empty Tags

The elements that do not include content must use a tag with the following

form:

<element_name />

Element Nesting

When an element appears within another element, it is said that the inner

element is "nested". Besides being such an easy term to understand,

nesting also serves a wonderful purpose of keeping order in an XML

document. Much like parentheses in a math problem, elements must be

closed in the order that they are opened.

Example: <patient>

<Name>

<first> Anil </ first>

<middle> Keshav </middle>

<last> Kumar </last>

</name>

</patient>

Element Attributes

Attributes often provide information that is not a part of the data. Attribute

values must always be enclosed in quotes, but either single or double

quotes can be used. For a person's sex, the person tag can be written like

this: <person sex="female">

Comments

A comment is used to leave a note or to temporarily edit out a portion of

XML code. XML comments have the exact same syntax as HTML

comments.

Example:



Special Characters and Built in Entities

If your keyboard will not allow you to type the characters you want, or if you

want to use characters outside the limits of the encoding scheme you have

chosen, you can use a symbolic notation called „entity referencing‟. Entity

references can either be numeric, using the decimal or hexadecimal

Unicode code point for the character (eg if your keyboard has no Euro

symbol (€) you can type €); or they can be character, using an

established name which you declare in your DTD (eg <!ENTITY euro

"€">) and then use as € in your document.

If you use XML with no DTD, then these five character entities are assumed

to be predeclared, and you can use them without declaring them:

&lt – The less-than character (<) starts element markup (the first character

of a start-tag or an end-tag).



&amp – The ampersand character (&) starts entity markup (the first

character of a character entity reference).

&gt – The greater-than character (>) ends a start-tag or an end-tag.

&quot – The double-quote character (") can be symbolized with this

character entity reference when you need to embed a double-quote inside a

string which is already double-quoted.

&apos – The apostrophe or single-quote character (') can be symbolised

with this character entity reference when you need to embed a single-quote

or apostrophe inside a string which is already single-quoted.

CDATA Sections

CDATA Sections are used to escape blocks of text containing characters

which would otherwise be recognized as markup. The content of a character

data section is not parsed by the XML parser, so it cannot include any tags.

The form of a character data section is as follows:

<! [ CDATA[content ] ] >

For example, instead of using the line “the last word of the line is

>>> here <<&lt” the following line could be used:

<! [ CDATA [The last word of the line is >>> here <<<] ] >

Embedded XML

It‟s possible to simply embed the XML into the HTML document itself. The

benefit of this is that it simply avoids an extra round-trip to the server. We

can write Java Script code at the client side to validate the XML code.

External XML

It is also possible to write a separate XML document and include in a HTML

document. This can be done with the help of a JavaScript code.


2. _________are information for the parser, upon which the validity of XML

documents are checked.

3. _________are used to escape blocks of text containing characters

which would otherwise be recognized as markup.

4.4 Validating XML documents with DTDs

This section deals with, data validation, document type definition, internal &

external DTD, parsers, sub elements and IDREFS types

The Concept of Data Validation

The main requirement of the data validation is to determine whether all

documents confirm to the rule it describes. Application programs that

process the data in the collection of XML documents can be written to

assume the particular document form. Without such structural restrictions,

developing such applications would be difficult.

Writing Document Type Definition (DTD) Files

A Document Type Definition (DTD) is a set of structural rules called

declarations, which specify a set of elements that can appear in the

document as well as how and where these elements may appear. Not all

XML documents need a DTD. DTDs are used when the same tag set

definition is used by a collection of documents, perhaps by a collection of

users, and the collection must have a consistent and uniform structure.

The purpose of a DTD is to define a standard form for a collection of XML

documents. This form is specified as the tag and attributes sets, as well as

rules that define how they can appear in a document. DTDs also provide

entity definitions. All documents in the collection can be tested against the

DTD to determine whether they conform to the rules it describes.

Internal and External DTDs

A DTD can be embedded in the XML document whose syntax rules it

describes, in which case it is called an internal DTD. The alternative is to

have the DTD stored in a separate file, in which case it is called an external

DTD. Because external DTDs allow use with more than one XML document,

they are preferable.

If the DTD is included in the XML code, it must be introduced with

<!DOCTYPE rootname [ and terminated with ]>. For example, the

structure of the planes XML document with its DTD included is as follows:

<?xml version = “1.0” encoding =”utf-8” ?>

<!DOCTYPE planes [



]>

When you use an external DTD, the XML document includes a DOCTYPE

declaration as its second line. This declaration has the following form:

<!DOCTYPE XML_document_root_name SYSTEM “DTD_file_name”>



Validating Parsers

All modern browsers have a built-in XML parser that can be used to read

and manipulate XML. The parser reads XML into memory and converts it

into an XML DOM object that can be accessed with JavaScript. There are

some differences between Microsoft's XML parser and the parsers used in

other browsers. The Microsoft parser supports loading of both XML files and

XML strings (text), while other browsers use separate parsers. However, all

parsers contain functions to traverse XML trees, access, insert, and delete

nodes (elements) and their attributes.

Specifying valid elements and sub elements

The element declarations of a DTD have a form that is related to that of the

rules of context-free grammars. Each element declaration in a DTD

specifies the structure of one category of elements. The declaration

provides the name of the element whose structure is being defined, along

with the specification of the structure of that element. An element is a node

in such a tree, either a leaf node or an internal node. If the element is leaf

node, its syntactic description is its character pattern. If the element is an

internal node, its syntactic description is a list of its child elements, each of

which can be either a leaf node or an internal node.

The form of an element declaration for elements that contain elements is as

follows:

<!ELEMENT element_name ( list of names of child elements ) >

For example, consider the following declaration:

<!ELEMENT memo ( from, to, date, re, body ) >

This element declaration would describe the document tree structure shown

in figure 4.2.



Figure 4.2: Document tree structure

Any child element specification can be followed by one of the modifiers.

Modifier Meaning

+ One or more occurrences

* Zero or more occurrences

? Zero or more occurrence

Consider the following DTD declaration:

<!ELEMENT person (parent+, age, spouse?, sibling* )>

In this example, a person element is specified to have the following children

elements: one or more parent elements, one age element, possibly a

spouse element, and zero or more sibling elements.

The leaf nodes of a DTD specify the data types of the content of their parent

nodes, which are elements. In most cases, the content of an element is type

PCDATA, for parsable character data. Two other content types can be

specified: EMPTY and ANY. The EMPTY type is used to specify that the

element has no content. The ANY type is used when the element may

contain literally any content. The form of a leaf element declaration is as

follows:

<!ELEMENT element_name ( #PCDATA ) >

Specifying Valid Attributes

The attributes of an element are declared separately from the element

declaration in a DTD. An attribute declaration must include the name of the

element to which the attribute belongs, the attribute‟s name, and its type.

Also, it may include a default value. The general form of an attribute

declaration is as follows:



<!ATTLIST element_name attribute_name attribute_type

[default_value]>

The main attribute_type is CDATA. This type is just any string of characters.

The default value in an attribute declaration can specify either an actual

value or a requirement for the value of the attribute in the XML document.

The following table lists the possible default values.

Table 4.1: CDATA

Value Meaning

A value The value, which is used if none is specified in an element

#FIXED value The value, which every element will have and which cannot be changed

#REQUIRED No default value is given; every instance of the element must specify a value

#IMPLIED No default value is given; the value ay or may not be specified in an element

Examples:

<!ATTLIST airplane places CDATA “4”>

<!ATTLIST airplane engine_type CDATA #REQUIRED>

Specifying Valid Entities

Entities can be defined so that they can be referenced any where in the

content of an XML document, in which case they are called General

Entities. The predefined entities are all general entities. Entities can also be

defined so that they can be referenced only in markup declarations, in which

case they are called Parameter Entities. The form of an entity declaration

that appears in a DTD is shown here:

<!ENTITY [%] entity_name “entity_value”>

When the optional percent sign (%) is present in an entity declaration, it

specifies that the entity is a parameter entity rather than a general entity.

Consider the following example of an entity. Suppose that a document

includes a large number of references to the full name of President

Kennedy. You could define an entity to represent his complete name:

<!ENTITY jfk “John Fitzgerald Kennedy”>



Any XML document that uses the DTD that includes this declaration can

specify the complete name with just the reference &jfk;

The ID, IDREF and IDREFS Types

An attribute can be specified to be an ID type attribute. Attributes specified

as IDREF or IDREFS can then be used to refer to the ID type attributes,

enabling links between documents. ID, IDREF, and IDREFS correspond to

PK/FK (primary key/foreign key) relationships in the database, with few

differences. In an XML document, the values of ID type attributes must be

distinct. If CustomerID and OrderID attributes are specified as ID type in an

XML document, these values must be distinct. However, in a database,

CustomerID and OrderID columns can have the same values. (For

example, CustomerID = 1 and OrderID = 1 are valid in the database).

For the ID, IDREF, and IDREFS attributes to be valid:

The value of ID must be unique within the XML document.

For every IDREF and IDREFS, the referenced ID values must be in the

XML document.

The value of an ID, IDREF, and IDREFS must be a named token. (For

example, the integer value 101 cannot be an ID value.)

The NMTOKEN and NMTOKENS Type

An XML name token is very close to an XML name. It must consist of the

same characters as an XML name. Furthermore, like an XML name, an

XML name token may not contain whitespace. However, a name token

differs from an XML name in that any of the allowed characters can be the

first character in a name token, while only letters, ideographs, and the

underscore can be the first character of an XML name. Thus 12 and .cshrc

are valid XML name tokens although they are not valid XML names. Every

XML name is an XML name token, but not all XML name tokens are XML

names.

Example:

<!ATTLIST journal year NMTOKEN #REQUIRED>

This still doesn't prevent the document author from assigning the year

attribute values like "99" or "March", but it at least eliminates some possible

wrong values, especially those that contain whitespace such as "1990 C.E."

or "Sally had a little lamb."



A NMTOKENS type attribute contains one or more XML name tokens

separated by whitespace. For example, you might use this to describe the

dates attribute of a performances element, if the dates were given in the

form 08-26-2000, like this:

<performances dates="08-21-2001 08-23-2001 08-27-2001">

Kat and the Kings

</performances>

The appropriate declaration is:

<!ATTLIST performances dates NMTOKENS #REQUIRED>

On the other hand, you could not use this for a list of dates in the form

08/27/2001 because the forward slash is not a legal name character.

The NOTATION Type

A NOTATION type attribute contains the name of a notation declared in the

document's DTD. This is perhaps the rarest attribute type and isn't much

used in practice. In theory, it could be used to associate types with particular

elements, as well as limiting the types associated with the element. For

example, these declarations define four notations for different image types

and then specify that each image element must have a type attribute that

selects exactly one of them:

<!NOTATION gif SYSTEM "image/gif">

<!NOTATION tiff SYSTEM "image/tiff">

<!NOTATION jpeg SYSTEM "image/jpeg">

<!NOTATION png SYSTEM "image/png">

<!ATTLIST image type NOTATION (gif | tiff | jpeg | png) #REQUIRED>

Enumeration

An enumeration is the only attribute type that is not an XML keyword.

Rather, it is a list of all possible values for the attribute, separated by vertical

bars. Each possible value must be an XML name token.

Example:

<!ATTLIST date month (January | February | March | April | May | June

| July | August | September October|November | December) #REQUIRED >

Conditional Sections

Conditional sections are portions of the Document Type Declaration or

of external parameter entities which are included in, or excluded from, the



logical structure of the DTD based on the keyword which governs them. The

syntax of the conditional section is:

Conditional Section – IncludeSection | IgnoreSection

Validation Tools

There are two main types of validation tools available:

Web based tools

Standalone tools

Web-based tools are web pages that allow you to enter the path (URI) of an

XML document to have it validated. The upside to web-based tools is that

they can be used without installing special software. Just open the web

page in a web browser and go for it! The downside to web-based validation

tools is that they sometimes don't work well when you aren't dealing with

files that are publicly available on the Internet.

Standalone validation tools are tools that you must install on your computer

in order to use. These kinds of tools range from full-blown XML editors such

as XML Spy to command-line XML validators such as the W3C's XSV

validator. Standalone validation tools have the benefit of allowing you to

validate local files with ease. The drawback to these tools is that some of

them aren't cheap, and they must be installed on your computer. However, if

you don't mind spending a little money, a standalone tool can come in

extremely handy.


4. ________type attribute contains the name of a notation declared in the

document's DTD.

5. _______ is the only attribute type that is not an XML keyword.

4.5 XML Namespaces

This section deals with, need for Namespaces and specifying namespaces.

4.5.1 The Need for Namespaces

XML Namespaces provide a method to avoid element name conflicts. In

XML, element names are defined by the developer. This often results in a

conflict when trying to mix XML documents from different XML applications.



This XML carries HTML table information:

<table>

<tr>

<td>Apples</td>

<td>Bananas</td>

</tr>

</table>

This XML carries information about a table (a piece of furniture):

<table>

<name>African Coffee Table</name>

<width>80</width>

<length>120</length>

</table>

If these XML fragments were added together, there would be a name

conflict. Both contain a <table> element, but the elements have different

content and meaning.

4.5.2 Specifying a Namespace

An XML namespace is a collection of element names used in XML

documents. The name of a namespace usually has the form of a Uniform

Resource Identifier (URI). A namespace for the elements of the hierarchy

rooted at a particular element is declared as the value of the attribute

xmlns. The form of a namespace declaration for an element is shown here:

<element_name xmlns [: prefix] = URI>

The square brackets indicate that what is within them is optional. The prefix,

if included, is the name that must be attached to the names in the declared

namespace.

Example:

<birds xmlns: bd = “http://www.audubon.org/names/species”>

Within the birds element, including all of its children elements, the names

from the namespace must be prefixed with bd, as in the following:

<bd: lark>

One namespace declaration in an element can be used to declare a default

namespace. This is done by simply leaving out the prefix in the declaration.

The names from the default namespace can be used without a prefix.



Consider the following example in which two namespaces are declared. The

first is declared to be the default namespace; the second defines the prefix,

cap.

<states>

xmlns = "http://www.states-info.org/states"

xmlns:cap = "http://www.states-info.org/state-capitals"

<state>

<name> South Dakota </name>

<population> 754844</population>

<capital>

<cap:name> Pierre </cap:name>

<cap:population>12429 </cap:population>

</capital>

</state>

</states>

Each state element has name and population elements from both

namespaces.

4.5.3 URLs, URIs and URNs

URI (Uniform Resource Identifier):

The resource is the conceptual mapping to an entity or set of entities, not

necessarily the entity which corresponds to that mapping at any particular

instance in time. Thus, a resource can remain constant even when its

content – the entities, to which it currently corresponds – changes over time,

provided that the conceptual mapping is not changed in the process. An

identifier is an object that can act as a reference to something that has

identity. In the case of URI, the object is a sequence of characters with a

restricted syntax.

URL (Uniform Resource Locator):

It refers to the subset of URI that identify resources via a presentation of

their primary access mechanism (e.g., their network "location"), rather than

identifying the resource by name or by some other attribute(s) of that

resource.

URN (Uniform Resource Name):

It refers to the subset of URI that are required to remain globally unique and

persistent even when the resource ceases to exist or becomes unavailable.



A URN differs from a URL in that it's primary purpose is persistent labeling

of a resource with an identifier. That identifier is drawn from one of a set of

defined namespaces, each of which has its own set name structure and

assignment procedures.

4.5.4 Qualifying Names

In XML documents, some names may be given as qualified names,

defined as follows:

QName – (prefix : ) ? LocalPart

The Prefix provides the namespace prefix part of the qualified name, and

must be associated with a namespace URI in a namespace declaration. The

LocalPart provides the local name part of the qualified name.

4.5.5 Namespace Scoping

The scope of a namespace declaration declaring a prefix extends from the

beginning of the start-tag in which it appears to the end of the corresponding

end-tag, excluding the scope of any inner declarations with the same

NameSpaceAttributeName part. In the case of an empty tag, the scope is

the tag itself.

4.5.6 The HTML Namespace

Namespaces have the potential to allow new names to be introduced

without breaking validity, although this potential has not yet materialized in

the HTML world. HTML has an area of naming, separate from element and

attribute names, for which extensibility has not been completely addressed.

These are used as values of the "rel" and "rev" attributes on "a" and "link"

links and they are strings drawn from a set determined by the head's profile

attribute. The linktype and profile mechanisms are rarely used, and there is

some perception that they have never been sufficiently defined. They are,

however, well enough defined to tie them the RDF's linking mechanism,

which is an area of considerable semantic precision and extensibility. This

profile can be used to perform exactly this connection.

4.5.7 Additional Significant Namespaces

Namespaces originally designed to provide names for XML elements and

attributes have been adopted much more broadly by the web community.

They are now used not simply for elements and attributes but for function



names, tokens, and identifiers for ever more purposes. The names in a

namespace form a collection:

sometimes it is a collection of element names (DocBook and XHTML, for

example)

sometimes it is a collection of attribute names (XLink, for example)

sometimes it is a collection of functions (XQuery 1.0 and XPath 2.0 Data

Model)

sometimes it is a collection of properties (FOAF)

Sometimes it is a collection of concepts (WordNet), and many other

uses are likely to arise.

4.5.8 Validating Uniqueness

Namespaces are used to uniquely identify elements with the same name

type when they are combined in a single document. The W3C

recommendations envision applications of XML where a single XML

document may contain elements and attributes that are defined for and used

by multiple software modules. Documents combining multiple markup

vocabularies pose processor validation problems of recognition and name

collision (markups using the same element type or attribute name). The

"name collisions" are a problem when validating documents. This problem

is overcome when document constructs have universally unique names,

beyond the scope of the containing document. The XML namespaces

specification describes a mechanism to accomplish this.

4.5.9 Validating Required Fields

Some of the URLs that you will use with field substitution have restrictions

on the type of data that they will accept, so GrazrScript has a rule tag that

lets you test the contents of form fields before they are substituted within a

template. The validation rules created by this tag are tested after the form is

submitted. If the user's data conforms to the validation rules, then the

normal substitution is performed. If any of the entered data fails to pass the

rules, then an error message is displayed and the form template is not run.

The key is that this is an all or nothing process. All validation rules for all

fields must be met before substitution is done.

Basic syntax

The simplest version of the rule tag takes the following form:

<grazr:rule field="[field_name]" [rule]="[value]" />



4.5.10 Combining and Redefining Schemas

User demand for rich Web application content is continually increasing for

both desktop and mobile device platforms. Open, standards-based

functional XML schemas enabling rich content help ensure that such content

– and the skills required to produce it – remains ubiquitous, accessible, and

cost effective. Schemas also help ensure that this technology does not

become a proprietary format for a single or small number of vendors

constrained to specific programming frameworks or to specific renderer and

browser technologies.

XML-based, declarative functional schemas like XHTML, XForms, XML

Events, Scalable Vector Graphics (SVG), SMIL, VoiceXML, and XHTML

Mobile Profile are examples of schemas that provide specific functionality

for creating rich content.

Each functional schema pertains to a specific area of functionality. For

example, SVG addresses graphics; XForms addresses form input collection

and submission; XML Events addresses the creation of events and

listeners; and so on. However, most rich Web applications require a

combination of two or more of these functional schemas within a single

document.

Combining schemas can be problematic because not all schemas can be

embedded within other schemas. And not all schemas allow other schemas

to be embedded within them. In fact, most functional schemas assume that

they are the root schema in a single document with only one functional

namespace and that if the need arises for rich content from another

functional namespace, a separate document can be referenced with its own

root schema. For example, an XHTML document can reference an SVG

graphic in a separate document at runtime to render the graphic.


6. _______is a collection of element names used in XML documents.

7. _________ are used to uniquely identify elements with the same name

type when they are combined in a single document.



4.6 Summary

XML is a markup language much like HTML.

XML was designed to carry data, not to display data. XML is used in

many aspects of web development, often to simplify data storage and

sharing.

The purpose of a DTD is to define a standard form for a collection of

XML documents. This form is specified as the tag and attributes sets, as

well as rules that define how they can appear in a document.

XML Namespaces provide a method to avoid element name conflicts.


1. List out the various XML tools.

2. With an example explain the need for CDATA section in XML.

3. Explain the usage of internal and external DTDs in XML.

4. How do you specify an element in DTD?

5. With an example, explain how do you specify a namespace in XML.

4.8 Answers


1. XML

2. Document Type Declaration

3. CDATA Sections

4. NOTATION

5. Enumeration

6. XML namespace

7. Namespaces

Terminal Questions

1. XML applications are software programs that process and manipulate

data using XML technologies including XML, XSLT, XQuery. (Refer

section 4.2.5)



2. CDATA Sections are used to escape blocks of text containing

characters which would otherwise be recognized as markup. (Refer

section 4.3)

3. A DTD can be embedded in the XML document whose syntax rules it

describes, in which case it is called an internal DTD. (Refer section 4.4)

4. The element declarations of a DTD have a form that is related to that of

the rules of context-free grammars. (Refer section 4.4)

5. XML Namespaces provide a method to avoid element name conflicts.

(Refer section 4.5.1)



Unit 5 XML Programming – II

Structure

5.1 Introduction

Objectives

5.2 Validating XML Documents with Schemas

5.3 Introduction to Simple Object Access Protocol (SOAP)

SOAP's Use of XML and Schemas

Elements of a SOAP Message

Sending and Receiving SOAP Messages (SOAP Clients and

Receivers)

Handling SOAP Faults

Current SOAP Implementations

5.4 Introduction to Web Services

Architecture and Advantages of Web Services

Purpose of Web Services Description Language (WSDL)

WSDL Elements

Creating and Examining WSDL Files

Overview of Universal Description, Discovery, and Integration (UDDI)

UDDI Registries (Public and Private)

Core UDDI Elements

Deploying and Consuming Web Services

ebXML Specifications ebXML Registry and Repository

5.5 Introduction to the XML Document Object Model (XMLDOM)

5.6 Summary


5.8 Answers

5.1 Introduction

In the previous unit, we have studied the XML concepts, document syntax,

DTDs, NOTATION and namespaces.

A schema is similar to a class definition. In this chapter you are going to

study how schema is used in XML. You are also going to study about an

overview of SOAP (Simple Object Access Protocol) and an introduction to



the web services. Overview of the XML Document Object Model will be

discussed in this chapter.

Objectives:


write XML Schema

describe the features of SOAP

give overview of Web Services

discuss the purpose of XML DOM

5.2 Validating XML Documents with Schemas

You' are now ready to take a deeper look at the process of XML Schema

validation. This section shows you the steps you take to validate an XML

document using an XML Schema definition.

Schema Design Goals (Limitations of DTDs)

DTDs have several disadvantages.

DTDs are written in a syntax unrelated to XML, so they cannot be

analyzed with an XML processor. Also, it can be confusing to deal with

two different syntactic forms, one to define a document and one to

define its structure.

DTDs do not allow restrictions on the form of data that can be the

content of a particular tag

With DTDs, there are only 10 data types, none of which is numeric

Several alternatives to DTDs have been developed, all attempts to

overcome their weaknesses. XML schema, which was designed by W3C, is

one of these alternatives.

Mixing DTDs and Schemas

To promote the transition from DTDs to XML schemas, XML schema was

designed to allow any DTD to be automatically converted to an equivalent

XML schema. A schema specifies the data type of every element and

attribute of its instance XML documents. This is the area in which schemas

far outshine DTDs. A schema defines a namespace in the same sense as a

DTD defines a tag set.

Schema Composition

Schemas themselves are written using a collection of names, or a

vocabulary, from a namespace that us, in effect, a schema of schemas. The

name of this namespace is http://www.w3.org/2001/XMLSchema some of

the names in this namespace are element, schema, sequence and string.

Every schema has schema as its root element. The schema element

specifies the namespace for the schema of schemas from which the

schema’s elements and attributes will be drawn. It often also specifies a

prefix that will be used for the names in the schema. This namespace

specification appears as follows:

xmlns:xsd = http://www.w3.org/2001/XMLSchems

This provides the prefix xsd for the names from the namespace for the

schema of schemas. The name of the namespace defined by a schema

must be specified with the targetNamespace attribute of the schema

element. Every top-level element that appears in a schema places its name

in the target namespace. The target namespace is specified by assigning a

namespace to the target namespace attribute, as in the following:

targetNamespace = “http://cs.uccs.edu/planeSchema”

If we want the elements and attributes that are not defined directly in the

schema element to be included in the target namespace, schema’s

elementFormDefault must be set to qualified, as in the following:

elementFormDefault = “qualified”

The default namespace, which is the source of the unprefixed names in the

schema, is given with another xmlns specification, but this time without the

prefix. For example:

xmlns =” http://cs.uccs.edu/planeSchema”

An example of a complete opening tag for a schema is as follows:

<xsd:schema



xmlns:xsd = “http://www.w3.org/2001/XMLSchema”



targetNamespace = “http://cs.uccs.edu/planeSchema”



http://www.w3.org/2001/XMLSchema

http://www.w3.org/2001/XMLSchems

http://cs.uccs.edu/planeSchema

xmlns =” http://cs.uccs.edu/planeSchema”



elementFormDefault = “qualified”

>

In this example, the target namespace and the default namespace are the

same.

Linking Schemas to XML documents

First, an instance document normally defines its default namespace to be

that defined in its schema. For example, if the root element is planes, you

could have the following:

<planes

xmlns = http://cs.uccs.edu/planeSchema

… >

The schemaLocation attribute is used to name the standard namespace

for instances, which is XMLSchema-instance. This namespace corresponds

to the XMLSchema namespace used for schemas. The following attribute

assignment specifies the XMLSchema-instance namespace and defines

the prefix, xsi, for it:

xmlns:xsi = http://www.w3.org/2001/XMLSchema-instance

Then the instance document must specify the filename of the schema where

the default namespace is defined. This is accomplished with the

schemaLocation attribute, which takes two values: the namespace of the

schema and the filename of the schema. This attribute is defined in the

XMLSchema-instance namespace, so it must be named with the proper

prefix.

For example:

xsi:schemaLocation = http://cs.uccs.edu/planeSchema planes.xsd

This is peculiar attribute assignment in that it assigns two values, which are

separated only by white space.

Altogether, the opening root tag of an XML instance of the planes.xsd

schema, where the root element name in the instance is planes, could

appear as follows:

<planes



http://www.w3.org/2001/XMLSchema-instance

http://cs.uccs.edu/planeSchema%20planes.xsd



xmlns = http://cs.uccs.edu/planeSchema

xmlns:xsi = http://www.w3.org/2001/XMLSchema-instance

xsi:schemaLocation = “http://cs.uccs.edu/planeSchema planes.xsd”

>

Annotation Declarations

Annotation of schemas and schema components, with material for human or

computer consumption, is provided for by allowing application information

and human information at the beginning of most major schema elements,

and anywhere at the top level of schemas. The XML representation for an

annotation schema component is an <annotation> element information item.

The correspondences between the properties of that information item and

properties of the component it corresponds to are as follows:

<annotation

id = ID

{any attributes with non-schema namespace . . .}>

Content: (appinfo | documentation)*

</annotation>

Application Information – A sequence of the <appinfo> element information

items from among the [children], in order, if any, otherwise the empty

sequence.

Element Declarations

Elements are defined in an XML schema with the element tag, which is

from the XMLSchema namespace. The prefix xsd is normally used for

names from this namespace.

Example:

<xsd:element name = “engine” type = “xsd:string” />

Here the element name is “engine” and its type is string.

An instance of the schema in which the engine element is defined could

have the following element:

<engine> inline six cylinder fuel injected </engine>

Attribute Declarations

An element that is named includes the name attribute for that purpose. The

other attribute that is necessary in a simple element declaration is type,


http://www.w3.org/2001/XMLSchema-instance

http://cs.uccs.edu/planeSchema%20planes.xsd

http://www.w3.org/TR/xmlschema-1/#element-annotation

http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#ID

http://www.w3.org/TR/xmlschema-1/#element-appinfo

http://www.w3.org/TR/xmlschema-1/#element-documentation

http://www.w3.org/TR/xmlschema-1/#element-appinfo

http://www.w3.org/TR/xml-infoset/#infoitem.element



which is used to specify the type of content allowed in the element. For

example:

<xsd:element name = “engine” type = “xsd:string” />

An element can be given a default value using the default attribute. For

example:

<xsd:element name = ”engine” type = “xsd:string” default = “fuel injected V-6” />

Elements can have constant values, meaning that the content of the defined

element in every instance document has the same value. Constant values

are given with the fixed attribute, as in the following example:

<xsd:element name = “plane” type = “xsd:string” fixed = “single wing” />

W3C Schema Data Types

XML schema defines 44 data types, 19 of which are primitive and 25 of

which are derived. The primitive data types include string, Boolean, float,

time and anyURI. The predefined derived types include byte, long, decimal,

unsignedInt, positiveInteger and NMTOKEN. User defined data types are

defined by specifying restrictions on an existing type, which is then called a

base type. Such user-defined types are derived types.

Constraints in derived types are given in terms of the facets of the base

type. For example, the integer primitive data type has eight possible facets:

totalDigits, maxInclusive, maxExclusive, minInclusive, minExclusive, pattern,

enumeration and whitespace.

Data declarations in an XML schema can be either local or global. A local

declaration is one that appears inside an element that is a child of the

schema element; that is, a declaration in a grandchild element of schema is

a local declaration. A locally declared element is visible only in that element.

A global declaration is one that appears as a child of the schema element.

Global elements are visible in the whole schema in which they are declared.

Specifying Simple Types

A simple data type is one whose content is restricted to strings. A simple

type cannot have attributes or include nested elements. The string

restriction seems like it would make simple types a very narrow type

category, but in fact it does not because a large collection of predefined data

types are included in the category. The primitive data types include string,



Boolean, float, time and anyURI. The predefined derived types include byte,

long, decimal, unsignedInt, positiveInteger and NMTOKEN.

Example: <xsd:element name = “engine” type = “xsd:string” />

Regular Expressions

Regular expressions form a language for specifying sets of characters and

strings of characters. They are used in the context of pattern matching: a

regular expression forms a pattern against which strings are matched. The

pattern facet allows you to specify a regular expression. Most individual

characters match themselves. The pattern \d matches any digit. The pattern

“\d{3}” matches any sequence of 3 digits. The pattern 315-\d{3}-\d{4}

matches any telephone number in the 315 area code of the United States, in

the 315-123-4567 format.

Example: To define a simple type of US telephone numbers in the 315-123-

4567 format, use the pattern facet.

<simpleType name = “USPhoneType”>

<restriction base = “string”>

<pattern value = “ \d{3}-\d{3}-\d{4}” />

</restriction>

</simpleType>

Working with User Defined Data Types

User defined data types are defined by specifying restrictions on an existing

type. A simple user-defined data type is described in a simpleType element,

using facets. Facets must be specified in the content of a restriction

element, which gives the base type name. The facets themselves are given

in elements named for the facets, using the value attribute to specify the

value of the facet. For example, the following declares a user-defined type,

firstName, for strings of fewer than 11 characters:

<xsd:simpleType name = “firstName”>

<xsd:restriction base = “xsd:string”>

<xsd:maxLength value = “10” />

</xsd:restriction>

</xsd:simpleType>



The length facet is used to restrict the string to an exact number of

characters. The minLength facet is used to specify a minimum length. The

number of digits of a decimal number is restricted with the precision facet.

For example:

<xsd:simpleType name = “phoneNumber”>

<xsd:restriction base = “xsd:decimal”>

<xsd:precision value = “7” />

</xsd:restriction>

</xsd:simpleType>

Union and List Types

List datatypes are special cases in which a structure is defined within the

content of a single attribute or element. The xs:list element is used to

define list of items. The definition of a list datatype can be done by

embedding an xs:simpleType element:

<xs:simpleType name="myIntegerList">

<xs:list>

<xs:simpleType>

<xs:restriction base="xs:integer">

<xs:maxInclusive value="100"/>

</xs:restriction>

</xs:simpleType>

</xs:list>

</xs:simpleType>

This datatype can be used to define attributes or elements that accept a

whitespace-separated list of integers smaller than or equal to 100.

List datatypes have their own value space that can be constrained using a

set of specific facets that is common to all of them. These facets are

xs:length, xs:maxLength, xs:minLength, xs:enumeration and xs:whiteSpace.

The unit used to measure the length of a list type is always the number of

elements in the list.

Derivation by union allows defining datatypes by merging the lexical spaces

of several predefined or user datatypes. The xs:union element is used for

defining the union of different types. The definition of a union datatype can

be done by embedding an xs:simpleType element:



<xs:simpleType name="myIntegerUnion">

<xs:union>

<xs:simpleType>

<xs:restriction base="xs:integer"/>

</xs:simpleType>

<xs:simpleType>

<xs:restriction base="xs:NMTOKEN">

<xs:enumeration value="undefined"/>

</xs:restriction>

</xs:simpleType>

</xs:union>

</xs:simpleType>

Now the myIntegerUnion has the merged type of meaning given in the

example.

Specifying Complex Types

Complex types are defined with the complexType tag. The elements that are

the content of an element-only element must be contained in an ordered

group, an unordered group, a choice, or a named group. The sequence

element is used to contain an ordered group of elements. For example,

consider the following type definition:

<xsd:complexType name = “sports_car”>

<xsd:sequence>

<xsd:element name = “make” type =“xsd:string” />

<xsd:element name = “model” type =“xsd:string” />

<xsd:element name = “engine” type =“xsd:string” />

<xsd:element name = “year” type =“xsd:decimal” />

</xsd:sequence>

</xsd:complexType>

The type sport_car is the complex data type element.

A complex type whose elements are an unordered group is defined in an all

element.

Elements and all and sequence groups can include attributes to specify the

numbers of occurrences. These attributes are minOccurs and maxOccurs.

The possible values of minOccurs are the non-negative integers, including



zero. The possible values for maxOccurs are the non-negative integers plus

the value unbounded. Consider the following example:

<xsd:element name ="planes"

<xsd:complexType>

<xsd:all>

<xsd:element name = "make"

type = "xsd:string"

minOccurs = "1"

maxOccurs = "unbounded"

/>

</xsd:all>

</xsd:complexType>

</xsd:element>

Notice that we use the all element to contain the single element of the

complex type, planes. We could have used sequence instead. Because

there is only one contained element, it makes no difference.

Deriving Complex Types Using Inheritance

If we want the year element in the sport_car element that was defined

earlier to be a derived type, we could define the derived type as another

global element and refer to it in the sports_car element. For example, the

year element could be defined as follows:

<xsd:element name = “year”

<xsd:simpleType>

<xsd:restriction base =“xsd:decimal”>

<xsd:minInclusive value=“1900”/>

<xsd:maxInclusive value=“2002”/>

</xsd:restriction>

</xsd:simpleType>

</xsd:element>

With the year element defined globally, the sports_car element can be

defined with a reference to the year with the ref attribute as follows:

<xsd:complexType name = “sports_car”>

<xsd:sequence>

<xsd:element name = “make” type =“xsd:string” />



<xsd:element name = “model” type =“xsd:string” />

<xsd:element name = “engine” type =“xsd:string” />

<xsd:element ref = “year” />

</xsd:sequence>

</xsd:complexType>

Reusable Groups

Elements and Attributes can be grouped together using <xs:group> and

<xs:attributeGroup>. These groups can then be referred to elsewhere within

the schema. Groups must have a unique name and be defined as children

of the <xs:schema> element. When a group is referred to, it is as if its

contents have been copied into the location it is referenced from.

<xs:group name="CustomerDataGroup">

<xs:sequence>

<xs:element name="Forename" type="xs:string" />

<xs:element name="Surname" type="xs:string" />

<xs:element name="Dob" type="xs:date" />

</xs:sequence>

</xs:group>

<xs:attributeGroup name="DobPropertiesGroup">

<xs:attribute name="Day" type="xs:string" />

<xs:attribute name="Month" type="xs:string" />

<xs:attribute name="Year" type="xs:integer" />

</xs:attributeGroup>

Substitution Groups

In this case, we have a simple type on one hand and a complex type with

complex content on the other, and we cannot find a type that can be

extended to both. We have no other choice but to start with the universal

type, which accepts any content model. Known as xs:anyType, this very

special type is also the default value when no type is specified, and we can

define a generic name element without giving any type definition to keep it

as open as possible:

<xs:element name="name"/>

This element will be what is known as the head of the substitution group.

Without declaring anything on this head element, other elements can

declare that they can be used wherever the head element is referenced in



the schema. These elements are known as the members of the substitution

group. The one restriction on the members is their types must be valid

derivations of the type of the head element. This declaration is made

through a substitutionGroup attribute that references the head element in

each interchangeable element – for instance:

<xs:element name="simple-name" type="string32"

substitutionGroup="name"/>

<xs:element name="full-name" substitutionGroup="name">

<xs:complexType>

<xs:all>

<xs:element name="first" type="string32" minOccurs="0"/>

<xs:element name="middle" type="string32" minOccurs="0"/>

<xs:element name="last" type="string32"/>

</xs:all>

</xs:complexType>

</xs:element>

The effect of these declarations is these two elements can be used every

time the head is used in the schema, such as in the definition of the

character and author elements:

<xs:element name="character">

<xs:complexType>

<xs:sequence>

<xs:element ref="name"/>

<xs:element ref="born"/>

<xs:element ref="qualification"/>

</xs:sequence>

</xs:complexType>

</xs:element>

Identity Elements

XML Schemas provide a feature that is similar to the DTD ID identity

constraint. In a DTD, the value of an ID attribute must be unique within an

XML document. In XML Schemas, the type of an identity constraint can be

unique, key, or keyref.



A unique identity constraint forces the result of evaluation of an XPath

expression to be unique. Stylus Studio evaluates the XPath expression

against the element for which you define the identity constraint. If the

element is present, the result must be unique among the children of that

element.

A key identity constraint specifies that the fields that form the expression

must be present in all instance documents. For example, if a key is

based on date and number attributes, the date and number attributes

must always be specified.

A keyref identity constraint is equivalent to the IDREF attribute in DTDs.

It specifies that the contents of a field in the instance document are the

value of a key that is defined in another document. For example, a

Quote document would have a reference to the RFQ that originated it.


1. DTDs are written cannot be analyzed with ______.

2. The name of the namespace defined by a schema must be specified

with the _________ attribute of the schema element.

3. Data definitions in an XML schema can be either ____ or ______.

4. Complex types are defined with _____ tag.

5. In XML schemas, the type of an identity constraint can be ___, ____, or

____.

5.3 Introduction to Simple Object Access Protocol (SOAP)

SOAP is an XML-based protocol for exchanging information between

computers. Although SOAP can be used in a variety of messaging systems

and can be delivered via a variety of transport protocols, the initial focus of

SOAP is remote procedure calls transported via HTTP. SOAP therefore

enables client applications to easily connect to remote services and invoke

remote methods. For example, a client application can immediately add

language translation to its feature set by locating the correct SOAP service

and invoking the correct method.

Other frameworks, including CORBA, DCOM, and Java RMI, provide similar

functionality to SOAP, but SOAP messages are written entirely in XML and

are therefore uniquely platform- and language-independent. For example, a



SOAP Java client running on Linux or a Perl client running on Solaris can

connect to a Microsoft SOAP server running on Windows 2000.

SOAP therefore represents a cornerstone of the web service architecture,

enabling diverse applications to easily exchange services and data.

5.3.1 SOAP’s Use of XML and Schemas

When exploring the SOAP encoding rules, it is important to note that the

XML 1.0 specification does not include rules for encoding data types. The

original SOAP specification therefore had to define its own data encoding

rules. Subsequent to early drafts of the SOAP specification, the W3C

released the XML Schema specification. The XML Schema Data types

specification provides a standard framework for encoding data types within

XML documents. The SOAP specification therefore adopted the XML

Schema conventions. However, even though the latest SOAP specification

adopts all the built-in types defined by XML Schema, it still maintains its own

convention for defining constructs not standardized by XML Schema, such

as arrays and references.

5.3.2 Elements of a SOAP Message

A one-way message, a request from a client, or a response from a server is

officially referred to as a SOAP message. Every SOAP message has a

mandatory Envelope element, an optional Header element, and a

mandatory Body element.

Figure 5.1: Main elements of the XML SOAP message



Envelope

Every SOAP message has a root Envelope element. In contrast to other

specifications, such as HTTP and XML, SOAP does not define a traditional

versioning model based on major and minor release numbers (e.g., HTTP

1.0 versus HTTP 1.1). Rather, SOAP uses XML namespaces to differentiate

versions. The version must be referenced within the Envelope element. For

example:

<SOAP-ENV:Envelope

xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ >

Header

The optional Header element offers a flexible framework for specifying

additional application-level requirements. Many current SOAP services do

not utilize the Header element, but as SOAP services mature, the Header

framework provides an open mechanism for authentication, transaction

management, and payment authorization.

The protocol does, however, specify two header attributes:

Actor attribute

The SOAP protocol defines a message path as a list of SOAP service

nodes. Each of these intermediate nodes can perform some processing and

then forward the message to the next node in the chain. By setting the Actor

attribute, the client can specify the recipient of the SOAP header.

MustUnderstand attribute

Indicates whether a Header element is optional or mandatory. If set to true,

the recipient must understand and process the Header attribute according to

its defined semantics, or return a fault.

Body

The Body element is mandatory for all SOAP messages. Typical uses of the

Body element include RPC requests and responses.

Fault

In the event of an error, the Body element will include a Fault element.

5.3.3 Sending and Receiving SOAP Messages (SOAP Clients and

Receivers)

SOAP can be used in a variety of messaging systems, including one-way

and two way messaging. For two-way messaging, SOAP defines a simple

http://schemas.xmlsoap.org/soap/envelope/



convention for representing remote procedure calls and responses. This

enables a client application to specify a remote method name, include any

number of parameters, and receive a response from the server.

To examine the specifics of the SOAP protocol, we begin by presenting a

sample SOAP conversation. XMethods.net provides a simple weather

service, listing current temperature by zip code. The service method,

getTemp requires a zip code string and returns a single float value.

The SOAP Request

The client request must include the name of the method to invoke and any

required parameters. Here is a sample client request sent to XMethods:

<?xml version='1.0' encoding='UTF-8'?>

<SOAP-ENV:Envelope

xmlns:SOAP-

ENV="http://schemas.xmlsoap.org/soap/envelope/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<SOAP-ENV:Body>

<ns1:getTemp

xmlns:ns1="urn:xmethods-Temperature"

SOAP-

ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

<zipcode xsi:type="xsd:string">10016</zipcode>

</ns1:getTemp>

</SOAP-ENV:Body>

</SOAP-ENV:Envelope>

First, the request includes a single mandatory Envelope element, which in

turn includes a mandatory Body element. Second, a total of four XML

namespaces are defined. The Body element encapsulates the main

"payload" of the SOAP message. The only element is getTemp, which is

tied to the XMethods namespace and corresponds to the remote method

name. Each parameter to the method appears as a sub element. In our

case, we have a single zip code element, which is assigned to the XML

Schema xsd:string data type and set to 10016.



The SOAP Response

Here is the SOAP response from XMethods:


<SOAP-ENV:Envelope

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"



<SOAP-ENV:Body>

<ns1:getTempResponse

xmlns:ns1="urn:xmethods-Temperature"

SOAP-

ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

<return xsi:type="xsd:float">71.0</return>

</ns1:getTempResponse>

</SOAP-ENV:Body>


Just like the request, the response includes Envelope and Body elements,

and the same four XML namespaces. This time, however, the Body element

includes a single getTempResponse element, corresponding to our initial

request. The response element includes a single return element, indicating

an xsd:float data type. As of this writing, the temperature for zip code 10016

is 71 degrees Fahrenheit.

5.3.4 Handling SOAP Faults

In the event of an error, the Body element will include a Fault element. The

fault sub elements include the faultCode, faultString, faultActor, and detail

elements.

faultCode – A text code used to indicate a class of errors.

faultString – A human-readable explanation of the error.

faultActor – A text string indicating who caused the fault. This is useful if

the SOAP message travels through several nodes in the SOAP message

path, and the client needs to know which node caused the error.

detail – An element used to carry application-specific error messages.

The following code is a sample Fault. The client has requested a method

named ValidateCreditCard, but the service does not support such a method.



This represents a client request error, and the server returns the following

SOAP response:


<SOAP-ENV:Envelope

xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"



<SOAP-ENV:Body>

<SOAP-ENV:Fault>

<faultcode xsi:type="xsd:string">SOAP-

ENV:Client</faultcode>

<faultstring xsi:type="xsd:string">

Failed to locate method (ValidateCreditCard) in class

(examplesCreditCard) at /usr/local/ActivePerl-5.6/lib/

site_perl/5.6.0/SOAP/Lite.pm line 1555.

</faultstring>

</SOAP-ENV:Fault>

</SOAP-ENV:Body>


5.3.5 Current SOAP Implementations

Dozens of SOAP implementations now freely exist on the Internet. Here are

four of the most popular and widely cited implementations.

Apache SOAP (http://xml.apache.org/soap/)

Open source Java implementation of the SOAP protocol; based on the IBM

SOAP4J implementation.

Microsoft SOAP ToolKit 2.0

(http://msdn.microsoft.com/soap/default.asp)

COM implementation of the SOAP protocol for C#, C++, Visual Basic, or

other COM-compliant languages.

SOAP::Lite for Perl (http://www.soaplite.com/)

Perl implementation of the SOAP protocol, written by Paul Kulchenko, that

includes support for WSDL and UDDI.

GLUE from the Mind Electric (http://www.themindelectric.com)



Java implementation of the SOAP protocol that includes support for WSDL

and UDDI


6. SOAP is an XML-based protocol for _________.

7. Elements of SOAP are ____, ____, and _____.

8. The Body element encapsulates ______ of the SOAP message.

5.4 Introduction to Web Services

A web service is any service that is available over the Internet, uses a

standardized XML messaging system, and is not tied to any one operating

system or programming language. With web services, we move from a

human-centric Web to an application-centric Web. It means that

conversations can take place directly between applications as easily as

between web browsers and servers.

There are numerous areas where an application-centric Web could prove

extremely helpful. Examples include credit card verification, package

tracking, portfolio tracking, shopping bots, currency conversion, and

language translation. Other options include centralized repositories for

personal information, such as Microsoft's proposed .NET MyServices

project. .NET MyServices aims to centralize calendar, email, and credit card

information and to provide web services for sharing that data.

5.4.1 Architecture and Advantages of Web Services

There are two ways to view the web service architecture. The first is to

examine the individual roles of each web service actor; the second is to

examine the emerging web service protocol stack.

Web Service Roles

Figure 5.2: Web Service Roles



There are three major roles within the web service architecture:

Service provider

This is the provider of the web service. The service provider implements the

service and makes it available on the Internet.

Service requestor

This is any consumer of the web service. The requestor utilizes an existing

web service by opening a network connection and sending an XML request.

Service registry

This is a logically centralized directory of services. The registry provides a

central place where developers can publish new services or find existing

ones. It therefore serves as a centralized clearinghouse for companies and

their services.

Web Service Protocol Stack

Figure 5.3: Web service protocol stack

A second option for viewing the web service architecture is to examine the

emerging web service protocol stack. The stack is still evolving, but currently

has four main layers.

Following is a brief description of each layer.

Service transport

This layer is responsible for transporting messages between applications.

Currently, this layer includes hypertext transfer protocol (HTTP), Simple Mail

Transfer Protocol (SMTP), file transfer protocol (FTP), and newer protocols,

such as Blocks Extensible Exchange Protocol (BEEP).



XML messaging

This layer is responsible for encoding messages in a common XML format

so that messages can be understood at either end. Currently, this layer

includes XML-RPC and SOAP.

Service description

This layer is responsible for describing the public interface to a specific web

service. Currently, service description is handled via the Web Service

Description Language (WSDL).

Service discovery

This layer is responsible for centralizing services into a common registry,

and providing easy publish/find functionality. Currently, service discovery is

handled via Universal Description, Discovery, and Integration (UDDI).

5.4.2 Purpose of Web Services Description Language (WSDL)

WSDL currently represents the service description layer within the web

service protocol stack. WSDL is an XML grammar for specifying a public

interface for a web service. This public interface can include information on

all publicly available functions, data type information for all XML messages,

binding information about the specific transport protocol to be used, and

address information for locating the specified service.

WSDL is not necessarily tied to a specific XML messaging system, but it

does include built-in extensions for describing SOAP services. Using WSDL,

a client can locate a web service and invoke any of the publicly available

functions. With WSDL-aware tools, this process can be entirely automated,

enabling applications to easily integrate new services with little or no manual

code.

5.4.3 WSDL Elements

WSDL is an XML grammar for describing web services. The specification is

divided into six major elements:

Definitions

The definitions element must be the root element of all WSDL documents.

It defines the name of the web service, declares multiple namespaces used

throughout the remainder of the document, and contains all the service

elements.



Types

The types element describes all the data types used between the client and

server. WSDL is not tied exclusively to a specific typing system, but it uses

the W3C XML Schema specification as its default choice. If the service uses

only XML Schema built-in simple types, such as strings and integers, the

types element is not required.

Message

The message element describes a one-way message, whether it is a single

message request or a single message response. It defines the name of the

message and contains zero or more message part elements, which can

refer to message parameters or message return values.

PortType

The portType element combines multiple message elements to form a

complete one way or round-trip operation. For example, a portType can

combine one request and one response message into a single

request/response operation, most commonly used in SOAP services.

Binding

The binding element describes the concrete specifics of how the service

will be implemented on the wire. WSDL includes built-in extensions for

defining SOAP services, and SOAP-specific information therefore goes

here.

Service

The service element defines the address for invoking the specified service.

Most commonly, this includes a URL for invoking the SOAP service.

In addition to the six major elements, the WSDL specification also defines

the following utility elements:

Documentation

The documentation element is used to provide human-readable

documentation and can be included inside any other WSDL element.

Import

The import element is used to import other WSDL documents or XML

Schemas. This enables more modular WSDL documents. For example, two

WSDL documents can import the same basic elements and yet include their

own service elements to make the same service available at two physical



addresses. Note, however, that not all WSDL tools support the import

functionality as of yet.

5.4.4 Creating and Examining WSDL Files

One of the best aspects of WSDL is that you rarely have to create WSDL

files from scratch. A whole host of tools currently exists for transforming

existing services into WSDL descriptions. You can then choose to use these

WSDL files as is or manually tweak them with your favorite text editor. Given

the WSDL file, you could manually create a SOAP client to invoke the

service. A better alternative is to automatically invoke the service via a

WSDL invocation tool. Many WSDL invocation tools already exist. For

example GLUE platform provides extensive support for SOAP, WSDAL and

UDDI.

5.4.5 Overview of Universal Description, Discovery, and Integration

(UDDI)

UDDI is a technical specification for describing, discovering, and integrating

web services. UDDI is therefore a critical part of the emerging web service

protocol stack, enabling companies to both publish and find web services.

At its core, UDDI consists of two parts. First, UDDI is a technical

specification for building a distributed directory of businesses and web

services. Data is stored within a specific XML format, and the UDDI

specification includes API details for searching existing data and publishing

new data. Second, the UDDI Business Registry is a fully operational

implementation of the UDDI specification.

The data captured within UDDI is divided into three main categories:

White pages

This includes general information about a specific company - for example,

business name, business description, contact information, address and

phone numbers. It can also include unique business identifiers.

Yellow pages

This includes general classification data for either the company or the

service offered. For example, this data may include industry, product, or

geographic codes based on standard taxonomies.



Green pages

This category contains technical information about a web service. Generally,

this includes a pointer to an external specification and an address for

invoking the web service. UDDI is not restricted to describing web services

based on SOAP. Rather, UDDI can be used to describe any service, from a

single web page or email address all the way up to SOAP, CORBA, and

Java RMI services.

5.4.6 UDDI Registries (Public and Private)

UDDI manages the discovery of Web services by relying on a distributed

registry of businesses and their service descriptions implemented in a

common XML format. Before you can publish your business entity and Web

service to a public registry, you must first register your business entity with a

UDDI registry.

UDDI registries come in two forms: public and private. Both types comply to

the same specifications. A private registry enables you to publish and test

your internal e-business applications in a secure, private environment. A

public registry is a collection of peer directories that contain information

about businesses and services. It locates services that are registered at one

of its peer nodes and facilitates the discovery of published Web services.

Data is replicated at each of the registries on a regular basis. This ensures

consistency in service description formats and makes it easy to track

changes as they occur.

5.4.7 Core UDDI Elements

The UDDI technical architecture consists of three parts:

UDDI data model: An XML Schema for describing businesses and web

services.

UDDI API: A SOAP-based API for searching and publishing UDDI data.

UDDI cloud services: Operator sites that provide implementations of the

UDDI specification and synchronize all data on a scheduled basis. UDDI

cloud services are currently provided by Microsoft and IBM. The current

cloud services provide a logically centralized, but physically distributed,

directory. This means that data submitted to one root node will automatically

be replicated across all the other root nodes. Currently, data replication

occurs every 24 hours.



5.4.8 Deploying and Consuming Web Services

Unlike a Web site, you can't just change your Web Service when you feel

like it. If there are others consuming your Web Service, you must make sure

you keep the interfaces to the Web Service the same. That is, you can

change your implementation without anyone knowing, but changing a Web

Service interface such as adding, deleting, or modifying the parameters of a

Web method will break consuming applications. So make sure you have an

upgrade and migration approach in mind before you put your Web Service

out there for all to consume. For example, you might choose to maintain

multiple versions of your Web Services for backward compatibility.

You can consume .NET Web Services on Windows 98 and above. The

consuming machine needs the .NET framework, which can be installed as

part of an application installation.

5.4.9 ebXML Specifications ebXML Registry and Repository

The ebXML (Electronic Business using XML) specifications enable

enterprises of any size and in any geographical location to conduct business

over the Internet. ebXML Specifications may be divided into design and run-

time specifications. ebXML is not a business language standard, it is rather

an infrastructure or middleware standard. ebXML does not forces you to use

any specific business process to exchange these business documents. It

merely provides a specification to define "business collaborations" (ebXML

BPSS). Once you know what you want to do (document, process and

transport), ebXML CPP provides you with a formal way to express your

capabilities. All the message transport options may be chosen from the ones

of the ebXML Messaging Service (ebXML MS) specification. Two partners

may decide to do business if they support the same documents, processes

and transports. This is expressed as a CPA (Collaboration Protocol

Agreement).

Together, the ebXML Registry and Messaging standards provide the

mechanism to discover and retrieve documents, templates, and software

(i.e., objects and resources) and exchange these documents in a secure

and reliable manner. Specifically the ebXML Registry specifications define

interoperable registries and repositories with an interface that enables

submission, query, and retrieval on the contents of the registry. The ebXML



Messaging specification provides a secure and reliable method for

exchanging electronic business.

The registry part of ebXML Registry/Repository provides an interface to

query information of the ebXML Registry/Repository whereas the repository

part of the ebXML Registry/Repository is in charge of storing date.


9. A ______ is a a service available over the Internet.

10. _____ is a logically centralized directory of services.

11. UDDI is a technical specification for ____, ____ and ____ web

services.

5.5 Introduction to the XML Document Object Model (XMLDOM)

The W3C Document Object Model (DOM) is a platform and language-

neutral interface that allows programs and scripts to dynamically access and

update the content, structure, and style of a document. It defines the logical

structure of documents and the way a document is accessed and

manipulated. With the Document Object Model, programmers can build

documents, navigate their structure, and add, modify, or delete elements

and content. Anything found in an HTML or XML document can be

accessed, changed, deleted, or added using the Document Object Model.

Data Object Tree

The DOM is a programming API for documents. It closely resembles the

structure of the documents it models. For instance, consider this table, taken

from an HTML document:

<TABLE>

<TBODY>

<TR><TD>Shady Grove</TD>

<TD>Aeolian</TD>

</TR>

<TR> <TD>Over the River, Charlie</TD>

<TD>Dorian</TD>

</TR>

</TBODY>

</TABLE>



The DOM represents this table like this:

Fig. 5.4: DOM representation of the example table

XMLDOM Parsers

All modern browsers have a build-in XML parser that can be used to read

and manipulate XML. The parser reads XML into memory and converts it

into an XML DOM object that can be accesses with JavaScript.

There are some differences between Microsoft's XML parser and the

parsers used in other browsers. The Microsoft parser supports loading of

both XML files and XML strings (text), while other browsers use separate

parsers. However, all parsers contain functions to traverse XML trees,

access, insert, and delete nodes.

The Top-Level Document Object

A top-level Document instance is the root of the tree, and has a single child

which is the top-level Element instance; this Element has child nodes

representing the content and any sub-elements, which may in turn have

further children and so forth. There are different classes for everything that

can be found in an XML document, so in addition to the Element class, there

are also classes such as Text, Comment, CDATASection, EntityReference,

and so on. Nodes have methods for accessing the parent and child nodes,

accessing element and attribute values, insert and delete nodes, and

converting the tree back into XML.

Primary Nodes and Node Collections (NodeList and NamedNodeMap)

The NodeList interface provides the abstraction of an ordered collection of

nodes, without defining or constraining how this collection is implemented.



NodeList objects in the DOM are live. The items in the NodeList are

accessible via an integral index, starting from 0.

Objects implementing the NamedNodeMap interface are used to represent

collections of nodes that can be accessed by name. Note that

NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not

maintained in any particular order. Objects contained in an object

implementing NamedNodeMap may also be accessed by an ordinal index,

but this is simply to allow convenient enumeration of the contents of a

NamedNodeMap, and does not imply that the DOM specifies an order to

these Nodes.


12. The DOM is a ____ for documents.

13. The ____ interface provides the abstraction of an ordered collection of

nodes.

5.6 Summary

A schema specifies the data type of every element and attribute of its

instance XML documents.

Elements are defined in an XML schema with the element tag, which is

from the XMLSchema namespace.

A simple data type is one whose content is restricted to strings. Complex

types are defined with the complexType tag.

SOAP is an XML-based protocol for exchanging information between

computers.

A web service is any service that is available over the Internet, uses a

standardized XML messaging system, and is not tied to any one

operating system or programming language.


1. Explain the Union and List data types in XML.

2. Briefly explain the sending and receiving of SOAP messages.

3. Explain the various WSDL elements.

4. Explain the UDDI Registries.

5. Explain the ebXML registry and repository.



5.8 Answers


1. XML Processor

2. targetNamespace

3. local or global

4. ComplexType

5. unique, key, keyref

6. exchanging information between computers

7. envelope, header, body

8. payload

9. web service

10. service registry

11. describing, discovering, integrating

12. programming API

13. NODELIST

Terminal Questions

1. List datatypes are special cases in which a structure is defined within the

content of a single attribute or element. (Refer section 5.2)

2. SOAP can be used in a variety of messaging systems, including one-

way and two way messaging. (Refer section 5.3.3)

3. WSDL currently represents the service description layer within the web

service protocol stack. (Refer section 5.4.2)

4. UDDI is a technical specification for describing, discovering, and

integrating web services. (Refer section 5.4.5)

5. The ebXML (Electronic Business using XML) specifications enable

enterprises of any size and in any geographical location to conduct

business over the Internet. (Refer section 5.4.9)



Unit 6 XML Programming – III

Structure:

6.1 Introduction

Objectives

6.2 Transforming XML Documents with XSLT and XPath

6.3 Formatting XML Documents with XSL-FO

Purpose of XSL Formatting Objects (XSL-FO)

XSL-FO Documents and XSL-FO Processors

XSL-FO Namespace

Page Format Specifiers

Page Content Specifiers

6.4 Summary


6.6 Answers

6.1 Introduction

XSLT style sheets are used to transform XML documents into different

forms or formats, perhaps using different DTDs. In this chapter you are

going to study about the transformation of XML documents into different

formats using XSLT style sheets. Also you will study an overview of the XSL

Formatting Objects.

Objectives:


transform XML documents into different formats using XSLT style sheets

formatting XML Documents with XSL-FO

6.2 Transforming XML Documents with XSLT and XPath

XSLT is a language for transforming XML documents into XHTML

documents or to other XML documents. CSS provides no direct means of

transforming XML documents. Unlike scripting languages, CSS was

explicitly designed for use by nonprogrammers, which explains why it is so

easy to learn and use. CSS simply attaches style properties to elements in

an XML/HTML document. The simplicity of CSS comes with limitations,

some of which follow:

CSS cannot reuse document data



CSS cannot conditionally select document data (other than hiding

specific types of elements)

CSS cannot calculate quantities or store values in variables

CSS cannot generate dynamic text, such as page numbers

These limitations of CSS are important because they are noticeably missing

in XSLT. In other words, XSLT is capable of carrying out these tasks and

therefore doesn't suffer from the same weaknesses.

XSL Stylesheet Advantages

The powerful capabilities provided by XSL allow:

formatting of source elements based on ancestry/descendency, position

and uniqueness

the creation of formatting constructs including generated text and

graphics

the definition of reusable formatting macros

writing-direction independent stylesheets

extensible set of formatting objects

Transformation vs. Formatting

In an XSL transformation, an XSLT processor reads both an XML document

and an XSLT style sheet. Based on the instructions the processor finds in

the XSLT style sheet, it outputs a new XML document or fragment thereof.

There's also special support for outputting HTML. With some effort most

XSLT processors can also be made to output essentially arbitrary text,

though XSLT is designed primarily for XML-to-XML and XML-to-HTML

transformations.

The formatting deals with how to display the content to the user. It uses

various formatting methods to make the content look good.

XSLT and XSL-FO

The Extensible Stylesheet Language (XSL) includes both a transformation

language and a formatting language. The transformation language is useful

independent of the formatting language. Its ability to move data from one

XML representation to another makes it an important component of XML-

based electronic commerce, electronic data interchange, metadata

exchange, and any application that needs to convert between different XML

representations of the same data.



XSL-FO stands for Extensible Stylesheet Language Formatting Objects.

XSL-FO is a language for formatting XML data. XSL-FO is an XML-based

markup language describing the formatting of XML data for output to screen,

paper or other media.

XSLT Templates

Template rules defined by xsl:template elements are the most important

part of an XSLT style sheet. These associate particular output with particular

input. Each xsl:template element has a match attribute that specifies which

nodes of the input document the template is instantiated for. The content of

the xsl:template element is the actual template to be instantiated. A template

may contain both text that will appear literally in the output document and

XSLT instructions that copy data from the input XML document to the result.

For example, here is a template that is applied to the root node of the input

tree:

<xsl:template match="/">

<html>

<head>

</head>

<body>

</body>

</html>

</xsl:template>

When the XSLT processor reads the input document, the first node it sees is

the root. This rule matches that root node, and tells the XSLT processor to

emit this text:

<html>

<head>

</head>

<body>

</body>

</html>

This text is well-formed HTML. Because the XSLT document is itself an

XML document, its contents – templates included – must be well-formed

XML.



XPath Data Model

An XPath query operates on a namespace well-formed XML document after

it has been parsed into a tree structure. The particular tree model XPath

uses divides each XML document into seven kinds of nodes:

Root node – The document itself. The root node’s children are the

comments and processing instructions in the prolog and epilog and the root

element of the document.

Element node – An element. Its children are all the child elements, text

nodes, comments, and processing instructions the element contains. An

element also has namespaces and attributes. However, these are not child

nodes.

Attribute node – An attribute other than one that declares a namespace

Text node – The maximum uninterrupted run of text between tags,

comments, and processing instructions. White space is included.

Comment node – A comment

Processing instruction node – A processing instruction

Namespace node – A namespace mapping in scope on an element

The XPath data model does not include entity references, CDATA sections,

or the document type declaration. Entity references are resolved into their

component text and elements. CDATA sections are treated like any other

text, and will be merged with any adjacent text before a text node is formed.

Default attributes are applied, but otherwise the document type declaration

is not considered.

Declaring XSL Stylesheets

XSL documents must conform to the rules of any other XML document, in

that the syntax of the document must be well-formed, such as the proper

nesting of tags, no empty tags, etc. The stylesheet can contain text that will

be reflected exactly in the output document, in addition to XSL instructions

that copy the data from the XML document the stylesheet is being applied

to. The declaration of the stylesheet, with the processing instructions to the

browser is done as follows.

< xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">



What this line is doing is declaring the element an xsl:stylesheet element

and calling for the XSL elements that are in the http://www.w3.org/TR/WD-

xsl namespace. Of course, for the document to be well formed, this tag must

be closed at the very end of the document with the close tag:

</xsl:stylesheet>

Built-In Templates

There is a built-in template rule to allow recursive processing to continue in

the absence of a successful pattern match by an explicit template rule in the

stylesheet. This template rule applies to both element nodes and the root

node. The following shows the equivalent of the built-in template rule:

<xsl:template match="*|/">

<xsl:apply-templates/>

</xsl:template>

There is also a built-in template rule for each mode, which allows recursive

processing to continue in the same mode in the absence of a successful

pattern match by an explicit template rule in the stylesheet. This template

rule applies to both element nodes and the root node. The following shows

the equivalent of the built-in template rule for mode m.

<xsl:template match="*|/" mode="m">

<xsl:apply-templates mode="m"/>

</xsl:template>

There is also a built-in template rule for text and attribute nodes that copies

text through:

<xsl:template match="text()|@*">

<xsl:value-of select="."/>

</xsl:template>

The built-in template rule for processing instructions and comments is to do

nothing.

<xsl:template match="processing-instruction()|comment()"/>

The built-in template rule for namespace nodes is also to do nothing. There

is no pattern that can match a namespace node; so, the built-in template

rule is the only template rule that is applied for namespace nodes.



Using Templates as Subroutines – xsl:apply-templates

The xsl:apply-templates selects source nodes for processing. The format is

given below:

<xsl:apply-templates [select="pattern"][mode="qname"]>

[<xsl:sort>]

</xsl:apply-templates>

If you specify the select attribute, specify a pattern that resolves to a set of

source nodes. For each source node in this set, the XSLT processor

searches for a template that matches the node. When it finds a matching

template, it instantiates it and uses the node as the context node. For

example:

<xsl:apply-templates select="/bookstore/book">

When the XSLT processor executes this instruction, it constructs a list of all

nodes that match the pattern in the select attribute. For each node in the list,

the XSLT processor searches for the template whose match pattern best

matches that node. If you do not specify the select attribute, the XSLT

processor uses the default pattern, "node()", which selects all child nodes of

the current node.

If you specify the mode attribute, the selected nodes are matched only by

templates with a matching mode attribute. The value of mode must be a

qualified name or an asterisk (*). If you specify an asterisk, it means

continue the current mode, if any, of the current template. If you do not

specify a mode attribute, the selected nodes are matched only by templates

that do not specify a mode attribute.

By default, the new list of source nodes is processed in document order.

However, you can use the xsl:sort instruction to specify that the selected

nodes are to be processed in a different order.

In the previous example, the XSLT processor searches for a template that

matches /bookstore/book. The following template is a match:

<xsl:template match="book">

<tr><td><xsl:value-of select="title"/></td>

<td><xsl:value-of select="author"/><td>

<td><xsl:value-of select="price"/><td></tr>

</xsl:template>



The XSLT processor instantiates this template for each book element.

XPath Expression Syntax

XPath can locate any type of information in an XML document with one line

of code. These one liners are referred to as "expressions," and every piece

of XPath that you write will be an expression. An XPath expression

describes the location of an element or attribute in our XML document. By

starting at the root element, we can select any element in the document by

carefully creating a chain of children elements. Each element is separated

by a slash "/".

Example: inventory/snack/chips/amount

XPath Functions and Predicates

You can use XML Path Language (Xpath) functions to refine XPath queries

and enhance the programming power and flexibility of XPath. The functions

are divided into the following groups.

Table 6.1: Six Functions

Node-Set Takes a node-set argument, returns a node-set, or returns/provides information about a particular node within a node-set.

String Performs evaluations, formatting, and manipulation on string arguments.

Boolean Evaluates the argument expressions to obtain a Boolean result.

Number Evaluates the argument expressions to obtain a numeric result.

Microsoft XPath Extension Functions

Microsoft extension functions to XPath that provide the ability to select nodes by XSD type. Also includes string comparison, number comparison, and date/time conversion functions.

Each function in the function library is specified using a function prototype

that provides the return type, function name, and argument type. If an

argument type is followed by a question mark, the argument is optional;

otherwise, the argument is required. Function names are case-sensitive.

A predicate is similar to an If/Then statement. If our predicate is TRUE,

then the element will be selected. If the predicate is FALSE, it will be

excluded. An XPath predicate is contained within square brackets [], and

comes after the parent element of what will be tested.

Example: inventory/drink/lemonade[amount>15]

Besides testing the values of elements, you can also use predicates to

check the values of attributes. The form pretty much the same as before,

except the attribute belongs to the element before the predicate.

Syntax: element[@element'sAttribute someTestHere]

Inserting Elements - xsl:element

The <xsl:element> element is used to create an element node in the output

document.

Syntax:

<xsl:element

name="name"

namespace="URI"

use-attribute-sets="namelist">



</xsl:element>

Attributes:


name name Required. Specifies the name of the element to be created (the value of the name attribute can be set to an expression that is computed at run-time, like this: <xsl:element name="{$country}" />

namespace URI Optional. Specifies the namespace URI of the element (the value of the namespace attribute can be set to an expression that is computed at run-time, like this: <xsl:element name="{$country}" namespace="{$someuri}"/>

Example: Create a "singer" element that contains the value of each artist

element:


<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">


<xsl:for-each select="catalog/cd">



<xsl:element name="singer">

<xsl:value-of select="artist" />

</xsl:element>

<br />

</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

Inserting Attributes - xsl:attribute

The <xsl:attribute> element is used to add attributes to elements.

Syntax:

<xsl:attribute name="attributename" namespace="uri">


</xsl:attribute>

Attributes:


name attributename Required. Specifies the name of the attribute

namespace URI Optional. Defines the namespace URI for the attribute

Example: Add a source attribute to the picture element:

<picture>

<xsl:attribute name="source"/>

</picture>

Extracting Node Values - xsl:value-of

The <xsl:value-of> element extracts the value of a selected node. The

<xsl:value-of> element can be used to select the value of an XML element

and add it to the output.

Syntax:

<xsl:value-of

select="expression"

disable-output-escaping="yes|no"/>



Attributes:


select expression Required. An XPath expression that specifies which node/attribute to extract the value from

disable-output-escaping yes no

Optional. "yes" indicates that special characters (like "<") should be output as is. "no" indicates that special characters (like "<") should be output as "<". Default is "no"

Example:





<html>

<body>

<h2>My CD Collection</h2>

<table border="1">

<tr bgcolor="#9acd32">

<th>Title</th>

<th>Artist</th>

</tr>

<tr>

<td><xsl:value-of select="catalog/cd/title"/></td>

<td><xsl:value-of select="catalog/cd/artist"/></td>

</tr>

</table>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

Looping - xsl:for-each

The <xsl:for-each> element allows you to do looping in XSLT. The XSL

<xsl:for-each> element can be used to select every XML element of a

specified node-set:







<html>

<body>


<table border="1">


<th>Title</th>

<th>Artist</th>

</tr>


<tr>

<td><xsl:value-of select="title"/></td>

<td><xsl:value-of select="artist"/></td>

</tr>

</xsl:for-each>

</table>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

The result of the transformation above will look like this:

Title Artist

Empire Burlesque Bob Dylan

Hide your heart Bonnie Tyler

Greatest Hits Dolly Parton

Still got the blues Gary More

Eros Eros Ramazzotti

One night only Bee Gees

Sorting – The order-by Attribute

Sorting is specified by adding xsl:sort elements as children of an xsl:apply-

templates or xsl:for-each element. The first xsl:sort child specifies the

primary sort key, the second xsl:sort child specifies the secondary sort key

and so on. When an xsl:apply-templates or xsl:for-each element has one or

more xsl:sort children, then instead of processing the selected nodes in

document order, it sorts the nodes according to the specified sort keys and

then processes them in sorted order. When used in xsl:for-each, xsl:sort

elements must occur first. When a template is instantiated by xsl:apply-

templates and xsl:for-each, the current node list list consists of the complete

list of nodes being processed in sorted order.

Order specifies whether the strings should be sorted in ascending or

descending order; ascending specifies ascending order; descending

specifies descending order; the default is ascending.

Simple Conditionals - xsl:if

To put a conditional if test against the content of the XML file, add an

<xsl:if> element to the XSL document.

Syntax:

<xsl:if test="expression">

...some output if the expression is true...

</xsl:if>

Multiple Conditionals - xsl:choose, xsl:when, and xsl:otherwise

The <xsl:choose> element is used in conjunction with <xsl:when> and

<xsl:otherwise> to express multiple conditional tests. If no <xsl:when> is

true, the content of <xsl:otherwise> is processed. If no <xsl:when> is true,

and no <xsl:otherwise> element is present, nothing is created.

Syntax:

<xsl:choose>



</xsl:choose>

Example:





<html>

<body>



<table border="1">


<th>Title</th>

<th>Artist</th>

</tr>


<tr>

<td><xsl:value-of select="title"/></td>

<xsl:choose>

<xsl:when test="price > 10">

<td bgcolor="#ff00ff">

<xsl:value-of select="artist"/></td>

</xsl:when>

<xsl:otherwise>

<td><xsl:value-of select="artist"/></td>

</xsl:otherwise>

</xsl:choose>

</tr>

</xsl:for-each>

</table>

</body>

</html>

</xsl:template>

</xsl:stylesheet>

Copying Nodes - xsl:copy

The <xsl:copy> element creates a copy of the current node.

Syntax:

<xsl:copy use-attribute-sets="name-list">


</xsl:copy>



Attribute:

Attribute Value

use-attribute-sets name-list


1. ____ is a language for transforming XML documents into XHTML

documents or to other XML documents.

2. ____ can locate any type of information in an XML document with one

line of code.

3. The <xsl:element> element is used to create an element node in

the ____ document.

4. The ____ element allows you to do looping in XSLT.

6.2 Formatting XML Documents with XSL-FO

XSL-FO is an XML-based markup language describing the formatting of

XML data for output to screen, paper or other media. Styling is both about

transforming and formatting information. When the World Wide Web

Consortium (W3C) made their first XSL Working Draft, it contained the

language syntax for both transforming and formatting XML documents.

Later, the XSL Working Group at W3C split the original draft into separate

Recommendations:

XSLT, a language for transforming XML documents

XSL or XSL-FO, a language for formatting XML documents

XPath, a language for navigating through elements and attributes in

XML documents

6.2.1 Purpose of XSL Formatting Objects (XSL-FO)

The purpose of XSL-FO is to provide a mechanism for formatting XML data

for print, screen and other output media. XSL-FO, also known simply as

XSL, is a specification of the World Wide Web Consortium and is closely

related to XSLT. However, whereas XSLT is most often used for

transforming XML into HTML or other XML structures, XSL-FO is most often

used for formatting XML for print.

Transforming XML for print is accomplished by transforming an XML

document to a Formatting Objects (FO) document, which itself is XML-

based, via XSLT. The formatting objects processor is able to read the FO

document and transform it for different types of print output. The most

common and best supported print output is currently Adobe PDF.

6.2.2 XSL-FO Documents and XSL-FO Processors

XSL-FO documents are XML files with output information. They contain

information about the output layout and output contents. XSL-FO documents

are stored in files with a .fo or a .fob file extension. It is also quite common

to see XSL-FO documents stored with an .xml extension, because this

makes them more accessible to XML editors.

XSL-FO documents have a structure like this:


<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set>

<fo:simple-page-master master-name="A4">



</fo:simple-page-master>

</fo:layout-master-set>

<fo:page-sequence master-reference="A4">



</fo:page-sequence>

</fo:root>

XSL-FO documents are XML documents, and must always start with an

XML declaration:


The <fo:root> element is the root element of XSL-FO documents. The root

element also declares the namespace for XSL-FO:




</fo:root>

The <fo:layout-master-set> element contains one or more page templates:

<fo:layout-master-set>



</fo:layout-master-set>

Each <fo:simple-page-master> element contains a single page template.

Each template must have a unique name (master-name):

<fo:simple-page-master master-name="A4">

One or more <fo:page-sequence> elements describe the page contents.

The master-reference attribute refers to the simple-page-master template

with the same name:

<fo:page-sequence master-reference="A4">



</fo:page-sequence>

6.2.3 XSL-FO Namespace

XSL-FO is the part of XSL that actually describes how a document should

be formatted. It is based on a word-processing model for page layout as

opposed to a desktop publishing model. As such, it is nearly impossible

describe the exact positionings and layout of the text of a document.

Instead, XSL-FO involves giving a general description of how text should be

arranged in relation to other text, and the XSL-FO engine will choose an

appropriate arrangement, much like Microsoft Word, Latex, or other

programs based on a word-processing model. For complete control over the

layout of a document, document designers still have to resort to other file

formats like .pdf files.

The easiest way to understand the nature of XSL-FO is to look at an

example. XSL-FO documents typically end with a .fob, .fo, or .xml ending.

Like all XML, documents, an XSL-FO document requires a namespace and

root node, as shown below:

<?xml version="1.0"?>


...

</fo:root>

6.2.4 Page Format Specifiers

XSL-FO uses page templates called "Page Masters" to define the layout of

pages. Each template must have a unique name:

<fo:simple-page-master master-name="intro">

<fo:region-body margin="5in" />


<fo:simple-page-master master-name="left">



<fo:region-body margin-left="2in" margin-right="3in" />


<fo:simple-page-master master-name="right">

<fo:region-body margin-left="3in" margin-right="2in" />


In the example above, three <fo:simple-page-master> elements, define

three different templates. Each template (page-master) has a different

name. The first template is called "intro". It could be used as a template for

introduction pages. The second and third templates are called "left" and

"right". They could be used as templates for even and odd page numbers.

XSL-FO Page Size

XSL-FO uses the following attributes to define the size of a page:

page-width defines the width of a page

page-height defines the height of a page

XSL-FO Page Margins

XSL-FO uses the following attributes to define the margins of a page:

margin-top defines the top margin

margin-bottom defines the bottom margin

margin-left defines the left margin

margin-right defines the right margin

margin defines all four margins

XSL-FO Page Regions

XSL-FO uses the following elements to define the regions of a page:

region-body defines the body region

region-before defines the top region (header)

region-after defines the bottom region (footer)

region-start defines the left region (left sidebar)

region-end defines the right region (right sidebar)

Example:

<fo:simple-page-master master-name="A4"

page-width="297mm" page-height="210mm"

margin-top="1cm" margin-bottom="1cm"

margin-left="1cm" margin-right="1cm">

<fo:region-body margin="3cm"/>



<fo:region-before extent="2cm"/>

<fo:region-after extent="2cm"/>

<fo:region-start extent="2cm"/>

<fo:region-end extent="2cm"/>


6.2.5 Page Content Specifiers

XSL-FO Lists - XSL-FO uses the <fo:list-block> element to define lists.

There are four XSL-FO objects used to create lists:

fo:list-block (contains the whole list)

fo:list-item (contains each item in the list)

fo:list-item-label (contains the label for the list-item - typically an

<fo:block> containing a number, character, etc.)

fo:list-item-body (contains the content/body of the list-item - typically one

or more <fo:block> objects)

XSL-FO Tables - XSL-FO uses the <fo:table-and-caption> element to

define tables. There are nine XSL-FO objects used to create tables:

fo:table-and-caption

fo:table

fo:table-caption

fo:table-column

fo:table-header

fo:table-footer

fo:table-body

fo:table-row

fo:table-cell

XSL-FO uses the <fo:table-and-caption> element to define a table. It

contains a <fo:table> and an optional <fo:caption> element.

The <fo:table> element contains optional <fo:table-column> elements, an

optional <fo:table-header> element, a <fo:table-body> element, and an

optional <fo:table-footer> element. Each of these elements has one or

more <fo:table-row> elements, with one or more <fo:table-cell> elements.




5. XSL-FO is an XML-based markup language describing the formatting of

XML data for output to ____ and _____.

6. XSL-FO documents contain information about ____ and ____.

6.3 Summary

XSLT is a language for transforming XML documents into XHTML

documents or to other XML documents.

XSL-FO is an XML-based markup language describing the formatting of

XML data for output to screen, paper or other media.

An XPath query operates on a namespace well-formed XML document

after it has been parsed into a tree structure. XSL-FO uses page

templates called "Page Masters" to define the layout of pages.


1. Explain XSLT templates

2. How do you declare a XSL stylesheet.

3. Explain the use of xsl:element.

4. Explain the XSL-FO namespace.

5. Explain he page format specifiers in XSL-FO.

6.5 Answers


1. XSLT

2. XPath

3. output

4. xsl:for-each

5. screen, paper

6. output layout, output contents.



Terminal Questions

1. XSLT is a language for transforming XML documents into XHTML

documents or to other XML documents. (Refer section 6.2)

2. XSL documents must conform to the rules of any other XML document,

in that the syntax of the document must be well-formed, such as the

proper nesting of tags, no empty tags, etc. (Refer section 6.2)

3. The <xsl:element> element is used to create an element node in the

output document. (Refer section 6.2)

4. XSL-FO documents are XML files with output information. (Refer section

6.2.2)

5. XSL-FO uses page templates called "Page Masters" to define the layout

of pages. (Refer section 6.2.4)



Acknowledgements, References and Suggested Readings

1. Goodman Danny, The JavaScript Bible

2. Nakhimovsky Alexander and Myers Tom. XML Programming

3. Sebesta Robert, Programming the World Wide Web, 3rd Edition.

Pearson Education

4. Online media