bscit2012.weebly.com · prof. s. kannan director & dean (in-charge) directorate of distance...
TRANSCRIPT
BT 0078 Website Design
Contents
Unit 1
Introduction to Internet 1
Unit 2
Website development with HTML – I 40
Unit 3
Website development with HTML – II 70
Unit 4
XML Programming – I 104
Unit 5
XML Programming – II 129
Unit 6
XML Programming – III 158
Acknowledgements, References
and Suggested Readings 178
Edition: Spring 2009
BKID – B1005 10th
June 2009
Prof. S. Kannan Director & Dean (in-charge) Directorate of Distance Education Sikkim Manipal University of Health, Medical & Technological Sciences (SMU DDE)
Board of Studies Dr. U. B. Pavanaja (Chairman) Nirmal Kumar Nigam General Manager – Academics HOP – IT Manipal Universal Learning Pvt. Ltd. Sikkim Manipal University – DDE Bangalore. Manipal. Prof. Bhushan Patwardhan Dr. A. Kumaran Chief Academics Research Manager (Multilingual) Manipal Education Microsoft Research Labs India Bangalore. Bangalore. Dr. Harishchandra Hebbar Ravindranath.P. S. Director Director (Quality) Manipal Centre for Info. Sciences. Yahoo India Manipal. Bangalore. Dr. N. V. Subba Reddy Dr. Ashok Kallarakkal HOD-CSE Vice President Manipal Institute of Technology, Manipal IBM India, Bangalore Dr. Ashok Hegde H. Hiriyannaiah Vice President Group Manager MindTree Consulting Ltd., Bangalore EDS Mphasis, Bangalore Dr. Ramprasad Varadachar Director, Computer Studies Dayanand Sagar College of Engg. Bangalore.
Content Preparation Team Content Writing Content Editing Mr. Balasubramani R Dr. E. R. Naganathan Assistant Professor, Dept. of IT Professor & HOD – IT Sikkim Manipal University – DDE Sikkim Manipal University – DDE Manipal. Manipal. Instructional Design Mr. Kulwinder Pal Senior Lecturer (Education) Sikkim Manipal University – DDE, Manipal
Edition: Spring 2009
This book is a distance education module comprising a collection of learning material for our students. All rights reserved. No part of this work may be reproduced in any form by any means without permission in writing from Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim. Printed and published on behalf of Sikkim Manipal University of Health, Medical and Technological Sciences, Gangtok, Sikkim by Mr. Rajkumar Mascreen, GM, Manipal Universal Learning Pvt. Ltd., Manipal – 576 104. Printed at Manipal Press Limited, Manipal.
SUBJECT INTRODUCTION
‘Website Design’ is a two credit subject in third semester of BscIT program,
that introduces you about the essential skills needed for students to develop
web site and for writing the script to process at the client side. The HTML
tutorial will give clear idea about the designing of the user interface to the
web. The subject also gives clear idea about handling of the XML data in the
web environment. The JavaScript is used to process the web page at the
client side and server side. This subject gives idea about how to use the
JavaScript at the client side to process the web form.
This SLM has been split into ten units to cover the overview of web
designing.
Unit 1: Introduction to Internet:
In this unit we shall begin with an introduction to the internet, then discuss
about the client-server model for communication and different types of
connections. We shall also discuss about Internet Service Providers and
addressing in the internet. At the end we shall explain about the Resource
Addressing and Electronic mail.
Unit 2: Web site development with HTML – I:
In this unit we shall study about the various HTML tags and to create a web
page using these tags. We shall also study to design a form using HTML.
Unit 3: Web site development with HTML – II:
In this chapter you are going to study about frames in HTML. The
Cascading Style Sheets usage will be studied in this chapter. The design of
Tables in HTML and the general web site layout and design is also
explained in this unit. Foundations of DHTML are also studied in this unit.
Unit 4: XML Programming – I:
XML is far more than a solution to the deficiencies of HTML. It provides a
simple and universal way of storing textual data of any kind. In this chapter,
you are going to study the need of XML, the XML document structure and
XML namespaces.
Unit 5: XML PROGRAMMING – II:
A schema is similar to a class definition. In this chapter you are going to
study how schema is used in XML. You are also going to study about an
overview of SOAP (Simple Object Access Protocol) and an introduction to
the web services. Overview of the XML Document Object Model will be
discussed in this chapter.
Unit 6: XML PROGRAMMING – III:
XSLT style sheets are used to transform XML documents into different
forms or formats, perhaps using different DTDs. In this chapter you are
going to study about the transformation of XML documents into different
formats using XSLT style sheets. Also you will study an overview of the XSL
Formatting Objects.
After studying this subject, you should be able to develop professional
Interactive websites using HTML, DHTML, XML features.
The subject requires knowledge and understanding of skills related to
Internet, ISP, DNS servers and HTML.
For various multimedia and other resources on the
subject, log on to TeL portal of SMU DDE at www.smude.edu.in.
Website Design Unit 1
Sikkim Manipal University Page No. 1
Unit 1 Introduction to Internet
Structure:
1.1 Introduction
Objectives
1.2 What is Internet?
Definition
Internet from practical and technical angle
Who owns and cares for the Internet?
What is TCP/IP?
Introduction to RFC
How Internet Works?
Internet Applications
1.3 Concepts of Server
Client Server Model
Servers
1.4 Getting Connected
Different Types of Connections
Requirements for Connections
1.5 Internet Service Providers
1.6 Address in Internet
The Domain Name System and DNS Servers
IP Addresses
1.6 Resource Addressing
URL (Uniform Resource Locator)
URLs and HOST Names
URLs and Port Numbers
Pathnames
1.7 Email
Email Basics
Mail protocols
How to Access the Mail System
1.8 Summary
1.9 Terminal Questions
1.10 Answers
Website Design Unit 1
Sikkim Manipal University Page No. 2
1.1 Introduction
We have covered the basic concepts of internet and website in the previous
semester. In addition to it we are going to cover few advance concepts in
this unit.
In this unit, we would begin with an introduction to the internet, then discuss
about the client-server model for communication and different types of
connections. We would also discuss about Internet Service Providers and
addressing in the internet. At the end we will explain about the Resource
Addressing and Electronic mail.
Objectives
After studying this unit, you should be able to:
explain the meaning, evolution, working and application of internet
discuss the client server model and various types of internet
describe how to get connected to internet
use IP addressing scheme
explain the concepts of resource addressing
describe the E-mail basics, mail protocol & methods of accessing mail
system
1.2 What is Internet?
This section covers the definition, meaning and practical & technical angle
of internet.
1.2.1 Definition
There is no single, generally agreed upon definition for internet because the
internet is a different thing to different people. We can give the following few
expressions in this context.
The Internet links are computer networks all over the world so that users
can share resources and communicate with each other.
It is the name for a vast, worldwide system consisting of people,
information and computers
It is a network of networks that spans the globe
It is an ocean of information
It is a set of computers communicating over fiber optics, phone lines,
satellite links and other media
Website Design Unit 1
Sikkim Manipal University Page No. 3
It is a gold mine of professionals from all fields sharing information about
their work
It is a world wide interconnected system of thousands of computer
networks, each network in turn linking thousands of computers together
The Internet is also what we call a distributed system; there is no central
archive.
The Internet thrives and develops as its many users find new ways to
create, display and retrieve the information that constitutes the Internet.
1.2.2 Internet from practical and technical angle
From the practical angle
Internet is a vast collection of globally available information which can be
accessed electronically – information which is of practical use for business,
research, study and technical purposes. It is a means for electronic
commerce – marketing, buying, services, economic and financial data
research. It is a collection of hundreds of libraries and archives that will open
to your fingertips. It is also a vast store of information relating to your
hobbies, travel, health, entertainment, games, software, etc.
Today the information can be in the form of Text, Images, Animation,
Sound, Video etc tomorrow it would probably be in the form of smell, touch,
taste or some energized form. If information can be put on computers, that
mean it can be digitized, it can be made available on the internet. The only
catch is, how fast? Even the future may not be able to tell.
From the Technical Angle
To be technically correct, we can say that the internet is “an ever growing
wide area network of millions of computers and computer networks across
the globe, which can exchange information through standard rules
(protocols). Each computer has a unique address. Information is divided into
packets which may travel through different paths to the destination address
where it is recombined into its original form.”
1.2.3 Who owns and cares for the Internet?
Owning of internet
No one owns the Internet. Any single person, corporation, university or
government does not fund it. Internet has been described as the cooperative
anarchy. Every person who makes a connection, every group whose Local
Area Network (LAN) becomes connected, owns a slice of the Internet.
Website Design Unit 1
Sikkim Manipal University Page No. 4
You can compare Internet model with phone companies and the electric
companies. For example, there is phone service in almost every part of the
country. With a phone company, each person who wants telephone service
contacts a local service provider. The service provides a “hook-up” from the
residence or office to the service network.
The person wanting service actually provides the telephone instrument and
the connections within the residence or office. As long as the calls you want
to place are restricted to your local area, you do not need anything else.
However, if you want to place a call to someone in another area, you need
to purchase services from a long-distance service provider. The local area
provider supplies the connection from the local network into the long-
distance network. This model allows you to connect to the telephone almost
anywhere in the world. Moving among networks of computers works much
the same way (which is not surprising since the telephone networks – that
is, the physical cables – are used to connect the computers).
Who cares for Internet?
Many people care about the internet. All the people who use it, even if only
to send a note to someone on some other network that is connected into the
Internet, care about it. Someone or some enterprise owns each computer
connected. The owner of the connected equipment therefore „owns‟ a piece
of the internet. The telephone companies „own‟ the pieces that carry the
information packets. The service providers „own‟ the packet routing
equipment. So, while no one person or entity owns the internet, all who use
it or supply materials for it play a part in its existence.
Since communication between networks cannot happen without
co-operation, there are committees and groups working hard all the time to
ensure smooth functioning. Some issues related to providing standards and
identification of computers on the NET are to be cared by somebody. Some
groups have thus been formed who look after primarily about the
commonality part of internet. This body is called IAB (Internet Architecture
Board), earlier called Internet Board as named by ARPA. There are two
main wings to this board:
IETF (Internet Engineering Task Force)
IRTF (Internet Research Task Force)
Website Design Unit 1
Sikkim Manipal University Page No. 5
IETF does a documentation of the internet known as RFC (Request For
Comments), named so because it is a set of open-ended documents always
available to public for their comments and thus the standards keep
continuously evolving.
Apart from maintaining protocols and norms/standards, another important
function of commonality is assigning unique names and addresses to
computers connected on the Net. This function is performed by InterNIC
(Internet Network Information Center) which is a group of three
organizations.
1. General Atomics, CA : Provides Information Services
2. AT & T., NY : Provides Directory and Database Services
3. Network Solutions, VA : Provides Registration Service
The services of InterNIC group are available on the Internet itself. Each
individually connected network maintains its own user policies and
procedures as to who can be connected, what kind of traffic the network will
carry, and so on.
1.2.4 What is TCP/IP?
As we have already discussed, the Internet is built on a collection of
networks covering the world, and obviously, these networks contain many
different types of computers. To hold the whole thing together we have
something called TCP/IP (Transmission Control Protocol/Internet Protocol).
Protocols are the rules that all networks use to understand each other. For
example, there is a protocol describing exactly what format should be used
for sending mail message. All internet mail programs follow this protocol
when they prepare a message for delivery. Collectively, more than 100
protocols are given the common name, called TCP/IP, used to organize
computers and communication devices into a network. It is glue holding it all
together.
Information within the Internet is not transmitted as a constant stream from
host to host; rather data is broken into small packages called segments. To
divide the data (or message) into number of segments is the task of TCP.
TCP marks each segment with a sequence number, the address of the
recipient, the address of the sender, and it also inserts some error control
information.
Website Design Unit 1
Sikkim Manipal University Page No. 6
The segments are then sent over the network, where it is the job of the IP to
transport them correctly to the remote host. TCP of the other end receives
the segments and checks for errors. If an error has occurred, TCP can ask
for that particular block to be resent. Once all the segments are received
correctly, TCP will reconstruct the original message using the sequence
number. Therefore, the job of TCP is to manage the flow and ensure the
data is correct, and for IP the job is to route the raw data – the packets from
one place to another.
The technical answer of, “What is TCP/IP, is: TCP/IP is a large family of
protocols used to organize computers and communication devices into a
network. The two important protocols are TCP and IP. IP transmits the data
from place to place, while TCP makes sure it all works correctly.
1.2.5 Introduction to RFC
The internet is based on a large number of protocols and conventions. Each
such protocol is explained in the technical publication called a request for
comment or RFC. An RFC is usually a detailed technical explanation of how
something is supposed to work, not an invitation for people to send in
comments. Each RFC is given a number and is made available to anyone
who wants to read it. In this manner, the technical information that supports
the internet is distributed around the world in an organized, reliable manner.
Programmers and engineers who want to design products to work with the
internet protocols can download the RFCs and use them as reference
material. This ensures that everyone is using the same specifications and
that all the internet programs are designed to follow the same set of
standards.
1.2.6 How Internet Works?
In this section, we are going to cover the concepts of Internet, sending and
receiving messages.
Working of Internet
The primary objective of any network is to exchange information between
different locations. The rules for this exchange are called Protocols. The
protocol on Internet is TCP/IP (Transmission Control Protocol/Internet
Protocol) which is actually a name for a set of many rules framed to connect
computers in a wide area network, a network which is established between
computers across cities or countries.
Website Design Unit 1
Sikkim Manipal University Page No. 7
Let us take a practical example of simply exchanging a message between
two persons, one at Lucknow and another at Mumbai.
Surya has an Internet account at Lucknow as [email protected]
Rishaba has an Internet account at Mumbai as Rishaba@bmOl.
vsnl.net.in
When Surya wants to send a message to Rishaba at Mumbai, he dials
from his telephone to his local service provider, types out his message
and types out the address of the recipient.
Surya‟s message is then broken into packets, which is an easy and
reliable communicable entity.
These packets are then broadcast to various connected links along with
the destination address, say o Delhi and Kanpur. At these sites also,
there is packet forwarding facilities available based on address available
and after a while, all packets ultimately reach the destination address
that is Mumbai.
At Mumbai, all packets marked for a particular address
[email protected] and particular message number are
(automatically generated) are reassembled and then posted in the box
that Rishaba is supposed to access regularly.
The above example cites a case of store and forward type of message
transfer. However the on-line transfer also occurs in the same way,
provided machines at both the ends are switched on and set to
transmit/receive internet traffic.
Sending and Receiving messages
How the messages are sent and received across a network? Suppose I
send a message. It could be a simple E-mail saying „Hi‟ to tell you I am on
the network. On the other hand, it could be a file, like the text of this chapter.
Now I have to tell the system the address of your computer. It is generally
your name. Therefore, what I do is put this message in an envelope super
scribing your name on it. The actual operation is not much complicated
either. It will be relevant here to understand the mechanisms adopted in
telephones and cable networks transmitting satellite channel, to your home.
Nevertheless, keep in mind that the way communication takes place
between computers is different from both these cases.
Website Design Unit 1
Sikkim Manipal University Page No. 8
When your phone is off the hook, your line is engaged and you cannot
receive another call. However, your cable operator can beam so many
channels and you can surf them at will. Telephones are circuit switched. In
simple terms, it means that when you dial a number it goes to your nearest
exchange, which routes it to the nearest exchange of the called number,
and the ell rings at the end. The moment the receiver lifts his phone off the
hook, a circuit between you and him is established. This is a dedicated
circuit. The whole mechanism is called circuit switching.
Your cable operator, on the other hand, can send multiple channels
because each channel has a different frequency and depending on the
bandwidth of the cable, many channels can be beamed. Imagine a wide
road with neatly defined lanes, one for two wheelers, one for cars, one for
light commercial vehicles and so on. Imagine frequency as the type of
vehicles and you have it!
In case of computer-to-computer communication, you cannot afford to have
circuit switching, and you cannot assign different frequencies to each
computer. The computer networks are packet switched. The different
stations send discrete blocks of data to each other. You can think of these
blocks of data as corresponding to some piece of a file, a piece of e-mail, or
an image.
The message is broken into pieces called packets. The time too is divided
and each computer gets a quota of time to send packets. Suppose many
stations want to communicate at the same time, they have to share the
network resources, especially the wires. This can be achieved through
multiplexing techniques.
Each packet has actual contents surrounded by a header and a trailer. The
packet header has information about its destination. The NIC (Network
Interface Card) transmits the packet on the network. All computers passed
by this packet get to see it but ignore it after seeing the header. The NIC at
the intended receiver copies the packet. But does it copy each packet
separately? Yes. The information at the two ends of the packets helps these
to be put together.
Website Design Unit 1
Sikkim Manipal University Page No. 9
1.2.7 Internet Applications
Internet is an important tool for practically everybody. The applications are
endless. Whatever information is required, it is ether already available on
the internet or it is soon going to be available. Here are some interesting
application areas:
Electronic mail, which was until recently considered only an internal
mechanism, is quickly becoming the most widely used application on
Internet. The most common of the communication methods used by the
people on the Internet is the private letter, written by one individual to
another (on any subject and in any language), and sent between any two
connected Internet sites or through an Internet e-mail gateway to or from a
service which provides an Internet gateway.
The ability to exchange visual information in readable and reusable formats
such as charts, figures, tables, images, databases, software code – opens
up possibilities for collaboration at the global as well as local levels. With the
trend specialization, the ability not only to communicate but also to actually
work with colleagues in the same field scattered all over the world makes
long distance collaboration feasible.
The resources for on-line research are multiplying at an astounding rate.
Searchable databases‟ library holdings, alerting services, pre-prints, and
other information systems are all changing the way research is done. Library
shelves are overflowing with journals and proceedings and with acquisitions‟
budgets receiving deep cuts, a likely scenario for the future is one in which
libraries achieve electronically, share holdings and become information
clearing houses instead of closets.
Another very important application of internet is Multimedia. Live music
concerts, radio broadcasts, live or recorded television shows, interactive
audio and web phone, and video conferencing are no more a dream on
Internet, even for a desktop PC user.
Internet provides a variety of information to everybody ranging from
entertainment to serious business application to areas of daily life such as:
Magazines and newspapers
Household shopping items
Ordering novelties from anywhere in the world
Website Design Unit 1
Sikkim Manipal University Page No. 10
Radio and TV broadcast schedules and sometimes the broadcast itself
Tour and travel plan guides and bookings
Health consultation
Tips for doing various things
Talking to friends and relatives in any part of the globe
Games of various kinds
Language interpreter
On-line education course material, examination conduction, advertising
on popular information sites, making payments on the net and getting an
item, Internet Banking.
Self-assessment questions
1. RFC stands for __________.
2. Internet is a network of ____________.
1.3 Concepts of Server
In this section, we are going to discuss the concepts of client server models,
mail servers and FTP servers.
1.3.1 Client Server Model
Well, some computers are more equal than others. There are more powerful
computers (not necessarily bigger) called servers. Actually they are like our
public servants, administrators to the core. These servers are connected to
other dependent (but not in all respects) computers called client, hence the
client-server model. The two are connected either through physical links
(wires, optical fibers, etc) or through microwaves using satellites or
microwave towers. When you have many computers talking and sending
and receiving an infinite number of signals traveling through these media,
there have to be traffic snarls. So there are devices to take care of these.
More detailed explanation for server and client is given below:
Server:
Many of the host computers on the Internet offer services to other
computers on the Internet. For example, your ISP probably has a host
computer that handles your incoming and outgoing mail. Computers that
provide services for other computers to use are called servers. The software
run by server computers to provide services is called server software. A
server usually runs n a computer that is connected directly to network and
Website Design Unit 1
Sikkim Manipal University Page No. 11
keeps running till any client login is expected. The size of that network is not
important to the client/server concept – it could be a small local area
network or the global Internet. The server is designed to interact with client
programs
Client:
Conversely, many of the computers on the Internet use servers to get
information. For example, when your computer dials into an Internet
account, your e-mail program downloads your incoming messages from
your ISP‟s mail server.
Programs that servers for services are called clients. Your e-mail program is
more properly called an e-mail client. A client program is designed for a
particular computing platform (for example, UNIX, Macintosh, Windows) to
take advantage of the strengths of the platform. It uses environmental
elements just like the ones used in word processing or a spreadsheet, or
even in playing a computer game.
Using the familiar computer environment, the client may help you locate
servers of interest, send a query, process the query results, and display
them using familiar tools. Popular client/server software include WinGopher,
Mosaic, World Wide Web software, Netscape Navigator and Novell Netware
file server software.
The client/server model has become one of the central ideas of network
computing. Most business applications being written today use the
client/server model.
1.3.2 Servers
Mail Servers
The mail servers handle incoming and outgoing mail. Specifically, Post
Office Protocol (POP) servers (or POP3 servers) store incoming mail, while
Simple Mail Transfer Protocol (SMTP) Servers relay outgoing mail. Mail
clients get incoming message from, and send outgoing message to a mail
server, and enable you to read, write, save and print messages, store web
pages and transmit them in response to requests from web clients, which
are usually called browsers.
Website Design Unit 1
Sikkim Manipal University Page No. 12
FTP Servers
Stores files that you can transfer to or from your computer if you have an
FTP client
News Servers
Stores Usenet newsgroup articles that you can read and send if you have a
news client or newsreader.
IRC servers
Act as a switchboard for Internet based on-line chats. To participate, you
use an IRC client.
Self-assessment questions
3. Many of the host computers on the Internet offer services to other
computers on the Internet. (true/false)
4. SMTP stands for ___________.
1.4 Getting Connected
Since Internet is a composite network of more than thousands of discrete
networks, each having its own rules and procedures, there could be many
different ways by which you can connect to the Internet. To use the Internet
you need three things:
1. A Computer
2. Client programs to run on your computer (one client for each type of
service you want to use).
3. A way to connect your computer to the Net so your clients can service
your request.
1.4.1 Different Types of Connections
To start with, we need to go over the different types of Internet connections.
There are essentially three different types of connections for accessing the
services and resources of the Internet:
Dialup Connections
ISDN, ADSL, and Leased Line Connections
Satellite Connections
Dialup Connections
To access the Internet via a phone line, the concept is: Connect your
computer to the telephone system using either a regular phone line (with a
Website Design Unit 1
Sikkim Manipal University Page No. 13
modem) or an ISDN line (which requires special equipment). To start work,
you run a communication program to dial the phone and establish a
connection with a remote Internet host. Once the connection is established,
you log in to the server by typing your user name and password. At this
point, there are three possible types of dial-up connections:
a) Shell account access
b) TCP/IP account access
c) Dial-up or on-demand TCP/IP link through your LAN
a) Conventional Dial-up Shell Account: With this type of account, you
actually do your work on the remote computer. You establish an
interactive session wit another computer which is an Internet host. Your
desktop assumes the role of an ASCII terminal. With shell access, your
provider‟s computer is considered a part of the Internet, but your
computer is not. The only program that runs on your computer is the
terminal emulator. When you connect to your provider, you type
commands to its system, which tell it what functions you want to do. The
program on your provider‟s computer that receives and acts on the
commands is known as a shell. The shell and the programs it runs for
you send back to your computer some text that is displayed on the
screen. A terminal emulator only supports a text-based interface, not a
graphical interface. You are usually limited to running one client at a
time
b) Protocol dial-up (TCP/IP Account): A protocol dialup account lets your
computer behave like it is connected directly to another computer on the
Internet – when it is really connected over a phone line whenever you
dialup and it enables you to run software, such as a graphical Web
browser like Microsoft Internet Explorer or Netscape Navigator, that
functions in your computer‟s native environment instead of forcing you to
deal with plain text programs like the text only browser Lynx and UNIX.
This means when you have a protocol dialup (TCP/IP) account, during
the time you are connected your computer is a full fledged Internet host.
The client programs you use as many clients as you want at the same
time. For example, you could start four programs – a web client, a
gopher client, a mail client, and switch back and forth from one to the
other. This type of connection is also known as TCP/IP type of account
and it uses the TCP/IP protocol to perform data transfer on the Internet.
Website Design Unit 1
Sikkim Manipal University Page No. 14
PPP and SLIP: The family of Internet protocols is called TCP/IP. The
connection protocol with ISP‟s server is known as PPP (Point to Point
Protocol), which is used in Indian context, although there are other
connection types such as SLIP or CSLIP which are available from other
Internet Service Providers in the world. But to your satisfaction you can
be sure that PPP is the most recent and advanced connection protocol.
The job of IP is to move the raw data from one place to another. Thus,
the protocol developed to support TCP/IP over a serial cable was called
SERIAL LINE IP or SLIP. SLIP dates back to the early 1980s and was
designed to be a simple, but not very powerful method of connecting two
IP devices over a serial cable. PPP is more powerful , more dependable,
more flexible, and is a lot easier to configure when you need to get it up
and running on a new system.
c) Dial-Up or On-Demand TCP/IP link through your LAN: A dial-up link
from your LAN is the intermediate step between individual dial-up and a
dedicated high speed link. It is therefore somewhat like dial-up and
somewhat like having a direct link. The main difference between this
type of connection and the one to your individual computer is that the
TCP/IP software runs on the LAN server, and your connection is to the
server. A TCP/IP connection through a LAN, either on a dial-up
connection or a direct connection, is the most common type of IP
connection, much more common than a personal dial-up IP connection.
ISDN, ADSL and Leased Line Connections: An alternative to a regular
phone line is ISDN (Integrated Services Digital Network) and Asymmetric
Digital Subscriber Line (ADSL) – a type of telephone service. ISDN and
ADSL allow the user to connect to another computer at a speed which is
much faster than even the fastest modem because it is digital. Thus, if you
are using a phone line connect your computer to the Internet, you are better
off with an ISDN or ADSL (not all phone companies offer them) connection
because it is digital and it is a lot faster. These services can be run as fast
as 128 kbps.
ISDN or ADSL services are a boon for corporates that have multiple users
who need simultaneous Internet access. However, it is still a medium that
very few Internet users have tried out in India. Primary reasons for this being
delayed implementation by MTNL (Mahanagar Telephone Nigam Ltd.) and
Website Design Unit 1
Sikkim Manipal University Page No. 15
relatively higher costs. Mantraon-line is the first private ISP to offer the
same.
A dedicated link (or leased line) is a permanent connection over a telephone
line between a modem pointer to another modem pointer. A router is a
specialized computer that reads the address of each TCP/IP packet and
sends the packet to its destination. At higher speeds (56 kbps and above),
routers are used. With a dedicated link, your personal computer or LAN is
connected to the Internet at all times (compare it with hotlines, in which you
just pick the phone and start conversation, no dialing, no engage problem,
etc.). This type of connection is the most costly connection because it is
private (nobody else can share) to a person‟s computer or organization.
Leased lines come in various speeds, including T1 (1.5 Mbps, or enough for
24 voice channels) and 13 (44 Mbps, or enough for 672 voice channels). If
you do not need quite that much speed, you can ask for fractional – T1 (half
or a quarter of a T1 line). You also need to connect your ISP for a leased
line account, which costs more than a dial-up account.
ISDN Advantages: To the subscriber, however, the most interesting
advantage perhaps is that via ISDN the entire services can be used with
one phone number only. One line is sufficient for telephone, telescopy,
video conference, or data transmission. A special protocol is responsible for
the fact that each incoming call will be directed to the right terminal. Thanks
to the Multiple Subscriber Number (MSN), it is now even possible to dial
each device by a central or PBX from the outside, without establishing a
connection prior to this.
1.4.2 Requirements for Connections
This section deals with shell account, TCP/IP account, TCP/IP software and
Web Browser.
For Shell Account
If you have a shell account type of access, what you all need is to become a
terminal on the computer of your ISP, thus the minimum possible PC
configuration with a VT-1 00 or equivalent type of terminal emulation
software can server your purpose well. In fact you may have a simple dumb
terminal to access such an account. The terminal emulation software on PC
is also widely available such as PROCOMM, etc. Please choose emulation
software which has KERMIT and ZMODEM file download capability. A
Website Design Unit 1
Sikkim Manipal University Page No. 16
modem with error correction capabilities 9.6 kbps or better, and telephone
line with capability to dial service provider Local/STD are also required.
For TCP/IP Account
It is the power of software available for TCP/IP account which has made
Internet so popular these days. It is highly desirable that you have a GUI
operating system such as Windows on your desktop, if you are a TCP/IP
account holder. Typically, you would require a TCP/IP connection
establishing software and a Web-Browser to access this type of account. A
modem which best suits your pocket and is fastest to its class is the right
choice. Typically a 28.8 kbps modem is found to perform best with Indian
ISPs.
TCP/IP Software
Such software is now bundled with new operating systems such as which
are also called TCP/IP sockets. If you do not have it along with your OS you
can have third party socket software such as Trumpet Winsock. It is
important to run this software to get connected to your ISP before you can
do the browsing part.
Web Browser
Web browsers are the Client software (your machine is a client to ISP‟s
server) which has various graphics capabilities to access the information
from the Internet. Modem Web Browsers are capable of browsing WWW,
Gopher sites, FTP sites and also provide facilities for e-mail. Initially NCSA‟s
web browser Mosaic hit the market which actually made the browsing
popular. Now web browsers from Netscape and from Microsoft are the
user‟s choice. You can get hold of any such browser and start browsing the
Net.
Self Assessment questions
5) There are three types of dial-up connection are available. (true/false)
6) An alternative to a regular phone is ISDN. (true/false)
1.5 Internet Service Providers
An Internet Service Provider (ISP) is an organization or business offering
public access to the Internet. It is your gateway, to the Net. You have to
subscribe to a provider your Internet connection. You use your computer
and modem to access the provider‟s system and the provider handles the
Website Design Unit 1
Sikkim Manipal University Page No. 17
rest of the details of connecting you to the Internet. There are many types of
Internet providers. You can, for instance, choose one of big commercial
online service providers. The primary business of an ISP is hooking people
to the Internet by giving an Internet account to subscribers, and providing
them with two different kinds of access: shell access and SLP/PPP access.
Most ISPs offer both kinds of access, some offer both with a single account
and others require that you choose one or the other. Once you register, your
provider will give you a user name (called a user id password, and a phone
number to dial). To establish the Internet connection, you have your
communications program dial the number. You then log in using your
particular user ID and password. At present it is VSNL (Videsh Sanchar
Nigam Limited) which is dominating the Internet scene in India through its
GIAS (Gateway Internet Access Service). The other service provides in
India are MTNL (Mahanagar Telephone Nigam Limited), Mantraon-line and
Satyam on-lie. Due to the new options in BSNL where the user need to
register from the telephone number and no separate account, the number of
users has increased. In this case what ever the usage of the person the
individual has to pay.
Choosing an ISP
The privatization of Internet Service Providers (ISPs) is set to give a further
fillip to the Internet boom. Central to the success of any service is the price
criterion. You will be amazed to find out how a service offered at a premium
could in effect be cheaper, considering the add-on facilities that are offered
along with the core service. Do not forget that apart from the Internet
connection, the ISP gives you an international contact address, that is, your
e-mail address. It is because of this e-mail address that you must be
discerning while choosing your ISP. The e-mail address provided by the ISP
would be all over your business and it will not be easy for you to change
your service provider if you wish to change your address. You will have to
live with the ISP as well as the e-mail address.
User ID – Telephone Ratio: The first thing you must keep in mind while
zeroing in on your ISP is the user-to-line ratio it commands. That is, how
many users are using or are expected to use one single telephone line.
Ascertaining this, however, is not easy as the numbers of subscribers are
growing every day. Nevertheless, even the current user-to-line ratio will give
you an idea about the standards the ISP has set for itself. This factor is very
Website Design Unit 1
Sikkim Manipal University Page No. 18
critical because it determines the ease of usage whether you would be able
to connect to your ISP or not. Another way of finding this is to check out with
some of the existing users as to how much time it normally takes to dial into
a given ISP. If it takes more than 10 minutes to get through, that particular
ISP should be avoided.
Interface Simplicity: Very few organizations take into account the simplicity
of the interface while opting for an ISP. This occurs to them only when they
begin to use the Internet service across their organizations. The right kind of
interface can lead to tremendous savings in cost. There are other problems
too. How many users in an organization know about dial-up networking
under Windows? How many can remember and use passwords correctly?
To how many people would you like to give the password? Does terms like
TCP/IP sound friendly to them? Questions like these determine the success
of the Internet enabled organizations. There are some ISPs to whom these
questions do not apply. They provide an easy-to-use interface that once
installed works by simply pressing a button.
Roaming Facility: The roaming facility is particularly relevant for those who
travel a lot. Though most ISPs advertise this particular facility, there are not
many who pay heed to it. Its benefits are realized only when one reaches
another city and wants to access an urgent e-mail or the Internet. How does
one connect to the Internet when one is not an ISP subscriber in that
particular city? To overcome this problem, either you will have to use a
facility like Hotmail to access your mail from round the world or use the
roaming facility provided by your ISP. The roaming facility allows you to dial-
in into the local node of your ISP or of the regional ISP that your service
provider has a tie-up with. Then all you have to do is to plug in your
computer to a telephone line, find out the numbers for dial-up access, and
then using your password, access your original Internet account. A crucial
point here is the number of cities that your ISP has presence in or has tie-
ups for the same.
Multiple Login Facility: Very few users know about this facility, mainly
because it is hardly advertised. However, it can prove to be a life-saver and
a great help for small and medium business houses. If n organization has
only one Internet connection, but more than one employee wants to access
the net simultaneously then this would be possible only if the ISP offers to
the organization the multiple login facility. In fact, this facility can even be
Website Design Unit 1
Sikkim Manipal University Page No. 19
availed of while being away from the Organization. For instance, one user
may be in New Delhi and the other user in Mumbai. But, with the e-mail it
would be possible for the man away in Mumbai to simultaneously access
the Internet. Some ISPs offer multiple e-mail IDs that allow you to segregate
e-mail individually. But you have to pay extra for this.
Special Packages: The private ISPs are putting out some unique usage
packages. It has launched a special package for night users. For those who
access the Net at night, some ISPs offer a dial-up account which costs
almost half compared to the regular connection. This account cannot be
used during day time. This is only the beginning as far as special packages
are concerned. Soon you will find ISPs (especially the regional ones)
coming out with packages that will fit your needs better than your cotton
trousers. So do not forget to check out each and every player before
deciding on your Internet provider.
Support: This is very crucial topic and an area of service where most of the
players have been found wanting. Try getting any help from the service
provider and the beautifully programmed EPABX system will take you
around each and every option, only to disconnect your call at the end saying
“Sorry, the person handling your call is busy at the moment”. In case, you
happen to be using pulse-dialing equipment, you can forget using the
telephone, and may as well go to their office and clear out the matter there
and then.
Ideally, new users should subscribe to an ISP where they can be hand-held
through the initial process, as Bill Gate‟s Windows operating system does
try its best to support you in the exercise. An installation guide, the help
desk‟s phone number, Windows 95 installation CD are part of the necessary
survival kit that a new user must have while undergoing this procedure.
Discounts on Renewal: Last but no the least, you must find out whether
your ISP will renew your account at the same rate or whether there are any
discounts to retain its old customers? This is a factor that can upset those
lining for their first-buy. VSNL has been very successful in playing this card.
It offers slashed rates to those subscribers ho renew their accounts.
Brochure-speak: If you can have more than a hundred different versions of
the holy Ramayana, just think what the crafty marketing people can do to
simple terms of the Internet. Hence, one must see through the exotic looking
Website Design Unit 1
Sikkim Manipal University Page No. 20
tariff cards of most ISPs. You must have the ability to judge beyond the
gloss and the glitter. To summarize, here is what you want from an Internet
Service Provider:
Access via a local phone call
A flat monthly fee
An ISDN or fast (28.8 kbps) connection
A PPP account
A shell account at no extra charge
The ability to use whichever Internet clients you want
Full Internet access to all resources
The capability of having your own web home page
Software support, through which you can use to connect to and use the
Internet
Technical support should be open 24 hours a day, 7 days a week
Self Aassessment Questions
7. ___________ are the examples of ISP.
8. The private ISPs are putting out some unique usage packages.
(true/false)
1.6 Address in Internet
Understanding of Internet also requires you to know a little about how the
systems connected on the network are named and identified. With these
names only you locate a computer and get connected to it. Every computer
that is on the Internet has its own unique address. On the Internet, the word
ADDRESS always refers to an electronic address. There are two kinds of
addresses in the Internet:
Domain names
IP Addresses
1.6.1 The Domain Name System and DNS Servers
On a TCP/IP network, computers know each other by their IP addresses.
But for human beings, remembering numbers is not the easiest thing to do.
Remembering names is much easier. Similarly, a way was devised to
associate IP addresses with names that can be easily remembered. In the
early days of the Internet, “hosts” files were used to associate machines with
names. The hosts file is simply a table of IP addresses and corresponding
names like a phone directory. Any name lookup (the process of identifying
Website Design Unit 1
Sikkim Manipal University Page No. 21
the IP address associated with a name) will first check the hosts file (if
present) on the machine making the query, to see whether the name can be
resolved.
Within the Internet, each separate computer is called a host. For example,
you might tell someone he can find the information he wants by connecting
to a host in Switzerland. If your computer is connected to the Internet, then it
too is a host, even though you may not be sharing any resources with the
rest of the world. If you connect to an log into a host and then use its rest of
the world. If you connect to and log into a host and then use its functions to
reach out onto the Internet, you are using your computer as a terminal to
reach another computer. Host connections are designed to use very simple
text based interactions.
Being connected to the Internet means your computer system or network is
actual a node on the Internet. It has an individually assigned Internet
address and client program to in running on the computer system that can
take full advantage of the computer‟s capabilities. Your workstation is a peer
of every other computer on the Net. So, a node is any “addressable device”
attached to a computer network.
But with the number of hosts on the Internet increasing rapidly to an
unmanageable level, that soon became impossible. The way out was the
DNS: the Domain Name Server. The DNS is a distributed, scalable
database of IP addresses and their associated names. It is distributed in the
sense that unlike the hosts file, no single computer contains all the DNS
information in the world. The DNS data is distributed across many name
servers. It is scalable – you can increase the volume of total DNS data and
requests from machines for the same data, without significantly increasing
the querying time. Otherwise the World Wide Web would really become the
World Wide Wait.
To understand the DNS and the way it is used, we need to understand the
Internet naming structure. Let us take, for example, the address:
http://www.trg.hclsso.hclinfosystems.com/
www: Indicates that the machine is part of the world
com: Indicates the top-level domain (TLD) that the machine is part of. Top
Level Domain include .com, .edu, .gov, .in etc
Website Design Unit 1
Sikkim Manipal University Page No. 22
hclinfosystems: Shows that the computer we are looking for is in a network
called hclinfosystems
hclsso: Indicates a sub-network (a group of computers with a common
function or at a common location).
trg: Is the name of the machine that we are interested in.
Let us see how the DNS aids in identifying the machine‟s IP address, given
its name. at the top level of DNS structure are the nine root name servers of
the world, which contain pointers to the master name servers of each of the
top-level domains. To find the IP address of http://www.trg.hclsso.hclinfo
systems.com/ the DNS server will have to ask one of the root name servers
for he address of the master name server for the .com domain. This master
name server will have the addresses of the name servers for all the .com
domains. From here you get the address of the name server, for the
hclinfosystems.com/ domain. You move on to this name server, which will
give you the IP address of the machine trg.hclsso.com. If there is a name
server for the trg.hclsso.com sub-domain, then the name server for
hclinfosystems will guide you on to this name server, which will give you the
IP address of trg.
A domain name is a way by which a company can uniquely identify itself on
the Internet. Registering a domain name on the Internet is the equivalent of
registering a company name at Companies House. Based on the top level
identifications, there are basically two types of domains:
1. Non-geographic domains
2. Geographic domains
Non Geographic Domains
The top level Internet domain types those are non-geographical:
Domain Indicates Example
Com Commercial Organizations hclinfosystems.com
Edu Educational Institutions Stanford.edu
Mil A (US) military setup Nic.mil
Gov A (US) government setup Nasa.gov
Org Other organizations www.bjp.org
Net Other networks Ns.stph.net
Int An international organization Tpc.int
Website Design Unit 1
Sikkim Manipal University Page No. 23
Geographic Domains
The geographically based top-level domains use two-letter country
designations.
Domain Meaning
Au Australia
Ca Canada
Dk Denmark
Fr France
Gr Greece
In India
Jp Japan
Us United States
In a complete (fully qualified) domain name, the part furthest to the right is
the top level domain, representing either a type of organization or a country.
As you read in from the right, the name gets more specific until you reach
the name of the individual host computer. For instance: rubens.anu.edu.au
is the name of a computer. It is in Australia (au), in the educational area
(edu), at the Australian National University (anu) and the host computer is
named rubens.
1.6.2 IP Addresses
Each host computer on the Internet has a unique number, called its IP
address. IP addresses identify the host computers, so that packets of
information reach the correct computer. You may have to type IP addresses
when you configure your computer for connection to the Internet. An IP
address is a 32-bit number that uniquely identifies a network interface. The
IP address is assigned to a network interface card and not a computer. So if
you have two Network Interface Cards, then each card is assigned an IP
address. The 32 bit IP addresses are normally expressed in dotted-decimal
format, with four numbers separated by periods, such as 151.202.123.132
These numbers can be the ranges of 0 to 255. The four constituent numbers
together represent the network that the computer is on and the computer
interface itself. IP addresses are organized from left to right, with the left-
hand octet describing the largest network organization and the rightmost
octet describing the actual network connection. Each octet has value of 8
bits within the computer. When the four octets of the address are added
Website Design Unit 1
Sikkim Manipal University Page No. 24
together, the total address has a value of 32 bits. Using the various
combinations of these octets, several million unique identifiers can be
assigned.
Classes of Networks
Just as with our phone numbers, we can look at the leftmost octet and
determine something about the network. Network addresses are divided into
classes, which are assigned depending on the size of the physical network.
The value of the first octet tells us what class the network is in, and how
large the physical network that underlines the number is. The first octet is
sometimes called the network address or net number.
Class A: Over 16 million served
These are very big networks with up to 224 (16 million) nodes. Class A
networks have their network addresses from 1.0.0.0 to 126.0.0.0. The zeros
are replaced with the node addresses. NEARNET, Sprint, ANSnet, Merit
and AT&T are examples of organizations with class A network numbers.
Class B: Larger nets
Class B networks are smaller than Class A networks. They can have up to a
maximum of 65000 nodes. Network addresses range from 128.0.0.0 to
191.0.0.0. In this case only the last two zeros are replaced with the node
addresses. Class B addresses go to organizations with larger nets, such as
universities or large businesses. The first two octets in a Class B address
describe the network itself, and the second two identify the host.
Class C: Addresses
Class C networks are smaller than Class B networks. They can have up to
254 nodes. Network addresses range from 192.0.0.0 to 223.0.0.0. In this
case only the last zero are replaced with the node addresses. The first three
octets are used for the network numbers and the last octet is the host
number. This class is where most networks will be assigned. Originally,
Class C addresses were intended for small company networks, K-12
schools and single machines that were not connected to other, larger nets.
Other Classes
There are other classes of networks, Class D and Class E. They are
primarily used for experimental purposes. For a given network address, the
last node address is the broadcast address. For example, for Class C
Website Design Unit 1
Sikkim Manipal University Page No. 25
network with address 193.168.1.0, the address 193.168.1.255 is the
broadcast address. The IP addresses for networks on the Internet are
allocated by the InterNIC, the official body in charge of allocating domain
names and addresses.
Subnet Masks
In an IP network, every machine on the same physical network sees all the
data packets sent out on the network. As the number of computers grows,
the increase in network traffic brings down the performance. In such a
situation it is recommended to divide your network into sub-networks and
minimize the traffic across different sub-networks. Interconnectivity between
the different subnets would be provided by routers, which will only transmit
data meant for another subnet across itself. To divide the given network into
two or more subnets you use subnet masks. The default subnet mask for
Class A networks is 255.0.0.0; for Class B is 255.255.0.0; for Class C is
255.255.255.0 which signifies a network without subnets. The subnet mask
is used to identify the subnet to which an IP address belongs, by performing
a bit-wise AND operation on the mask and the IP address.
Self-assessment questions
9. Class A networks have their network addresses from __________
to _______.
10. ______________are examples of organizations with class A network
numbers.
1.7 Resource Addressing
Using the Web means having your browser act as a client program on your
behalf. In order to fulfill your requests, your browser will contact a server,
and ask for either some information or a service of some type.
1.7.1 URL (Uniform Resource Locator)
URLs provide a standard way to specify the exact location and name of just
about any Internet resource. In general, most URLs have one of two
common formats:
Scheme: //hostname/description
Scheme: description
Example 1: http://www.alan.com/afan
Website Design Unit 1
Sikkim Manipal University Page No. 26
This example describes a particular web page on a particular computer. The
URL begins with a name, indicating a specific type of resource.
Example 2: news.rec.human
This example describes a more general resource. The scheme is news,
which indicates a Usenet discussion group.
1.7.2 URLs and HOST Names
On the Internet, a hostname is a domain name assigned to a host computer.
This is usually a combination of the host's local name with its parent
domain's name. For example, "en.wikipedia.org" consists of a local
hostname ("en") and the domain name "wikipedia.org". This kind of
hostname is translated into an IP address via the local hosts file, or the
Domain Name System (DNS) resolver. It is possible for a single host
computer to have several hostnames; but generally the operating system of
the host prefers to have one hostname that the host uses for itself.
List of schemes used within URLs
Scheme Meaning
ftp File accessed via file transfer protocol
gopher Gopher resource
http Hypertext resource
mailto Mail
news Usenet newsgroup
telnet Interactive telnet session
wais Access a Wais database
1.7.3 URLs and Port Numbers
Each type of Internet service has its own specific port number. Within a URL
you only have to specify a port number if it is not the default for that type of
service. For example, the default port number for telnet is 23. The following
two URLs are equivalent:
telnet://locis.loc.gov/
telnet://locis.loc.gov:23/
The http service, by default, uses port 80. Similarly, the gopher service uses
port 70. For instance, the following two URLs are equivalent. They both
Website Design Unit 1
Sikkim Manipal University Page No. 27
point to the same hypertext resources, using port 80, on the computer
named www.wendy.com:
http://www.wendy.com/~wendy
http://www.wendy.com:80/~wendy
1.7.4 Pathnames
Here is a typical hypertext URL:
http://www.cathouse.org/cathouse/humor/tech/data
We can divide such URLs into three parts, the scheme, the host name and
the pathname. To analyze such a URL look at each of the parts:
The scheme (http) identifies this resource as being hypertext
The hostname (www.cathouse.org/) is the name of the computer
The pathname (cathouse/humor/tech/data) shows where on the host the
hypertext resource is stored
Self-assessment questions
11. URL stands for ______________.
12. ______________ is a domain name assigned to a host computer.
1.8 Email
This section covers the email concepts, definition, e-mail services and e-
mail networks.
1.8.1 Email Basics
The Internet is a valuable tool for accessing information, but it also opens a
whole new world of communications to its users. Using electronic mail
(email) a person can engage in conversations with people all over the world.
Yet, because of its convenience, it is also a powerful tool for even local
communication. With typical telephone communications you may be either
interrupted by a call, or may return a call only to find that the other person is
not available, an occurrence referred to as "telephone tag." Electronic mail
though, sits on the server computer until you are ready to read it, and when
you respond it will then wait patiently on the other person's computer until
they have time to read it. This is especially valuable for busy teachers, who
because of their duties and general working isolation in a classroom with
just their students, usually aren't able to communicate with peers on as
regular a basis as they would like.
Website Design Unit 1
Sikkim Manipal University Page No. 28
Meaning and definition of email
Electronic mail could be defined as the transmission of letters and memos
from one computer to another. When E-mail originated in the 1970s, it was
just the sending of messages. The capability to send various items has
rapidly become true of E-mail: users now can attach spreadsheets, business
forms, lengthy documents, scanned images, faxed images, computer
graphics, meeting schedules, sound and video to their messages.
Electronic mail or Email lets you communicate with other people on the
Internet. Email is one of the basic Internet services, and by far, the most
popular. It is used for conversation purpose, to keep in touch with friends,
get information, start relationships or express your opinion. This is called
Email because:
a) You put it into an electronic envelope and address it
b) You post it or hand the message to someone else (i.e. the network) to
be delivered
c) You may not know when the Email is read
d) You get Email back in your mailbox, if you addressed it incorrectly
e) If the recipient leaves a forwarding address, the Email system will keep
trying to route it to him/her until it runs out of forwarding locations
f) If the network is unable to deliver your Email, it will return the mail (this is
called bounced mail).
Email Services
In practice, Email usually refers to a service that includes the following
facilities:
Store and Forward: Messages are held until they are requested by the
recipient. Direct person-to-person contact is not required and the service
can be used by either party at whatever time and on whatever day that
suits them
Blind copies: Copies can be sent automatically to names on a
distribution list, including „blind‟ copies (where the principal recipient is
not notified that others have received the message).
Advise delivery: The sender an be told (by a confirming message to his
or her mailbox) when the recipient has read the message. An immediate
reply could also be demanded.
Website Design Unit 1
Sikkim Manipal University Page No. 29
Off-line working: Text can be prepared in advance of transmission and
incoming messages can be saved for later consideration or for use
within word-processed documents.
Email Networks
Email networks consist of Gateway and Closed user groups.
Gateways: Most electronic mail services include access to other facilities.
They include the telex system, on-line information services and electronic
typesetting bureaux which accepts Emailed text and return phototypeset
matters.
Closed user groups: These are areas of the Email service with restricted
access. In some cases they are available to anyone who pays an additional
fee; usually they will include extra gateways and more services. Other
closed user groups (CUGs) will be specific to members of a particular
profession – Telecom hold hosts cues for solicitors and accountants, for
instance; and there are also cues for customers of individual companies
(handy for disseminating and sharing information or making requests) and
user groups for particular computer products.
In addition to these basic functions of electronic delivery systems, most
systems provide features related to other aspects of office work. These
features include:
Composing messages
Text editing
Message filing and retrieval
Authentication of message authorship
Broadcasting and distribution of messages as per specified addresses
Content processing of messages
Message switching
Accounting and billing
Security
Many Email services offer some or all of these:
Radiopaging: Your pager will beep when an urgent message is
received in your mailbox. Or you can beep someone by sending a
message to the service‟s radiopaging mailbox.
Website Design Unit 1
Sikkim Manipal University Page No. 30
Telemessages: This is a replacement of the old style telegram, can be
sent from some Email services rather than by you calling the Post Office
yourself. Delivery on the next working day (including Saturdays, usually)
is guaranteed for messages received by a set time (which can be as late
as 10 p.m.). The Telemessage service can include „special occasion‟
formats for birthday, anniversaries and the like; the delivery can include
a special reply envelope to encourage an immediate reply.
Message translation: Messages sent or received can be translated by
the Email service into the recipient‟s native tongue.
Courier services: A message placed by you on the Email service can be
copied ad delivered by hand or mailed.
The basic functions involved in an Email system are the message creation,
message transfer and post delivery processing. These are provided by the
User Agent (UA) and a Message Transfer Agent (MTA). Thus, an Email
system is actually a message handling system. The user agent is
responsible for providing the text editing and proper presentation services to
the end user. It provides for other activities such as user friendly interaction,
security, priority provision, delivery notification and distribution subsets. The
message transfer agent is oriented towards the actual routing of the
electronic move. It is responsible primarily for the store-and-forward path,
channel security and the actual routing through the communication media.
Several MTAs taken together form the Message Transfer System (MTS).
1.8.2 Mail protocols
Email is instantaneous, cost effective and above all, personal. It produces
the immediate results in terms of increased productivity from reduced
turnaround time, and reduced costs. Email is one of the easiest services to
implement on your Internet. The ideal Mail System consists of Email servers
and clients that support standards. A clear understanding of popular
acronyms of Email will help the users in choosing the right Mail Systems.
SMTP
The transmission of Email message through the Internet relies on the
SMTP, which stands for Simple Mail Transfer Protocol. SMTP is part of the
TCP/IP family of protocols. The SMTP protocol is used to transport
messages between computer systems in the Internet. SMTP uses TCP,
Transmission Control Protocol, which provides a reliable means of
Website Design Unit 1
Sikkim Manipal University Page No. 31
communication. Throughout the Internet, there are millions of computers
using SMTP to send and receive mails.
Many of the host computers on the Internet run UNIX. Therefore, of
hundreds of thousands of transport agents scattered around the Net, are
running under UNIX. Specifically, most of these computers use a transport
agent called send mail, which runs automatically in the background and is
always ready to respond to whatever requests it may receive. In UNIX, such
a program is called a DAEMON and every UNIX system has various
daemons to provide fast services for you.
Internet mail system works only because everybody‟s network has at least
one computer running a transport agent, sending and receiving mail
according to the SMTP protocol. SMTP is fast and efficient. Nevertheless,
its drawback is both nodes should be on-line for communicating between
them. That is where POP comes in. SMTP governs the way; a UA (User
Agent) establishes a connection with a MTA (Message Transfer Agent) and
it transmits its Email message. MTAs also use SMTP to relay the Email from
MTA to MTA, until it reaches the appropriate MTA for delivery to the
receiving UA. The interactions that happen between two nodes on the
TCP/IP, whether a UA to an MTA or an MTA to another MTA, have similar
processes and follow a basic „call-and-response procedure‟.
POP
Post Office Protocol (POP) is a mail collection and distribution system,
which works on the office principle with the mail server. It is designed to
allow single-user hosts to read mail from a server. POP allows creating a
mailbox for each user who has a mail account on the server.
There are three versions of POP: POP, POP2 and POP3. POP is a system
by which a mail server on the Internet lets us to grab our mails and
download them to our PCs. Like SMTP POP also uses plain ASCII and
independent platform and the Operating System. POP depends on SMTP to
send mails and it handles the access to the messages. POP3 is the latest
version of this protocol.
IMAP
Internet Mail Access Protocol (IMAP), unlike POP, allows hierarchical
storage of mail and a message retrieval system that allows selective access
Website Design Unit 1
Sikkim Manipal University Page No. 32
to your mailbox. While POP is used; for simply retrieving and deleting the
messages, using IMAP, we can organize our mails and read them on the
server itself. For a user getting connected over a slow dial up lines, IMAP
provides ways to download only the Header or the Body of the message that
contains a large attachment. In addition, IMAP allows one user to access
multiple mail serves and multiple users to share a single mailbox. IMAP can
work on any of the three basic models of communication, On-line, Off-line or
Disconnected Operation. In the On-line mode, the mail is processed in an
interactive fashion, that is, the client can ask the server for only the
message headers and then request only specified messages, or can even
retrieve parts of certain messages.
MIME and S/MIME
SMTP can handle only messages containing the 7 bit ASCII text and it
cannot handle other types of data such as 8-bit binary data and other
multimedia formats that nowadays we are sending both within the body of
Email messages and as attachments. However, as a solution to this
limitation, the ETF developed the Multipurpose Internet Mail Extensions
(MIME) protocol, which packs multimedia data into a format that SMTP can
handle. Stands for Secure/Multipurpose Internet Mail Extensions and was
designed to add security to Email messages in MIME format. The security
services offered are authentication (using digital signatures) and privacy
(using encryption). S/MIME is not specific to the Internet and can be used in
any electronic mail environment.
UUCP
As an Internet user, you may want to exchange mails with different types of
networks and then you should know what type of addresses they use. Same
popular networks to send mails are CompuServe, MCI Mail, America Online,
UNIX-based UUCP network, and so on.
All UNIX systems come with a built in networking system called UUCP.
Although the job of UUCP is to connect UNIX computers, it is not as
powerful as TCP/IP. For example, UUCP does not provide a remote login
facility; mail facility is slower and awkward than the TCP/IP based Internet
system. However, UUCP does have an important advantage. It is a
standard part of UNIX and it runs cheaply and reliably over dial-up or
hardwired connections.
Website Design Unit 1
Sikkim Manipal University Page No. 33
UUCP works by allowing UNIX systems to connect together to form a chain.
To understand this let us consider all connections in Internet are permanent
and messages are transmitted quickly, often within seconds. Therefore,
there is no comparison between Internet and UUCP connections. To send
mail to UUCP address, you must specify the route you want the message to
take. For the above example the mail command will be: Mail second! Third!
Fourth! Pant
After creating such a message, your system will start this message until a
contact is established with the computer within seconds and then the
message will be sent on its way. If the path is too long or you have no idea
about what path to use from your computer to send the mail,
UUCPMAPPING PROJECT is undertaken. This allows you to use a UUCP
address that is similar to an Internet address. Thus, on occasions, you may
see an address that uses a top-level domain of UUCP. Look at the following
example: [email protected]
Say you are using a computer named first. Your computer is connected to
another computer named second. This computer is connected to third,
which is, in turn connected to fourth. You decide to send the message from
your computer, first to a person having user id pant to fourth. UUCP will
pass the message from first to second to third to fourth, where it will be
delivered to user id pant.
Therefore, in our example, four computers and three connections are
involved. The system works well as it provides an economical way to send
mail from computer to computer over large distances. However, the
limitation is since many UUCP connections are made over a telephone line
at certain predefined time, mail delivery can take hours or even several
days.
1.8.3 How to Access the Mail System
As we explained in the previous section, SMTP is used to send and receive
mail behind the scenes. The question now arises, how the mail gets from
the transfer agent to you. The computer that provides the Internet
connection also acts as the mail host. Typically, this computer runs a
transport agent program which is connected to the Internet 24 hours a day.
This means, whenever your mail arrives, the transport agent available
accepts it and saves it in a file called a MAILBOX. Each person who has an
Website Design Unit 1
Sikkim Manipal University Page No. 34
account on the host computer is given his own mailbox file. In this way the
host computer always keep everyone‟s mail in an organized manner and at
the same time it assures you that no one can read your messages.
Ways of Accessing Email
There are many ways to access your Email. You may use a mail client, such
as Eudora, Outlook or anyone of the popular packages that download your
incoming messages from the POP server to your computer and upload your
outgoing messages to the SMTP server. This may occur through a Local
Area Network (LAN) or through a dial-up connection.
You may use a Web based Email service
You may use a commercial provider, such as CompuServe or America
On-line which have their own Email programs
You may get your Email through a LAN, a common system at large
organization, if your organization has some sort of Internet connection,
Email arrives in the company‟s POP server. You then read your Email
either on the server using an Email application or on your own computer,
by downloading your Email from the server through the LAN by using an
Email application. Your company may use a POP server or some kind of
proprietary protocol.
You may have a UNIX shell account and use a UNIX Email program that
reads your POP mailbox directly.
How does Email Work?
Let us review how Email works, using an example. In this example, you are
using a PC with Windows OS, which connects to the Internet using TCP/IP.
Let us suppose you want to send a mail to two of your friends: Surya in
Washington and Rishaba in Germany. Surya uses a Macintosh and also
connects to the Net using PPP. Rishaba uses a shell account by connecting
UNIX host computer. The following steps illustrate the example.
1. First using a Windows mail client, you compose the message on your
own computer.
2. After you compose the message, address it to both Surya and
Rishaba.
3. Once the message is finished, you tell your program to send it on its
way.
Website Design Unit 1
Sikkim Manipal University Page No. 35
4. Now your client program contacts the mail server on your Internet host
and using the POP protocol, sends your message to the server.
5. In the next step, the server passes your message to the transport
agent.
6. Now, it is the job of transport agent to look at the addresses in your
message and connect to the appropriate computers over the Net.
7. First, the transport agent connects it on the host computer in
Washington that receives mail for Surya.
8. Once the connection is made, the two transport agents use the SMTP
to relay the message.
9. After the message is sent, your transport agent terminates the
connection and forms a new connection with the transport agent on
the appropriate computer in Germany.
10. Again, the two transport agents use SMTP to relay the message.
11. Once the message is sent, your transport agent terminates the
connection, its job is finished.
12. In Washington, Surya turns on his computer to check the mail. He tells
his Macintosh mail client to see if any new mail has arrived. Now it is
the turn of his mail client to connect to the mail server on Surya‟s host
computer and using the POP protocol, asks the server to check
Surya‟s mailbox. Since server finds your message, so using POP, it
sends the message to the client and places the message in his local
mailbox (a file on the Mac) and tells him that new mail has arrived.
Now, with the help of mail program, Surya displays the message.
13. Similarly, in Germany, Rishaba has logged into his shell account on a
UNIX host. He runs his UNIX mail program which checks his mailbox
and tells him new mail has arrived. Using appropriate command,
Rishaba tells the mail program to show him your message.
The important thing in this example is that, even though they use different
computers and different programs, the mail moves smoothly and quickly,
just because of the Internet and SMTP.
Understanding the Internet Email Addresses
In this section, we will talk a little more about how to specify addresses
when you send mail as you have now become aware that whenever we talk
Website Design Unit 1
Sikkim Manipal University Page No. 36
about the word “mail”, it always means electronic ail and the word address
always refers to an Internet address. Thus, if someone on the Net asks
“What is your address?” tell him or her, your electronic address.
An Email address defines the location of an individual‟s mailbox on the
Internet. An address consists of two parts: username and domain name,
separated by the @ symbol. Here is an example:
Username in the preceding example is Leenu. Usernames are usually pretty
straightforward; often, companies give employees‟ usernames that use one
initial and one full name. However, usernames can also contain characters
other than letters – they can contain numbers, underscores, periods and
some other special characters. They cannot contain commas, spaces or
parenthesis.
The host name provides the Internet location of the mailbox, usually the
name of a computer owned by a company or Internet service which has
been discussed in Unit 2. If the recipient is within your local network, you
can often leave out part of the address. For example, say your address is
[email protected] and you are mailing to a friend whose computer is on
the same network. Your friend‟s address is [email protected]. So you
can leave off the part of the address you both have in common. That is, in
this case, you use sachin@more. The mail program easily recognizes it as a
local address and delivers the message properly. If you have a problem, you
may have to use the full address. it is also possible to leave out the
computer name entirely and just use the user ID if the person you are
sending mail to is [email protected] and you want to send mail to
[email protected], you can use: rishaba. When you do not know
someone‟s Email address, and you have an ides of his login name and the
name of the Internet site he uses you should be able to send Email to the
postmaster at any Internet site. That is the address to use if you have
questions about an Email to or from a specific host or site, or general
questions about a site. However, you may not get a quick response, since
the person designated as “postmaster” usually has lots of other duties. For
example, you have trouble finding out the address of someone who uses a
computer named great.vsnl.in; you can send a message asking for the
person‟s mail address to: [email protected].
Website Design Unit 1
Sikkim Manipal University Page No. 37
Self-assessment questions
13. Three versions of POP are ____________.
14. _____________ allows hierarchical storage of mail and a message
retrieval system that allows selective access to your mailbox.
1.9 Summary
The Internet links are computer networks all over the world so that users
can share resources and communicate with each other.
Internet is a vast collection of globally available information which can be
accessed electronically – information which is of practical use for
business, research, study and technical purposes.
No one owns the Internet. Any single person, corporation, university or
government does not fund it. Internet has been described as the
cooperative anarchy.
Protocols are the rules that all networks use to understand each other.
For example, there is a protocol describing exactly what format should
be used for sending mail message.
The internet is based on a large number of protocols and conventions.
Each such protocol is explained in the technical publication called a
request for comment or RFC.
The primary objective of any network is to exchange information
between different locations.
Many of the host computers on the Internet offer services to other
computers on the Internet.
Conversely, many of the computers on the Internet use servers to get
information.
The mail servers handle incoming and outgoing mail.
A protocol dialup account lets your computer behave like it is connected
directly to another computer on the Internet.
The family of Internet protocols is called TCP/IP.
A dedicated link (or leased line) is a permanent connection over a
telephone line between a modem pointer to another modem pointer.
Each host computer on the Internet has a unique number, called its IP
address.
Website Design Unit 1
Sikkim Manipal University Page No. 38
On the Internet, a hostname is a domain name assigned to a host
computer.
Internet Mail Access Protocol (IMAP), unlike POP, allows hierarchical
storage of mail and a message retrieval system that allows selective
access to your mailbox.
1.10 Terminal Questions
1. Briefly explain the Internet from practical and technical angle.
2. What are the requirements for internet connections?
3. Explain the Domain Name System and DNS servers.
4. Briefly explain the various classes of networks.
5. Explain the various mail protocols used.
1.11 Answers
Self Assessment questions:
1. Request for comment
2. Networks
3. True
4. Simple mail transfer protocol
5. True
6. True
7. Airtel, Bsnl, Tataindicom
8. True
9. 1.0.0.0 to 126.0.0.0.
10. NEARNET, Sprint, ANSnet, Merit and AT&T
11. Uniform Resource Locator
12. Hostname
13. POP, POP2 and POP3
14. POP
Website Design Unit 1
Sikkim Manipal University Page No. 39
Terminal Questions
1. Internet is a vast collection of globally available information which can be
accessed electronically – information which is of practical use for
business, research, study and technical purposes. (Refer Section 1.2.2)
2. There are essentially three different types of connections for accessing
the services and resources of the Internet. (Refer Section 1.4.1)
3. On a TCP/IP network, computers know each other by their IP
addresses. (Refer Section 1.6.1)
4. Network addresses are divided into classes, which are assigned
depending on the size of the physical network. (Refer Section 1.6.2)
5. It produces the immediate results in terms of increased productivity from
reduced turnaround time, and reduced costs. (Refer Section 1.8.2)
Website Design Unit 2
Sikkim Manipal University Page No. 40
Unit 2 Website Development with HTML – I
Structure:
2.1 Introduction
Objectives
2.2 HTML Fundamentals 1
Architecture of Web Page Contents
Browser Specific Tags
Structure Tags
Physical Tags
Logical Tags
HTML Tags
Tools for HTML Validation
2.3 Using Graphics
Tools for creating and manipulating Web Graphics
Image Tags and Attributes
Sources for web site graphics
Introduction to Client-Side Image Maps
Tools for creating image maps
GIF, JPEG, and PNG Formats
Transparent Graphics
Transparency and Interlacing of Graphics
Creating Animated Graphics
Interactive Graphics
2.4 Constructing Forms
2.5 Marketing Your Site
Characteristics of Search Engines
Registering with Search Engines and Directories
The <meta> Tags and Attributes keywords, description and robots
Creating Effective <title> tags
Designing Your Site for Effective Search Engine Optimization
2.6 Summary
2.7 Terminal Questions
2.8 Answers
Website Design Unit 2
Sikkim Manipal University Page No. 41
2.1 Introduction
In the previous unit, we have studied the concepts of Internet, Servers,
Internet application, client server model, Internet connection, URL and email
system. On that basis we are going to continue with some advance
concepts.
In this unit we shall study about the various HTML tags and to create a web
page using these tags. We shall also study to design a form using HTML.
Objectives
After studying this unit, you should be able to:
explain the architecture of the web page contents & various tags used in
HTML
describe how to use graphics in HTML
explain how to construct form
discuss about marketing your site
2.2 HTML Fundamentals
This section deals with structure of webpage, HTML language, URI and
HTTP concepts.
2.2.1 Architecture of Web Page Contents
The basic web architecture is two-tiered and characterized by a web client
that displays information content and a web server that transfers information
to the client. This architecture depends on three key standards: HTML for
encoding document content, URLs for naming remote information objects in
a global namespace, and HTTP for staging the transfer.
HyperText Markup Language (HTML)
The common representation language for hypertext documents on the Web.
HTML had a first public release as HTML 0.0 in 1990, was Internet draft
HTML 1.0 in 1993, and HTML 2.0 in 1994. The September 22 1995 draft of
the HTML 2.0 specification has been approved as a standard by the IETF
Application Area HTML Working Group. HTML 3.0 and Netscape HTML are
competing next generations of HTML 2.0. Proposed features in HTML 3.0
include: forms, style sheets, mathematical markup, and text flow around
figures. HTML is an application of the Standard Generalized Markup
Language (SGML ISO-8879), an international standard approved in 1986,
Website Design Unit 2
Sikkim Manipal University Page No. 42
which specifies a formal meta-language for defining document markup
systems.
An SGML Document Type Definition (DTD) specifies valid tag names and
element attributes. HTML consists of embedded content separated by
hierarchical case sensitive start and end tag names which may contain
embedded element attributes in the start tag. These attributes may be
required, optional, or empty. In addition, documents can be inter or intra
linked by establishing source and target anchor points. Many HTML
documents are the result of manual authoring or word processing HTML
converters, but now several WYSIWYG editors support HTML styles. HTML
files are viewed using a WWW client browser (software), the primary user
interface to the Web. HTML allows for embedding of images, sounds, video
streams, form fields and simple text formatting.
Universal Resource Identifier (URI)
An IETF addressing protocol for objects in the WWW ("if it's out there, we
can point at it"). There are two types of URIs, Universal Resource Names
(URN) and the Universal Resource Locators (URL). URLs are location
dependent and contain four distinct parts: the protocol type, the machine
name, the directory path and the file name. There are several kinds of
URLs: file URLs, FTP URLs, Gopher URLs, News URLs, and HTTP URLs.
URLs may be relative to a directory or offsets into a document.
HyperText Transfer Protocol (HTTP)
An application-level network protocol for the WWW. Tim Berners-Lee, father
of the Web, describes it as a "generic stateless object-oriented protocol."
Stateless means neither the client nor the server store information about the
state of the other side of an ongoing connection. Statelessness is a
scalability property but is not necessarily efficient since HTTP sets up a new
connection for each request, which is not desirable for situations requiring
sessions or transactions. In HTTP, commands (request methods) can be
associated with particular types of network objects (files, documents,
network services). Commands are provided for
Establishing a TCP/IP connection to a WWW server,
Sending a request to the server (containing a method to be applied to a
specific network object identified by the object's identifier, and the HTTP
protocol version, followed by information encoded in a header style)
Website Design Unit 2
Sikkim Manipal University Page No. 43
Returning a response from the server to the client (consisting of three
parts: a status line, a response header, and response data), and
Closing the connection.
2.2.2 Browser Specific Tags
Netscape only tags – There are six tags that are only visible with
Netscape. Keep in mind that many of these tags (such as <layer></layer>)
are not a part of the XHTML specification.
<blink></blink>
The text within the <blink></blink> tag will turn on and off (blink). This can
make text fairly difficult to read. This tag works in both Netscape and
Mozilla.
<keygen></keygen>
This tag was meant to generate a public key to encrypt HTML forms and
make them secure. It works in Netscape and Opera.
<layer></layer>
The layer tag allows you to place sections of your Web page on different
"layers" and treat them as separate objects within your page. Use of this tag
is discouraged in favor of CSS positioning. This only works in Netscape 4.x.
<multicol></multicol>
The enclosed text will be displayed in multiple columns. This tag was meant
to be used to create newspaper-like columns of text. This works in
Netscape 4.x.
<nolayer></nolayer>
The <nolayer></nolayer> tag indicates HTML that should be displayed in
browsers that don't support the <layer></layer> tag, when layers are used. It
is similar to the <noframes></noframes> tag in a frameset.
<spacer/>
The spacer tag was Netscape's take on the non-breaking space. Use this
tag to put a specific sized block of white space on your Web page. Note that
this tag works in the Netscape 4 and 6.
Internet Explorer only tags – Two tags are only supported by Internet
Explorer.
Website Design Unit 2
Sikkim Manipal University Page No. 44
<bgsound>/>
This tag will set a sound file to play in the background as the Web page is
displayed.
<marquee></marquee>
With the <marquee></marquee> tag, you can create a scrolling text
marquee on your Web page. This tag is also supported by MSNTV.
2.2.3 Structure Tags
This section describes the tags that indicate the basic structure of a web
page.
HTML - The HTML tag identifies a document as an HTML document. All
HTML documents should start with the <HTML> tag and end with the
</HTML> tag.
Syntax:
<HTML>….</HTML>
HEAD – The HEAD tag defines an HTML document header. The header
contains information about the document rather than information to be
displayed in the document. The web browser displays none of the
information in the header, except for text contained by the TITLE tag. You
should put all header information between the <HEAD> and </HEAD> tags,
which should precede the BODY tag.
The HEAD tag can contain TITLE, BASE, ISINDEX, META, SCRIPT,
STYLE, and LINK tags.
Syntax:
<HEAD>… </HEAD>
TITLE – The TITLE tag defines the TITLE of the document. This is what is
displayed in the top of your browser window. In addition, many search
engines use this as their primary name of a document.
Syntax:
<TITLE> … </TITLE>
BODY – The BODY tag specifies the main content of a document. You
should put all content that is to appear in the web page between the
Website Design Unit 2
Sikkim Manipal University Page No. 45
<BODY> and </BODY> tags. The BODY tag has attributes that let you
specify characteristics for the document. You can specify the background
color or an image to use as a tiled background for the window in which the
document is displayed. You can specify the default text color, active link
color, unvisited link color, and visited link color. You can specify actions to
occur when the document finishes loading or is unloaded, and when the
window in which the document is displayed receives or loses focus.
Syntax:
<body> … </body>
2.2.4 Physical Tags
Text in HTML code can be dressed up in various ways so that it's displayed
differently by the browser. Text can be made Bold, Underlined, Italicized,
Struck-through etc. Moreover, you can make text both italicized and bold at
the same time.
Physical tags define how the text should be displayed in the browser. They
control the Physical characteristics of the text. There are 10 physical tags
each requiring a closing tag:
<I> Italics: I am in italics
Syntax: <I> .. </I>
<B> Bold: I am in bold
Syntax: <B> .. </B>
<U> Underline: I am underlined
Syntax: <U> .. </U>
<STRIKE> Strikethrough: I am struck!
Syntax: <STRIKE> .. </STRIKE>
<SUP> Superscript: My superscript
Syntax: <SUP> .. </SUP>
<SUB> Subscript: My subscript
Syntax: <SUB> .. </SUB>
<TT> Typewriter: I am in typewriter form
Syntax: <TT> .. </TT>
<BIG> Bigger font: I am bigger
<BIG> .. <BIG>
<SMALL> Smaller font: I am smaller
Website Design Unit 2
Sikkim Manipal University Page No. 46
Syntax: <SMALL> .. </SMALL>
<S> Strikethrough alternative: I am also struck!
Syntax: <S> .. </S>
Tag Nesting
Physical tags can be nested i.e. one tag can be placed (including its closing
tag) inside another. Let's test this:
<B>Some text</B> displays Some text which is in bold
Give more emphasis by underlining this text:
<U><B>Some text</B></U> displays Some text which is bold and
underlined
2.2.5 Logical Tags
Logical tags allow the browser to render that information in the manner most
appropriate for that browser. Following are the logical tags used:
<h1> through <h6>
Create headings. They should flow sequentially (try not to skip levels). The
title of the page should always appear as a level 1 heading, with
subheadings cascading down from it. Text is usually displayed in a large,
bold font. Remember that they’re all block-level elements.
<em>
Creates emphasis, and is usually displayed as italicized text. Equivalent to
<i>.
<strong>
Creates strong emphasis, and is usually displayed as bold text. Equivalent
to <b>.
<code>
Is suitable for giving examples of computer code, and is usually rendered in
a mono-spaced font. Equivalent to <tt>.
<blockquote>
Is a block-level tag that’s used to enclose multi-line quotations from other
sources. It is usually displayed as indented from both sides.
Website Design Unit 2
Sikkim Manipal University Page No. 47
<cite>
Is used to enclose the title of a work that is currently being referred to. It’s
usually displayed as italicized text.
<q>
Is a short quotation from another source. Modern browsers will display
contained text with quotation marks added on both sides.
<pre>
Is a block-level element that displays text in a fixed-width font exactly how it
was typed in the source code (i.e. honouring all tabs, spaces and line
breaks). pre is not strictly a logical element, but its use is often necessary.
<del>
Is a HTML 4 tag used to show document revisions; text deleted from a page
in this case. It is usually displayed as text with a strike-through.
<ins>
Is del’s partner in crime, used to show text inserted during a revision. It is
usually displayed with an underline.
<address>
Should be wrapped around contact information, including email addresses.
<kbd>
Is suitable for marking up text that is meant to be entered by the reader on
the keyboard. It is usually displayed in a fixed-width font.
<var>
Marks up a variable’s name. Useful if you’re writing about technical subjects
like computer programming.
2.2.6 HTML tags
The syntax is <HTML> .. </HTML>
Attribute definitions:
Version = cdata
Website Design Unit 2
Sikkim Manipal University Page No. 48
The value of this attribute specifies which HTML DTD version governs the
current document. This attribute has been deprecated because it is
redundant with version information provided by the document type
declaration.
Lang = Language Information – Gives the information of the language
used
Dir = Text Direction – Gives the direction of the text.
2.2.7 Tools for HTML validation
Total Validator is a free one-stop all-in-one validator comprising a HTML
validator, an accessibility validator, a spelling validator, a broken links
validator, and the ability to take screenshots with different browsers to see
what your web pages really look like. Currently Total Validator provides the
following main features:
A parser that validates the basic construction of your pages
True HTML validation against the W3C Markup Specifications or
ISO/IEC definition using the published DTDs (2.0, 3.2, 4.0, 4.01,
ISO/IEC, XHTML 1.0 and 1.1)
An accessibility validator that validates against the W3C WAI
Accessibility Guidelines and US Section 508 Standard
A broken links validator that checks each page for broken links
A spelling validator that spell checks the content of your pages (English,
French, Italian, Spanish, German)
Snapshots (screenshots) of your pages in different browsers, on
different platforms, at different resolutions
A desktop tool so you can validate pages before you publish, and pages
behind firewalls
A Firefox extension for fast, one click validation
Self-assessment questions
1. URI stands for _______________.
2. <blink> </blink> is a example of ____________ tag.
3. <h1> through </h6> is belongs to __________ tag.
Website Design Unit 2
Sikkim Manipal University Page No. 49
2.3 Using Graphics
This section deals with web graphics, image tags, source of web graphics,
and client side Image maps.
2.3.1 Tools for creating and manipulating Web Graphics
Graphics convey complex ideas, lend emotional components, and add style
to a Web page. Following are the various tools for creating and manipulating
web graphics.
Buttonmania: Buttonmania is a freeware utility that allows you to create
impressive Web page buttons.
DeKnop: This freeware utility allows you to create customized Web
page buttons quickly and easily.
GIF Optimizer: GIF Optimizer is a freeware utility that compresses GIF
images allowing your Web pages to load faster.
JPEG Cleaner: This free tool allow you to compress .jpeg or .jpg images
so that your Web page images will load more quickly.
Gimp: This program is a powerful and free graphics editor available for
UNIX, Mac OS X, and Windows.
PaintShop Pro: PaintShop Pro offers the easiest, most affordable way
to achieve professional results.
PhotoPlus: PhotoPlus has the features you'll need for importing,
creating pictures and animations, and manipulating colors and effects.
Ulead WebRazor: This is a set of indispensable graphic utilities
including GIF Animator, Web Plugins for Photoshop, SmartSaver, Photo
Explorer, Photo Viewer, and Screen capture.
Adobe Photoshop: Photoshop is the ultimate graphics program used
by nearly all graphic designers and a recommended tool for anyone
serious about Web design.
Corel Draw: Corel Draw is a complete suite of powerful graphics
applications and supporting utilities.
Macromedia Fireworks: Fireworks lets users import files from all major
graphics formats and manipulate both vector and bitmap images to
quickly create graphics and interactivity.
2.3.2 Image Tags and attributes
The image is stored in a file, which is specified by an HTML request. The in
the file is inserted into the display of the document by the browser.
Website Design Unit 2
Sikkim Manipal University Page No. 50
The image tag, <img>, which is an inline tag, specifies an image that is to
appear in a document. In its simplest form, the image tag includes two
attributes: src, which specifies the file containing the image; and alt, which
specifies text to be displayed when it is not possible to display the image. If
the file is in the same directory as the HTML file of the document, the value
of src is just the image’s filename. In many cases, image files are stored in
a subdirectory of the directory where the HTML files are stored.
For example, the image files might be stored in a subdirectory named
images. If the image file’s name is stars.jpg and it is stored in the images
subdirectory, the value of src would be as follows:
“images/stars.jpg”
Example:
<img src = “c210.jpg” alt = “Picture of a Cessna 210” />
Two optional attributes of img tag, width and height, can be included to
specify (in pixels) the size of the rectangle for the image.
2.3.3 Sources for web site graphics
Most graphic designers and art directors will know about Getty Images. And
there's no doubt that the resources of the largest image library in the World
are vast. But sometimes you want something just a little bit special.
Something that the big fish just won't have. And that's when it is necessary
to look for a more specialized image resource. That's where the
Photographic Libraries directory comes in. A vast listing of image and
photographic resources ranging from photography, to film, to fine art,
fashion, maps and many others. Photographic Libraries also have an
extensive search facility.
MyFonts.com provides the largest collection of fonts ever assembled for on-
line delivery. Well that's what the website says anyway. That's obviously
debatable, but it's true to say that they do have a lot of fonts to view - over
30,000 - with links to dozens of font foundries. A wholly owned subsidiary of
Bit stream Inc., the site certainly has an impressive selection of typefaces.
One interesting feature is the WhatTheFont tool. This lets you upload
scanned images of fonts to the Web site and then will try and display the
closest match to your font sample. Hours of fun uploading pictures of
Website Design Unit 2
Sikkim Manipal University Page No. 51
inanimate objects to see what they come up with. Well it kept us amused,
but then we probably need to get out more.
The following links and graphic design resources are in no particular order.
They are on our list to be reviewed and listed (or discarded if it proves to be
out of date, or inappropriate). As ever, we neither endorse nor recommend
any of these, other than to promote them as sites that were either useful to
us, or to others, for graphic design research purposes.
PBS Digital Television News Archive. Digital TV information.
http://www.pbs.org/digitaltv/dtvtech. And also PBS Digital Television News
Archive. History of TV.
http://www.pbs.org/opb/crashcourse/tv_grows_up/mechanicaltv.html
Cloninger, Curt. Usability Experts are from Mars, Graphic Designers are
from Venus. http://www.alistapart.com/stories/marsvenus/
Gentnera, Dona and Nielsen, Jakob. The Anti-Mac Interface www.useit.com
Humanoid Animation Group. http://sunee.uwaterloo.ca/~h-anim
2.3.4 Introduction to Client-side Image Maps
Image maps aren't as bad as they seem, at least if you use a client side
image map using HTML rather than a CGI program. Now you need to put
the image on the page. To do this, you use the image tag, but with a new
attribute: usemap.
<img src="eximap1.gif" width="200" height="40" border="0" alt="image map"
usemap="#mymap" />
The usemap="#mymap" command tells the browser to use a map on the
page, which is named "mymap". Notice how it uses the "#" symbol in front of
the map name. Also notice that we defined the width and height of the
image. This need to be done so we can use coordinates later on when we
define the map. Speaking of that, let’s see how to define the map. For this
map, we would place the following code somewhere on the page.
<map name="mymap" id="mymap">
<area shape="rect" coords="0, 0, 99, 40" href="table1.htm" alt="Tables" />
<area shape="rect" coords="100, 0, 200, 40" href="frame1.htm"
alt="Frames" />
Website Design Unit 2
Sikkim Manipal University Page No. 52
<area shape="default" href="http://www.pageresource.com" alt="Home" />
</map>
Now you can see where the usemap="#mymap" from the <img> tag comes
from. The name of the map is "mymap". Now, let's look at what all of this
means:
<map name="mymap" id="mymap">
This defines your image map section, and gives the map a name. This map
is named "mymap" In XHTML, the id attribute is required rather than name.
If you are using XHTML transitional, both the name and id can be used.
<area shape="rect" coords="0,0,99,40" href="table1.htm" alt="Tables" />
The area tag defines an area of the image that will be used as a link. The
shape attribute tells the browser what shape the area will be. To keep it
simple, I only used "rect", which stands for rectangle. The coords attribute is
where we define the edges of each area. Since it is a rectangle, we will use
two sets of coordinates. The first set defines where to start the rectangle,
where the top-left edge of the rectangle will be. Since this rectangle starts at
the top-left edge of the image, the coordinates are (0 pixels, 0 pixels). The
second two numbers define where to end the rectangle. This will be the
lower-right edge of the rectangle. Remember that the total image size was
200x40. We want the lower-right edge of this rectangle to be halfway across
the image and at the bottom of the image. Going across, half of 200 is 100,
but we use 99 here because 100 can only be used once. We will use it in
the second rectangle here. Of course, 40 pixels take us to the bottom of the
image. So the lower-right corner of this rectangle will be 99 pixels across the
image, and 40 pixels (all the way) down the image. And now the easy part:
The href attribute is used to tell the browser where to go when someone
clicks someplace on that rectangle. Put the URL of the page you want to go
to in there, and the first rectangle is set up! The alt attribute allows you to
define alternate text for that area.
<area shape="rect" coords="100, 0, 200, 40" href="frame1.htm"
alt="Frames" />
Basically the same as the previous area tag, but it is for our second
rectangle. We start where the other one left off, but back at the top of the
image. Since the right edge of the last rectangle was at 99 pixels accross,
Website Design Unit 2
Sikkim Manipal University Page No. 53
we start this one at 100 pixels accross. And since this will be the upper-left
of the second rectangle, we start it at 0 pixels down the image (the top!). We
end this rectangle where the image ends, so the lower-right coordinate here
is pretty nice- (200, 40), the size of the image!
<area shape="default" href="http://www.pageresource.com"
alt= "Home">
The default is not really a new shape; it just covers anything that may have
been left out. We didn't leave out anything in this map, but if we had, this
would be the URL someone would go to if they clicked on any area we did
not define earlier.
</map>
This ends the map section!
Now, you can use other shapes besides rectangles, but those are a lot
tougher to code by hand.
2.3.5 Tools for creating image maps
Following are some of the tools used to create Image Maps
Client-Side Image Map Editor (CSIME): is a standalone Java application
for maintaining the HTML tags that form a client-side image map. The
CSIME allows the creation of RECT, CIRCLE, and POLY regions overlaying
a GIF or JPEG image. Cool features include 1) use your Netscape
bookmarks file, 2) import a client-side image map straight from a web
server, 3) export to server-side image map format, 4) fading an image to
allow easier editing, and more. Written in Java, the CSIME is platform
independent (runs on UNIX, Mac, Win95/NT, OS/2, etc). The CSIME is
freeware.
Glorglox: is a replacement for NCSA's image map. It allows you to make
image maps with irregular and/or discontiguous areas, and is much more
flexible for some applications.
Imaptool: is for creating client-side image maps. It's for the X Window
System and tested with Linux 1.2.13.
Mapedit: is a WYSIWYG editor for image maps, available for Microsoft
Windows and the X Window System. Use Mapedit to generate, or convert
to, NCSA, CERN, or client-side map files.
Website Design Unit 2
Sikkim Manipal University Page No. 54
Map This: freeware 32 bit application for creating, editing, and converting
map files. Supports NCSA, CERN, and Client Side Image Maps. Handles
both GIF and JPG images. Runs under Win 3.1/3.11 with Win32s installed;
Win95 and WinNT.
Web Hotspots 2.0: is an image map editor for Windows supporting both
server and client-side image maps, multiple image file formats including GIF
and JPEG and more microscopic (zoomed-in) editing, advanced shape
manipulation, subtractive regions (cutouts), starter host page generation,
insertion of host (i.e., IMG) entries into existing pages, and live testing for
Windows Sockets 1.1 compliant configurations.
2.2.6 GIF, JPEG and PNG Formats
GIF (Graphics Interchange Format):
This uses the file extension .gif. This format is invented by Bob Berry and
team at Compuserve. This format is created in 1987 and updated in 1989. It
uses 256 colors. One color may optionally be 100% transparent. It uses
Lossless – LZW (Abraham Lempel, Jacob Ziv, and Terry Welch)
compression technique. It uses a palette, and instead of putting 24-bit
values in its map for the image, it puts palette values. So it starts off with 3:1
compression. The LZW compression on top of that can raise it to 5:1 or
even 10:1.
GIF is good for Line Drawings, Clip Art, CAD drawings, Text, Animations
and Images with transparent areas. GIF is bad for Photographs and images
with more than 256 colors.
JPEG (Joint Photographic Experts Group):
This uses the file extension .jpe, .jpg, .jpeg. This format is invented by Eric
Hamilton, Joint Photographic Experts Group, Tom Lane, Independent JPEG
Group. This format is created in 1990. It uses ISO/IEC 10918 standard. It
uses 16,777,215 colors. No transparency. It uses Lossy – JPEG
compression (lossy discrete cosine transform followed by Huffman coding).
JPEG is good for Photographs, images with more than 256 colors, making
smaller files. JPEG is bad for Text, images with sharp edges especially
vertical edges, Line drawings, CAD drawings, Transparency and most scans
from books or news papers.
Website Design Unit 2
Sikkim Manipal University Page No. 55
PNG (Portable Network Graphics):
This uses the file extension .png. This format is invented by Tom Boutell,
Tom Lane, Greg Roelofs, others. The version 1.0 format is created in 1996
and version 1.1 in 1998. It uses World Wide Web Consortium
recommendation 1996, RFC 2083 1997 standard. It uses 2-256 (palette
mode) or 16,777,215 colors. Single color is 100% transparent (like GIF),
variable transparency (256 levels of transparency per pixel). It uses
Lossless – "deflation" compression technique. For each image line, a filter
method is chosen which predicts the colour of each pixel based on the
colours of previous pixels and subtracts the predicted colour of the pixel
from the actual color. An image line filtered in this way is often more
compressible than the raw image line would be. On most images, PNG can
achieve greater compression than GIF, but some implementations make
poor choices of filter methods and therefore produce unnecessarily large
PNG files.
PNG is good for wherever you would use GIF, images with variable
transparency. PNG is bad for Full color images will probably be bigger than
equivalent JPEGs and ring around the image.
2.3.7 Transparent Graphics
Transparency is possible in a number of graphics file formats. The term
transparency is used in various ways by different people, but at its simplest
there is "full transparency" i.e. something that is completely invisible. Of
course, only part of a graphic should be fully transparent, or there would be
nothing to see. More complex is "partial transparency" or "translucency"
where the effect is achieved that a graphic is partially transparent in the
same way as colored glass. Since ultimately a printed page or computer or
television screen can only be one color at a point, partial transparency is
always simulated at some level by mixing colors. There are many different
ways to mix colors, so in some cases transparency is ambiguous.
In addition, transparency is often an "extra" for a graphics format, and some
graphics programs will ignore the transparency.
Transparent Pixels: One color entry in a single GIF or PNG image's palette
can be defined as "transparent" rather than an actual color. This means that
when the decoder encounters a pixel with this value, it is rendered in the
Website Design Unit 2
Sikkim Manipal University Page No. 56
background color of the part of the screen where the image is placed, also if
this varies pixel-by-pixel as in the case of a background image.
Applications include:
An image that is not rectangular can be filled to the required rectangle
using transparent surroundings; the image can even have holes (e.g. be
ring-shaped)
In a run of text, a special symbol for which an image is used because it
is not available in the character set, can be given a transparent
background, resulting in a matching background.
The transparent color should be chosen carefully, to avoid items that just
happen to be the same color vanishing.
Even this limited form of transparency has patchy implementation, though
most popular web browsers are capable of displaying transparent GIF
images. This support often does not extend to printing, especially to printing
devices which do not include support for transparency in the device or
driver. Outside the world of web browsers, support is fairly hit-or-miss for
transparent GIF files.
2.3.8 Transparency and Interlacing of Graphics
Interlacing is a method of encoding a bitmap image such that a person who
has partially received it sees a degraded copy of the entire image. When
communicating over a slow communications link, this is often preferable to
seeing a perfectly clear copy of one part of the image, as it helps the viewer
decide more quickly whether to abort or continue the transmission.
Interlacing is supported by the following formats:
GIF stores the lines in the order 0, 8, 16, ..., 4, 12, ..., 2, 6, 10, 14, ..., 1,
3, 5, 7, 9,.
PNG uses the Adam7 algorithm
JPEG and JPEG 2000
PGF
Interlacing is also known as "progressive" encoding, because the image
becomes progressively clearer as it is received.
2.3.9 Creating Animated Graphics
In this section you will study how a animated graphics is created in GIF
format. The world of animated gifs is a fascinating one indeed. Anyone can
Website Design Unit 2
Sikkim Manipal University Page No. 57
create animated gifs, irrespective of the graphics skills one has. Usually the
initial animations will look ugly and wierd but with practice and after viewing
hundreds of animations on the web the animations will begin to look better,
until one day you will say... "ah I'm proud of this animation".... This pride will
be greater when your animations will be used in different web pages.
Let us start off with a check list of all the things you need to create animated
gifs.
An imaging software such as paint shop pro or colorworks
A gif assembling software such as Gif animator, Animation Shop, Giffy
Creativity and
A lot of patience
A number of software programmes are available either for free or shareware
on the Internet. We recommend Paint Shop Pro (JASC. Inc.) as the imaging
software, and Gif Animator (Ulead).
Creating animated gifs is really simple. Let us start off with an example.
shown in figure 2.1.
Figure 2.1: Image of a Ball
The aim of this first exercise is to make the ball move from left to right and
then back.
In the imaging software often a new work area 450 pixels wide and height
equal to the original image (in this case it is 49 pixels). Copy the ball and
paste it into the working area at the far left.
Create a new working area same height and width. Paste the ball at a
position more at the right than the previous frame. Repeat this procedure
until the ball is completely at the right. At this point you should have a
number of frames shown in figure 2.2.
Website Design Unit 2
Sikkim Manipal University Page No. 58
Figure 2.2: Series of Ball
Now import the individual images in the gif assembler program in sequential
order, and then in the reverse order. Save the animation.
2.3.10 Interactive Graphics
Ivan Sutherland (MIT 1963) established the basic interactive paradigm that
characterizes interactive computer graphics:
User sees an object on the display
User points to (picks) the object with an input device (light pen, mouse,
trackball)
Object changes (moves, rotates, morphs)
Repeat
Input devices contain a trigger which can be used to send a signal to the
operating system; Button on mouse or Pressing or releasing a key. When
triggered, input devices return information (their measure) to the system;
Mouse returns position information and Keyboard returns ASCII code. Most
systems have more than one input device, each of which can be triggered at
an arbitrary time by a user. Each trigger generates an event whose measure
is put in an event queue which can be examined by the user program.
Figure 2.3: Process of interactive graphics
Website Design Unit 2
Sikkim Manipal University Page No. 59
Programming interface for event-driven input defines a callback function for
each type of event the graphics system recognizes. This user-supplied
function is executed when the event occurs.
Self-assessment questions
4. JPEG uses ______ number of colours.
5. PNG stands for __________.
6. Interlacing is a method of encoding a bitmap image such that a person
who has partially received it sees a degraded copy of the entire
image.(true/false)
2.4 Constructing Forms
The most common way for a user to communicate information from a Web
browser to the server is through a form. HTML provides tags to generate the
commonly used objects on a screen form. These objects are called controls
or widgets. Together, the values of all of the controls in a form are called the
form data.
<FORM> tags and attributes
All of the components of a form appear in the content of a <FORM> tag. The
action attribute specifies the URL of the application on the Web server that
is to be called when the user clicks the Submit button. The method attribute
of <FORM> specifies one of the two techniques, get or post, used to pass
the form data to the server. Get is the default, so if no method attribute is
given in the <FORM> tag, get will be used. The alternative technique is
post.
Example:
<FORM action=”pgm1.php” method =”post”>
….
</FORM>
<INPUT> tags and Attributes
Many of the commonly used controls are specified with the inline tag
<input>, which is used for text, passwords, checkboxes, radio buttons and
the special buttons Submit and Reset. The one attribute of <input> that is
required for all of the controls is Type, which specifies the particular kind of
control. The control’s kind is its type name, such as checkbox.
Website Design Unit 2
Sikkim Manipal University Page No. 60
Text Types
A text control, referred to as Text Box, creates a horizontal box into which
the user can type a line of text. Default size of the text box is often 20
characters. The attributes used are Type, Name, size and maxlength.
For the Text Box, the type value is “text”. Name indicates the name given to
the control. Size indicates the size of the text box in terms of characters. If
the user types more characters than will fit in the text box, the box is
scrolled. If you do not want the box to be scrolled, you can include the
maxlength attribute to specify the maximum number of characters that the
browser will accept in the box. Any additional characters are ignored.
Example:
<form action =””>
<input type=”text” name=”fname” size=”25” maxlength =”50” />
</form>
If the contents of a text box should not be displayed when it is entered by
the user, a password control can be used.
Example:
<form action =””>
<input type=”password” name=”MyPasswd” size=”10” maxlength =”10” />
</form>
Regardless of what characters are typed into a password control, only
bullets or asterisks are displayed by the browser.
In some situations, a multiline text area is needed. The <textarea> tag is
used to create such controls. The text typed into the area created by
<textarea> is not limited in length, and there is implicit scrolling both
vertically and horizontally. The default size of the visible part of the text is is
often quite small, so the rows and cols attributes should usually be included
and set to reasonable sizes.
Example:
<textarea name=”address” rows=”3” cols=”40” />
Radio Buttons and Checkboxes
Checkbox and radio controls are used to collect multiple-choice input from
the user. A checkbox control is a single button that is either on or off
(checked or not). If a checkbox button is on, the value associated with the
Website Design Unit 2
Sikkim Manipal University Page No. 61
name of the button is the string assigned to its value attribute. A checkbox
button doesn’t contribute to the form data if it is off. Every checkbox button
requires a name attribute and a value attribute in its <input> tag. The
attribute checked, which is assigned the value checked, specifies that the
checkbox button is initially on. The content of the <input> tag is displayed
next to the checkbox button, providing a label.
Example:
<form action =””>
<input type=”checkbox” name=”groceries” value = “milk” checked
=”checked” />Milk
<input type=”checkbox” name=”groceries” value = “bread” />Bread
<input type=”checkbox” name=”groceries” value = “eggs”” />Eggs
</form>
Radio buttons are closely related to checkbox buttons. The difference
between a group of radio buttons and a group of checkboxes is that only
one radio button can be on or pressed at any time. Every time a radio button
is pressed, the button in the group that was previously on is turned off. The
type value for radio buttons is radio. All radio buttons in a group must have
the name attribute set in the <input> tag, and all radio buttons in a group
have the same name. The attribute checked, which is assigned the value
checked, specifies that the radio button is initially on. If no radio button in a
group is specified as being checked, the browser usually checks the first
button in the group.
Example:
<form action =””>
<input type=”radio” name=”age” value = “under20” checked =”checked” />
0-19
<input type=”radio” name=”age” value = “20-35” />20-35
<input type=”radio” name=”age” value = “36-50” />36-50
<input type=”radio” name=”age” value = “over50” />over 50
</form>
Scrolling and Selection Lists
If the number of possible choices is large, the displayed form becomes too
long to display. In these cases, a menu should be used. A menu is specified
with a <select> tag. There are two kinds of menus: those in which only one
Website Design Unit 2
Sikkim Manipal University Page No. 62
menu item can be selected at a time and those in which multiple menu items
can be selected at any given time. The default option is the one related to
radio buttons. The other option can be specified by adding the multiple
attribute. The size attribute specifies the number of menu items that are to
be displayed for the user. If either multiple is specified or the size attribute
is set to a number larger than 1, the menu is usually displayed as a scrolled
list.
Each of the items in a menu is specified with an <option> tag, nested in the
select element. The content of an <option> tag is the value of the menu
item, which is just text. The <option> tag can include the selected attribute,
which specifies that the item is pre selected. The value assigned to
selected is “selected”.
Example:
<form action =””>
With size = 1(the default)
<select name=”groceries”>
<option> Milk </option>
<option> Bread </option>
<option> Eggs </option>
<option> Cheese </option>
</select>
</form>
Submit and Reset Buttons
The Reset button clears all of the controls in the form to their initial states.
The Submit button has two actions: First, the form data is encoded and sent
to the server. Second, the server is requested to execute the server-resident
program specified in the action attribute of the <form> tag. Every form
requires a Submit button. The Submit and Reset buttons are created with
the <input> tag, as illustrated in the following example:
<form action =”pgm1.php” method = “post”>
<input type=”submit” value=”Submit Form” />
<input type=”reset” value = “Reset Form” />
</form>
Website Design Unit 2
Sikkim Manipal University Page No. 63
Scripts for Form Processing
Before the form data is submitted to the server it has to be processed at the
client side. For example, consider a form which has inputs like the user
name, age, information regarding his marks etc. If any one of the field is
missing then the data has to go to server and the server has to process and
find some data is missing. Then the server will send an error message to the
client indicating some fields have missed. The drawback of this is that the
server will be given more responsibility of validating the form content.
Instead of this the form validation can be done at the client side it self. This
reduces the burden of the server and reduces delay. Some scripting
languages can be used to process the form.
Sources for Sample Scripts
The scripting languages to process the form can be JavaScript or VB Script.
These scripts can be embedded within the HTML content and processed by
the browser. Client-side JavaScript cannot replace all of server-side
computing. In particular, while server-side software supports file operations,
database access and networking, client side JavaScript supports none of
these. Many JavaScripts, however, are an integral part of the HTML
document, so no secondary downloading is necessary.
Self-assessment questions
7. A text control, referred to as ____________.
8. ___________ are used to collect multiple-choice input from the user.
9. ___________ clears all of the controls in the form to their initial states.
2.5 Marketing Your Site
This section deals with search engine mechanism, directories, Meta tags
and its attributes.
2.5.1 Characteristics of Search Engines
Most search engines work by sending out a spider to fetch submitted
documents. Another program, called an indexer, then reads these
documents and creates an index based on the words contained in each
document. Each search engine uses a proprietary algorithm to create its
indices such that, ideally, only meaningful results are returned for each
query.
Website Design Unit 2
Sikkim Manipal University Page No. 64
META Tags. These are special HTML tags that provide information
about a Web page. Unlike normal HTML tags, Meta tags do not affect
how the page is displayed. Instead, they provide information such as
what the page is about, which keywords represent the page's content
and who created the page. Many search engines use Meta tags when
they build and update their indices.
Spider Support. As mentioned above spiders are programs used by
some search engines to fetch submitted documents.
Popularity. Some search engines count the number of linked to a page
to measure its popularity. Those with largest number of linked sites
receive a higher rating.
Lag Time. The time required to index a page and have it appear in
subsequent search results. All search engine submission times are
approximations.
2.5.2 Registering with Search Engines and Directories
A search engine is a piece of software that enables users to search through
an index or database of websites that has been created either by people or
automatically by software that crawls through the World Wide Web looking
for new websites and indexing them. A search engine is actually the tool
that a website such as Yahoo or Google employs to enable people to search
its index for websites, images, words or phrases.
Registering your website with search engines such as Yahoo is relatively
easy. It is often free and is the first thing you should do once a new website
has been launched or an existing one has been re-developed. Registering
with search engines is one of the most effective ways of making it easy for
people to find your website.
What to do?
Option 1: You can register your website yourself with search engines.
Here is what to do.
Compose a descriptive sentence (usually up to 25 words) that
summarizes your site's content. This sentence should be simple, in plain
English, and state the main contents of the website. For example, if you
owned a cardboard box factory that sold standard sized boxes and also
made them to clients' specifications, you might compose a sentence like
this: "XYZ Box Company makes quality cardboard boxes of every
Website Design Unit 2
Sikkim Manipal University Page No. 65
standard size and we can produce boxes to your specifications and your
budget."
Identify the most popular search engines that allow you to register your
site with them.
Log on to their sites, locate the online registration form or area and
complete the instructions - and you will probably be asked to use the
sentence you composed in step 1 above.
Many search engine directories, like Yahoo, are organized into
categories, and allow you to register your site in multiple categories. It takes
time to register your website with the most popular search engines and may
be a day's work, but usually it is free.
The search engine owners will check your application and choice of
categories and index the site. This usually takes 2 to 6 weeks.
Option 2: You can pay an organization to register your website with search
engines. Many companies offer this service. Locate them using a search
engine and select one that offers the best value for money and will register
your site with search engines that are popular with your target audience.
2.5.3 The <Meta> tag and its attributes, keywords, description and
robots
Metadata is information about data. The <Meta> tag provides metadata
about the HTML document. Metadata will not be displayed on the page, but
will be machine parsable. Meta elements are typically used to specify page
description, keywords, author of the document, last modified and other
metadata. The <Meta> tag always goes inside the head element. The
metadata can be used by browsers (how to display content or reload page),
search engines (keywords), or other web services.
Required Attribute:
Attribute Value Description
content text Specifies the content of the meta information
Table 2.1: Optional Attributes
Website Design Unit 2
Sikkim Manipal University Page No. 66
Attribute Value Description
http-equiv content-type content-style-type expires refresh set-cookie
Provides an HTTP header for the information in the content attribute
name author description keywords generator revised others
Provides a name for the information in the content attribute
2.5.4 Creating Effective <title> tags
Title tag shows the words which describe your web page. It is the most
important factor in luring visitors. Your visitors get the initial information
about you website through the title tags. You can create effective title tags
with judging the needs of the visitor. Following are few important tips to
increase your title value.
a. Utilize Keywords
Keywords are the most important expressions of your website. Try to
accommodate few keywords in your title tag. Make your title tag look
informative with keywords. This can increase your page ranking also as
more and more visitors will access you through your keywords.
b. Preference in Keywords
Organize keywords in your title according to their importance. Place most
important keyword first and then follow it with other keywords. For example
if your keywords are Ethnic Women Wear, Online Women Wear, Women
Shoes, Women Clothing India, then you can make your title tag as
<TITLE>Online Women Wear, Women Clothing India</TITLE>
c. Target Traffic
It is widely said that for websites 'content is the king'. If you want to get
better search engine results then you need to put more thought on your
website content. Your content can have all the pertinent keywords and it
makes your reader read your website. As your visitors read your content,
search engines too go through your website to find right keywords for
enlisting your website.
Website Design Unit 2
Sikkim Manipal University Page No. 67
d. Limit Characters
Meta tags are created to give search engines the important information
about your website. Though Meta tags help a lot in optimization as it help
search engines to determine the information. Meta tags should involve
important keywords but many people spam the search engines with Meta
tags which is gorged with keywords.
e. Be different
Always use different tag line for different pages. Don't use a single tagline in
every page. Try to be different and put a unique tagline in every page. Your
tagline should reflect the content of your page. Hence craft every tagline
with adeptness, so that it will look effective.
2.5.5 Designing your site for Effective Search Engine Optimization
(SEO)
Search engine optimization is crucial for anyone who wants people to visit
his or her Web site. You can place as many ads as you like, but most
people are still going to find your site because of its listings in search
engines or directories. It's a fact that most people who use search engines
only look at the first one or two page of search listings. The goal of effective
search engine optimization is to get your pages listed on those critical first
pages for particular key terms. The following rules are applied for designing
your site for Effective Search Engine Optimization (SEO).
a) Phrasing matters. Many more people search for the term "effective
search engine optimization" than for "effectively optimizing for search
engines". To find out which key words or phrases are more popular than
others, you can use a tool such as Overture’s Search Term
Suggestion Tool; enter your chosen phrases and you'll see how many
people searched for that term recently.
b) Give each page an appropriate title that includes the key word or phrase
at least once. I so often see sites that use the name of their business as
the title of all their pages. Is every page of their site about their
business? Probably. But chances are really low that people will be
searching for their business' name!
c) Put the key words or phrase that you've chosen in the page's title tag,
Meta keywords, and Meta description. Make sure that the Meta
description is as appealing as possible, because some search engines
Website Design Unit 2
Sikkim Manipal University Page No. 68
actually use this description in the search engine results pages that
people will be reading.
d) Be sure your chosen key words or phrase is repeated judiciously
throughout the content of the page. You don't want to overdo it, or your
page may be rejected as spam, but you need to repeat it enough times
that the search engine's software will consider the phrase relevant.
Self Assessment questions
10. _______ are special HTML tags that provide information about a Web
page.
11. ________ is a piece of software that enables users to search through an
index or database of websites that has been created either by people or
automatically by software
2.6 Summary
The basic web architecture is two-tiered and characterized by a web
client that displays information content and a web server that transfers
information to the client.
An SGML Document Type Definition (DTD) specifies valid tag names
and element attributes.
The text within the <blink></blink> tag will turn on and off (blink).
The HTML tag identifies a document as an HTML document.
Physical tags can be nested i.e. one tag can be placed (including its
closing tag) inside another.
Graphics convey complex ideas, lend emotional components, and add
style to a Web page.
The image tag, <img>, which is an inline tag, specifies an image that is
to appear in a document.
Input devices contain a trigger which can be used to send a signal to the
operating system
These scripts can be embedded within the HTML content and processed
by the browser.
A search engine is a piece of software that enables users to search
through an index or database of websites that has been created either
by people or automatically by software that crawls through the World
Wide Web looking for new websites and indexing them.
Website Design Unit 2
Sikkim Manipal University Page No. 69
2.7 Terminal Questions
1. Explain the architecture of the web page contents
2. Briefly explain the various tools used for validating HTML document
3. Explain the use of client-side image maps
4. Briefly explain the transparent graphics
5. Explain the characteristics of Search Engines.
2.8 Answers
Self Assessment Questions:
1. Universal Resource Identifier
2. Browser specific tag
3. Logical tags
4. 16,777,215
5. Portable Network Graphics
6. True
7. Text Box
8. Checkbox and radio controls
9. Reset button
10. META Tags
11. Search engine
Terminal Questions
1. The basic web architecture is two-tiered and characterized by a web
client that displays information content and a web server that transfers
information to the client. (Refer Section 2.2)
2. Total Validator is a free one-stop all-in-one validator comprising a HTML
validator, an accessibility validator, a spelling validator, a broken links
validator, and the ability to take screenshots with different browsers to
see what your web pages really look like. (Refer Section 2.2.7)
3. client side image map using HTML rather than a CGI program. (Refer
Section 2.3.4)
4. Transparency is possible in a number of graphics file formats. (Refer
Section 2.3.7)
5. Most search engines work by sending out a spider to fetch submitted
documents. (Refer Section 2.5.1)
Website Design Unit 3
Sikkim Manipal University Page No. 70
Unit 3 Website development with HTML – II
Structure:
3.1 Introduction
Objectives
3.2 Frames
The <frame> Tags and Attributes
The <frameset> Tags and Attributes
Frame Construction
Frame Navigation
3.3 Creating and Managing Styles
Cascading Style Sheets (CSS)
<style> Tags and Attributes
Defining Styles
Creating CSS Rules
Using Style Sheets To Support Multiple Browsers
Creating Custom Styles (classes)
Using <div> and <span> Tags
3.4 Tables
Purpose of Tables
Table Tags
Table Attributes
Using Tables for Page Layout and Structure
Creating Nested Tables
3.5 Website Layout and Design
Layout and Design Heuristics
Content Organization
Page Size and Load Time Optimization
Navigation Styles
Providing Navigational Feedback
Tables vs. CSS
Use of Color and Graphics
3.6 Managing Source Files
Recommended Folder Structure
Testing and Production Folders
Development Steps
Website Design Unit 3
Sikkim Manipal University Page No. 71
File Naming
Version Control
3.7 Foundations of Dynamic HTML
DHTML Capabilities
Netscape vs. Microsoft Support for DHTML
<link> Tags and External Styles
Creating Custom Styles (classes)
<layer> Tags
Positioning Layers
HTML Vs DHTML
3.8 Summary
3.9 Terminal Questions
3.10 Multiple Choice Questions
3.11 Answers
3.1 Introduction
In this chapter you are going to study about frames in HTML. The
Cascading Style Sheets usage will be studied in this chapter. The design of
Tables in HTML and the general web site layout and design is also
explained in this unit. Foundations of DHTML are also studied in this unit.
Objectives:
After studying this unit, you should be able to:
design frames in HTML
create and manage style sheets
to design tables in HTML
discuss website layout and design
give overview of DHTML
3.2 Frames
The browser display window can be used to display more than one
document at a time. The window can be divided into rectangular areas, each
of which is a Frame. Each frame is capable of displaying its own document.
Website Design Unit 3
Sikkim Manipal University Page No. 72
The <frame> tag and attributes
The content of a frame is specified with the <frame> tag, which can appear
only in the content of a frameset element. The content of a frame is
specified as the value of the src attribute in the <frame> tag.
Example:
<frame src = “apples.html” >
If the <frame> tag has no src attribute, the browser displays an empty
frame. If the content of a frame doesn’t fit into the given frame, scroll bars
are implicitly included. If you want a frame to have scroll bars, regardless of
the size of its content, the <frame> attribute scrolling can be set to yes. If a
<frame> tag includes a name attribute, the content of its associated frame
can be changed by the selection of a link in some other frame that specifies
that name.
The <frameset> tag and attributes
The number of frames and their layout in the browser window are specified
with the <frameset> tag. A frameset element takes the place of the body
element in a document. A document has either a body or a frameset but
cannot have both.
The <frameset> tag must have either a rows or a cols attribute, and they
often have both. The rows attribute specifies the number of rows of frames
that will occupy the window. There are 3 kinds of values for rows: numbers,
percentages, and asterisks. Normally, two or more values, separated by
commas, are given in a quoted string. When a number is used as a value, it
specifies the height of one row in pixels. A percentage is given as a number
followed immediately by percent sign. When used, a percent value specifies
the percentage of the total browser window height that a row should occupy.
When an asterisk is used as the value of rows, it means the remainder of
the window height.
Examples:
<frameset rows = ”200, 300, 400”>
<frameset rows = “22%, 33%, 45%”>
<frameset rows = “22%, 33%, *”>
The cols attribute is very much like the rows attribute, except that it
specifies the number of columns of frames. For example, the following tag
Website Design Unit 3
Sikkim Manipal University Page No. 73
specifies that the window is to have six frames in three equal-height rows
and two columns.
<frameset rows = 33%, 33%, 33%” cols = “25%, *”>
Frame Construction
Consider the following example.
<html>
<frameset cols = "50%,*">
<frameset rows = "50%, 50%">
<frame src =" EX1.HTML" />
<frame src =" EX2.HTML" />
</frameset>
<frameset cols = "50%, 50%">
<frame src =" EX1.HTML" />
<frame src =" EX2.HTML" />
</frameset>
</frameset>
</html>
This example creates totally 4 frames. First 2 vertical frames will be created.
Within first vertical frame two horizontal frames will be created. Within
second vertical frame two vertical frames will be created.
Frame Navigation
The navigation frame contains a list of links with the second frame as the
target. This example demonstrates how to make a navigation frame. The file
called "tryhtml_contents.htm" contains three links. The source code of the
links:
<a href ="frame_a.htm" target ="showframe">Frame a</a><br>
<a href ="frame_b.htm" target ="showframe">Frame b</a><br>
<a href ="frame_c.htm" target ="showframe">Frame c</a>
The second frame will show the linked document.
Self Assessment Questions
1. The content of a frame is specified as the value of the src attribute in
the_________.
Website Design Unit 3
Sikkim Manipal University Page No. 74
3.3 Creating and managing styles
This section deals with, CSS concepts, style tag, CSS rule and multiple
browser support systems.
3.3.1 Cascading Style Sheets
Some of the tags of HTML, for example, <i> specify presentation details, or
style. However, these presentation specifications can be more precisely and
more consistently described with style sheets. Furthermore, many of the
tags and attributes used for describing presentation have been deprecated
in favor of style sheets.
Most HTML tags have associated properties, which store presentation
information for browsers. Browsers use default values for these properties if
the document doesn’t specify values. For example, the <h2> tag has the
font-size property, for which a browser could have the default value of 30
points. A style sheet could specify that the font-size property for <h2> be set
to 26 points, which would override the default value. The new value could
apply to one occurrence of an <h2> element or all such occurrences in the
document, depending on how the property value is set.
Perhaps the most important benefit of style sheets is their capability of
imposing consistency on the style of Web documents. For example, they
allow the author to specify that all occurrences of a particular tag use the
same presentation style. HTML style sheets are called Cascading Style
Sheets because they can be defined at three different levels to specify the
style of a document. Lower level style sheets can override higher level style
sheets, so the style of the content of a tag is determined through a cascade
of style-sheet applications.
The three levels of style sheets, in order from lowest level to highest level,
are inline, document level, and external. Inline style sheets apply to the
content of a single tag, document level style sheets apply to the whole body
of a document, and external style sheets can apply to the bodies of any
number of documents. Inline style sheets have precedence over document
style sheets, which have precedence over external style sheets.
3.3.2 <Style> tag and attributes
The format of a style specification depends on the level of style sheet. The
general form of the content of a style element is as follows:
Website Design Unit 3
Sikkim Manipal University Page No. 75
<style type = “text/css”>
Rule list
</style>
The type attribute of the <style> tag tells the browser the type of style
specification, which is always text/css. The type of style specification is
necessary because there are other kinds of style sheets. For example,
JavaScript also provides style sheets that can appear in style elements.
3.3.3 Defining Styles
Inline style specifications appear as values of the style attribute of a tag, the
general form of which is as follows:
Style = “property_1:value1; property_2: value 2; …; property_n: value n;”
Although it is not required, it is recommended that the last property/value
pair be followed by a semicolon.
The scope of an inline style specification is restricted to the content of the
element in which it appears.
Whether you list your styles in the head of the HTML document or in a
separate style sheet, you define them similarly. Give each selector a style
definition using the following format:
H3 { font-family: Arial }
In this instance, H3 is the selector (in this case, an HTML element), and
font-family: Arial is its defined style. The definition is a combination of a
property and its value, separated by a colon. (Properties are the
characteristics an element can have – such as font type, font size, or color –
while values are specific traits those properties can have – such as Arial,
24-point, or red.)
To include multiple properties in the same definition, simply separate them
with semicolons:
H3 { font-family: Arial; font-style: italic; color: green }
To make a style easier to read, you can stack the properties:
H3 { font-family: Arial;
font-style: italic;
color: green }
Website Design Unit 3
Sikkim Manipal University Page No. 76
And to define more than one value for a single property, just add them on,
separated by commas:
H3 { font-family: Arial, Helvetica, sans-serif;
font-style: italic;
color: green }
The font-family property in the code above offers the browser several values
to choose from; the browser will go down the line until it finds a typeface it
recognizes. The first item listed (Arial) is the preferred typeface, and the
second item (Helvetica) is an alternate typeface in case the user's system
doesn't have Arial. The third item (sans-serif) is a generic style of font rather
than a specific one--this is recommended as a last alternative because most
systems have at least one typeface in that generic family. If the browser
doesn't find any matches, it will use its default font.
3.3.4 Creating CSS Rules
A Cascading Style Sheets rule is made up of a selector and a declaration.
H2 {color: blue;}
selector {declaration;}
The declaration is the part of the rule inside the curly braces. It specifies
what a style effect will be. For example, "color:blue".
The selector specifies which element(s) will be affected by the delaration.
Think of the selector as a link of sorts between the HTML mark-up
document and the style of the Web page. A selector that refers to an HTML
element is called a type selector. (Other kinds of selectors will be discussed
later). Any HTML element name can be used as a type selector. HTML
"tags" without content ("empty containers") such as <BR> or <HR> can not
be used as a selector. They are not included in the current CSS
specification.
A declaration has two parts separated by a colon: property and value.
selector {property:value}
More than one declaration may be placed inside the curly braces and a
semi-colon must separate each declaration from the next. The ending
declaration does not require a semi-colon but I like to use it.
selector {property:value; property:value;}
H2 {color:blue; font-family:Arial, sans-serif;}
Website Design Unit 3
Sikkim Manipal University Page No. 77
Instead of coding,
H1 {font-family:Arial, Helvetica, sans-serif;}
H2 {font-family:Arial, Helvetica, sans-serif;}
H3 {font-family:Arial, Helvetica, sans-serif;}
You may group selectors together. When grouping selectors you will need to
separate each selector with a comma. When grouped together, one rule
applies to several selectors.
H1, H2, H3 {font-family:Arial, Helvetica, sans-serif;}
3.3.5 Using Style Sheets to support multiple browsers
It is not impossible to design a single style sheet that works properly on all
the different browsers. One option is to use different style sheet documents
for the different browsers you want to support. In this way you can specify
CSS formatting customized to the strengths (and weaknesses) of each
different browser, without compromising for the average of them.
There are essentially two ways to do this. The first is to use content
negotiation to send the browser a browser-specific style sheet. With HTTP,
a request for any resource (including a style sheet) will look something like
(omitting several other pieces of information):
GET /path/stylesheet.css HTTP/1.0
....
User-Agent: Mozilla/4.61 [en] (Win98; I)
The User-agent string uniquely identifies the browser (here Navigator 4.6).
Most Web server can be configured to return different style sheet
documents depending on this value. Unfortunately, this breaks caching on
some proxy servers, so it doesn't always work. Also you, as an author, may
have not control over server configuration.
The second way is to use JavaScript to test, on the browser, for the browser
version and model number, and to then "write" link elements referencing
appropriate style sheets directly into the document. Both Navigator and
Internet Explorer will then process the script-generated link elements, and
will load the referenced style sheet. Of course, this will only work if
JavaScript is enabled, but in many cases this may be an entirely acceptable
requirement.
Website Design Unit 3
Sikkim Manipal University Page No. 78
3.3.6 Creating Custom Styles (Classes)
A simple selector can have different classes, thus allowing the same
element to have different styles. For example, an author may wish to display
code in a different color depending on its language:
code.html { color: #191970 }
code.css { color: #4b0082 }
The above example has created two classes, css and html for use with
HTML's CODE element. The class attribute is used in HTML to indicate the
class of an element, e.g.,
<P CLASS=warning>Only one class is allowed per selector.
For example, code.html.proprietary is invalid.</p>
Classes may also be declared without an associated element:
.note { font-size: small }
In this case, the note class may be used with any element.
A good practice is to name classes according to their function rather than
their appearance. The note class in the above example could have been
named small, but this name would become meaningless if the author
decided to change the style of the class so that it no longer had a small font
size.
3.3.7 Using <div> and <span> tags
The <span> and <div> tags are very useful when dealing with Cascading
Style Sheets. People tend to use the two tags in a similar fashion, but they
serve different purposes.
<div>:
The <div> tag defines logical divisions in your Web page. It acts a lot like a
paragraph tag, but it divides the page up into larger sections. <div> also
gives you the chance to define the style of whole sections of HTML. You
could define a section of your page as a call out and give that section a
different style from the surrounding text.
The <div> tag gives you the ability to name certain sections of your
documents so that you can affect them with style sheets or Dynamic HTML.
One thing to keep in mind when using the <div> tag is that it breaks
paragraphs. It acts as a paragraph end/beginning, and while you can have
paragraphs within a <div> you can't have a <div> inside a paragraph.
Website Design Unit 3
Sikkim Manipal University Page No. 79
The primary attributes of the <div> tag are:
style
class
id
Even if you don't use style sheets or DHTML, you should get into the habit
of using the <div> tag. This will give you more flexibility when more XML
parsers become available. Also, you can use the id and name attributes to
name your sections so that your Web pages are well formed (always use
the name attribute with the id attribute and give them the same contents).
Because the <center> tag has been deprecated in HTML 4.0, it is a good
idea to start using
<div style="text-align: center ;"> to center the content inside your div.
<span>:
The <span> tag has very similar properties to the <div> tag, in that it
changes the style of the text it encloses. But without any style attributes, the
<span> tag won't change the enclosed items at all.
The primary difference between the <span> and <div> tags is that <span>
doesn't do any formatting of it's own. The <div> tag acts includes a
paragraph break, because it is defining a logical division in the document.
The <span> tag simply tells the browser to apply the style rules to whatever
is within the <span>.
The <span> tag has no required attributes, but the three that are the most
useful are:
style
class
id
Use <span> when you want to change the style of elements without placing
them in a new block-level element in the document. For example, if you had
a Level 3 Heading (<h3>) that you wanted the second word to be red, you
could surround that word with
<span style="color: #f00 ;"> 2ndWord</span> and it would still be a part of
the <h3> tag, just red.
Website Design Unit 3
Sikkim Manipal University Page No. 80
Self Assessment Questions
2. The format of a style specification depends on the level of ________.
3. CSS stands for ___________.
3.4 Tables
This section deals with usage of tables, table tags, table attributes, table for
page layout and creating nested tables.
3.4.1 Purpose of Tables
The TABLE element defines a table for multi-dimensional data arranged in
rows and columns. TABLE is commonly used as a layout device, but
authors should avoid this practice as much as possible. Tables can cause
problems for users of narrow windows, large fonts, or non-visual browsers,
and these problems are often accentuated when tables are used solely for
layout purposes. As well, current visual browsers will not display anything
until the complete table has been downloaded, which can have very
noticeable effects when an entire document is laid out within a TABLE.
3.4.2 Table Tags
The <table> tag defines an HTML table. A simple HTML table consists of
the table element and one or more tr, th, and td elements. The tr element
defines a table row, the th element defines a table header, and the td
element defines a table cell.
Example:
<table border="1">
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table>
The <th> tag defines a header cell in an HTML table.
Website Design Unit 3
Sikkim Manipal University Page No. 81
An HTML table has two kinds of cells:
Header cells – contains header information (created with the th element)
Standard cells – contains data (created with the td element)
The text in a th element is bold and centered.
The text in a td element is regular and left-aligned.
Example:
<table border="1">
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table>
The <tr> tag defines a row in an HTML table.
A tr element contains one or more th or td elements.
<table border="1">
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table>
The <td> tag defines a standard cell in an HTML table.
An HTML table has two kinds of cells:
Header cells - contains header information (created with the th element)
Standard cells - contains data (created with the td element)
The text in a th element is bold and centered.
The text in a td element is regular and left-aligned.
Website Design Unit 3
Sikkim Manipal University Page No. 82
Example:
<table border="1">
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table>
The <caption> tag defines a table caption.
The <caption> tag must be inserted immediately after the <table> tag. You
can specify only one caption per table. Usually the caption will be centered
above the table.
Example:
<table border="1">
<caption>Monthly savings</caption>
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table>
3.4.3 Table Attributes
Table 3.1: Table Attributes
Attribute Value Description
Align Left / Center/ right Specifies the alignment of a table according to surrounding text
Border pixels Specifies the width of the borders around a table
bgcolor Rgb(x,x,x) / #xxxxxx / colorname
Specifies the background color for a table
Website Design Unit 3
Sikkim Manipal University Page No. 83
The nowrap attribute
Browsers treat each table cell as though it's a browser window unto itself,
flowing contents inside the cell as they would common body contents
(although subject to special table-cell alignment properties). Accordingly, the
browsers automatically wrap text lines to fill the allotted table cell space. The
nowrap attribute, when included in a table row, stops that normal word
wrapping in all cells in that row. With nowrap, the browser assembles the
contents of the cell onto a single line, unless you insert a <br> or <p> tag,
which then forces a break so that the contents continue on a new line inside
the table cell.
3.4.4 Using Tables for Page Layout and Structure
Tables are the main method used to layout/structure Web Pages. Layout
using tables is considered by many purists as table misuse, but it is far
simpler than using Cascading Style sheets for element positioning. Although
we are going to use tables for the page layout we are going combine that
with CSS to create a very flexible and easily update page layout.
3.4.5 Creating Nested Tables
This technique means that tables can be placed within tables. Tables could
then be placed within those tables, which would create a 3rd level of
nesting, but at this point we won't go into anymore complexity than is
necessary.
Let’s say you have a page and you would like a navigational part on the left
with content on the right. You don’t want to use frames, and layers are too
fiddly. A good way to create this kind of effect is by using nested tables. A
table which contains 2 other tables, one of the 2 inner tables would be quite
narrow and on the left (for the navigation) and then the other table on the
right with the majority of the page space available to it. The example below
indicates this.
<table width="500" cellspacing="2" border="1">
<tr>
<td><div align="left"><b>The containing table</b></div>
<table width="120" cellspacing="2" cellpadding="2" align="left" border="1">
<tr>
<td>A nested table</td>
</tr>
Website Design Unit 3
Sikkim Manipal University Page No. 84
</table>
<table width="380" cellspacing="2" cellpadding="2" align="right"
border="1">
<tr>
<td>Another nested table</td>
</tr>
</table>
</td>
</tr>
</table>
This will produce something as shown in figure 3.1.
Figure 3.1: Nested table
3.4.6 Self Assessment Questions (For Section 3.3)
4. The <table> tag defines an HTML table.
5. _________defines a table for multi-dimensional data arranged in rows
and columns.
3.5 Web Site Layout and Design
This section deals with layout design, design heuristics, content
organization, page size and load time.
3.5.1 Layout and Design Heuristics
A few basic design principles that every developer should have a fairly good
understanding of can be picked up or gleaned from a quick run through of
any print or Web design text.
Website Design Unit 3
Sikkim Manipal University Page No. 85
Choice heuristics that is found in today’s best on-screen presentations,
whether in Web, television or cinematic media, are the rule of thirds and the
divine proportion.
The rule of thirds is widely known to be used in photography, and it more or
less states that dividing an image into nine equal part can help one
aesthetically lead the viewers eye to most important sections of the piece.
Imagine overlaying a three by three grid on a photograph. The intersections
of these gridlines can help you align the main features of your image.
The divine proportion is a similar guideline that comes in extremely handy
for Web media. Also known as the golden ratio, the divine proportion is in
effect if the ratio between the sum of two line segments and the larger
segment is equivalent to the ratio between the smaller segment and the
larger segment. When expressed algebraically, the divine proportion is
equivalent to 1:1.61803… or 1:phi.
Shapes that are constructed with the divine proportion can be used to frame
your Web site so that the smaller segment in the ratio makes up your
sidebar or header, while the larger segment forms your content division or
main section division.
3.5.2 Content Organization
The objectives of content assessment and organization are to gather a list of
the necessary content and to organize that content relative to your
audience's needs. This process works "hand in glove" with the process of
defining your Audience. Both these processes require that you have
defined the Purpose of your website.
Create a list of all the information sources, services, processes, and other
content you offer (or plan to offer) that can be made available through the
Web. Eliminate items that don't directly advance the purpose of your site or
may not fulfill audience objectives.
Note: at this time it may be a good opportunity to enlist a focus group of
your audience to help define and describe your offerings.
1. Assess your service offerings by mapping them to the audience based
on their needs.
2. Next, categorize the items in your content inventory according to both
user needs and the purpose of your site.
Website Design Unit 3
Sikkim Manipal University Page No. 86
3. For example, if you have content that concerns the graduation process
and part of your purpose is to offer that content to your users, then
graduation may be a likely category. Continue to group all of your
content into their respective categories.
4. After all the content is categorized, organize the content within each
category by its relative importance to users. Finally, name each category
with a concise and descriptive title. These will become your main
"category" links for your Web site.
5. By completing this process you have collected content that satisfies the
needs of your target audience, categorized your content into groups that
form the foundation for your site structure, and prioritized the relative
importance of the content in each category.
3.5.3 Page size and Load Time Optimization
A really good graphic does indeed convey a great deal of information in a
remarkably economical way. A really bad graphic (or worse, several really
bad graphics) merely adds overhead to your page. Each and every graphic
on your site must contribute enough to the page it’s on to make it worth the
time it takes for that graphic to load. Any graphic that can't "pull its own
weight" is ultimately parasitic, and really ought to be summarily discarded
from your Website. (Be forewarned: As you peruse the following list, you will
probably find your favorite graphical doo-dad targeted for elimination. That's
precisely why your readers are unhappy.) Some of the most likely
candidates for removal include:
Separator bar graphics: At 4 bytes, the <HR> tag transmits in a few
hundredths of a second, and it's a perfectly adequate tool for breaking
up a page visually. In contrast, separator bar graphics eat hundreds of
bytes -- and many eat over 1,000! It's hard to believe that putting a fancy
curly-q at the end of an otherwise horizontal line truly makes it 150 times
better as a text separator.
Oversized icons: Many Websites employ icons that are much larger
and more elaborate than they need to be. Icons are fundamentally
different from other graphics. For an icon to be effective it does not need
to be realistic, but merely recognizable. Anything more elaborate is
ultimately wasteful.
Website Design Unit 3
Sikkim Manipal University Page No. 87
The ubiquitous imagemap: Text-based navigational aids typically
require (at most) a few dozen bytes, whereas imagemaps can easily eat
bandwidth by the 10's of K. And, as catchy as your imagemap may be, it
probably doesn't render your site thousands of times more navigable
than a simple, readable, text-based nav-bar.
Unoptimized banner graphics: A well-placed, well-optimized logo
graphic is a great way to unify a site visually. But unoptimized banner
graphics can kill a page's load time. It's a rare logo that can't be
brutalized.
Pictures of words
For sheer bandwidth-guzzling, there's probably nothing more wasteful
than GIF images of words. If you're rendering words, nothing transmits
faster than simple text.
Once you've eliminated the leech graphics from your site, the surviving
graphics must be optimized for load time. Optimization takes only a few
moments to do, and it can have a stunning payback in terms of load time
and reader satisfaction. It's really fairly easy to cut load time in half, without
sacrificing visual appeal.
Recommended page size:
The 0-10K range qualifies as exemplary
Pages between 10-20K rate as well-optimized
The 20-40K range is merely adequate
40-60K pages earn a dubious designator
Anything over 60K is unacceptable
One of the keys to building a reader-friendly Website is to provide readers
with navigational shortcuts. Provide a link to at least one "master"
navigational page on each and every page on your Website. That master
page can be an alphabetical index, a topical table of contents, or a
comprehensive site map.
Ideally, every page on your site should be accessible from the master
page(s). That way, any page on your site is accessible from every other
page with a minimal number of mouse clicks.
Another alternative is to provide a high-level navigator bar on every page.
This approach, when used well, can add significantly to the visual
Website Design Unit 3
Sikkim Manipal University Page No. 88
consistency of a site. It's a fair amount more work, however, than the
"master" page approach, particularly for larger sites.
3.5.4 Navigation Styles
The first step in developing your navigation scheme is to think about how
your information is best presented. According to Information Architecture for
the World Wide Web, the de facto authority on navigation, there are three
basic types of navigation:
Hierarchical
Hierarchical applies to sites that are information-rich and are best organized
as a large tree, much like a library.
Global
Global applies to sites where you can easily and logically jump among all
points; this is best if you are presenting information in fewer, broader
categories.
Local
Local navigation sits somewhere in between. This applies when you have
depth of information within broader areas.
The most basic form of navigation is the embedded link. That's just
anyplace where you link text within the body of the page.
Styles of navigation
Embedded links: the most basic form of navigation.
Bread-crumb trail: if you're organizing large amounts of information.
Left/top/pop-up nav bar: Most common, generally usable.
Tab navigation: When breaking into a few primary categories.
Site map: One-stop shopping for everything on your site.
Mix and match navigation schemes for optimal usability.
3.5.5 Providing Navigational Feedback
Presentation of navigation elements should incorporate visual and non-
visual cues that indicate the range of possible choices and the appropriate
action required to make a choice. The navigation system should also
provide the user with feedback so that they know if their actions have been
successful.
Website Design Unit 3
Sikkim Manipal University Page No. 89
The basic coding language for Web pages, HTML, incorporates cues and
feedback mechanisms for the user. For example, when the mouse moves
over an image or text containing a link the default setting is for the cursor
appearance to change from an arrow to a hand with a pointing finger. This
suggests to the user that the item is some how different to the surrounding
material and provides a clue to the possible action that is required. The use
of different default colors for visited and unvisited links provides feedback
about which pages of a site have already been visited.
3.5.6 Tables vs. CSS
There are 13 reasons why Cascading Style Sheets (CSS) are superior to
table-based layouts when designing a website. Some web designers swear
that table-based layouts are better than CSS-based layouts, while others
believe that table-based layouts are ancient history and XHTML combined
with CSS is the only real solution to coding a web site’s visual layout.
a) Faster page loading
b) Lowered hosting costs
c) Redesigns are more efficient
d) Redesigns are less expensive
e) Visual consistency maintained throughout website(s)
f) Better for SEO
g) Accessibility
h) Competitive edge (job security)
i) Quick website-wide updates
j) Easier for teams to maintain (and individuals)
k) Increased usability
l) More complex layouts and designs
m) No spacer gifs
3.5.7 Use of color and graphics
Color is very important in web design, and can be used to add spice to your
website, relay the mood of a page, as well as to emphasize sections of a
site. If you think about it, as soon as you look at a website, you can normally
guess within seconds what that site is all about. Just like we all are quick to
judge other people by their appearance, and surroundings by the way they
smell, look, and feel, we also judge a website by its color scheme and style
of design. We can usually tell almost immediately, whether a website is
corporate, personal, whether it is for kids, teens, or just for adults, etc. Most
Website Design Unit 3
Sikkim Manipal University Page No. 90
of this information is perceived solely by taking in color and design
elements.
What Elements Of Website Design Will Catch A Site Visitor’s Eyes?
Eyes naturally being scanning left to right
When viewing a website, a visitor’s eyes most often fixate first on the
upper left portion of the screen. Viewers often fixate on the point for a
few seconds before moving their eyes to the right and then down the
page.
Dominant, noticeable headlines tend to draw the visitor’s eyes first upon
entering the website (especially when they are in the upper left, and
most of the time when they are in the upper right.)
Website readers often read blurbs and headlines, however, they tend to
only read the first one-third of the blurb. Unfortunately, you only have
less than a second to grab the reader’s attention on these headlines.
Website visitors often will scan down to the bottom of the page to see if
something catches their eyes.
Website navigation works best on the top of the page…so try to use
navigational features on the top of your page instead of on the side or on
the bottom of the page.
Images of beautiful, clean faces, causes the visitor’s eyes to fixate on
this image.
If you display articles on your website, then try to use short paragraph
structure. Web surfers prefer short paragraphs opposed to longer ones.
And it is no surprise that we all tend to like one column formats opposed
to a newspaper format of several columns.
Details and Depth within elements of design are noticed before items
lacking depth.
The bigger a graphic or image, the longer the user will fixate on it.
Eyes always lock on the most noticeable aspect of a website, for
example color within a grey-toned website.
Ads tend to do better on the top left portion of the site. This is no
surprise considering that this is the first place people look when opening
a webpage.
Placing ads next to popular content increases an ad’s success.
Bigger banner ads did better than smaller, less noticeable ads.
Website Design Unit 3
Sikkim Manipal University Page No. 91
Text ads do better than banner ads because users tend to mistake the
text ad for a link to content within your site.
Self Assessment Questions
6. When expressed algebraically, the divine proportion is equivalent to
____.
3.6 Managing Source Files
This section deals with, folder structure, testing, production folders,
development steps, file naming and version control.
3.6.1 Recommended Folder Structure
Before you start building your website template it’s important that you set up
your folder structure correctly if you haven’t done so already for your new
website. You can start doing this by creating a folder. Create this folder
anywhere on your PC or Mac.
Once you create this folder open the folder by either clicking once on the
folder or double clicking the folder (depending how your operating system is
configured) and create a folder by the name of images inside this folder you
created. The purpose of this is that the main folder will be where you save
your webpage’s and you will place or save your images in the images folder.
So if you created a webpage and placed an image in your webpage the path
would be /images/myimage.jpg
It is very important you setup your website folder structure correctly and
there are important reasons for this. The main one is that when you design
and build your website on your computer your website webpage’s will use
the same image path that you will use on your web hosting account when
you upload your website.
When you have finished building your website you will need to login to your
hosting account and create a folder in your main directory and give it a
name identically to your images folder you created on your computer in this
case being images.
3.6.2 Testing and Production Folders
Below is a screen shot of a folder setup on a PC. The main folder name is
called www.affacademy.com and highlighted in light blue is the images
folder which is created to store the images for the website.
Website Design Unit 3
Sikkim Manipal University Page No. 92
Figure 3.2: PC Folder Setup
Below is a screen shot of a folder setup on the web hosting account for
affacademy.com. On the left hand side you can see I am in the www folder
which is the main directory for this web hosting account setup type which
may vary. In this folder I have created the images folder.
Website Design Unit 3
Sikkim Manipal University Page No. 93
Figure 3.3: Website Web hosting Server Folder Setup
The Idea behind this is that your image path will be identical so it will work
on your website once everything has been uploaded and created. If you use
a folder mapping path for example like c:\mysite\images if this is the path
being saved in the webpage’s the site will work on your PC but no one will
be able to view the images even though you have uploaded them into the
images.
3.6.3 Development Steps
A web site system development process can follow a number of standard or
company specific frameworks, methodologies, modeling tools and
languages. Software development life cycle normally comes with some
standards which can fulfill the needs of any development team. Like
software, web sites can also be developed with certain methods with some
Website Design Unit 3
Sikkim Manipal University Page No. 94
changes and additions with the existing software development process.
Let us see the steps involve in any web site development.
1. Analysis: Input: Interviews with the clients, Mails and supporting docs
by the client, Discussions Notes, Online chat, recorded telephone
conversations, Model sites/applications etc., Output: 1. Work plan,
2. Cost involved, 3. Team requirements, 4. Hardware-software
requirements, 5.Supporting documents and 6. The approval
2. Specification Building: Input: Reports from the analysis team
Output: Complete requirement specifications to the individuals and the
customer/customer's representative
3. Design and development: After building the specification, work on the
web site is scheduled upon receipt of the signed proposal, a deposit,
and any written content materials and graphics you wish to include. Here
normally the layouts and navigation will be designed as a prototype.
Some customers may be interested only in a full functional prototype. In
this case we may need to show them the interactivity of the application
or site. But in most of the cases customer may be interested in viewing
two or three design with all images and navigation. There can be a lot of
suggestions and changes from the customer side, and all the changes
should be freezed before moving into the next phase. The revisions
could be redisplayed via the web for the customer to view. As needed,
customer comments, feedback and approvals can be communicated by
e-mail, fax and telephone.
Figure 3.4: Development life cycle
Website Design Unit 3
Sikkim Manipal University Page No. 95
4. Content writing: This phase is necessary mainly for the web sites.
There are professional content developers who can write industry
specific and relevant content for the site. Content writers to add their text
can utilize the design templates. The grammatical and spelling check
should be over in this phase.
Input: Designed template
Output: Site with formatted content
5. Coding: Input: The site with forms and the requirement specification
Output: Database driven functions with the site, Coding documents
6. Testing: Input: The site, Requirement specifications, supporting
documents, technical specifications and technical documents
Output: Completed application/site, testing reports, error logs, frequent
interaction with the developers and designers
7. Promotion: Input: Site with content, Client mails mentioning the
competitors
Output: Site submission with necessary meta tag preparation
8. Maintenance and Updating: Web sites will need quite frequent
updations to keep them very fresh. In that case we need to do analysis
again, and all the other life cycle steps will repeat. Bug fixes can be
done during the time of maintenance. Once your web site is operational,
ongoing promotion, technical maintenance, content management &
updating, site visit activity reports, staff training and mentoring is needed
on a regular basis depend on the complexity of your web site and the
needs within your organization.
Input: Site/Application, content/functions to be updated, re-Analysis
reports
Output: Updated application, supporting documents to other life cycle
steps and teams.
3.6.4 File Naming
File name conventions, again, are the way that you name your web pages
so that search engines can use it as a method of determining what your web
page is about. It's important that you use these names wisely and not abuse
them.
Website Design Unit 3
Sikkim Manipal University Page No. 96
Consider this; you have a web page about search engine tools. If you didn't
use a standard file naming convention you may name it webpage1.htm. A
search engine could still crawl the web page to determine the subject matter
but they give relevancy to the file names.
A better way to name the page would be to name it search-engine-
tools.com. This is specifically telling the search engines that this page is
related to 'search engine tools'. This is valuable because at this point search
engines will use this data when determining the subject matter of your web
page. When naming your files it's recommended that you separate your
keywords with a dash. When you're naming a page it's important to find out
if the keywords are even searched.
It's important to note that you don't want to name files names like:
file-naming-conventions-best-practice-should-i-use-a-dash-or-
underscore.htm.
This isn't what the search engines have in mind when they try to determine
relevancy. With that said, your job is to make search engines realize how
relevant and unique the content is on your web page.
3.6.5 Version Control
Version control is a special kind of software used to track and manage
changes. Example, CVS version control is used to track any sort of change
made to our web sites, whether it's a single edit of one file to fix a typo, or a
series of adjustments to a project where several files, folders, and graphics
are added to (or removed from) the site.
In an uncontrolled site where multiple authors have access to edit and
contribute, the potential for conflict and problems arises – more so when
these authors work from different offices at different times of day and night.
You may spend the day improving the file index.html for a customer. After
you've made your changes, another developer who works at home after
hours, or in another office, may spend the night uploading their own newly
revised version of the file index.html, completely overwriting your work with
no way to get it back.
With the same site under CVS version control, the late-night author will be
alerted to a conflict with the file index.html, presented with the exact parts of
the index.html file that are causing a problem, and asked to adjust their work
Website Design Unit 3
Sikkim Manipal University Page No. 97
to incorporate anything you added and committed to the site while working
on it earlier in the day.
If a customer needs to remove a recently added page or content area for
legal reasons--or if they simply prefer an earlier version of their site-CVS
can be used to restore the entire site to any previous state of their choosing,
rolling back multiple variations and edits by all authors until a satisfactory
site can be put back in place.
Self Assessment Questions
7. _______ is a special kind of software used to track and manage
changes.
3.7 Foundations of Dynamic HTML
This section deals with DHTML, Netscape, external style sheet, custom
style sheets, layer tags and positioning layers.
3.7.1 DHTML Capabilities
There are four primary features of DHTML:
1. Changing the tags and properties
2. Real-time positioning
3. Dynamic fonts (Netscape Communicator)
4. Data binding (Internet Explorer)
Changing the tags and properties: This is one of the most common uses
of DHTML. It allows you to change the qualities of an HTML tag depending
on an event outside of the browser (such as a mouse click, time, or date,
and so on). You can use this to preload information onto a page, and not
display it unless the reader clicks on a specific link.
Real-time positioning: When most people think of DHTML this is what they
expect. Objects, images, and text moving around the Web page. This can
allow you to play interactive games with your readers or animate portions of
your screen.
Dynamic Fonts: This is a Netscape only feature. Netscape developed this
to get around the problem designers had with not knowing what fonts would
be on a reader's system. With dynamic fonts, the fonts are encoded and
Website Design Unit 3
Sikkim Manipal University Page No. 98
downloaded with the page, so that the page always looks how the designer
intended it to.
Data binding: This is an IE only feature. Microsoft developed this to allow
easier access to databases from Web sites. It is very similar to using a CGI
to access a database, but uses an ActiveX control to function. This feature
is very advanced and difficult to use for the beginning DHTML writer.
3.7.2 Netscape vs. Microsoft Support for DHTML
DHTML can do some pretty neat stuff. There's more to it than flash-style
effects and the ugly square spotlight. The first problem is that NS and IE
comply to the DHTML standard in two completely different ways. In many
cases, it's necessary to write 2 HTMLs per page. Netscape has chosen to
use the <layer> tag, while Microsoft's Internet Explorer treats DHTML more
like an extension of JavaScript.
The second problem is that Netscape's implementation may provide better
document control, but it relies far too heavily on a coordinate style of
authoring. This would be fine if the backbone for web design weren't
something as loose as HTML. Although with more substance than the 80s
Cola War, this Browser War is just getting tiresome, and is just as much in
the interest of the consumer. It's sad when the only way Netscape can keep
their product alive is to make it 100% incompatible with their competitor.
This divergence can only progess until they are two utterly unalike systems,
with a full set of code required for each.
3.7.3 <link> Tags and External Styles
The most commonly used type of link is the stylesheet link. This looks like:
<link href="styles.css" rel="stylesheet" type="text/css" />
The first attribute href defines the URL where the style sheet is located.
Then the rel attribute indicates that the relationship of this link is a style
sheet. Finally the type attribute tells the user agent what MIME type the
linked document will be. For style sheets this should always be "text/css".
The rel and rev attributes are where you define the type of link you're
including in your document. Rel and rev act as complementary attributes, rel
defining related links that are forward while rev defines related links that are
reverse from the current page. This is most often used in a series of pages,
Website Design Unit 3
Sikkim Manipal University Page No. 99
where you would define the rel="next" and rev="prev" links on the pages.
Most links are considered forward or "rel" links.
Alternate pages are a useful way to provide more details for your customers
and for search engines. You might define alternate natural language pages
or alternate pages in a different file format. You can do both, in fact.
To define a link to a Spanish version of the current page, you would write:
<link href="spanish.html" lang="sp" hreflang="sp" rel="alternate"
type="text/html" title="The page in Spanish" />
To define a link to a PDF version of the current page, you would write:
<link href="page.pdf" rel="alternate" type="application/pdf" title = "A PDF
version of the page" media="print" />
Another great use of the alternate type is to define alternate style sheets for
specific uses. This allows readers using user agents like Firefox to choose
between different style sheets. The most common alternate style sheet is
the zoom layout style sheet. You would define an alternate style sheet with
two types (separated by spaces) in the rel attribute:
<link href = "zoom.css" rel = "alternate stylesheet" type = "text/css" title =
"Zoom style sheet" />
Be sure to title your alternate style sheet with the title attribute so that the
browsers can display them effectively.
3.7.4 Creating Custom Styles (Classes)
A simple selector can have different classes, thus allowing the same
element to have different styles. For example, an author may wish to display
code in a different color depending on its language:
code.html { color: #191970 }
code.css { color: #4b0082 }
The above example has created two classes, css and html for use with
HTML's CODE element. The class attribute is used in HTML to indicate the
class of an element, e.g.,
<P CLASS=warning>Only one class is allowed per selector.
For example, code.html.proprietary is invalid.</p>
Classes may also be declared without an associated element:
.note { font-size: small }
In this case, the note class may be used with any element.
Website Design Unit 3
Sikkim Manipal University Page No. 100
A good practice is to name classes according to their function rather than
their appearance. The note class in the above example could have been
named small, but this name would become meaningless if the author
decided to change the style of the class so that it no longer had a small font
size.
3.7.5 <Layer> Tags
The layer tag is a new tag introduced in Netscape 4 that allows authors to
position and animate (through scripting) elements in a page. A layer can be
thought of as a separate document that resides on top of the main one, all
existing within one window.
The layer tag has been left behind in Netscape development - it may not be
supported at all by future versions.
<layer id="UNIQUE_NAME" src="URL" bgcolor= "COLOUR" width="500"
height="450" top="10" left="270" visibility="show"></layer>
Id specifies the name of the layer, enabling other layers and JavaScript
scripts to refer to it. The src specifies the pathname of a file that contains
HTML-formatted content for the layer. The height and width elements must
be fixed pixels, not percentages or the external page may not be visible. The
height and width elements cannot be altered by JavaScript in real time. This
has important implications if visitors are browsing at a high resolution. The
layer tag definition does not include scrollbars. The bgcolor specifies the
background color of the layer. The left and top attributes specify the
horizontal and vertical positions of positioned layers or the relative horizontal
and vertical positions for inflow layers.
3.7.6 Positioning Layers
This tag allows you to position blocks of contents. These blocks of
positioned content are also called layers. Navigator 4.0. Positioned blocks
of content can overlap each other, be transparent or opaque, and be visible
or invisible. They can also be nested. Use the LAYER tag to specify an
absolute position for a block of content, and use the ILAYER tag to specify a
relative position.
This example creates three overlapping layers. The back one is red, the
middle one is blue, and the front one is green.
Website Design Unit 3
Sikkim Manipal University Page No. 101
<LAYER ID=layer1 TOP=250 LEFT=50 WIDTH=200 HEIGHT=200
BGCOLOR=RED>
<P>Layer 1</P>
</LAYER>
<LAYER ID=layer2 TOP=350 LEFT=150 WIDTH=200 HEIGHT=200
BGCOLOR=BLUE>
<P>Layer 2</P>
</LAYER>
<LAYER ID=layer3 TOP=450 LEFT=250 WIDTH=200 HEIGHT=200
BGCOLOR=GREEN>
<P>Layer 3</P>
</LAYER>
3.7.7 HTML vs. DHTML
Dynamic HTML is an extension of HTML that enables, among other things,
the inclusion of small animations and dynamic menus in Web pages.
DHTML code makes use of style sheets and JavaScript.
When you see an object, or word(s), on a webpage that becomes
highlighted, larger, a different color, or a streak runs through it by moving
your mouse cursor over it is the result of adding a DHTML effect. This is
done in the language coding and when the file of the webpage was saved it
was saved as the .dhtml format instead of .htm or .html.
DHTML sites are dynamic in nature. DHTML uses client side scripting to
change variables in the presentation which affects the look and function of
an otherwise static page. DHTML characteristics are the functions while a
page is viewed, rather than generating a unique page with each page load
(a dynamic website).
On the other hand, HTML is static. HTML sites rely solely upon client-side
technologies. This means the pages of the site do not require any special
processing from the server side before they go to the browser. In other
words, the pages are always the same for all visitors - static. HTML pages
have no dynamic content.
Self Assessment Questions
8. ________ tag is a new tag introduced in Netscape 4 that allows authors
to position and animate (through scripting) elements in a page.
Website Design Unit 3
Sikkim Manipal University Page No. 102
3.8 Summary
1. The browser display window can be used to display more than one
document at a time.
2. The <frameset> tag must have either a rows or a cols attribute, and
they often have both.
3. Most HTML tags have associated properties, which store presentation
information for browsers.
4. The format of a style specification depends on the level of style sheet.
5. A Cascading Style Sheets rule is made up of a selector and a
declaration.
6. The <span> and <div> tags are very useful when dealing with
Cascading Style Sheets.
7. The TABLE element defines a table for multi-dimensional data
arranged in rows and columns.
8. The first step in developing your navigation scheme is to think about
how your information is best presented.
9. Presentation of navigation elements should incorporate visual and
non-visual cues that indicate the range of possible choices and the
appropriate action required to make a choice.
10. Color is very important in web design, and can be used to add spice to
your website, relay the mood of a page, as well as to emphasize
sections of a site.
3.9 Terminal Questions
1. Explain <frameset> tag and its attributes.
2. Briefly explain how the style sheets can be used to support multiple
browsers.
3. Explain the various tags used in Table.
4. Bring out the differences between Tables and CSS.
5. Explain the development steps in a web site construction.
3.10 Answers
1. <frame> tag
2. style sheet
3. Cascading Style Sheets
4. The <table>
Website Design Unit 3
Sikkim Manipal University Page No. 103
5. TABLE element
6. 1:1.61803… or 1:phi.
7. Version control
8. The layer
Terminal Questions
1. The content of a frame is specified with the <frame> tag, which can
appear only in the content of a frameset element. (Refer section 3.2)
2. It is not impossible to design a single style sheet that works properly on
all the different browsers. (Refer section 3.3.5)
3. The TABLE element defines a table for multi-dimensional data
arranged in rows and columns. (Refer section 3.4)
4. There are 13 reasons why Cascading Style Sheets (CSS) are superior
to table-based layouts when designing a website. (Refer section 3.5.6).
5. A web site system development process can follow a number of
standard or company specific frameworks, methodologies, modeling
tools and languages. (Refer section 3.6.3)
Website Design Unit 4
Sikkim Manipal University Page No. 104
Unit 4 XML Programming – I
Structure:
4.1 Introduction
Objectives
4.2 The Need for XML
Introduction
Structured Data and Formatting
Advantages of XML
SGML, XML, and HTML
World Wide Web Consortium (W3C) Specifications and Grammars
XML Applications and Tools
Creating and Viewing XML Documents
Transforming XML Documents
4.3 XML Document Syntax
4.4 Validating XML Documents with DTDs
4.5 XML Namespaces
4.6 Summary
4.7 Terminal Questions
4.8 Answers
4.1 Introduction
XML is far more than a solution to the deficiencies of HTML. It provides a
simple and universal way of storing textual data of any kind. In this chapter,
you are going to study the need of XML, the XML document structure and
XML namespaces.
Objectives:
After studying this unit, you should be able to:
discuss the need of XML
describe the XML Document Syntax
write DTD files
use XML Namespaces
Website Design Unit 4
Sikkim Manipal University Page No. 105
4.2 The Need for XML
XML stands for Extensible Markup Language. XML is a markup language
much like HTML. XML was designed to carry data, not to display data.
XML tags are not predefined. You must define your own tags. XML is
nothing special. It is just plain text. Software that can handle plain text can
also handle XML. However, XML-aware applications can handle the XML
tags specially. The functional meaning of the tags depends on the nature of
the application. XML is now as important for the Web as HTML was to the
foundation of the Web. XML is everywhere. It is the most common tool for
data transmissions between all sorts of applications, and is becoming more
and more popular in the area of storing and describing information.
4.2.1 Structured Data and Formatting
An XML document has two correctness levels:
Well-formed. A well-formed document conforms to the XML syntax
rules; e.g. if a start-tag (< >) appears without a corresponding end-tag
(</>), it is not well-formed. A document not well-formed is not in XML; a
conforming parser is disallowed from processing it.
Valid. A valid document additionally conforms to semantic rules, either
user-defined or in an XML schema, especially DTD; e.g. if a document
contains an undefined element, then it is not valid; a validating parser is
disallowed from processing it.
4.2.2 Advantages of XML
XML is used in many aspects of web development, often to simplify data
storage and sharing.
XML Separates Data from HTML:
If you need to display dynamic data in your HTML document, it will take a lot
of work to edit the HTML each time the data changes. With XML, data can
be stored in separate XML files. This way you can concentrate on using
HTML for layout and display, and be sure that changes in the underlying
data will not require any changes to the HTML. With a few lines of
JavaScript, you can read an external XML file and update the data content
of your HTML.
XML Simplifies Data Sharing
In the real world, computer systems and databases contain data in
incompatible formats.
Website Design Unit 4
Sikkim Manipal University Page No. 106
XML data is stored in plain text format. This provides a software- and
hardware-independent way of storing data. This makes it much easier to
create data that different applications can share.
XML Simplifies Data Transport
With XML, data can easily be exchanged between incompatible systems.
One of the most time-consuming challenges for developers is to exchange
data between incompatible systems over the Internet. Exchanging data as
XML greatly reduces this complexity, since the data can be read by different
incompatible applications.
XML Simplifies Platform Changes
Upgrading to new systems (hardware or software platforms), is always very
time consuming. Large amounts of data must be converted and
incompatible data is often lost. XML data is stored in text format. This makes
it easier to expand or upgrade to new operating systems, new applications,
or new browsers, without losing data.
XML Makes Your Data More Available
Since XML is independent of hardware, software and application, XML can
make your data more available and useful. Different applications can access
your data, not only in HTML pages, but also from XML data sources. With
XML, your data can be available to all kinds of "reading machines"
(Handheld computers, voice machines, news feeds, etc), and make it more
available for blind people, or people with other disabilities.
4.2.3 SGML, XML and HTML
Standard Generalized Markup Language (SGML)
SGML is a metalanguage or a language that describes another language
and has been an international standard for describing electronic text since
1986. SGML does not provide the definitive list of allowed elements or
definitions of specific elements, but rather provides the rules as to how
elements can be used or interact within a document. SGML is not a
language that dictates how text is formatted. The power comes in that
SGML encodes the semantics or meaning of text, which is totally separated
from how the text is rendered or appears on paper or on the screen. In
SGML, presentation is separated from content, which allows programmers
to write many different applications that take care of how the document is
displayed.
Website Design Unit 4
Sikkim Manipal University Page No. 107
No Web browser can display SGML as is, so an application must be written
to convert the SGML into a language like HTML that the browser can
understand.
HyperText Markup Language (HTML)
HTML is just a specific application of SGML with a specific set of rules that
are defined in a HTML DTD. The DTD is constructed, refined, and published
as a specification through committee work at the World Wide Web
Consortium. The most recent version of the HTML specification is 4.01 and
all current Web browsers should have a built-in ability to interpret this
specification. Each Web page that is created is supposed to declare which
HTML specification it is using, which helps the browser out with its
interpretation. The following is an example of a DTD declaration from a Web
page:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
The biggest differences between SGML and HTML are that HTML‟s element
set is based more on presentation than on meaning or semantics, and the
HTML element set is predetermined by the specification. The specification
does not allow users to create their own elements or produce their own
DTDs. This has often been a point of contention as the major Web browser
software companies have introduced their own takes on HTML, which may
or may not be supported by the competitors‟ browsers.
eXtensible Markup Language (XML)
XML is designed to bring some of the features of SGML to the Web and
other output options like print. So, like SGML and unlike HTML, developers
can define their own DTDs or XML schema. XML is a metalanguage just like
SGML, and the extensible part of XML refers to the ability of developers to
define their own elements. XML is a simplified version of SGML, making
XML easier for developers to work with and deploy than SGML, while
providing the flexibility, semantic structures, and ability to exchange data
that is not available in HTML.
4.2.4 World Wide Web Consortium (W3C) Specifications and
Grammars
Here is a one line description of the W3C Groups that are listed and linked
in this section of the W3C website:
Website Design Unit 4
Sikkim Manipal University Page No. 108
XML Coordination Group whose functions can be viewed on the XML
CG Charter link
XML Core Working Group whose function is development and
maintenance of the specs for XML and intimately related other specs
e.g. Namespace specs in XML.
XSL Working Group develops specifications for Extensible Style sheet
Language (XSL), including XSL Formatting Objects (XSL/FO) and XSL
Transformations (XSLT).
The Efficient XML Interchange Working Group looks after the
development of exchanging XML documents.
XML Binary Characterization Working Group investigated whether it
was necessary to develop a binary Interchange format. The reports and
recommendations can be found on the public pages of this working
group.
The XML Processing Model Working Group works on development of
a scripting language.
XML Linking Working Group is now deprecated but their work can be
found on the group page.
XML Query Working Group is working on ways to provide a flexible
query language to extract data from a XML document.
XML Schema Working Group works to provide protocols and
specifications to define and describe the content, structure, and is
looking at defining and describe the semantics of XML documents.
Service Modeling Language Working Group to find ways to define
and support extensions to the XML Schema language.
4.2.5 XML Applications and Tools
XML applications are software programs that process and manipulate data
using XML technologies including XML, XSLT, XQuery, XML Schema,
XPath, Web services, etc. Stylus Studio already provides many intuitive
tools for working with all of the above and now using XML pipeline you can
design a complete XML application from start to finish! For example, you
can visually specify the order in which different XML processing steps
should occur, and can even debug the entire application and deploy it to
your production environment in just minutes.
Website Design Unit 4
Sikkim Manipal University Page No. 109
A Sample XML Application
In the following sample XML application, we'll building an order report. This
will involve some XML processing, for example, applying various XML
operations (converting, parsing, validating, transforming and publishing
XML) on several data sources. The order report XML application is
displayed in figure 4.1.
Figure 4.1: XML application
The steps involved in creating this XML application include:
1. Getting a catalog of books from a Text File
2. Getting an Order from an EDIFACT file
3. Using XQuery to extract the order information
4. Using XSLT to publish an HTML order report
5. Using XQuery to generate an XSL:FO style sheet
6. Using XSL:FO to publish a PDF order report
Following are some of the XML tools used:
Sense: X
Is an intelligent, XML-aware editing feature that provides XML sensing, XML
tag completion, syntax coloring, and more. It's the best XML editor in the
industry!
Website Design Unit 4
Sikkim Manipal University Page No. 110
Integrated XML Schema/DTD Validator
Creating valid XML documents is simple and easy using an integrated XML
Schema/DTD Validator which automatically finds and highlights errors, and
provides detailed error messages.
XML Canonicalizer
It provides an easy way to convert any XML document into W3C-standard
XML canonical form.
XML Generator
The XML Generator automatically creates well-formed & valid XML sample
instance documents from any XML Schema in a highly customizable way.
XML Code Folding
The XML Editor features code folding to help maximize valuable screen
real-estate and simplify editing of large XML files.
4.2.6 Creating and Viewing XML Documents
XML documents form a tree structure that starts at "the root" and branches
to "the leaves". XML documents use a self-describing and simple syntax:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line is the XML declaration. It defines the XML version (1.0) and the
encoding used. The next line describes the root element of the document
(like saying: "this document is a note"). i.e. <note>. The next 4 lines
describe 4 child elements of the root (to, from, heading, and body). And
finally the last line defines the end of the root element: </note>.
The XML document will be displayed with color-coded root and child
elements. A plus (+) or minus sign (-) to the left of the elements can be
clicked to expand or collapse the element structure. To view the raw XML
source (without the + and - signs), select "View Page Source" or "View
Source" from the browser menu.
Website Design Unit 4
Sikkim Manipal University Page No. 111
4.2.7 Transforming XML Documents
XML processing techniques that are available in the browser have
complements on the server, and then some. Server-based processing can
deliver formatted displays of XML to the browser; it also can perform a full
range of data maintenance activities to modify documents and to enable
sharing of XML data between servers.
On the server there are file-based, memory-based, and stream-based
methods of accessing XML documents. File-based methods input XML files
and XSLT style sheets to transform XML into XHTML for display in the
browser. Memory-based methods use the Document Object Model (DOM)
to access and process full in-memory representations of XML documents.
Stream-based methods provide simple read and write capabilities through
which XML documents are accessed and output is produced an element at
a time.
Self-assessment Questions
1. ________designed to bring some of the features of SGML to the Web
and other output options like print.
4.3 XML Document Syntax
This section deals with structure, elements, tags, XMl declaration, type
declaration, start & end tags and elements attributes.
Well Formed Structure
A textual object is a well-formed XML document if:
1. Taken as a whole, it matches the production labeled document.
2. It meets all the well-formedness constraints given in this specification.
3. Each of the parsed entities which is referenced directly or indirectly
within the document is well-formed.
Elements and Tags
XML Element: XML is a markup language that is used to store data in a
self-explanatory manner. Making the data "self-explanatory" comes about by
containing information in elements. If a piece of text is a title then it will be
contained within a "title" element.
XML Tag: A tag is just a generic name for a <element>. An opening tag
looks like <element>, while a closing tag has a slash that is placed before
Website Design Unit 4
Sikkim Manipal University Page No. 112
the element's name: </element>. From now on we will refer to the opening
or closing of an element as open or close tags. All information that belongs
to an element must be contained between the opening and closing tags of
an element.
The XML Declaration
All XML documents begin with an XML declaration, which has the
appearance of a processing instruction but technically is not one. The XML
declaration identifies the document as being XML and provides the version
number of the XML standard being used. It may also specify an encoding
standard.
Example: <?xml version = “1.0” encoding = “utf-8”?>
Document Type Declaration
Document Type Declaration are information for the parser, upon which the
validity of XML documents are checked. Document Type Declaration is a
XML mechanism that defines the constraints of the logical structure and
supports the use of predefined storage units. The document Type
declaration can contain the following:
1. Document name
2. Reference to an external DTD (Document Type Definition)
3. Markup declaration (internal DTD)
4. Parameter entity references
Start and End Tags
In XML a Tag is what is written between angled brackets, i.e. XML tags
open with the < symbol and end with the > symbol. They always come in
and matched pairs, with the defined element between the open and close
tag. An example of a start-tag: <composer>. An example of an end-tag:
</composer>.
Empty Tags
The elements that do not include content must use a tag with the following
form:
<element_name />
Element Nesting
When an element appears within another element, it is said that the inner
element is "nested". Besides being such an easy term to understand,
Website Design Unit 4
Sikkim Manipal University Page No. 113
nesting also serves a wonderful purpose of keeping order in an XML
document. Much like parentheses in a math problem, elements must be
closed in the order that they are opened.
Example: <patient>
<Name>
<first> Anil </ first>
<middle> Keshav </middle>
<last> Kumar </last>
</name>
</patient>
Element Attributes
Attributes often provide information that is not a part of the data. Attribute
values must always be enclosed in quotes, but either single or double
quotes can be used. For a person's sex, the person tag can be written like
this: <person sex="female">
Comments
A comment is used to leave a note or to temporarily edit out a portion of
XML code. XML comments have the exact same syntax as HTML
comments.
Example:
<!-- Students grades are updated bi-monthly -->
Special Characters and Built in Entities
If your keyboard will not allow you to type the characters you want, or if you
want to use characters outside the limits of the encoding scheme you have
chosen, you can use a symbolic notation called „entity referencing‟. Entity
references can either be numeric, using the decimal or hexadecimal
Unicode code point for the character (eg if your keyboard has no Euro
symbol (€) you can type €); or they can be character, using an
established name which you declare in your DTD (eg <!ENTITY euro
"€">) and then use as € in your document.
If you use XML with no DTD, then these five character entities are assumed
to be predeclared, and you can use them without declaring them:
< – The less-than character (<) starts element markup (the first character
of a start-tag or an end-tag).
Website Design Unit 4
Sikkim Manipal University Page No. 114
& – The ampersand character (&) starts entity markup (the first
character of a character entity reference).
> – The greater-than character (>) ends a start-tag or an end-tag.
" – The double-quote character (") can be symbolized with this
character entity reference when you need to embed a double-quote inside a
string which is already double-quoted.
&apos – The apostrophe or single-quote character (') can be symbolised
with this character entity reference when you need to embed a single-quote
or apostrophe inside a string which is already single-quoted.
CDATA Sections
CDATA Sections are used to escape blocks of text containing characters
which would otherwise be recognized as markup. The content of a character
data section is not parsed by the XML parser, so it cannot include any tags.
The form of a character data section is as follows:
<! [ CDATA[content ] ] >
For example, instead of using the line “the last word of the line is
>>> here <<<” the following line could be used:
<! [ CDATA [The last word of the line is >>> here <<<] ] >
Embedded XML
It‟s possible to simply embed the XML into the HTML document itself. The
benefit of this is that it simply avoids an extra round-trip to the server. We
can write Java Script code at the client side to validate the XML code.
External XML
It is also possible to write a separate XML document and include in a HTML
document. This can be done with the help of a JavaScript code.
Self-assessment Questions
2. _________are information for the parser, upon which the validity of XML
documents are checked.
3. _________are used to escape blocks of text containing characters
which would otherwise be recognized as markup.
Website Design Unit 4
Sikkim Manipal University Page No. 115
4.4 Validating XML documents with DTDs
This section deals with, data validation, document type definition, internal &
external DTD, parsers, sub elements and IDREFS types
The Concept of Data Validation
The main requirement of the data validation is to determine whether all
documents confirm to the rule it describes. Application programs that
process the data in the collection of XML documents can be written to
assume the particular document form. Without such structural restrictions,
developing such applications would be difficult.
Writing Document Type Definition (DTD) Files
A Document Type Definition (DTD) is a set of structural rules called
declarations, which specify a set of elements that can appear in the
document as well as how and where these elements may appear. Not all
XML documents need a DTD. DTDs are used when the same tag set
definition is used by a collection of documents, perhaps by a collection of
users, and the collection must have a consistent and uniform structure.
The purpose of a DTD is to define a standard form for a collection of XML
documents. This form is specified as the tag and attributes sets, as well as
rules that define how they can appear in a document. DTDs also provide
entity definitions. All documents in the collection can be tested against the
DTD to determine whether they conform to the rules it describes.
Internal and External DTDs
A DTD can be embedded in the XML document whose syntax rules it
describes, in which case it is called an internal DTD. The alternative is to
have the DTD stored in a separate file, in which case it is called an external
DTD. Because external DTDs allow use with more than one XML document,
they are preferable.
If the DTD is included in the XML code, it must be introduced with
<!DOCTYPE rootname [ and terminated with ]>. For example, the
structure of the planes XML document with its DTD included is as follows:
<?xml version = “1.0” encoding =”utf-8” ?>
<!DOCTYPE planes [
<!-- the DTD for planes -->
]>
Website Design Unit 4
Sikkim Manipal University Page No. 116
<!--The planes XML Document -->
When you use an external DTD, the XML document includes a DOCTYPE
declaration as its second line. This declaration has the following form:
<!DOCTYPE XML_document_root_name SYSTEM “DTD_file_name”>
<!--The XML Document -->
Validating Parsers
All modern browsers have a built-in XML parser that can be used to read
and manipulate XML. The parser reads XML into memory and converts it
into an XML DOM object that can be accessed with JavaScript. There are
some differences between Microsoft's XML parser and the parsers used in
other browsers. The Microsoft parser supports loading of both XML files and
XML strings (text), while other browsers use separate parsers. However, all
parsers contain functions to traverse XML trees, access, insert, and delete
nodes (elements) and their attributes.
Specifying valid elements and sub elements
The element declarations of a DTD have a form that is related to that of the
rules of context-free grammars. Each element declaration in a DTD
specifies the structure of one category of elements. The declaration
provides the name of the element whose structure is being defined, along
with the specification of the structure of that element. An element is a node
in such a tree, either a leaf node or an internal node. If the element is leaf
node, its syntactic description is its character pattern. If the element is an
internal node, its syntactic description is a list of its child elements, each of
which can be either a leaf node or an internal node.
The form of an element declaration for elements that contain elements is as
follows:
<!ELEMENT element_name ( list of names of child elements ) >
For example, consider the following declaration:
<!ELEMENT memo ( from, to, date, re, body ) >
This element declaration would describe the document tree structure shown
in figure 4.2.
Website Design Unit 4
Sikkim Manipal University Page No. 117
Figure 4.2: Document tree structure
Any child element specification can be followed by one of the modifiers.
Modifier Meaning
+ One or more occurrences
* Zero or more occurrences
? Zero or more occurrence
Consider the following DTD declaration:
<!ELEMENT person (parent+, age, spouse?, sibling* )>
In this example, a person element is specified to have the following children
elements: one or more parent elements, one age element, possibly a
spouse element, and zero or more sibling elements.
The leaf nodes of a DTD specify the data types of the content of their parent
nodes, which are elements. In most cases, the content of an element is type
PCDATA, for parsable character data. Two other content types can be
specified: EMPTY and ANY. The EMPTY type is used to specify that the
element has no content. The ANY type is used when the element may
contain literally any content. The form of a leaf element declaration is as
follows:
<!ELEMENT element_name ( #PCDATA ) >
Specifying Valid Attributes
The attributes of an element are declared separately from the element
declaration in a DTD. An attribute declaration must include the name of the
element to which the attribute belongs, the attribute‟s name, and its type.
Also, it may include a default value. The general form of an attribute
declaration is as follows:
Website Design Unit 4
Sikkim Manipal University Page No. 118
<!ATTLIST element_name attribute_name attribute_type
[default_value]>
The main attribute_type is CDATA. This type is just any string of characters.
The default value in an attribute declaration can specify either an actual
value or a requirement for the value of the attribute in the XML document.
The following table lists the possible default values.
Table 4.1: CDATA
Value Meaning
A value The value, which is used if none is specified in an element
#FIXED value The value, which every element will have and which cannot be changed
#REQUIRED No default value is given; every instance of the element must specify a value
#IMPLIED No default value is given; the value ay or may not be specified in an element
Examples:
<!ATTLIST airplane places CDATA “4”>
<!ATTLIST airplane engine_type CDATA #REQUIRED>
Specifying Valid Entities
Entities can be defined so that they can be referenced any where in the
content of an XML document, in which case they are called General
Entities. The predefined entities are all general entities. Entities can also be
defined so that they can be referenced only in markup declarations, in which
case they are called Parameter Entities. The form of an entity declaration
that appears in a DTD is shown here:
<!ENTITY [%] entity_name “entity_value”>
When the optional percent sign (%) is present in an entity declaration, it
specifies that the entity is a parameter entity rather than a general entity.
Consider the following example of an entity. Suppose that a document
includes a large number of references to the full name of President
Kennedy. You could define an entity to represent his complete name:
<!ENTITY jfk “John Fitzgerald Kennedy”>
Website Design Unit 4
Sikkim Manipal University Page No. 119
Any XML document that uses the DTD that includes this declaration can
specify the complete name with just the reference &jfk;
The ID, IDREF and IDREFS Types
An attribute can be specified to be an ID type attribute. Attributes specified
as IDREF or IDREFS can then be used to refer to the ID type attributes,
enabling links between documents. ID, IDREF, and IDREFS correspond to
PK/FK (primary key/foreign key) relationships in the database, with few
differences. In an XML document, the values of ID type attributes must be
distinct. If CustomerID and OrderID attributes are specified as ID type in an
XML document, these values must be distinct. However, in a database,
CustomerID and OrderID columns can have the same values. (For
example, CustomerID = 1 and OrderID = 1 are valid in the database).
For the ID, IDREF, and IDREFS attributes to be valid:
The value of ID must be unique within the XML document.
For every IDREF and IDREFS, the referenced ID values must be in the
XML document.
The value of an ID, IDREF, and IDREFS must be a named token. (For
example, the integer value 101 cannot be an ID value.)
The NMTOKEN and NMTOKENS Type
An XML name token is very close to an XML name. It must consist of the
same characters as an XML name. Furthermore, like an XML name, an
XML name token may not contain whitespace. However, a name token
differs from an XML name in that any of the allowed characters can be the
first character in a name token, while only letters, ideographs, and the
underscore can be the first character of an XML name. Thus 12 and .cshrc
are valid XML name tokens although they are not valid XML names. Every
XML name is an XML name token, but not all XML name tokens are XML
names.
Example:
<!ATTLIST journal year NMTOKEN #REQUIRED>
This still doesn't prevent the document author from assigning the year
attribute values like "99" or "March", but it at least eliminates some possible
wrong values, especially those that contain whitespace such as "1990 C.E."
or "Sally had a little lamb."
Website Design Unit 4
Sikkim Manipal University Page No. 120
A NMTOKENS type attribute contains one or more XML name tokens
separated by whitespace. For example, you might use this to describe the
dates attribute of a performances element, if the dates were given in the
form 08-26-2000, like this:
<performances dates="08-21-2001 08-23-2001 08-27-2001">
Kat and the Kings
</performances>
The appropriate declaration is:
<!ATTLIST performances dates NMTOKENS #REQUIRED>
On the other hand, you could not use this for a list of dates in the form
08/27/2001 because the forward slash is not a legal name character.
The NOTATION Type
A NOTATION type attribute contains the name of a notation declared in the
document's DTD. This is perhaps the rarest attribute type and isn't much
used in practice. In theory, it could be used to associate types with particular
elements, as well as limiting the types associated with the element. For
example, these declarations define four notations for different image types
and then specify that each image element must have a type attribute that
selects exactly one of them:
<!NOTATION gif SYSTEM "image/gif">
<!NOTATION tiff SYSTEM "image/tiff">
<!NOTATION jpeg SYSTEM "image/jpeg">
<!NOTATION png SYSTEM "image/png">
<!ATTLIST image type NOTATION (gif | tiff | jpeg | png) #REQUIRED>
Enumeration
An enumeration is the only attribute type that is not an XML keyword.
Rather, it is a list of all possible values for the attribute, separated by vertical
bars. Each possible value must be an XML name token.
Example:
<!ATTLIST date month (January | February | March | April | May | June
| July | August | September October|November | December) #REQUIRED >
Conditional Sections
Conditional sections are portions of the Document Type Declaration or
of external parameter entities which are included in, or excluded from, the
Website Design Unit 4
Sikkim Manipal University Page No. 121
logical structure of the DTD based on the keyword which governs them. The
syntax of the conditional section is:
Conditional Section – IncludeSection | IgnoreSection
Validation Tools
There are two main types of validation tools available:
Web based tools
Standalone tools
Web-based tools are web pages that allow you to enter the path (URI) of an
XML document to have it validated. The upside to web-based tools is that
they can be used without installing special software. Just open the web
page in a web browser and go for it! The downside to web-based validation
tools is that they sometimes don't work well when you aren't dealing with
files that are publicly available on the Internet.
Standalone validation tools are tools that you must install on your computer
in order to use. These kinds of tools range from full-blown XML editors such
as XML Spy to command-line XML validators such as the W3C's XSV
validator. Standalone validation tools have the benefit of allowing you to
validate local files with ease. The drawback to these tools is that some of
them aren't cheap, and they must be installed on your computer. However, if
you don't mind spending a little money, a standalone tool can come in
extremely handy.
Self Aassessment Questions
4. ________type attribute contains the name of a notation declared in the
document's DTD.
5. _______ is the only attribute type that is not an XML keyword.
4.5 XML Namespaces
This section deals with, need for Namespaces and specifying namespaces.
4.5.1 The Need for Namespaces
XML Namespaces provide a method to avoid element name conflicts. In
XML, element names are defined by the developer. This often results in a
conflict when trying to mix XML documents from different XML applications.
Website Design Unit 4
Sikkim Manipal University Page No. 122
This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a table (a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name
conflict. Both contain a <table> element, but the elements have different
content and meaning.
4.5.2 Specifying a Namespace
An XML namespace is a collection of element names used in XML
documents. The name of a namespace usually has the form of a Uniform
Resource Identifier (URI). A namespace for the elements of the hierarchy
rooted at a particular element is declared as the value of the attribute
xmlns. The form of a namespace declaration for an element is shown here:
<element_name xmlns [: prefix] = URI>
The square brackets indicate that what is within them is optional. The prefix,
if included, is the name that must be attached to the names in the declared
namespace.
Example:
<birds xmlns: bd = “http://www.audubon.org/names/species”>
Within the birds element, including all of its children elements, the names
from the namespace must be prefixed with bd, as in the following:
<bd: lark>
One namespace declaration in an element can be used to declare a default
namespace. This is done by simply leaving out the prefix in the declaration.
The names from the default namespace can be used without a prefix.
Website Design Unit 4
Sikkim Manipal University Page No. 123
Consider the following example in which two namespaces are declared. The
first is declared to be the default namespace; the second defines the prefix,
cap.
<states>
xmlns = "http://www.states-info.org/states"
xmlns:cap = "http://www.states-info.org/state-capitals"
<state>
<name> South Dakota </name>
<population> 754844</population>
<capital>
<cap:name> Pierre </cap:name>
<cap:population>12429 </cap:population>
</capital>
</state>
</states>
Each state element has name and population elements from both
namespaces.
4.5.3 URLs, URIs and URNs
URI (Uniform Resource Identifier):
The resource is the conceptual mapping to an entity or set of entities, not
necessarily the entity which corresponds to that mapping at any particular
instance in time. Thus, a resource can remain constant even when its
content – the entities, to which it currently corresponds – changes over time,
provided that the conceptual mapping is not changed in the process. An
identifier is an object that can act as a reference to something that has
identity. In the case of URI, the object is a sequence of characters with a
restricted syntax.
URL (Uniform Resource Locator):
It refers to the subset of URI that identify resources via a presentation of
their primary access mechanism (e.g., their network "location"), rather than
identifying the resource by name or by some other attribute(s) of that
resource.
URN (Uniform Resource Name):
It refers to the subset of URI that are required to remain globally unique and
persistent even when the resource ceases to exist or becomes unavailable.
Website Design Unit 4
Sikkim Manipal University Page No. 124
A URN differs from a URL in that it's primary purpose is persistent labeling
of a resource with an identifier. That identifier is drawn from one of a set of
defined namespaces, each of which has its own set name structure and
assignment procedures.
4.5.4 Qualifying Names
In XML documents, some names may be given as qualified names,
defined as follows:
QName – (prefix : ) ? LocalPart
The Prefix provides the namespace prefix part of the qualified name, and
must be associated with a namespace URI in a namespace declaration. The
LocalPart provides the local name part of the qualified name.
4.5.5 Namespace Scoping
The scope of a namespace declaration declaring a prefix extends from the
beginning of the start-tag in which it appears to the end of the corresponding
end-tag, excluding the scope of any inner declarations with the same
NameSpaceAttributeName part. In the case of an empty tag, the scope is
the tag itself.
4.5.6 The HTML Namespace
Namespaces have the potential to allow new names to be introduced
without breaking validity, although this potential has not yet materialized in
the HTML world. HTML has an area of naming, separate from element and
attribute names, for which extensibility has not been completely addressed.
These are used as values of the "rel" and "rev" attributes on "a" and "link"
links and they are strings drawn from a set determined by the head's profile
attribute. The linktype and profile mechanisms are rarely used, and there is
some perception that they have never been sufficiently defined. They are,
however, well enough defined to tie them the RDF's linking mechanism,
which is an area of considerable semantic precision and extensibility. This
profile can be used to perform exactly this connection.
4.5.7 Additional Significant Namespaces
Namespaces originally designed to provide names for XML elements and
attributes have been adopted much more broadly by the web community.
They are now used not simply for elements and attributes but for function
Website Design Unit 4
Sikkim Manipal University Page No. 125
names, tokens, and identifiers for ever more purposes. The names in a
namespace form a collection:
sometimes it is a collection of element names (DocBook and XHTML, for
example)
sometimes it is a collection of attribute names (XLink, for example)
sometimes it is a collection of functions (XQuery 1.0 and XPath 2.0 Data
Model)
sometimes it is a collection of properties (FOAF)
Sometimes it is a collection of concepts (WordNet), and many other
uses are likely to arise.
4.5.8 Validating Uniqueness
Namespaces are used to uniquely identify elements with the same name
type when they are combined in a single document. The W3C
recommendations envision applications of XML where a single XML
document may contain elements and attributes that are defined for and used
by multiple software modules. Documents combining multiple markup
vocabularies pose processor validation problems of recognition and name
collision (markups using the same element type or attribute name). The
"name collisions" are a problem when validating documents. This problem
is overcome when document constructs have universally unique names,
beyond the scope of the containing document. The XML namespaces
specification describes a mechanism to accomplish this.
4.5.9 Validating Required Fields
Some of the URLs that you will use with field substitution have restrictions
on the type of data that they will accept, so GrazrScript has a rule tag that
lets you test the contents of form fields before they are substituted within a
template. The validation rules created by this tag are tested after the form is
submitted. If the user's data conforms to the validation rules, then the
normal substitution is performed. If any of the entered data fails to pass the
rules, then an error message is displayed and the form template is not run.
The key is that this is an all or nothing process. All validation rules for all
fields must be met before substitution is done.
Basic syntax
The simplest version of the rule tag takes the following form:
<grazr:rule field="[field_name]" [rule]="[value]" />
Website Design Unit 4
Sikkim Manipal University Page No. 126
4.5.10 Combining and Redefining Schemas
User demand for rich Web application content is continually increasing for
both desktop and mobile device platforms. Open, standards-based
functional XML schemas enabling rich content help ensure that such content
– and the skills required to produce it – remains ubiquitous, accessible, and
cost effective. Schemas also help ensure that this technology does not
become a proprietary format for a single or small number of vendors
constrained to specific programming frameworks or to specific renderer and
browser technologies.
XML-based, declarative functional schemas like XHTML, XForms, XML
Events, Scalable Vector Graphics (SVG), SMIL, VoiceXML, and XHTML
Mobile Profile are examples of schemas that provide specific functionality
for creating rich content.
Each functional schema pertains to a specific area of functionality. For
example, SVG addresses graphics; XForms addresses form input collection
and submission; XML Events addresses the creation of events and
listeners; and so on. However, most rich Web applications require a
combination of two or more of these functional schemas within a single
document.
Combining schemas can be problematic because not all schemas can be
embedded within other schemas. And not all schemas allow other schemas
to be embedded within them. In fact, most functional schemas assume that
they are the root schema in a single document with only one functional
namespace and that if the need arises for rich content from another
functional namespace, a separate document can be referenced with its own
root schema. For example, an XHTML document can reference an SVG
graphic in a separate document at runtime to render the graphic.
Self Assessment Questions
6. _______is a collection of element names used in XML documents.
7. _________ are used to uniquely identify elements with the same name
type when they are combined in a single document.
Website Design Unit 4
Sikkim Manipal University Page No. 127
4.6 Summary
XML is a markup language much like HTML.
XML was designed to carry data, not to display data. XML is used in
many aspects of web development, often to simplify data storage and
sharing.
The purpose of a DTD is to define a standard form for a collection of
XML documents. This form is specified as the tag and attributes sets, as
well as rules that define how they can appear in a document.
XML Namespaces provide a method to avoid element name conflicts.
4.7 Terminal Questions
1. List out the various XML tools.
2. With an example explain the need for CDATA section in XML.
3. Explain the usage of internal and external DTDs in XML.
4. How do you specify an element in DTD?
5. With an example, explain how do you specify a namespace in XML.
4.8 Answers
Self Assessment Questions
1. XML
2. Document Type Declaration
3. CDATA Sections
4. NOTATION
5. Enumeration
6. XML namespace
7. Namespaces
Terminal Questions
1. XML applications are software programs that process and manipulate
data using XML technologies including XML, XSLT, XQuery. (Refer
section 4.2.5)
Website Design Unit 4
Sikkim Manipal University Page No. 128
2. CDATA Sections are used to escape blocks of text containing
characters which would otherwise be recognized as markup. (Refer
section 4.3)
3. A DTD can be embedded in the XML document whose syntax rules it
describes, in which case it is called an internal DTD. (Refer section 4.4)
4. The element declarations of a DTD have a form that is related to that of
the rules of context-free grammars. (Refer section 4.4)
5. XML Namespaces provide a method to avoid element name conflicts.
(Refer section 4.5.1)
Website Design Unit 5
Sikkim Manipal University Page No. 129
Unit 5 XML Programming – II
Structure
5.1 Introduction
Objectives
5.2 Validating XML Documents with Schemas
5.3 Introduction to Simple Object Access Protocol (SOAP)
SOAP's Use of XML and Schemas
Elements of a SOAP Message
Sending and Receiving SOAP Messages (SOAP Clients and
Receivers)
Handling SOAP Faults
Current SOAP Implementations
5.4 Introduction to Web Services
Architecture and Advantages of Web Services
Purpose of Web Services Description Language (WSDL)
WSDL Elements
Creating and Examining WSDL Files
Overview of Universal Description, Discovery, and Integration (UDDI)
UDDI Registries (Public and Private)
Core UDDI Elements
Deploying and Consuming Web Services
ebXML Specifications ebXML Registry and Repository
5.5 Introduction to the XML Document Object Model (XMLDOM)
5.6 Summary
5.7 Terminal Questions
5.8 Answers
5.1 Introduction
In the previous unit, we have studied the XML concepts, document syntax,
DTDs, NOTATION and namespaces.
A schema is similar to a class definition. In this chapter you are going to
study how schema is used in XML. You are also going to study about an
overview of SOAP (Simple Object Access Protocol) and an introduction to
Website Design Unit 5
Sikkim Manipal University Page No. 130
the web services. Overview of the XML Document Object Model will be
discussed in this chapter.
Objectives:
After studying this unit, you should be able to:
write XML Schema
describe the features of SOAP
give overview of Web Services
discuss the purpose of XML DOM
5.2 Validating XML Documents with Schemas
You' are now ready to take a deeper look at the process of XML Schema
validation. This section shows you the steps you take to validate an XML
document using an XML Schema definition.
Schema Design Goals (Limitations of DTDs)
DTDs have several disadvantages.
DTDs are written in a syntax unrelated to XML, so they cannot be
analyzed with an XML processor. Also, it can be confusing to deal with
two different syntactic forms, one to define a document and one to
define its structure.
DTDs do not allow restrictions on the form of data that can be the
content of a particular tag
With DTDs, there are only 10 data types, none of which is numeric
Several alternatives to DTDs have been developed, all attempts to
overcome their weaknesses. XML schema, which was designed by W3C, is
one of these alternatives.
Mixing DTDs and Schemas
To promote the transition from DTDs to XML schemas, XML schema was
designed to allow any DTD to be automatically converted to an equivalent
XML schema. A schema specifies the data type of every element and
attribute of its instance XML documents. This is the area in which schemas
far outshine DTDs. A schema defines a namespace in the same sense as a
DTD defines a tag set.
Website Design Unit 5
Sikkim Manipal University Page No. 131
Schema Composition
Schemas themselves are written using a collection of names, or a
vocabulary, from a namespace that us, in effect, a schema of schemas. The
name of this namespace is http://www.w3.org/2001/XMLSchema some of
the names in this namespace are element, schema, sequence and string.
Every schema has schema as its root element. The schema element
specifies the namespace for the schema of schemas from which the
schema’s elements and attributes will be drawn. It often also specifies a
prefix that will be used for the names in the schema. This namespace
specification appears as follows:
xmlns:xsd = http://www.w3.org/2001/XMLSchems
This provides the prefix xsd for the names from the namespace for the
schema of schemas. The name of the namespace defined by a schema
must be specified with the targetNamespace attribute of the schema
element. Every top-level element that appears in a schema places its name
in the target namespace. The target namespace is specified by assigning a
namespace to the target namespace attribute, as in the following:
targetNamespace = “http://cs.uccs.edu/planeSchema”
If we want the elements and attributes that are not defined directly in the
schema element to be included in the target namespace, schema’s
elementFormDefault must be set to qualified, as in the following:
elementFormDefault = “qualified”
The default namespace, which is the source of the unprefixed names in the
schema, is given with another xmlns specification, but this time without the
prefix. For example:
xmlns =” http://cs.uccs.edu/planeSchema”
An example of a complete opening tag for a schema is as follows:
<xsd:schema
<!-- The namespace for the schema itself (prefix is xsd) -->
xmlns:xsd = “http://www.w3.org/2001/XMLSchema”
<!--The namespace where elements defined here will be placed -->
targetNamespace = “http://cs.uccs.edu/planeSchema”
<!--The default namespace for this document (no prefix) -->
Website Design Unit 5
Sikkim Manipal University Page No. 132
xmlns =” http://cs.uccs.edu/planeSchema”
<!--We want to put non-top-level elements in the target namespace -->
elementFormDefault = “qualified”
>
In this example, the target namespace and the default namespace are the
same.
Linking Schemas to XML documents
First, an instance document normally defines its default namespace to be
that defined in its schema. For example, if the root element is planes, you
could have the following:
<planes
xmlns = http://cs.uccs.edu/planeSchema
… >
The schemaLocation attribute is used to name the standard namespace
for instances, which is XMLSchema-instance. This namespace corresponds
to the XMLSchema namespace used for schemas. The following attribute
assignment specifies the XMLSchema-instance namespace and defines
the prefix, xsi, for it:
xmlns:xsi = http://www.w3.org/2001/XMLSchema-instance
Then the instance document must specify the filename of the schema where
the default namespace is defined. This is accomplished with the
schemaLocation attribute, which takes two values: the namespace of the
schema and the filename of the schema. This attribute is defined in the
XMLSchema-instance namespace, so it must be named with the proper
prefix.
For example:
xsi:schemaLocation = http://cs.uccs.edu/planeSchema planes.xsd
This is peculiar attribute assignment in that it assigns two values, which are
separated only by white space.
Altogether, the opening root tag of an XML instance of the planes.xsd
schema, where the root element name in the instance is planes, could
appear as follows:
<planes
Website Design Unit 5
Sikkim Manipal University Page No. 133
xmlns = http://cs.uccs.edu/planeSchema
xmlns:xsi = http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation = “http://cs.uccs.edu/planeSchema planes.xsd”
>
Annotation Declarations
Annotation of schemas and schema components, with material for human or
computer consumption, is provided for by allowing application information
and human information at the beginning of most major schema elements,
and anywhere at the top level of schemas. The XML representation for an
annotation schema component is an <annotation> element information item.
The correspondences between the properties of that information item and
properties of the component it corresponds to are as follows:
<annotation
id = ID
{any attributes with non-schema namespace . . .}>
Content: (appinfo | documentation)*
</annotation>
Application Information – A sequence of the <appinfo> element information
items from among the [children], in order, if any, otherwise the empty
sequence.
Element Declarations
Elements are defined in an XML schema with the element tag, which is
from the XMLSchema namespace. The prefix xsd is normally used for
names from this namespace.
Example:
<xsd:element name = “engine” type = “xsd:string” />
Here the element name is “engine” and its type is string.
An instance of the schema in which the engine element is defined could
have the following element:
<engine> inline six cylinder fuel injected </engine>
Attribute Declarations
An element that is named includes the name attribute for that purpose. The
other attribute that is necessary in a simple element declaration is type,
Website Design Unit 5
Sikkim Manipal University Page No. 134
which is used to specify the type of content allowed in the element. For
example:
<xsd:element name = “engine” type = “xsd:string” />
An element can be given a default value using the default attribute. For
example:
<xsd:element name = ”engine” type = “xsd:string” default = “fuel injected V-6” />
Elements can have constant values, meaning that the content of the defined
element in every instance document has the same value. Constant values
are given with the fixed attribute, as in the following example:
<xsd:element name = “plane” type = “xsd:string” fixed = “single wing” />
W3C Schema Data Types
XML schema defines 44 data types, 19 of which are primitive and 25 of
which are derived. The primitive data types include string, Boolean, float,
time and anyURI. The predefined derived types include byte, long, decimal,
unsignedInt, positiveInteger and NMTOKEN. User defined data types are
defined by specifying restrictions on an existing type, which is then called a
base type. Such user-defined types are derived types.
Constraints in derived types are given in terms of the facets of the base
type. For example, the integer primitive data type has eight possible facets:
totalDigits, maxInclusive, maxExclusive, minInclusive, minExclusive, pattern,
enumeration and whitespace.
Data declarations in an XML schema can be either local or global. A local
declaration is one that appears inside an element that is a child of the
schema element; that is, a declaration in a grandchild element of schema is
a local declaration. A locally declared element is visible only in that element.
A global declaration is one that appears as a child of the schema element.
Global elements are visible in the whole schema in which they are declared.
Specifying Simple Types
A simple data type is one whose content is restricted to strings. A simple
type cannot have attributes or include nested elements. The string
restriction seems like it would make simple types a very narrow type
category, but in fact it does not because a large collection of predefined data
types are included in the category. The primitive data types include string,
Website Design Unit 5
Sikkim Manipal University Page No. 135
Boolean, float, time and anyURI. The predefined derived types include byte,
long, decimal, unsignedInt, positiveInteger and NMTOKEN.
Example: <xsd:element name = “engine” type = “xsd:string” />
Regular Expressions
Regular expressions form a language for specifying sets of characters and
strings of characters. They are used in the context of pattern matching: a
regular expression forms a pattern against which strings are matched. The
pattern facet allows you to specify a regular expression. Most individual
characters match themselves. The pattern \d matches any digit. The pattern
“\d{3}” matches any sequence of 3 digits. The pattern 315-\d{3}-\d{4}
matches any telephone number in the 315 area code of the United States, in
the 315-123-4567 format.
Example: To define a simple type of US telephone numbers in the 315-123-
4567 format, use the pattern facet.
<simpleType name = “USPhoneType”>
<restriction base = “string”>
<pattern value = “ \d{3}-\d{3}-\d{4}” />
</restriction>
</simpleType>
Working with User Defined Data Types
User defined data types are defined by specifying restrictions on an existing
type. A simple user-defined data type is described in a simpleType element,
using facets. Facets must be specified in the content of a restriction
element, which gives the base type name. The facets themselves are given
in elements named for the facets, using the value attribute to specify the
value of the facet. For example, the following declares a user-defined type,
firstName, for strings of fewer than 11 characters:
<xsd:simpleType name = “firstName”>
<xsd:restriction base = “xsd:string”>
<xsd:maxLength value = “10” />
</xsd:restriction>
</xsd:simpleType>
Website Design Unit 5
Sikkim Manipal University Page No. 136
The length facet is used to restrict the string to an exact number of
characters. The minLength facet is used to specify a minimum length. The
number of digits of a decimal number is restricted with the precision facet.
For example:
<xsd:simpleType name = “phoneNumber”>
<xsd:restriction base = “xsd:decimal”>
<xsd:precision value = “7” />
</xsd:restriction>
</xsd:simpleType>
Union and List Types
List datatypes are special cases in which a structure is defined within the
content of a single attribute or element. The xs:list element is used to
define list of items. The definition of a list datatype can be done by
embedding an xs:simpleType element:
<xs:simpleType name="myIntegerList">
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:maxInclusive value="100"/>
</xs:restriction>
</xs:simpleType>
</xs:list>
</xs:simpleType>
This datatype can be used to define attributes or elements that accept a
whitespace-separated list of integers smaller than or equal to 100.
List datatypes have their own value space that can be constrained using a
set of specific facets that is common to all of them. These facets are
xs:length, xs:maxLength, xs:minLength, xs:enumeration and xs:whiteSpace.
The unit used to measure the length of a list type is always the number of
elements in the list.
Derivation by union allows defining datatypes by merging the lexical spaces
of several predefined or user datatypes. The xs:union element is used for
defining the union of different types. The definition of a union datatype can
be done by embedding an xs:simpleType element:
Website Design Unit 5
Sikkim Manipal University Page No. 137
<xs:simpleType name="myIntegerUnion">
<xs:union>
<xs:simpleType>
<xs:restriction base="xs:integer"/>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="undefined"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
Now the myIntegerUnion has the merged type of meaning given in the
example.
Specifying Complex Types
Complex types are defined with the complexType tag. The elements that are
the content of an element-only element must be contained in an ordered
group, an unordered group, a choice, or a named group. The sequence
element is used to contain an ordered group of elements. For example,
consider the following type definition:
<xsd:complexType name = “sports_car”>
<xsd:sequence>
<xsd:element name = “make” type =“xsd:string” />
<xsd:element name = “model” type =“xsd:string” />
<xsd:element name = “engine” type =“xsd:string” />
<xsd:element name = “year” type =“xsd:decimal” />
</xsd:sequence>
</xsd:complexType>
The type sport_car is the complex data type element.
A complex type whose elements are an unordered group is defined in an all
element.
Elements and all and sequence groups can include attributes to specify the
numbers of occurrences. These attributes are minOccurs and maxOccurs.
The possible values of minOccurs are the non-negative integers, including
Website Design Unit 5
Sikkim Manipal University Page No. 138
zero. The possible values for maxOccurs are the non-negative integers plus
the value unbounded. Consider the following example:
<xsd:element name ="planes"
<xsd:complexType>
<xsd:all>
<xsd:element name = "make"
type = "xsd:string"
minOccurs = "1"
maxOccurs = "unbounded"
/>
</xsd:all>
</xsd:complexType>
</xsd:element>
Notice that we use the all element to contain the single element of the
complex type, planes. We could have used sequence instead. Because
there is only one contained element, it makes no difference.
Deriving Complex Types Using Inheritance
If we want the year element in the sport_car element that was defined
earlier to be a derived type, we could define the derived type as another
global element and refer to it in the sports_car element. For example, the
year element could be defined as follows:
<xsd:element name = “year”
<xsd:simpleType>
<xsd:restriction base =“xsd:decimal”>
<xsd:minInclusive value=“1900”/>
<xsd:maxInclusive value=“2002”/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
With the year element defined globally, the sports_car element can be
defined with a reference to the year with the ref attribute as follows:
<xsd:complexType name = “sports_car”>
<xsd:sequence>
<xsd:element name = “make” type =“xsd:string” />
Website Design Unit 5
Sikkim Manipal University Page No. 139
<xsd:element name = “model” type =“xsd:string” />
<xsd:element name = “engine” type =“xsd:string” />
<xsd:element ref = “year” />
</xsd:sequence>
</xsd:complexType>
Reusable Groups
Elements and Attributes can be grouped together using <xs:group> and
<xs:attributeGroup>. These groups can then be referred to elsewhere within
the schema. Groups must have a unique name and be defined as children
of the <xs:schema> element. When a group is referred to, it is as if its
contents have been copied into the location it is referenced from.
<xs:group name="CustomerDataGroup">
<xs:sequence>
<xs:element name="Forename" type="xs:string" />
<xs:element name="Surname" type="xs:string" />
<xs:element name="Dob" type="xs:date" />
</xs:sequence>
</xs:group>
<xs:attributeGroup name="DobPropertiesGroup">
<xs:attribute name="Day" type="xs:string" />
<xs:attribute name="Month" type="xs:string" />
<xs:attribute name="Year" type="xs:integer" />
</xs:attributeGroup>
Substitution Groups
In this case, we have a simple type on one hand and a complex type with
complex content on the other, and we cannot find a type that can be
extended to both. We have no other choice but to start with the universal
type, which accepts any content model. Known as xs:anyType, this very
special type is also the default value when no type is specified, and we can
define a generic name element without giving any type definition to keep it
as open as possible:
<xs:element name="name"/>
This element will be what is known as the head of the substitution group.
Without declaring anything on this head element, other elements can
declare that they can be used wherever the head element is referenced in
Website Design Unit 5
Sikkim Manipal University Page No. 140
the schema. These elements are known as the members of the substitution
group. The one restriction on the members is their types must be valid
derivations of the type of the head element. This declaration is made
through a substitutionGroup attribute that references the head element in
each interchangeable element – for instance:
<xs:element name="simple-name" type="string32"
substitutionGroup="name"/>
<xs:element name="full-name" substitutionGroup="name">
<xs:complexType>
<xs:all>
<xs:element name="first" type="string32" minOccurs="0"/>
<xs:element name="middle" type="string32" minOccurs="0"/>
<xs:element name="last" type="string32"/>
</xs:all>
</xs:complexType>
</xs:element>
The effect of these declarations is these two elements can be used every
time the head is used in the schema, such as in the definition of the
character and author elements:
<xs:element name="character">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="born"/>
<xs:element ref="qualification"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Identity Elements
XML Schemas provide a feature that is similar to the DTD ID identity
constraint. In a DTD, the value of an ID attribute must be unique within an
XML document. In XML Schemas, the type of an identity constraint can be
unique, key, or keyref.
Website Design Unit 5
Sikkim Manipal University Page No. 141
A unique identity constraint forces the result of evaluation of an XPath
expression to be unique. Stylus Studio evaluates the XPath expression
against the element for which you define the identity constraint. If the
element is present, the result must be unique among the children of that
element.
A key identity constraint specifies that the fields that form the expression
must be present in all instance documents. For example, if a key is
based on date and number attributes, the date and number attributes
must always be specified.
A keyref identity constraint is equivalent to the IDREF attribute in DTDs.
It specifies that the contents of a field in the instance document are the
value of a key that is defined in another document. For example, a
Quote document would have a reference to the RFQ that originated it.
Self-assessment Questions
1. DTDs are written cannot be analyzed with ______.
2. The name of the namespace defined by a schema must be specified
with the _________ attribute of the schema element.
3. Data definitions in an XML schema can be either ____ or ______.
4. Complex types are defined with _____ tag.
5. In XML schemas, the type of an identity constraint can be ___, ____, or
____.
5.3 Introduction to Simple Object Access Protocol (SOAP)
SOAP is an XML-based protocol for exchanging information between
computers. Although SOAP can be used in a variety of messaging systems
and can be delivered via a variety of transport protocols, the initial focus of
SOAP is remote procedure calls transported via HTTP. SOAP therefore
enables client applications to easily connect to remote services and invoke
remote methods. For example, a client application can immediately add
language translation to its feature set by locating the correct SOAP service
and invoking the correct method.
Other frameworks, including CORBA, DCOM, and Java RMI, provide similar
functionality to SOAP, but SOAP messages are written entirely in XML and
are therefore uniquely platform- and language-independent. For example, a
Website Design Unit 5
Sikkim Manipal University Page No. 142
SOAP Java client running on Linux or a Perl client running on Solaris can
connect to a Microsoft SOAP server running on Windows 2000.
SOAP therefore represents a cornerstone of the web service architecture,
enabling diverse applications to easily exchange services and data.
5.3.1 SOAP’s Use of XML and Schemas
When exploring the SOAP encoding rules, it is important to note that the
XML 1.0 specification does not include rules for encoding data types. The
original SOAP specification therefore had to define its own data encoding
rules. Subsequent to early drafts of the SOAP specification, the W3C
released the XML Schema specification. The XML Schema Data types
specification provides a standard framework for encoding data types within
XML documents. The SOAP specification therefore adopted the XML
Schema conventions. However, even though the latest SOAP specification
adopts all the built-in types defined by XML Schema, it still maintains its own
convention for defining constructs not standardized by XML Schema, such
as arrays and references.
5.3.2 Elements of a SOAP Message
A one-way message, a request from a client, or a response from a server is
officially referred to as a SOAP message. Every SOAP message has a
mandatory Envelope element, an optional Header element, and a
mandatory Body element.
Figure 5.1: Main elements of the XML SOAP message
Website Design Unit 5
Sikkim Manipal University Page No. 143
Envelope
Every SOAP message has a root Envelope element. In contrast to other
specifications, such as HTTP and XML, SOAP does not define a traditional
versioning model based on major and minor release numbers (e.g., HTTP
1.0 versus HTTP 1.1). Rather, SOAP uses XML namespaces to differentiate
versions. The version must be referenced within the Envelope element. For
example:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ >
Header
The optional Header element offers a flexible framework for specifying
additional application-level requirements. Many current SOAP services do
not utilize the Header element, but as SOAP services mature, the Header
framework provides an open mechanism for authentication, transaction
management, and payment authorization.
The protocol does, however, specify two header attributes:
Actor attribute
The SOAP protocol defines a message path as a list of SOAP service
nodes. Each of these intermediate nodes can perform some processing and
then forward the message to the next node in the chain. By setting the Actor
attribute, the client can specify the recipient of the SOAP header.
MustUnderstand attribute
Indicates whether a Header element is optional or mandatory. If set to true,
the recipient must understand and process the Header attribute according to
its defined semantics, or return a fault.
Body
The Body element is mandatory for all SOAP messages. Typical uses of the
Body element include RPC requests and responses.
Fault
In the event of an error, the Body element will include a Fault element.
5.3.3 Sending and Receiving SOAP Messages (SOAP Clients and
Receivers)
SOAP can be used in a variety of messaging systems, including one-way
and two way messaging. For two-way messaging, SOAP defines a simple
Website Design Unit 5
Sikkim Manipal University Page No. 144
convention for representing remote procedure calls and responses. This
enables a client application to specify a remote method name, include any
number of parameters, and receive a response from the server.
To examine the specifics of the SOAP protocol, we begin by presenting a
sample SOAP conversation. XMethods.net provides a simple weather
service, listing current temperature by zip code. The service method,
getTemp requires a zip code string and returns a single float value.
The SOAP Request
The client request must include the name of the method to invoke and any
required parameters. Here is a sample client request sent to XMethods:
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
xmlns:SOAP-
ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SOAP-ENV:Body>
<ns1:getTemp
xmlns:ns1="urn:xmethods-Temperature"
SOAP-
ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<zipcode xsi:type="xsd:string">10016</zipcode>
</ns1:getTemp>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
First, the request includes a single mandatory Envelope element, which in
turn includes a mandatory Body element. Second, a total of four XML
namespaces are defined. The Body element encapsulates the main
"payload" of the SOAP message. The only element is getTemp, which is
tied to the XMethods namespace and corresponds to the remote method
name. Each parameter to the method appears as a sub element. In our
case, we have a single zip code element, which is assigned to the XML
Schema xsd:string data type and set to 10016.
Website Design Unit 5
Sikkim Manipal University Page No. 145
The SOAP Response
Here is the SOAP response from XMethods:
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SOAP-ENV:Body>
<ns1:getTempResponse
xmlns:ns1="urn:xmethods-Temperature"
SOAP-
ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<return xsi:type="xsd:float">71.0</return>
</ns1:getTempResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Just like the request, the response includes Envelope and Body elements,
and the same four XML namespaces. This time, however, the Body element
includes a single getTempResponse element, corresponding to our initial
request. The response element includes a single return element, indicating
an xsd:float data type. As of this writing, the temperature for zip code 10016
is 71 degrees Fahrenheit.
5.3.4 Handling SOAP Faults
In the event of an error, the Body element will include a Fault element. The
fault sub elements include the faultCode, faultString, faultActor, and detail
elements.
faultCode – A text code used to indicate a class of errors.
faultString – A human-readable explanation of the error.
faultActor – A text string indicating who caused the fault. This is useful if
the SOAP message travels through several nodes in the SOAP message
path, and the client needs to know which node caused the error.
detail – An element used to carry application-specific error messages.
The following code is a sample Fault. The client has requested a method
named ValidateCreditCard, but the service does not support such a method.
Website Design Unit 5
Sikkim Manipal University Page No. 146
This represents a client request error, and the server returns the following
SOAP response:
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<SOAP-ENV:Fault>
<faultcode xsi:type="xsd:string">SOAP-
ENV:Client</faultcode>
<faultstring xsi:type="xsd:string">
Failed to locate method (ValidateCreditCard) in class
(examplesCreditCard) at /usr/local/ActivePerl-5.6/lib/
site_perl/5.6.0/SOAP/Lite.pm line 1555.
</faultstring>
</SOAP-ENV:Fault>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
5.3.5 Current SOAP Implementations
Dozens of SOAP implementations now freely exist on the Internet. Here are
four of the most popular and widely cited implementations.
Apache SOAP (http://xml.apache.org/soap/)
Open source Java implementation of the SOAP protocol; based on the IBM
SOAP4J implementation.
Microsoft SOAP ToolKit 2.0
(http://msdn.microsoft.com/soap/default.asp)
COM implementation of the SOAP protocol for C#, C++, Visual Basic, or
other COM-compliant languages.
SOAP::Lite for Perl (http://www.soaplite.com/)
Perl implementation of the SOAP protocol, written by Paul Kulchenko, that
includes support for WSDL and UDDI.
GLUE from the Mind Electric (http://www.themindelectric.com)
Website Design Unit 5
Sikkim Manipal University Page No. 147
Java implementation of the SOAP protocol that includes support for WSDL
and UDDI
Self Aassessment Questions
6. SOAP is an XML-based protocol for _________.
7. Elements of SOAP are ____, ____, and _____.
8. The Body element encapsulates ______ of the SOAP message.
5.4 Introduction to Web Services
A web service is any service that is available over the Internet, uses a
standardized XML messaging system, and is not tied to any one operating
system or programming language. With web services, we move from a
human-centric Web to an application-centric Web. It means that
conversations can take place directly between applications as easily as
between web browsers and servers.
There are numerous areas where an application-centric Web could prove
extremely helpful. Examples include credit card verification, package
tracking, portfolio tracking, shopping bots, currency conversion, and
language translation. Other options include centralized repositories for
personal information, such as Microsoft's proposed .NET MyServices
project. .NET MyServices aims to centralize calendar, email, and credit card
information and to provide web services for sharing that data.
5.4.1 Architecture and Advantages of Web Services
There are two ways to view the web service architecture. The first is to
examine the individual roles of each web service actor; the second is to
examine the emerging web service protocol stack.
Web Service Roles
Figure 5.2: Web Service Roles
Website Design Unit 5
Sikkim Manipal University Page No. 148
There are three major roles within the web service architecture:
Service provider
This is the provider of the web service. The service provider implements the
service and makes it available on the Internet.
Service requestor
This is any consumer of the web service. The requestor utilizes an existing
web service by opening a network connection and sending an XML request.
Service registry
This is a logically centralized directory of services. The registry provides a
central place where developers can publish new services or find existing
ones. It therefore serves as a centralized clearinghouse for companies and
their services.
Web Service Protocol Stack
Figure 5.3: Web service protocol stack
A second option for viewing the web service architecture is to examine the
emerging web service protocol stack. The stack is still evolving, but currently
has four main layers.
Following is a brief description of each layer.
Service transport
This layer is responsible for transporting messages between applications.
Currently, this layer includes hypertext transfer protocol (HTTP), Simple Mail
Transfer Protocol (SMTP), file transfer protocol (FTP), and newer protocols,
such as Blocks Extensible Exchange Protocol (BEEP).
Website Design Unit 5
Sikkim Manipal University Page No. 149
XML messaging
This layer is responsible for encoding messages in a common XML format
so that messages can be understood at either end. Currently, this layer
includes XML-RPC and SOAP.
Service description
This layer is responsible for describing the public interface to a specific web
service. Currently, service description is handled via the Web Service
Description Language (WSDL).
Service discovery
This layer is responsible for centralizing services into a common registry,
and providing easy publish/find functionality. Currently, service discovery is
handled via Universal Description, Discovery, and Integration (UDDI).
5.4.2 Purpose of Web Services Description Language (WSDL)
WSDL currently represents the service description layer within the web
service protocol stack. WSDL is an XML grammar for specifying a public
interface for a web service. This public interface can include information on
all publicly available functions, data type information for all XML messages,
binding information about the specific transport protocol to be used, and
address information for locating the specified service.
WSDL is not necessarily tied to a specific XML messaging system, but it
does include built-in extensions for describing SOAP services. Using WSDL,
a client can locate a web service and invoke any of the publicly available
functions. With WSDL-aware tools, this process can be entirely automated,
enabling applications to easily integrate new services with little or no manual
code.
5.4.3 WSDL Elements
WSDL is an XML grammar for describing web services. The specification is
divided into six major elements:
Definitions
The definitions element must be the root element of all WSDL documents.
It defines the name of the web service, declares multiple namespaces used
throughout the remainder of the document, and contains all the service
elements.
Website Design Unit 5
Sikkim Manipal University Page No. 150
Types
The types element describes all the data types used between the client and
server. WSDL is not tied exclusively to a specific typing system, but it uses
the W3C XML Schema specification as its default choice. If the service uses
only XML Schema built-in simple types, such as strings and integers, the
types element is not required.
Message
The message element describes a one-way message, whether it is a single
message request or a single message response. It defines the name of the
message and contains zero or more message part elements, which can
refer to message parameters or message return values.
PortType
The portType element combines multiple message elements to form a
complete one way or round-trip operation. For example, a portType can
combine one request and one response message into a single
request/response operation, most commonly used in SOAP services.
Binding
The binding element describes the concrete specifics of how the service
will be implemented on the wire. WSDL includes built-in extensions for
defining SOAP services, and SOAP-specific information therefore goes
here.
Service
The service element defines the address for invoking the specified service.
Most commonly, this includes a URL for invoking the SOAP service.
In addition to the six major elements, the WSDL specification also defines
the following utility elements:
Documentation
The documentation element is used to provide human-readable
documentation and can be included inside any other WSDL element.
Import
The import element is used to import other WSDL documents or XML
Schemas. This enables more modular WSDL documents. For example, two
WSDL documents can import the same basic elements and yet include their
own service elements to make the same service available at two physical
Website Design Unit 5
Sikkim Manipal University Page No. 151
addresses. Note, however, that not all WSDL tools support the import
functionality as of yet.
5.4.4 Creating and Examining WSDL Files
One of the best aspects of WSDL is that you rarely have to create WSDL
files from scratch. A whole host of tools currently exists for transforming
existing services into WSDL descriptions. You can then choose to use these
WSDL files as is or manually tweak them with your favorite text editor. Given
the WSDL file, you could manually create a SOAP client to invoke the
service. A better alternative is to automatically invoke the service via a
WSDL invocation tool. Many WSDL invocation tools already exist. For
example GLUE platform provides extensive support for SOAP, WSDAL and
UDDI.
5.4.5 Overview of Universal Description, Discovery, and Integration
(UDDI)
UDDI is a technical specification for describing, discovering, and integrating
web services. UDDI is therefore a critical part of the emerging web service
protocol stack, enabling companies to both publish and find web services.
At its core, UDDI consists of two parts. First, UDDI is a technical
specification for building a distributed directory of businesses and web
services. Data is stored within a specific XML format, and the UDDI
specification includes API details for searching existing data and publishing
new data. Second, the UDDI Business Registry is a fully operational
implementation of the UDDI specification.
The data captured within UDDI is divided into three main categories:
White pages
This includes general information about a specific company - for example,
business name, business description, contact information, address and
phone numbers. It can also include unique business identifiers.
Yellow pages
This includes general classification data for either the company or the
service offered. For example, this data may include industry, product, or
geographic codes based on standard taxonomies.
Website Design Unit 5
Sikkim Manipal University Page No. 152
Green pages
This category contains technical information about a web service. Generally,
this includes a pointer to an external specification and an address for
invoking the web service. UDDI is not restricted to describing web services
based on SOAP. Rather, UDDI can be used to describe any service, from a
single web page or email address all the way up to SOAP, CORBA, and
Java RMI services.
5.4.6 UDDI Registries (Public and Private)
UDDI manages the discovery of Web services by relying on a distributed
registry of businesses and their service descriptions implemented in a
common XML format. Before you can publish your business entity and Web
service to a public registry, you must first register your business entity with a
UDDI registry.
UDDI registries come in two forms: public and private. Both types comply to
the same specifications. A private registry enables you to publish and test
your internal e-business applications in a secure, private environment. A
public registry is a collection of peer directories that contain information
about businesses and services. It locates services that are registered at one
of its peer nodes and facilitates the discovery of published Web services.
Data is replicated at each of the registries on a regular basis. This ensures
consistency in service description formats and makes it easy to track
changes as they occur.
5.4.7 Core UDDI Elements
The UDDI technical architecture consists of three parts:
UDDI data model: An XML Schema for describing businesses and web
services.
UDDI API: A SOAP-based API for searching and publishing UDDI data.
UDDI cloud services: Operator sites that provide implementations of the
UDDI specification and synchronize all data on a scheduled basis. UDDI
cloud services are currently provided by Microsoft and IBM. The current
cloud services provide a logically centralized, but physically distributed,
directory. This means that data submitted to one root node will automatically
be replicated across all the other root nodes. Currently, data replication
occurs every 24 hours.
Website Design Unit 5
Sikkim Manipal University Page No. 153
5.4.8 Deploying and Consuming Web Services
Unlike a Web site, you can't just change your Web Service when you feel
like it. If there are others consuming your Web Service, you must make sure
you keep the interfaces to the Web Service the same. That is, you can
change your implementation without anyone knowing, but changing a Web
Service interface such as adding, deleting, or modifying the parameters of a
Web method will break consuming applications. So make sure you have an
upgrade and migration approach in mind before you put your Web Service
out there for all to consume. For example, you might choose to maintain
multiple versions of your Web Services for backward compatibility.
You can consume .NET Web Services on Windows 98 and above. The
consuming machine needs the .NET framework, which can be installed as
part of an application installation.
5.4.9 ebXML Specifications ebXML Registry and Repository
The ebXML (Electronic Business using XML) specifications enable
enterprises of any size and in any geographical location to conduct business
over the Internet. ebXML Specifications may be divided into design and run-
time specifications. ebXML is not a business language standard, it is rather
an infrastructure or middleware standard. ebXML does not forces you to use
any specific business process to exchange these business documents. It
merely provides a specification to define "business collaborations" (ebXML
BPSS). Once you know what you want to do (document, process and
transport), ebXML CPP provides you with a formal way to express your
capabilities. All the message transport options may be chosen from the ones
of the ebXML Messaging Service (ebXML MS) specification. Two partners
may decide to do business if they support the same documents, processes
and transports. This is expressed as a CPA (Collaboration Protocol
Agreement).
Together, the ebXML Registry and Messaging standards provide the
mechanism to discover and retrieve documents, templates, and software
(i.e., objects and resources) and exchange these documents in a secure
and reliable manner. Specifically the ebXML Registry specifications define
interoperable registries and repositories with an interface that enables
submission, query, and retrieval on the contents of the registry. The ebXML
Website Design Unit 5
Sikkim Manipal University Page No. 154
Messaging specification provides a secure and reliable method for
exchanging electronic business.
The registry part of ebXML Registry/Repository provides an interface to
query information of the ebXML Registry/Repository whereas the repository
part of the ebXML Registry/Repository is in charge of storing date.
Self Aassessment Questions
9. A ______ is a a service available over the Internet.
10. _____ is a logically centralized directory of services.
11. UDDI is a technical specification for ____, ____ and ____ web
services.
5.5 Introduction to the XML Document Object Model (XMLDOM)
The W3C Document Object Model (DOM) is a platform and language-
neutral interface that allows programs and scripts to dynamically access and
update the content, structure, and style of a document. It defines the logical
structure of documents and the way a document is accessed and
manipulated. With the Document Object Model, programmers can build
documents, navigate their structure, and add, modify, or delete elements
and content. Anything found in an HTML or XML document can be
accessed, changed, deleted, or added using the Document Object Model.
Data Object Tree
The DOM is a programming API for documents. It closely resembles the
structure of the documents it models. For instance, consider this table, taken
from an HTML document:
<TABLE>
<TBODY>
<TR><TD>Shady Grove</TD>
<TD>Aeolian</TD>
</TR>
<TR> <TD>Over the River, Charlie</TD>
<TD>Dorian</TD>
</TR>
</TBODY>
</TABLE>
Website Design Unit 5
Sikkim Manipal University Page No. 155
The DOM represents this table like this:
Fig. 5.4: DOM representation of the example table
XMLDOM Parsers
All modern browsers have a build-in XML parser that can be used to read
and manipulate XML. The parser reads XML into memory and converts it
into an XML DOM object that can be accesses with JavaScript.
There are some differences between Microsoft's XML parser and the
parsers used in other browsers. The Microsoft parser supports loading of
both XML files and XML strings (text), while other browsers use separate
parsers. However, all parsers contain functions to traverse XML trees,
access, insert, and delete nodes.
The Top-Level Document Object
A top-level Document instance is the root of the tree, and has a single child
which is the top-level Element instance; this Element has child nodes
representing the content and any sub-elements, which may in turn have
further children and so forth. There are different classes for everything that
can be found in an XML document, so in addition to the Element class, there
are also classes such as Text, Comment, CDATASection, EntityReference,
and so on. Nodes have methods for accessing the parent and child nodes,
accessing element and attribute values, insert and delete nodes, and
converting the tree back into XML.
Primary Nodes and Node Collections (NodeList and NamedNodeMap)
The NodeList interface provides the abstraction of an ordered collection of
nodes, without defining or constraining how this collection is implemented.
Website Design Unit 5
Sikkim Manipal University Page No. 156
NodeList objects in the DOM are live. The items in the NodeList are
accessible via an integral index, starting from 0.
Objects implementing the NamedNodeMap interface are used to represent
collections of nodes that can be accessed by name. Note that
NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not
maintained in any particular order. Objects contained in an object
implementing NamedNodeMap may also be accessed by an ordinal index,
but this is simply to allow convenient enumeration of the contents of a
NamedNodeMap, and does not imply that the DOM specifies an order to
these Nodes.
Self Assessment Questions
12. The DOM is a ____ for documents.
13. The ____ interface provides the abstraction of an ordered collection of
nodes.
5.6 Summary
A schema specifies the data type of every element and attribute of its
instance XML documents.
Elements are defined in an XML schema with the element tag, which is
from the XMLSchema namespace.
A simple data type is one whose content is restricted to strings. Complex
types are defined with the complexType tag.
SOAP is an XML-based protocol for exchanging information between
computers.
A web service is any service that is available over the Internet, uses a
standardized XML messaging system, and is not tied to any one
operating system or programming language.
5.7 Terminal Questions
1. Explain the Union and List data types in XML.
2. Briefly explain the sending and receiving of SOAP messages.
3. Explain the various WSDL elements.
4. Explain the UDDI Registries.
5. Explain the ebXML registry and repository.
Website Design Unit 5
Sikkim Manipal University Page No. 157
5.8 Answers
Self Assessment Questions
1. XML Processor
2. targetNamespace
3. local or global
4. ComplexType
5. unique, key, keyref
6. exchanging information between computers
7. envelope, header, body
8. payload
9. web service
10. service registry
11. describing, discovering, integrating
12. programming API
13. NODELIST
Terminal Questions
1. List datatypes are special cases in which a structure is defined within the
content of a single attribute or element. (Refer section 5.2)
2. SOAP can be used in a variety of messaging systems, including one-
way and two way messaging. (Refer section 5.3.3)
3. WSDL currently represents the service description layer within the web
service protocol stack. (Refer section 5.4.2)
4. UDDI is a technical specification for describing, discovering, and
integrating web services. (Refer section 5.4.5)
5. The ebXML (Electronic Business using XML) specifications enable
enterprises of any size and in any geographical location to conduct
business over the Internet. (Refer section 5.4.9)
Website Design Unit 6
Sikkim Manipal University Page No. 158
Unit 6 XML Programming – III
Structure:
6.1 Introduction
Objectives
6.2 Transforming XML Documents with XSLT and XPath
6.3 Formatting XML Documents with XSL-FO
Purpose of XSL Formatting Objects (XSL-FO)
XSL-FO Documents and XSL-FO Processors
XSL-FO Namespace
Page Format Specifiers
Page Content Specifiers
6.4 Summary
6.5 Terminal Questions
6.6 Answers
6.1 Introduction
XSLT style sheets are used to transform XML documents into different
forms or formats, perhaps using different DTDs. In this chapter you are
going to study about the transformation of XML documents into different
formats using XSLT style sheets. Also you will study an overview of the XSL
Formatting Objects.
Objectives:
After studying this unit, you should be able to:
transform XML documents into different formats using XSLT style sheets
formatting XML Documents with XSL-FO
6.2 Transforming XML Documents with XSLT and XPath
XSLT is a language for transforming XML documents into XHTML
documents or to other XML documents. CSS provides no direct means of
transforming XML documents. Unlike scripting languages, CSS was
explicitly designed for use by nonprogrammers, which explains why it is so
easy to learn and use. CSS simply attaches style properties to elements in
an XML/HTML document. The simplicity of CSS comes with limitations,
some of which follow:
CSS cannot reuse document data
Website Design Unit 6
Sikkim Manipal University Page No. 159
CSS cannot conditionally select document data (other than hiding
specific types of elements)
CSS cannot calculate quantities or store values in variables
CSS cannot generate dynamic text, such as page numbers
These limitations of CSS are important because they are noticeably missing
in XSLT. In other words, XSLT is capable of carrying out these tasks and
therefore doesn't suffer from the same weaknesses.
XSL Stylesheet Advantages
The powerful capabilities provided by XSL allow:
formatting of source elements based on ancestry/descendency, position
and uniqueness
the creation of formatting constructs including generated text and
graphics
the definition of reusable formatting macros
writing-direction independent stylesheets
extensible set of formatting objects
Transformation vs. Formatting
In an XSL transformation, an XSLT processor reads both an XML document
and an XSLT style sheet. Based on the instructions the processor finds in
the XSLT style sheet, it outputs a new XML document or fragment thereof.
There's also special support for outputting HTML. With some effort most
XSLT processors can also be made to output essentially arbitrary text,
though XSLT is designed primarily for XML-to-XML and XML-to-HTML
transformations.
The formatting deals with how to display the content to the user. It uses
various formatting methods to make the content look good.
XSLT and XSL-FO
The Extensible Stylesheet Language (XSL) includes both a transformation
language and a formatting language. The transformation language is useful
independent of the formatting language. Its ability to move data from one
XML representation to another makes it an important component of XML-
based electronic commerce, electronic data interchange, metadata
exchange, and any application that needs to convert between different XML
representations of the same data.
Website Design Unit 6
Sikkim Manipal University Page No. 160
XSL-FO stands for Extensible Stylesheet Language Formatting Objects.
XSL-FO is a language for formatting XML data. XSL-FO is an XML-based
markup language describing the formatting of XML data for output to screen,
paper or other media.
XSLT Templates
Template rules defined by xsl:template elements are the most important
part of an XSLT style sheet. These associate particular output with particular
input. Each xsl:template element has a match attribute that specifies which
nodes of the input document the template is instantiated for. The content of
the xsl:template element is the actual template to be instantiated. A template
may contain both text that will appear literally in the output document and
XSLT instructions that copy data from the input XML document to the result.
For example, here is a template that is applied to the root node of the input
tree:
<xsl:template match="/">
<html>
<head>
</head>
<body>
</body>
</html>
</xsl:template>
When the XSLT processor reads the input document, the first node it sees is
the root. This rule matches that root node, and tells the XSLT processor to
emit this text:
<html>
<head>
</head>
<body>
</body>
</html>
This text is well-formed HTML. Because the XSLT document is itself an
XML document, its contents – templates included – must be well-formed
XML.
Website Design Unit 6
Sikkim Manipal University Page No. 161
XPath Data Model
An XPath query operates on a namespace well-formed XML document after
it has been parsed into a tree structure. The particular tree model XPath
uses divides each XML document into seven kinds of nodes:
Root node – The document itself. The root node’s children are the
comments and processing instructions in the prolog and epilog and the root
element of the document.
Element node – An element. Its children are all the child elements, text
nodes, comments, and processing instructions the element contains. An
element also has namespaces and attributes. However, these are not child
nodes.
Attribute node – An attribute other than one that declares a namespace
Text node – The maximum uninterrupted run of text between tags,
comments, and processing instructions. White space is included.
Comment node – A comment
Processing instruction node – A processing instruction
Namespace node – A namespace mapping in scope on an element
The XPath data model does not include entity references, CDATA sections,
or the document type declaration. Entity references are resolved into their
component text and elements. CDATA sections are treated like any other
text, and will be merged with any adjacent text before a text node is formed.
Default attributes are applied, but otherwise the document type declaration
is not considered.
Declaring XSL Stylesheets
XSL documents must conform to the rules of any other XML document, in
that the syntax of the document must be well-formed, such as the proper
nesting of tags, no empty tags, etc. The stylesheet can contain text that will
be reflected exactly in the output document, in addition to XSL instructions
that copy the data from the XML document the stylesheet is being applied
to. The declaration of the stylesheet, with the processing instructions to the
browser is done as follows.
< xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
Website Design Unit 6
Sikkim Manipal University Page No. 162
What this line is doing is declaring the element an xsl:stylesheet element
and calling for the XSL elements that are in the http://www.w3.org/TR/WD-
xsl namespace. Of course, for the document to be well formed, this tag must
be closed at the very end of the document with the close tag:
</xsl:stylesheet>
Built-In Templates
There is a built-in template rule to allow recursive processing to continue in
the absence of a successful pattern match by an explicit template rule in the
stylesheet. This template rule applies to both element nodes and the root
node. The following shows the equivalent of the built-in template rule:
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
There is also a built-in template rule for each mode, which allows recursive
processing to continue in the same mode in the absence of a successful
pattern match by an explicit template rule in the stylesheet. This template
rule applies to both element nodes and the root node. The following shows
the equivalent of the built-in template rule for mode m.
<xsl:template match="*|/" mode="m">
<xsl:apply-templates mode="m"/>
</xsl:template>
There is also a built-in template rule for text and attribute nodes that copies
text through:
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
The built-in template rule for processing instructions and comments is to do
nothing.
<xsl:template match="processing-instruction()|comment()"/>
The built-in template rule for namespace nodes is also to do nothing. There
is no pattern that can match a namespace node; so, the built-in template
rule is the only template rule that is applied for namespace nodes.
Website Design Unit 6
Sikkim Manipal University Page No. 163
Using Templates as Subroutines – xsl:apply-templates
The xsl:apply-templates selects source nodes for processing. The format is
given below:
<xsl:apply-templates [select="pattern"][mode="qname"]>
[<xsl:sort>]
</xsl:apply-templates>
If you specify the select attribute, specify a pattern that resolves to a set of
source nodes. For each source node in this set, the XSLT processor
searches for a template that matches the node. When it finds a matching
template, it instantiates it and uses the node as the context node. For
example:
<xsl:apply-templates select="/bookstore/book">
When the XSLT processor executes this instruction, it constructs a list of all
nodes that match the pattern in the select attribute. For each node in the list,
the XSLT processor searches for the template whose match pattern best
matches that node. If you do not specify the select attribute, the XSLT
processor uses the default pattern, "node()", which selects all child nodes of
the current node.
If you specify the mode attribute, the selected nodes are matched only by
templates with a matching mode attribute. The value of mode must be a
qualified name or an asterisk (*). If you specify an asterisk, it means
continue the current mode, if any, of the current template. If you do not
specify a mode attribute, the selected nodes are matched only by templates
that do not specify a mode attribute.
By default, the new list of source nodes is processed in document order.
However, you can use the xsl:sort instruction to specify that the selected
nodes are to be processed in a different order.
In the previous example, the XSLT processor searches for a template that
matches /bookstore/book. The following template is a match:
<xsl:template match="book">
<tr><td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="author"/><td>
<td><xsl:value-of select="price"/><td></tr>
</xsl:template>
Website Design Unit 6
Sikkim Manipal University Page No. 164
The XSLT processor instantiates this template for each book element.
XPath Expression Syntax
XPath can locate any type of information in an XML document with one line
of code. These one liners are referred to as "expressions," and every piece
of XPath that you write will be an expression. An XPath expression
describes the location of an element or attribute in our XML document. By
starting at the root element, we can select any element in the document by
carefully creating a chain of children elements. Each element is separated
by a slash "/".
Example: inventory/snack/chips/amount
XPath Functions and Predicates
You can use XML Path Language (Xpath) functions to refine XPath queries
and enhance the programming power and flexibility of XPath. The functions
are divided into the following groups.
Table 6.1: Six Functions
Node-Set Takes a node-set argument, returns a node-set, or returns/provides information about a particular node within a node-set.
String Performs evaluations, formatting, and manipulation on string arguments.
Boolean Evaluates the argument expressions to obtain a Boolean result.
Number Evaluates the argument expressions to obtain a numeric result.
Microsoft XPath Extension Functions
Microsoft extension functions to XPath that provide the ability to select nodes by XSD type. Also includes string comparison, number comparison, and date/time conversion functions.
Each function in the function library is specified using a function prototype
that provides the return type, function name, and argument type. If an
argument type is followed by a question mark, the argument is optional;
otherwise, the argument is required. Function names are case-sensitive.
A predicate is similar to an If/Then statement. If our predicate is TRUE,
then the element will be selected. If the predicate is FALSE, it will be
Website Design Unit 6
Sikkim Manipal University Page No. 165
excluded. An XPath predicate is contained within square brackets [], and
comes after the parent element of what will be tested.
Example: inventory/drink/lemonade[amount>15]
Besides testing the values of elements, you can also use predicates to
check the values of attributes. The form pretty much the same as before,
except the attribute belongs to the element before the predicate.
Syntax: element[@element'sAttribute someTestHere]
Inserting Elements - xsl:element
The <xsl:element> element is used to create an element node in the output
document.
Syntax:
<xsl:element
name="name"
namespace="URI"
use-attribute-sets="namelist">
<!-- Content:template -->
</xsl:element>
Attributes:
Attribute Value Description
name name Required. Specifies the name of the element to be created (the value of the name attribute can be set to an expression that is computed at run-time, like this: <xsl:element name="{$country}" />
namespace URI Optional. Specifies the namespace URI of the element (the value of the namespace attribute can be set to an expression that is computed at run-time, like this: <xsl:element name="{$country}" namespace="{$someuri}"/>
Example: Create a "singer" element that contains the value of each artist
element:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="catalog/cd">
Website Design Unit 6
Sikkim Manipal University Page No. 166
<xsl:element name="singer">
<xsl:value-of select="artist" />
</xsl:element>
<br />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Inserting Attributes - xsl:attribute
The <xsl:attribute> element is used to add attributes to elements.
Syntax:
<xsl:attribute name="attributename" namespace="uri">
<!-- Content:template -->
</xsl:attribute>
Attributes:
Attribute Value Description
name attributename Required. Specifies the name of the attribute
namespace URI Optional. Defines the namespace URI for the attribute
Example: Add a source attribute to the picture element:
<picture>
<xsl:attribute name="source"/>
</picture>
Extracting Node Values - xsl:value-of
The <xsl:value-of> element extracts the value of a selected node. The
<xsl:value-of> element can be used to select the value of an XML element
and add it to the output.
Syntax:
<xsl:value-of
select="expression"
disable-output-escaping="yes|no"/>
Website Design Unit 6
Sikkim Manipal University Page No. 167
Attributes:
Attribute Value Description
select expression Required. An XPath expression that specifies which node/attribute to extract the value from
disable-output-escaping yes no
Optional. "yes" indicates that special characters (like "<") should be output as is. "no" indicates that special characters (like "<") should be output as "<". Default is "no"
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<tr>
<td><xsl:value-of select="catalog/cd/title"/></td>
<td><xsl:value-of select="catalog/cd/artist"/></td>
</tr>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Looping - xsl:for-each
The <xsl:for-each> element allows you to do looping in XSLT. The XSL
<xsl:for-each> element can be used to select every XML element of a
specified node-set:
Website Design Unit 6
Sikkim Manipal University Page No. 168
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The result of the transformation above will look like this:
Title Artist
Empire Burlesque Bob Dylan
Hide your heart Bonnie Tyler
Greatest Hits Dolly Parton
Still got the blues Gary More
Eros Eros Ramazzotti
One night only Bee Gees
Sorting – The order-by Attribute
Sorting is specified by adding xsl:sort elements as children of an xsl:apply-
templates or xsl:for-each element. The first xsl:sort child specifies the
primary sort key, the second xsl:sort child specifies the secondary sort key
Website Design Unit 6
Sikkim Manipal University Page No. 169
and so on. When an xsl:apply-templates or xsl:for-each element has one or
more xsl:sort children, then instead of processing the selected nodes in
document order, it sorts the nodes according to the specified sort keys and
then processes them in sorted order. When used in xsl:for-each, xsl:sort
elements must occur first. When a template is instantiated by xsl:apply-
templates and xsl:for-each, the current node list list consists of the complete
list of nodes being processed in sorted order.
Order specifies whether the strings should be sorted in ascending or
descending order; ascending specifies ascending order; descending
specifies descending order; the default is ascending.
Simple Conditionals - xsl:if
To put a conditional if test against the content of the XML file, add an
<xsl:if> element to the XSL document.
Syntax:
<xsl:if test="expression">
...some output if the expression is true...
</xsl:if>
Multiple Conditionals - xsl:choose, xsl:when, and xsl:otherwise
The <xsl:choose> element is used in conjunction with <xsl:when> and
<xsl:otherwise> to express multiple conditional tests. If no <xsl:when> is
true, the content of <xsl:otherwise> is processed. If no <xsl:when> is true,
and no <xsl:otherwise> element is present, nothing is created.
Syntax:
<xsl:choose>
<!-- Content:(xsl:when+,xsl:otherwise?) -->
</xsl:choose>
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
Website Design Unit 6
Sikkim Manipal University Page No. 170
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<xsl:choose>
<xsl:when test="price > 10">
<td bgcolor="#ff00ff">
<xsl:value-of select="artist"/></td>
</xsl:when>
<xsl:otherwise>
<td><xsl:value-of select="artist"/></td>
</xsl:otherwise>
</xsl:choose>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Copying Nodes - xsl:copy
The <xsl:copy> element creates a copy of the current node.
Syntax:
<xsl:copy use-attribute-sets="name-list">
<!-- Content:template -->
</xsl:copy>
Website Design Unit 6
Sikkim Manipal University Page No. 171
Attribute:
Attribute Value
use-attribute-sets name-list
Self Assessment Questions
1. ____ is a language for transforming XML documents into XHTML
documents or to other XML documents.
2. ____ can locate any type of information in an XML document with one
line of code.
3. The <xsl:element> element is used to create an element node in
the ____ document.
4. The ____ element allows you to do looping in XSLT.
6.2 Formatting XML Documents with XSL-FO
XSL-FO is an XML-based markup language describing the formatting of
XML data for output to screen, paper or other media. Styling is both about
transforming and formatting information. When the World Wide Web
Consortium (W3C) made their first XSL Working Draft, it contained the
language syntax for both transforming and formatting XML documents.
Later, the XSL Working Group at W3C split the original draft into separate
Recommendations:
XSLT, a language for transforming XML documents
XSL or XSL-FO, a language for formatting XML documents
XPath, a language for navigating through elements and attributes in
XML documents
6.2.1 Purpose of XSL Formatting Objects (XSL-FO)
The purpose of XSL-FO is to provide a mechanism for formatting XML data
for print, screen and other output media. XSL-FO, also known simply as
XSL, is a specification of the World Wide Web Consortium and is closely
related to XSLT. However, whereas XSLT is most often used for
transforming XML into HTML or other XML structures, XSL-FO is most often
used for formatting XML for print.
Transforming XML for print is accomplished by transforming an XML
document to a Formatting Objects (FO) document, which itself is XML-
based, via XSLT. The formatting objects processor is able to read the FO
Website Design Unit 6
Sikkim Manipal University Page No. 172
document and transform it for different types of print output. The most
common and best supported print output is currently Adobe PDF.
6.2.2 XSL-FO Documents and XSL-FO Processors
XSL-FO documents are XML files with output information. They contain
information about the output layout and output contents. XSL-FO documents
are stored in files with a .fo or a .fob file extension. It is also quite common
to see XSL-FO documents stored with an .xml extension, because this
makes them more accessible to XML editors.
XSL-FO documents have a structure like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="A4">
<!-- Page template goes here -->
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4">
<!-- Page content goes here -->
</fo:page-sequence>
</fo:root>
XSL-FO documents are XML documents, and must always start with an
XML declaration:
<?xml version="1.0" encoding="ISO-8859-1"?>
The <fo:root> element is the root element of XSL-FO documents. The root
element also declares the namespace for XSL-FO:
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<!-- The full XSL-FO document goes here -->
</fo:root>
The <fo:layout-master-set> element contains one or more page templates:
<fo:layout-master-set>
<!-- All page templates go here -->
</fo:layout-master-set>
Each <fo:simple-page-master> element contains a single page template.
Each template must have a unique name (master-name):
<fo:simple-page-master master-name="A4">
Website Design Unit 6
Sikkim Manipal University Page No. 173
<!-- One page template goes here -->
</fo:simple-page-master>
One or more <fo:page-sequence> elements describe the page contents.
The master-reference attribute refers to the simple-page-master template
with the same name:
<fo:page-sequence master-reference="A4">
<!-- Page content goes here -->
</fo:page-sequence>
6.2.3 XSL-FO Namespace
XSL-FO is the part of XSL that actually describes how a document should
be formatted. It is based on a word-processing model for page layout as
opposed to a desktop publishing model. As such, it is nearly impossible
describe the exact positionings and layout of the text of a document.
Instead, XSL-FO involves giving a general description of how text should be
arranged in relation to other text, and the XSL-FO engine will choose an
appropriate arrangement, much like Microsoft Word, Latex, or other
programs based on a word-processing model. For complete control over the
layout of a document, document designers still have to resort to other file
formats like .pdf files.
The easiest way to understand the nature of XSL-FO is to look at an
example. XSL-FO documents typically end with a .fob, .fo, or .xml ending.
Like all XML, documents, an XSL-FO document requires a namespace and
root node, as shown below:
<?xml version="1.0"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
...
</fo:root>
6.2.4 Page Format Specifiers
XSL-FO uses page templates called "Page Masters" to define the layout of
pages. Each template must have a unique name:
<fo:simple-page-master master-name="intro">
<fo:region-body margin="5in" />
</fo:simple-page-master>
<fo:simple-page-master master-name="left">
Website Design Unit 6
Sikkim Manipal University Page No. 174
<fo:region-body margin-left="2in" margin-right="3in" />
</fo:simple-page-master>
<fo:simple-page-master master-name="right">
<fo:region-body margin-left="3in" margin-right="2in" />
</fo:simple-page-master>
In the example above, three <fo:simple-page-master> elements, define
three different templates. Each template (page-master) has a different
name. The first template is called "intro". It could be used as a template for
introduction pages. The second and third templates are called "left" and
"right". They could be used as templates for even and odd page numbers.
XSL-FO Page Size
XSL-FO uses the following attributes to define the size of a page:
page-width defines the width of a page
page-height defines the height of a page
XSL-FO Page Margins
XSL-FO uses the following attributes to define the margins of a page:
margin-top defines the top margin
margin-bottom defines the bottom margin
margin-left defines the left margin
margin-right defines the right margin
margin defines all four margins
XSL-FO Page Regions
XSL-FO uses the following elements to define the regions of a page:
region-body defines the body region
region-before defines the top region (header)
region-after defines the bottom region (footer)
region-start defines the left region (left sidebar)
region-end defines the right region (right sidebar)
Example:
<fo:simple-page-master master-name="A4"
page-width="297mm" page-height="210mm"
margin-top="1cm" margin-bottom="1cm"
margin-left="1cm" margin-right="1cm">
<fo:region-body margin="3cm"/>
Website Design Unit 6
Sikkim Manipal University Page No. 175
<fo:region-before extent="2cm"/>
<fo:region-after extent="2cm"/>
<fo:region-start extent="2cm"/>
<fo:region-end extent="2cm"/>
</fo:simple-page-master>
6.2.5 Page Content Specifiers
XSL-FO Lists - XSL-FO uses the <fo:list-block> element to define lists.
There are four XSL-FO objects used to create lists:
fo:list-block (contains the whole list)
fo:list-item (contains each item in the list)
fo:list-item-label (contains the label for the list-item - typically an
<fo:block> containing a number, character, etc.)
fo:list-item-body (contains the content/body of the list-item - typically one
or more <fo:block> objects)
XSL-FO Tables - XSL-FO uses the <fo:table-and-caption> element to
define tables. There are nine XSL-FO objects used to create tables:
fo:table-and-caption
fo:table
fo:table-caption
fo:table-column
fo:table-header
fo:table-footer
fo:table-body
fo:table-row
fo:table-cell
XSL-FO uses the <fo:table-and-caption> element to define a table. It
contains a <fo:table> and an optional <fo:caption> element.
The <fo:table> element contains optional <fo:table-column> elements, an
optional <fo:table-header> element, a <fo:table-body> element, and an
optional <fo:table-footer> element. Each of these elements has one or
more <fo:table-row> elements, with one or more <fo:table-cell> elements.
Website Design Unit 6
Sikkim Manipal University Page No. 176
Self Assessment Questions
5. XSL-FO is an XML-based markup language describing the formatting of
XML data for output to ____ and _____.
6. XSL-FO documents contain information about ____ and ____.
6.3 Summary
XSLT is a language for transforming XML documents into XHTML
documents or to other XML documents.
XSL-FO is an XML-based markup language describing the formatting of
XML data for output to screen, paper or other media.
An XPath query operates on a namespace well-formed XML document
after it has been parsed into a tree structure. XSL-FO uses page
templates called "Page Masters" to define the layout of pages.
6.4 Terminal Questions
1. Explain XSLT templates
2. How do you declare a XSL stylesheet.
3. Explain the use of xsl:element.
4. Explain the XSL-FO namespace.
5. Explain he page format specifiers in XSL-FO.
6.5 Answers
Self Assessment Questions
1. XSLT
2. XPath
3. output
4. xsl:for-each
5. screen, paper
6. output layout, output contents.
Website Design Unit 6
Sikkim Manipal University Page No. 177
Terminal Questions
1. XSLT is a language for transforming XML documents into XHTML
documents or to other XML documents. (Refer section 6.2)
2. XSL documents must conform to the rules of any other XML document,
in that the syntax of the document must be well-formed, such as the
proper nesting of tags, no empty tags, etc. (Refer section 6.2)
3. The <xsl:element> element is used to create an element node in the
output document. (Refer section 6.2)
4. XSL-FO documents are XML files with output information. (Refer section
6.2.2)
5. XSL-FO uses page templates called "Page Masters" to define the layout
of pages. (Refer section 6.2.4)
Website Design Unit 6
Sikkim Manipal University Page No. 178
Acknowledgements, References and Suggested Readings
1. Goodman Danny, The JavaScript Bible
2. Nakhimovsky Alexander and Myers Tom. XML Programming
3. Sebesta Robert, Programming the World Wide Web, 3rd Edition.
Pearson Education
4. Online media