web engineering unit 1 as per rgpv syllabus
TRANSCRIPT
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 1
World Wide Web is important to know that this is not a synonym for the Internet. The
World Wide Web, or just "the Web," as ordinary people call it, is a subset of the Internet. The
Web consists of pages that can be accessed using a Web browser. The Internet is the actual
network of networks where all the information resides. Things like Telnet, FTP, Internet
gaming, Internet Relay Chat (IRC), and e-mail are all part of the Internet, but are not part of
the World Wide Web. The Hyper-Text Transfer Protocol (HTTP) is the method used to
transfer Web pages to your computer. With hypertext, a word or phrase can contain a link to
another Web site. All Web pages are written in the hyper-text markup language (HTML),
which works in conjunction with HTTP.
Introduction to TCP/IP
TCP/IP is made up of two acronyms, TCP, for Transmission Control Protocol, and IP,
for Internet Protocol. TCP handles packet flow between systems and IP handles the routing of
packets. However, that is a simplistic answer that we will expound on further.
All modern networks are now designed using a layered approach. Each layer presents a
predefined interface to the layer above it. By doing so, a modular design can be developed so
as to minimize problems in the development of new applications or in adding new interfaces.
The ISO/OSI protocol with seven layers is the usual reference model. SInce TCP/IP was
designed before the ISO model was developed it has four layers; however the differences
between the two are mostly minor. Below, is a comparison of the TCP/IP and OSI protocol
stacks:
OSI Protocol Stack
7. Application -- End user services such as email.
6. Presentation -- Data problems and data compression
5. Session -- Authentication and authorization
4. Transport -- Guarantee end-to-end delivery of packets
3. Network -- Packet routing
2. Data Link -- Transmit and receive packets
1. Physical -- The cable or physical connection itself.
TCP/IP Protocol Stack.
5. Application -- Authentication, compression, and end user services.
4. Transport -- Handles the flow of data between systems and
provides access to the network for applications via
the (BSD socket library)
3. Network -- Packet routing
2. Link -- Kernel OS/device driver interface to the network
interface on the computer.
Below are the major difference between the OSI and TCP/IP:
The application layer in TCP/IP handles the responsibilities of layers 5,6, and 7 in the
OSI model.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 2
The transport layer in TCP/IP does not always gaurentee reliable delivery of packets
as the transport layer in the OSI model does. TCP/IP offers an option called UDP that
does not gaurentee reliable packet delivery.
Software Components of TCP/IP
Application Layer
Some of the applications we will cover are SMTP (mail), Telnet, FTP, Rlogin, NFS,
NIS, and LPD
Transport Layer
The transport uses two protocols, UDP and TCP. UDP which stands for User
Datagram Protocol does not guarantee packet delivery and applications which use this
must provide their own means of verifying delivery. TCP does guarantee delivery of
packets to the applications which use it.
Network Layer
The network layer is concerned with packet routing and used low level protocols such
as ICMP, IP, and IGMP. In addition, routing protocols such as RIP, OSPF, and EGP
will be discussed.
Link Layer
The link layer is concerned with the actual transmittal of packets as well as IP to
Ethernet address translation. This layer is concerned with Arp, the device driver, and
RARP.
WAP
In 1997, several companies organized an industry group called the WAP Forum. This group
produces the WAP specification, a (long and detailed) series of technical documents that
define standards for implementing wireless network applications. Hundreds of industry firms
have given strong backing to the WAP Forum, so the technology should become widely
adopted, and it is already well-hyped.
WAP specifies an architecture based on layers that follows the OSI model fairly closely. The
WAP model, or stack as it is commonly known, is illustrated below.
The WAP Model
Application Layer
WAP's application layer is the Wireless Application Environment (WAE). WAE directly
supports WAP application development with Wireless Markup Language (WML) instead of
HTML and WML Script instead of JavaScript. WAE also includes the Wireless Telephony
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 3
Application Interface (WTAI, or WTA for short) that provides a programming interface to
telephones for initiating calls, sending text messages, and other networking capability.
Session Layer
WAP's session layer is the Wireless Session Protocol (WSP). WSP is the equivalent to HTTP
for WAP browsers. WAP involves browsers and servers just like the Web, but HTTP was not
a practical choice for WAP because of its relative inefficiency on the wire. WSP conserves
precious bandwidth on wireless links; in particular, WSP works with relatively compact
binary data where HTTP works mainly with text data.
Transaction, Security, and Transport Layers
These three protocols can be thought of as "glue layers" in WAP:
Wireless Transaction Protocol (WTP)
Wireless Transaction Layer Security (WTLS)
Wireless Datagram Protocol (WDP)
WTP provides transaction-level services for both reliable and unreliable transports. It
prevents duplicate copies of packets from being received by a destination, and it supports
retransmission, if necessary, in cases where packets are dropped. In this respect, WTP is
analogous to TCP. However, WTP also differs from TCP. WTP is essentially a pared-down
TCP that squeezes some extra performance from the network.
WTLS provides authentication and encryption functionality analogous to Secure Sockets
Layer (SSL) in Web networking. Like SSL, WTLS is optional and used only when the
content server requires it.
WDP implements an abstraction layer to lower-level network protocols; it performs functions
similar to UDP. WDP is the bottom layer of the WAP stack, but it does not implement
physical or data link capability. To build a complete network service, the WAP stack must be
implemented on some low-level legacy interface not technically part of the model. These
interfaces, called bearer services or bearers, can be IP-based or non-IP based.
Bearer Interfaces
WAP supports dial-up networking using IP and Point-to-Point Protocol (PPP) as the bearer
interface underneath WDP. It also supports Short Message Service (SMS) and General
Packet Radio System (GPRS). SMS passes text and binary data between digital phones.
GPRS is a relatively new technology that implements faster, "always-on" connections for
wireless devices; GPRS actually runs on top of IP.
Domain Name System (DNS) enables you to use hierarchical, friendly names to easily
locate computers and other resources on an IP network. The following sections describe the
basic DNS concepts, including features explained in newer Requests for Comments (RFCs),
such as dynamic update, from the Internet Engineering Task Force (IETF). The
Microsoft® Windows® 2000–specific implementation of DNS is not covered within this
chapter, except where indicated.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 4
For information about the Windows 2000 implementation of DNS, see "Windows 2000
DNS" in this book.
DNS is a distributed database that contains mappings of DNS domain names to data. It is also
a protocol for Transmission Control Protocol/Internet Protocol (TCP/IP) networks, defined
by the Requests for Comments (RFCs) that pertain to DNS. DNS defines the following:
Mechanism for querying and updating the database.
Mechanism for replicating the information in the database among servers Schema for
the database
DNS servers store information about no zones, one zone, or multiple zones. When a
DNS server receives a DNS query, it attempts to locate the requested information by
reteivingdata from its local zones. If this fails because the server is not authoritative
for the DNS domain requested and thus does not have the data for the requested
domain, the server can check its cache, communicate with other DNS servers to
resolve the request, or refer the client to another DNS server that might know the
answer.
DNS servers can host primary and secondary zones. You can configure servers to host
as many different primary or secondary zones as is practical, which means that a
server might host the primary copy of one zone and the secondary copy of another
zone, or it might host only the primary or only the secondary copy for a zone. For
each zone, the server that hosts the primary zones is considered the primary server for
that zone, and the server that hosts the secondary zones is considered the secondary
server for that zone.
Primary zones are locally updated. When a change is made to the zone data, such as
delegating a portion of the zone to another DNS server or adding resource records in
the zone, these changes must be made on the primary DNS server for that zone, so
that the new information can be entered in the local zone.
In contrast, secondary zones are replicated from another server. When a zone is
defined on a secondary server for that zone, the zone is configured with the IP address
of the server from which the zone is to be replicated. The server from which the zone
file replicates can either be a primary or secondary server for the zone, and is
sometimes called a master server for the secondary zone.
When a secondary server for the zone starts up, it contacts the master server for the
zone and initiates a zone transfer. The secondary server for the zone also periodically
contacts the master server for the zone to see whether the zone data has changed. If
so, it can initiate a transfer of the zones, referred to as a zone transfer. For more
information about zone transfers, see "Zone Transfer" later in this chapter.
You must have a primary server for each zone. Additionally, you should have at least
one secondary server for each zone. Otherwise, if the primary server for the zone goes
down, no one will be able to resolve the names in that zone.
Secondary servers provide the following benefits:
Fault tolerance When a secondary server is configured for a zone, clients can still
resolve names for that zone even if the primary server for the zone goes down.
Generally, plan to install the primary and secondary servers for the zone on different
subnets. Therefore, if connectivity to one subnet is lost, DNS clients can still direct
queries to the name server on the other subnet.
Reduction of traffic on wide area links You can add a secondary server for the
zone in a remote location that has a large number of clients, and then configure the
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 5
client to try those servers first. This can prevent clients from communicating across
slow links for DNS queries.
Reduction of load on the primary server for the zone The secondary server can
answer queries for the zone, reducing the number of queries the primary server for the
zone must answer
Electronic mail (e mail) is one of the use of the Electronic mail (e mail) is one of the use of
the World Wide Web, according to most businesses, improves productivity. Traditional
methods of sending mail within an office environment are inefficient, as it normally requires
an individual requesting a secretary to type the letter. This must then be proof-read and sent
through the internal mail system, which is relatively slow and can be open to security
breaches.
A faster method, and more secure method of sending information is to use electronic mail
where by a computer user can exchange messages with other computer users (or groups of
users) via a communications network. Electronic mail is one of the most popular uses of the
Internet. For example, a memo with 100 words will be sent in a fraction of a second. Other
types of data can also be sent with mail message such as images, sound, and so on.
The main standards that relate to the protocols of email transmission and reception are:
Simple Mail Transfer Protocol (SMTP) - which is used with the TCP/IP protocol
suite? It has traditionally been limited to the text based electronic messages.
Multipurpose Internet Mail Extension (MIME) - Which allows the transmission and
reception of mail that contains various types of data, such as speech, images, and
motion video? It is a newer standard than STMP and uses much of its basic protocol.
S/MIME (Secure MIME) - RSA Data security created S/MIME which supports
encrypted e-mail transfer and digitally signed electronic mail.
A typical email-architecture contains four elements:
1. Post offices- where outgoing messages are temporally buffered (stored) before
transmission and where incoming messages are stored. The post office runs the server
software capable of routing messages (a message transfer agent) and maintaining the post
office database.
2. Message transfer agents- for forwarding messages between post offices and to the
destination clients. The software can either reside on the local post office or on a physically
separate server.
3. Gateways-which provide parts of the message transfer agent functionally. They translate
between different e-mail systems, different e-mail addressing schemes and messaging
protocols.
4. E-mail clients- normally the computer which connects to the post office. It contains three
parts:
E-mail Application Program Interface (API), such as MAPI, VIM, MHS and CMC.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 6
Messaging protocol. The main messaging protocols are SMTP or X.400.STMP is
defined in RFC 822 and RFC 821, Where as x.400 is an OSI-defined e-mail message
delivery standard.
Network transport protocol, such as Ethernet, FDDI, and so on.
Post Office Protocol (POP)
The Post Office Protocol was first defined in RFC918, but has since been replaced with POP-3,
which is defined in RFC1939.The objective of POP is to create a standard method for users to access
a mail server-mail messages are uploaded onto a mail server using SMTP, and then downloaded using POP. With POP the server listens for a connection, and when this occurs the server sends a greeting
message, and waits a command. The standard port reserved for POP transactions is 110.Like SMTP, it
consists of case-intensive commands with one or more arguments, followed by the Carriage Return(CR) and line feed(LF) characters typically represented by CRLF. These keywords are either
three or four characters long.
The client opens the connection by sending a USER and a PASS command, for the user name and
password, respectively. If successful this will give accesses to the mailbox on the server. The client
can then read the messages with the following commands:
RDEL.Reads and deletes all the messages from the mailbox.
RTER.Reads the messages from the mailbox, and keeps them on the server.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 7
The commands and responses for POP can be summarized by:
Command Description Possible responses
USER name Define the name of the user +OK, -ERR
PASS password Define the password for the user +OK, -ERR
RETR mailbox Begins a mail reading transaction, and deletes
the messages once they have been transferred. The messages are not deleted until a RCEV
command.
+Val, -ERR
RDEL mailbox Begins mail reading transaction, and deletes the
messages once they have been transferred. The messages are not deleted until a RCEV
command.
+Val, -ERR
RVEC Acknowledges the reception of the mail
messages
+OK, or aborted connection
RVCD Confirms that the client has received the mail messages
+OK, -ERR
QUIT Client wishes to end the session. +OK, then close
NOOP No operation but prompts the mail server for an
OK response.
+OK
RSET Sent by the client to inform the server to abort the current transaction
+OK
Telnet
Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional
interactive text-oriented communication facility using a virtual terminal connection. User data is
interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over
the Transmission Control Protocol (TCP).
Telnet is a client-server protocol, based on a reliable connection-oriented transport. Typically, this
protocol is used to establish a connection to Transmission Control Protocol (TCP) port number 23,
where a Telnet server application (telnetd) is listening. Telnet, however, predates TCP/IP and was
originally run over Network Control Program (NCP) protocols.
Before March 5, 1973, Telnet was an ad hoc protocol with no official definition.[1]
Essentially, it used
an 8-bit channel to exchange 7-bit ASCII data. Any byte with the high bit set was a special Telnet
character. On March 5, 1973, a Telnet protocol standard was defined at UCLA[2]
with the publication
of two NIC documents: Telnet Protocol Specification, NIC #15372, and Telnet Option Specifications,
NIC #15373.
Telnet Protocol
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 8
TELNET is a standard protocol. Its status is recommended.
It is described in RFC 854 - TELNET Protocol Specifications and RFC 855 -
TELNET Option Specifications.
Telnet was the first application demonstrated on the four-IMP (Interface Message
Processor) network installed by December 1969. The final edition took 14 more years
to develop, culminating in Internet Standard #8 in 1983, three years after the final
TCP specification was ratified.
Telnet even predates internetworking and the modern IP packet and TCP transport
layers.
The TELNET protocol provides a standardized interface, through which a program on
one host (the TELNET client) may access the resources of another host (the TELNET
server) as though the client were a local terminal connected to the server.
For example, a user on a workstation on a LAN may connect to a host attached to the
LAN as though the workstation were a terminal attached directly to the host. Of
course, TELNET may be used across WANs as well as LANs.
Most TELNET implementations do not provide you with graphics capabilities.
TELNET Overview
TELNET is a general protocol, meant to support logging in from almost any type of
terminal to almost any type of computer.
It allows a user at one site to establish a TCP connection to a login server or terminal
server at another site.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 9
A TELNET server generally listens on TCP Port 23.
How it works
A user is logged in to the local system, and invokes a TELNET program (the
TELNET client) by typing
telnet xxx.xxx.xxx
where xxx.xxx.xxx is either a host name or an IP address.
The TELNET client is started on the local machine (if it isn't already running). That
client establishes a TCP connection with the TELNET server on the destination
system.
Once the connection has been established, the client program accepts keystrokes from
the user and relays them, generally one character at a time, to the TELNET server.
The server on the destination machine accepts the characters sent to it by the client,
and passes them to a terminal server.
A "terminal server" is just some facility provided by the operating system for entering
keystrokes from a user's keyboard.
o The terminal server treats the remote user as it would any other user logged in
to the system, including relaying commands to other applications.
o The terminal server passes outputs back to the TELNET server, which relays
them to the client, which displays them on the user's screen.
In general, a TELNET server is implemented as a master server with some number of
slave servers. The master server listens for service requests from clients. When it
hears one, it spawns a slave server to handle that specific request, while the master
goes back to listening for more requests.
The only thing that makes TELNET hard to implement is the heterogeneity of the
terminals and operating systems that must be supported. Not all of them use the same
control characters for the same purposes.
To accommodate this heterogeneity, TELNET defines a Network Virtual Terminal
(NVT). Any user TELNET ting in to a remote site is deemed to be on an NVT,
regardless of the actual terminal type being used.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 10
It is the responsibility of the client program to translate user keystrokes from the
actual terminal type into NVT format, and of the server program to translate NVT
characters into the format needed by the destination host. For data sent back from the
destination host, the translation is the reverse.
NVT format defines all characters to be 8 bits (one byte) long. At startup, 7 bit US
ASCII is used for data; bytes with the high order bit = 1 are command sequences.
The 128 7-bit long US ASCII characters are divided into 95 printable characters and
33 control codes. NVT maps the 95 printable characters into their defined values -
decimal 65 = "A", decimal 97 = "a", etc.
The 33 control codes are defined for NVT as:
ASCII Code Decimal value Meaning
NUL 0 NO - OP
BEL 7 Ring "terminal bell"
BS 8 Backspace; move cursor left
HT 9 Horizontal tab; move cursor right
LF 10 Line feed; move down one line stay in same column
VT 11 Vertical tab; move cursor down
FF 12 Form Feed
CR 13 Carriage return; move cursor beginning of current line
all others NO - OP
NVT defines end-of-line to be a CR-LF combination - the two-character sequence.
In addition to the 128 characters mentioned above, there are 128 other possible
characters in an 8-bit encoding scheme. NVT uses these 128 (with decimal values 128
through 255, inclusive) to pass control functions from client to server. More on this
later.
TELNET Operation
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 11
The TELNET protocol is based on three ideas:
o The Network Virtual Terminal (NVT) concept. An NVT is an imaginary
device having a basic structure common to a wide range of real terminals.
Each host maps its own terminal characteristics to those of an NVT, and
assumes that every other host will do the same.
o A symmetric view of terminals and processes.
o Negotiation of terminal options. The principle of negotiated options is used by
the TELNET protocol, because many hosts wish to provide additional
services, beyond those available with the NVT. Various options may be
negotiated. Server and client use a set of conventions to establish the
operational characteristics of their TELNET connection via the ``DO, DON'T,
WILL, WON'T'' mechanism discussed later in this document.
The two hosts begin by verifying their mutual understanding. Once this initial
negotiation is complete, they are capable of working on the minimum level
implemented by the NVT.
After this minimum understanding is achieved, they can negotiate additional options
to extend the capabilities of the NVT to reflect more accurately the capabilities of the
real hardware in use.
Because of the symmetric model used by TELNET, both the host and the client may
propose additional options to be used.
The set of options is not part of the TELNET protocol, so that new terminal features
can be incorporated without changing the TELNET protocol (mouse?).
All TELNET commands and data flow through the same TCP connection.
Commands start with a special character called the Interpret as Command escape
character (IAC).
The IAC code is 255.
If a 255 is sent as data - it must be followed by another 255
Each receiver must look at each byte that arrives and look for IAC. If IAC is found
and the next byte is IAC - a single byte is presented to the application/terminal.
If IAC is followed by any other code - the TELNET layer interprets this as a
command.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 12
HyperText Transfer Protocol, HTTP is a set of standards that allow users of the World
Wide Web to exchange information found on web pages. When wanting to access any web
page enter http:// in front of the web address, which tells the browser to communicate over
HTTP. For example, the full URL for Computer Hope is http://www.computerhope.com.
Today's modern browsers no longer require HTTP in front of the URL since it is the default
method of communication. However, it is still used in browsers because of the need to access
other protocols such as FTP through the browser. Below are a few of the major facts on
HTTP.
The term HTTP was coined by Ted Nelson.
HTTP commonly utilizes port 80, 8008, or 8080.
HTTP/0.9 was the first version of the HTTP and was introduced in 1991.
HTTP/1.0 is specified in RFC 1945 and introduced in 1996.
HTTP/1.1 is specified in RFC 2616 and officially released in January 1997.
HTTPS
Short for Hypertext Transfer Protocol over Secure, HTTPS is a secure method of accessing or
sending information across a web page. All data sent over HTTPS is encrypted before it is
sent, this prevents anyone from understanding that information if intercepted. Because data is
encrypted over HTTPS, it is slower than HTTP, which is why HTTPS is only used when
requiring login information or with pages that contain sensitive information such as an online
bank web page.
HTTPS uses port 443 to transfer its information.
HTTPS is first used in HTTP/1.1 and is defined in RFC 2616.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 13
File Transfer Protocol is a protocol through which internet users can upload files from their
computers to a website or download files from a website to their PCs. Originated by Abhay
Bhushan in 1971 for use in the military and scientific research network known as ARPANET,
FTP has evolved into a protocol for far wider applications on the World Wide Web with
numerous revisions throughout the years.
FTP is the easiest way to transfer files between computers via the internet, and utilizes
TCP, transmission control protocol, and IP, internet protocol, systems to perform
uploading and downloading tasks.
How It Works
TCP and IP are the two major protocols that keep the internet running smoothly. TCP
manages data transfer while IP directs traffic to internet addresses. FTP is an underling of
TCP and shuttles files back and forth between FTP server and FTP client. Because FTP
requires that two ports be open--the server's and the client's--it facilitates the exchange of
large files of information.
First, you as client make a TCP control connection to the FTP server's port 21 which will
remain open during the transfer process. In response, the FTP server opens a second
connection that is the data connection from the server's port 20 to your computer.
Using the standard active mode of FTP, your computer communicates the port number
where it will stand by to receive information from the controller and the IP address--
internet location--from which or to which you want files to be transferred.
If you are using a public--or anonymous--FTP server, you will not need proprietary sign-
in information to make a file transfer, but you may be asked to enter your email address.
If you are using a private FTP server, however, you must sign in with a user name and
password to initiate the exchange of data.
Modes of File Transfer
Three modes of transferring data are available via FTP. The system can use a stream
mode, in which it transfers files as a continuous stream from port to port with no
intervention or processing of information into different formats. For example, in a transfer
of data between two computers with identical operating systems, FTP does not need to
modify the files.
In block mode, FTP divides the data to be transferred into blocks of information, each
with a header, byte count, and data field. In the third mode of transfer, the compressed
mode, FTP compresses the files by encoding them. Often these modifications of data are
necessary for successful transfer because the file sender and file receiver do not have
compatible data storage systems.
Passive FTP
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 14
Should your computer have firewall protection, you may have difficulties using FTP. A
firewall protects your PC by preventing internet sites from initiating file transfers. You
can circumvent your firewall's function by using the PASV command that reverses the
FTP process, allowing your computer to initiate the transfer request.
Many corporate networks use PASV FTP as a security measure to protect their internal
network from assaults of unwanted external files. Also called passive FTP, the process
requires that any transfer of information from the internet or other external source must be
initiated by the client or private network rather than the external source.
Further FTP Security
In response to the need for a more secure transfer process for sensitive information such
as financial data, Netscape developed a Secure Sockets Layer (SSL) protocol in 1994 that
it used primarily to secure HTTP--HyperText Transfer Protocol--transmissions from
tampering and eavesdropping. The industry subsequently applied this security protocol to
FTP transfers, developing SFTP, a file transfer protocol armoured with SSL for protection
from hackers
Introduction of Browsing
A browser is a program on your computer that enables you to search ("surf") and retrieve
information on the World Wide Web (WWW), which is part of the Internet. The Web is
simply a large number of computers linked together in a global network, that can be
accessed using an address (URL, Uniform Resource Locator,
e.g. http://www.veths.no for the Oslo Veterinary School), in the same way that you can
phone anyone in the world given their telephone number.
URLs are often long and therefore easy to type incorrectly. They all begin with http://,
and many (but not all) begin with http://www. In many cases the first part (http://, or
even http://www.) can be omitted, and you will still be able to access the page. Try this
with http://www.cnn.com. URLs are constructed in a standard fashion. This may be of
use to you. Take, for example, the address of this page
http://oslovet.veths.no/teaching/internet/basics.html
Before you conduct a search, it is important to consider, among others, the following
points:
1. Is your choice of search term is adequate, too restrictive or too general?
2. Is the search you have planned to undertake most suited for a search engine that
categorizes web sites, so that you can browse through appropriate subcategories when the
first results are returned?
3. Are you more interested in using a search engine that merely returns all the web pages
it has found containing the search term?
4. Have you read the Search Help pages that most search pages offer? These will tell
you how the search engine conducts the search, and therefore how you ought to plan your
search.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 15
5. Bear in mind the fact that engines differ in their coverage of the Internet, their speed
and whether they are largely compiled manually by people or automatically by 'robots'
that scan the Internet.
Introduction of Search Engine
A search engine is a web site that collects and organizes content from all over the
internet. Those wishing to locate something would enter a query about what they'd like to
find and the engine provides links to content that matches what they want.
As regards real estate, millions of searches are done each day on multiple search engines
for search queries like "Atlanta real estate", or "atlanta real estate". The search engine
returns results for real estate related sites and content for the Atlanta, GA area in this case.
The sites are ranked by highly secret and complex formulas. These formulas are also
changed frequently by the engines.
Though there are many who attempt to manipulate their sites to get higher placement in
results, it's generally best to provide highly relevant real estate and area content and make
it very useful for your site visitors. As all the engines are striving to be the most popular
based on results that are closest to what the searcher is looking for, it can only be a good
strategy to provide that relevant content.
There are two main kinds of search services commonly used on the Web: the index, and
the directory or subject guide. One way to think of the differences between these two
kinds of engines is to think of web sites as books. Indexes will catalogue every word in
every book it looks at, and will list for you each page that contains word(s) you're looking
for. Directories and Subject Guides take the overall subject matter of the books it looks at
and lists the front covers of the books that match your word(s).
Indexes
Alta Vista and HotBot, -- both popular search indexes. Indexes regularly scan the Internet
for Web pages and record the HTML content and key words. They also have the ability to
follow any links associated with scanned pages and get even more information.
The job of compiling data for indexes is done by spiders (also called robots, bots,
or crawlers ergo the names HotBot and WebCrawler), software programmed by a human
to automatically gather information from all over the 'Net based on specific or broad
search criteria. Most of the time spiders scan pages on the fly, without the owner's
knowledge or consent (if you don't want some or all of your web pages scanned by
spiders, you can write some HTML into your page to keep them out).
The advantages of this kind of service is their data bases are very large and updated often
by spiders working around the clock. They catalogue Web pages in a computational
manner without human intervention. A search engine's spider catalogue all the pages of a
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 16
given web site, listing for you only the pages that match the words or phrases you're
searching for.
For instance, if you're looking for information about spiders, you'll get over thirty-nine
thousand hits (links to a Web page) from Alta Vista with the word spiders in them. This
means not only will you get pages referencing Internet robots, you'll mostly get the eight-
legged, living-in-your-shoe-and-going-to-bite-you kind of spider.
A drawback to using services of this type is that sifting through so many hits to find what
you're looking for is sometimes a daunting task. Some indexes include a number of
options you can utilize to help narrow down your search criteria, such as search for this
exact phrase or search for any of the words on HotBot.
Directories/Subject Guides
On the other hand, Yahoo! and Magellan are hierarchical directories of web page
subjects. Each reference is entered and updated by a person manually, placing each web
address in a certain context much like your telephone company's Yellow Page directory.
People catalog the sites in a directory, so the hits often include reviews and/or
recommendations, which can guide you through the content of the pages quicker and
more easily.
To have a Web site listed in a directory you must submit it yourself, or you can hire a
company to do it for you. The directory has the last word on where they catalog your site.
This means directories contain far fewer sites than indexes do, but they are better targeted
to what word(s) you use to search.
For example, you enter the same key word spiders in Yahoo!, and this time you'll get a
list of categories like Science: Zoology: Animals, Insects, and Pets:
Arachnids or Computers and Internet: Internet: World Wide Web: Searching the Web:
Robots, Spiders, etc. Documentation which can narrow and shorten your search
significantly. You'll get fewer hits overall, and hits on pages with headings and content
within the context of the keywords you enter.
One drawback is that Yahoo's hits are usually to home pages (the first page of a site)
only, for instance it would hit a home page calledNancy's Page-O-Spiders but not Nancy's
Home which contains a page exclusively on spiders. Another drawback to directories is
that manually updating directories is tedious and time consuming, and that means old
sites that are no longer valid (dead links) are often listed long after their demise.
Hybrids
Some search services use both schemes -- they are both an index and a directory,
like Infoseek and Excite. These services occasionally send out a spider to collect and cull
Web sites, alongside people cataloging sites that are submitted by Web developers.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 17
Working of Search engine
When someone is looking for comprehensive information on any particular aspect or
issue by keying in the relevant keywords on the search engine (Google, Yahoo, Hotmail,
MSN and many more), only the most recent quality content is sifted through for search
engine optimization. The staler contents (older than say 12 hours) may have a higher page
rankings based on specific parameters like previous user traffic density but the more
current and refreshed contents have a much better chance of getting more hits from online
users. The reason is simple but prosaic: these contents are more updated and contain more
relevant information that the user is looking for. Search engines are becoming more and
more like archives and libraries that one may look up anytime for any sort of information
or data that can envisage or is looking for.
These search engines are the veritable repositories of the 21st century. The World Wide
Web has become a realm where almost every website or portal is a kind of social
networking site where online users share and interact with each other, exchange news and
views on a variety of subjects of common interests. Every site is a sort of intranet- a
microcosm within a much larger realm. Many social networking sites like Facebook,
Twitter, LinkedIn, Pinterest etc. compete fiercely amongst themselves to be the online
user’s most preferred social automation medium. These sites now have a global reach and
each one claims to have the latest updates on any social, political, and economic event.
Almost at the end of 2011, I had penned a post on the theme: -Google’s New Freshness
Update: Social Media has changed the expectations of searchers. Just a month prior to
that, I wrote on how Yahoo might turn to social media to find out new URLs on fresh and
current topics in Do Search Engines Use Social Media to Discover New Topics. We are
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 18
left with speculating on what Bing is upto as both Google and Yahoo are enthusiastically
scouting for newer and innovative ways of optimizing search results for quality content.
Google allows me to filter and refine my search results upto the last hour. You can
optimize your search for the last 24 hours, the last week, month, and the entire year or at
least upto a specific period. Same goes for Yahoo but though Yahoo sources its data from
Bing, Bing is laidback when it comes to fine tuning search results.
The mechanism for refining search results harnesses an “in-memory” index apart from the
inverted index process used by Bing. The in-memory catalogue or index would be
restructured throughout the entire day and the contents are more current than Bing’s
inverted index. Content from the in memory index can be compressed and catalogued
inside the inverted index either on an everyday basis or for fixed time periods.
When someone searches for information on any specific issue or subject, the inverted
index returns the results on a primary basis and if one is looking for more detailed data,
the in-memory indices come to the aid as this indexing mechanism has been filtering and
adding information on that particular subject during the last 12 hours. Ultimately, the
results that are thrown up are prioritized in terms of the number of times they have been
searched by users.
Now, we are not aware whether Microsoft has already exploited the patented process or
they concurred on using a different process. This process might be antiquated by now. We
are also not aware whether the percolating or filtering method used by Google known as
Caffeine update used to graduate to a more progressive system for their batch updating is
still in place.
It seems that the patented process furnishes the latest relevant information or data to
online users and simultaneously hangs on to the batch process where the current search
results are condensed or compressed virtually for storage in inverted databases on a
regular basis.
Introduction of Web Server
A Web server is a program that, using the client/server model and the World Wide Web's
Hypertext Transfer Protocol ( HTTP ), serves the files that form Web pages to Web users
(whose computers contain HTTP clients that forward their requests). Every computer on
the Internet that contains a Web site must have a Web server program. Two leading Web
servers are Apache , the most widely-installed Web server, and Microsoft's Internet
Information Server ( IIS ). Other Web servers include Novell's Web Server for users of its
NetWare operating system and IBM's family of Lotus Domino servers, primarily for
IBM's OS/390 and AS/400 customers.
Web servers often come as part of a larger package of Internet- and intranet-related
programs for serving e-mail, downloading requests for File Transfer Protocol ( FTP )
files, and building and publishing Web pages. Considerations in choosing a Web server
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 19
include how well it works with the operating system and other servers, its ability to
handle server-side programming, security characteristics, and publishing, search engine,
and site building tools that may come with it.
b
There's a common set of features that you'll find on most web servers. Because web
servers are built specifically to host websites, their features are typically focussed around
setting up and maintaining a website's hosting environment.
Most web servers have features that allow you to do the following:
Create one or more websites. (No I don't mean build a set of web pages. What I mean
is, set up the website in the web server, so that the website can be viewed via HTTP)
Configure log file settings, including where the log files are saved, what data to
include on the log files etc. (Log files can be used to analyse traffic etc)
Configure website/directory security. For example, which user accounts are/aren't
allowed to view the website, which IP addresses are/aren't allowed to view the
website etc.
Create an FTP site. An FTP site allows users to transfer files to and from the site.
Create virtual directories, and map them to physical directories
Configure/nominate custom error pages. This allows you to build and display user
friendly error messages on your website. For example, you can specify which page is
displayed when a user tries to access a page that doesn't exist (i.e. a "404 error").
Specify default documents. Default documents are those that are displayed when no
file name is specified. For example, if you open "http://localhost", which file should
be displayed? This is typically "index.html" or similar but it doesn't need to be. You
could nominate "index.cfm"
Caching in Web Server
A web cache is a mechanism for the temporary storage (caching) of web documents,
such as HTML pages and images, to reduce bandwidth usage, server load, and
perceived lag. A web cache stores copies of documents passing through it; subsequent
requests may be satisfied from the cache if certain conditions are met.[1]
Google's
cache link in its search results provides a way of retrieving information from websites
that have recently gone down and a way of retrieving data more quickly than by
clicking the direct link.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 20
Web caches can be used in various systems.
A search engine may cache a website.
A forward cache is a cache outside the webserver's network, e.g. on the client
software's ISP or company network.
A network-aware forward cache is just like a forward cache but only caches heavily
accessed items.
A reverse cache sits in front of one or more Web servers and web applications,
accelerating requests from the Internet.
A client, such as a web browser, can store web content for reuse. For example, if the
back button is pressed, the local cached version of a page may be displayed instead of
a new request being sent to the web server.
A web proxy sitting between the client and the server can evaluate HTTP headers and
choose to store web content.
A content delivery network can retain copies of web content at various points
throughout a network.
three basic mechanisms for controlling caches: freshness, validation, and invalidation.
Freshness
allows a response to be used without re-checking it on the origin server, and can be
controlled by both the server and the client. For example, the Expires response header
gives a date when the document becomes stale, and the Cache-Control: max-age
directive tells the cache how many seconds the response is fresh for.
Validation
can be used to check whether a cached response is still good after it becomes stale.
For example, if the response has a Last-Modified header, a cache can make
a conditional request using the If-Modified-Since header to see if it has changed.
The ETag (entity tag) mechanism also allows for both strong and weak validation.
Invalidation
is usually a side effect of another request that passes through the cache. For example,
if a URL associated with a cached response subsequently gets a POST, PUT or
DELETE request, the cached response will be invalidated.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 21
Configuration of Web Server
For those who wish to manage their own Web server, this activity describes the steps
required to set up your own Web server. In order to set up a Web server, you need a
dedicated computer (PC or Macintosh) running Windows/95, Windows/NT, or Linux or a
Macintosh computer running MacOS. You also need a direct Internet connection and TCP/IP
software. You can download shareware HTTP software for these platforms and operate your
own Web server.
Objectives
Learn how to find and download shareware software for a Web server.
Learn about PC/Windows and Macintosh Web server programs that are available.
Learn what is required to set up a Web server.
Learn where to find additional information on setting up a Web server.
Materials and Resources
In developing our lessons and activities, we made some assumptions about the hardware
and software that would be available in the classroom for teachers who visit the LETSNet
Website. We assume that teachers using our Internet-based lessons or activities have a
computer with the necessary hardware components (mouse, keyboard, and monitor) as well
as a World Wide Web browser. In the section below, we specify any "special" hardware or
software requirements for a lesson or activity (in addition to those described above) and the
level of Internet access required to do the activity.
1. Special hardware requirements: none.
2. Special software requirements: none.
3. Internet access: High-speed connection (greater than 28,800 BPS).
Activity Description
The tasks below lead you through the process of setting up your own Web server. While
the process described is necessarily incomplete, it is offered as a guide to the things you must
do to successfully set up a Web server. Links to software (for networking, HTTP, CGI, etc.)
are provided to encourage you to gather your own toolkit of Web server products.
Commercial software is also available, including Netscape's Communication Server
(available free to educational institutions) and Microsoft's Internet Information Server, for
both PC and Macintosh platforms (see Internet Resources below).
Step 1 - The computer: A Web server requires a dedicated computer that is directly
connected to the Internet, usually through an ethernet network (LAN/WAN). You can
run a Web server on a low-end computer (80386-based PC or 68040 Macintosh), but
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 22
if you want your server to be responsive to Web surfers you should probably use a
more powerful computer (such as a Pentium or PowerPC-based Macintosh). A Web
server needs a fast and large hard drive and should have lots of RAM (over 16 MB).
Step 2 - The operating system software: The following operating systems can
support a Web server: Windows/NT, Windows/95, MacOS, Unix, and Linux. Of
these, most of the existing Web servers run on Windows/NT, MacOS (on a
PowerMac) or Unix. Linux is a PC/DOS-based version of Unix.
Step 3 - The networking software: All Internet computers need TCP/IP, and a Web
server is no exception. As stated above, your computer should be directly connected
to the Internet and thus may require appropriate ethernet software.
Step 4 - The Web server software: There are a variety of Web server programs
available for a variety of platforms, from Unix to DOS machines. For the Macintosh,
a popular Web server is WebStar from StarNine (see Internet Resources below). For
the Windows/NT platform, both Microsoft and Netscape offer a powerful Web server
program free to educational institutions (see Internet Resources below). Download or
purchase the Web server software and install it on your computer using the
instructions provided.
Step 5 - Configuring your Web server: Whey you install your Web server, you will
be prompted for basic settings - default directory or folder, whether to allow visitors
to see the contents of a directory or folder, where to store the log file, etc. Depending
on the Web software you install, you will have to configure the software per the
instructions that come with it.
Step 6 - Managing your Web server: As your Web server is accessed by more and
more people, you may need to monitor the log file to see which files people are
reading, identify peak access times, and consider upgrading your computer. You can
always add more RAM and disk space to your Web server computer to improve its
performance. Also check for bottlenecks - such as your TCP/IP software. For
example, Open Transport 1.1 from Apple has been modified to support faster TCP/IP
access if installed on a Web server.
Step 7 - Getting more information on operating a Web server: For more
information on finding, downloading, installing, and operating a Web server, see the
Internet Resources below. For example, Web66 has information on setting up a
Macintosh and Windows/95 Web server, and there are many other useful resources
available.
IIS
Internet Information Services (IIS, formerly Internet Information Server) is an
extensible web server created byMicrosoft for use with Windows NT family.[2]
IIS
supports HTTP, HTTPS, FTP, FTPS, SMTP and NNTP. It has been an integral part of the
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 23
Windows NT family since Windows NT 4.0, though it may be absent from some editions
(e.g. Windows XP Home edition). IIS is not turned on by default when Windows is installed.
The IIS Manager is accessed through the Microsoft Management Console or Administrative
Tools in the Control Panel.
Case Study of IIS
Internet Information Services (IIS) on Windows Server 2012 is NUMA-aware and
provides the optimal configuration for the IT administrators. The following section
describes how IIS 8.0 takes advantage of NUMA hardware to provide optimal
performance.
IIS supports following two ways of partitioning the workload:
1. Run multiple worker processes in one application pool (i.e. web garden).
If you are using this mode, by default, the application pool is configured to run in a
single worker process. For maximum performance, you should consider running the
same number of worker processes as there are NUMA nodes, so that there is 1:1
affinity between the worker processes and NUMA nodes. This can be done by setting
"Maximum Worker Processes" App Pool setting to 0. When this setting is configured,
IIS will determine how many NUMA nodes are available on the hardware and will
start the same number of worker processes.
2. Run multiple applications pools in single workload/site.
In this configuration, the workload/site is divided into multiple application pools. For
example, the site may contain several applications that are configured to run in
separate application pools. Effectively, this configuration results in running multiple
IIS worker processes for the workload/site and IIS intelligently distributes process
affinity for maximum performance.
Depending upon the workload, administrator partitions the workload into multiple worker
processes. Once a workload is correctly partitioned, IIS 8.0 identifies the most optimal
NUMA node when the IIS worker process is about to start. By default, IIS picks the
NUMA node with the most available memory. IIS has the knowledge of the memory
consumption by each NUMA node and uses this information to "load balance" the IIS
worker processes. This option is different from Windows default of round-robin and
specially designed for IIS workload.
Finally, there are two different ways to configure the affinity for threads from an IIS
worker process to a NUMA node.
1. Soft Affinity (default)
With soft affinity, if other NUMA nodes have available cycles, the threads from an
IIS worker process may get scheduled to a NUMA node that was not configured for
affinity. This approach helps to maximize all available resources on the system as
whole.
Unit I/ Web Engineering Truba College of Science & Technology, Bhopal
Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE) Page 24
2. Hard Affinity
With hard affinity, regardless of what the load may be on other NUMA nodes on the
system; all threads from an IIS worker process are assigned to the chosen NUMA
node that was selected for affinity using the design above.
sApache
The Apache HTTP Server, commonly referred to as Apache, is a web server application
notable for playing a key role in the initial growth of the World Wide Web. Originally based
on the NCSA HTTPd server, development of Apache began in early 1995 after work on the
NCSA code stalled. Apache quickly overtook NCSA HTTPd as the dominant HTTP server,
and has remained the most popular HTTP server in use since April 1996. In 2009, it became
the first web server software to serve more than 100 million websites.
Apache is developed and maintained by an open community of developers under the auspices
of the Apache Software Foundation. Most commonly used on a Unix-like system, the
software is available for a wide variety of operating systems, including Unix,
FreeBSD, Linux, Solaris, Novell NetWare, OS X, Microsoft
Windows, OS/2, TPF, OpenVMS and eComStation. Released under theApache License,
Apache is open-source software.
Although the main design goal of Apache is not to be the "fastest" web server, Apache does
have performance similar to other "high-performance" web servers. Instead of implementing
a single architecture, Apache provides a variety of MultiProcessing Modules (MPMs) which
allow Apache to run in a process-based, hybrid (process and thread) or event-hybrid mode, to
better match the demands of each particular infrastructure. This implies that the choice of
correct MPM and the correct configuration is important. Where compromises in performance
need to be made, the design of Apache is to reduce latency and increase throughput, relative
to simply handling more requests, thus ensuring consistent and reliable processing of requests
within reasonable time-frames.