web engineering unit 1 as per rgpv syllabus

Unit I/ Web Engineering Truba College of Science & Technology, Bhopal

Prepaerd By: Ms. Nandini Sharma (DEPTT. CSE)

World Wide Web is important to know that this is not a synonym for the Internet. The

World Wide Web, or just "the Web," as ordinary people call it, is a subset of the Internet. The

Web consists of pages that can be accessed using a Web browser. The Internet is the actual

network of networks where all the information resides. Things like Telnet, FTP, Internet

gaming, Internet Relay Chat (IRC), and e-mail are all part of the Internet, but are not part of

the World Wide Web. The Hyper-Text Transfer Protocol (HTTP) is the method used to

transfer Web pages to your computer. With hypertext, a word or phrase can contain a link to

another Web site. All Web pages are written in the hyper-text markup language (HTML),

which works in conjunction with HTTP.

Introduction to TCP/IP

TCP/IP is made up of two acronyms, TCP, for Transmission Control Protocol, and IP,

for Internet Protocol. TCP handles packet flow between systems and IP handles the routing of

packets. However, that is a simplistic answer that we will expound on further.

All modern networks are now designed using a layered approach. Each layer presents a

predefined interface to the layer above it. By doing so, a modular design can be developed so

as to minimize problems in the development of new applications or in adding new interfaces.

The ISO/OSI protocol with seven layers is the usual reference model. SInce TCP/IP was

designed before the ISO model was developed it has four layers; however the differences

between the two are mostly minor. Below, is a comparison of the TCP/IP and OSI protocol

stacks:

OSI Protocol Stack

7. Application -- End user services such as email.

6. Presentation -- Data problems and data compression

5. Session -- Authentication and authorization

4. Transport -- Guarantee end-to-end delivery of packets

3. Network -- Packet routing

2. Data Link -- Transmit and receive packets

1. Physical -- The cable or physical connection itself.

TCP/IP Protocol Stack.

5. Application -- Authentication, compression, and end user services.

4. Transport -- Handles the flow of data between systems and

provides access to the network for applications via

the (BSD socket library)

3. Network -- Packet routing

2. Link -- Kernel OS/device driver interface to the network

interface on the computer.

Below are the major difference between the OSI and TCP/IP:

The application layer in TCP/IP handles the responsibilities of layers 5,6, and 7 in the

OSI model.



The transport layer in TCP/IP does not always gaurentee reliable delivery of packets

as the transport layer in the OSI model does. TCP/IP offers an option called UDP that

does not gaurentee reliable packet delivery.

Software Components of TCP/IP

Application Layer

Some of the applications we will cover are SMTP (mail), Telnet, FTP, Rlogin, NFS,

NIS, and LPD

Transport Layer

The transport uses two protocols, UDP and TCP. UDP which stands for User

Datagram Protocol does not guarantee packet delivery and applications which use this

must provide their own means of verifying delivery. TCP does guarantee delivery of

packets to the applications which use it.

Network Layer

The network layer is concerned with packet routing and used low level protocols such

as ICMP, IP, and IGMP. In addition, routing protocols such as RIP, OSPF, and EGP

will be discussed.

Link Layer

The link layer is concerned with the actual transmittal of packets as well as IP to

Ethernet address translation. This layer is concerned with Arp, the device driver, and

RARP.

WAP

In 1997, several companies organized an industry group called the WAP Forum. This group

produces the WAP specification, a (long and detailed) series of technical documents that

define standards for implementing wireless network applications. Hundreds of industry firms

have given strong backing to the WAP Forum, so the technology should become widely

adopted, and it is already well-hyped.

WAP specifies an architecture based on layers that follows the OSI model fairly closely. The

WAP model, or stack as it is commonly known, is illustrated below.

The WAP Model

Application Layer

WAP's application layer is the Wireless Application Environment (WAE). WAE directly

supports WAP application development with Wireless Markup Language (WML) instead of

HTML and WML Script instead of JavaScript. WAE also includes the Wireless Telephony

http://compnetworking.about.com/library/glossary/bldef-osi.htm



Application Interface (WTAI, or WTA for short) that provides a programming interface to

telephones for initiating calls, sending text messages, and other networking capability.

Session Layer

WAP's session layer is the Wireless Session Protocol (WSP). WSP is the equivalent to HTTP

for WAP browsers. WAP involves browsers and servers just like the Web, but HTTP was not

a practical choice for WAP because of its relative inefficiency on the wire. WSP conserves

precious bandwidth on wireless links; in particular, WSP works with relatively compact

binary data where HTTP works mainly with text data.

Transaction, Security, and Transport Layers

These three protocols can be thought of as "glue layers" in WAP:

Wireless Transaction Protocol (WTP)

Wireless Transaction Layer Security (WTLS)

Wireless Datagram Protocol (WDP)

WTP provides transaction-level services for both reliable and unreliable transports. It

prevents duplicate copies of packets from being received by a destination, and it supports

retransmission, if necessary, in cases where packets are dropped. In this respect, WTP is

analogous to TCP. However, WTP also differs from TCP. WTP is essentially a pared-down

TCP that squeezes some extra performance from the network.

WTLS provides authentication and encryption functionality analogous to Secure Sockets

Layer (SSL) in Web networking. Like SSL, WTLS is optional and used only when the

content server requires it.

WDP implements an abstraction layer to lower-level network protocols; it performs functions

similar to UDP. WDP is the bottom layer of the WAP stack, but it does not implement

physical or data link capability. To build a complete network service, the WAP stack must be

implemented on some low-level legacy interface not technically part of the model. These

interfaces, called bearer services or bearers, can be IP-based or non-IP based.

Bearer Interfaces

WAP supports dial-up networking using IP and Point-to-Point Protocol (PPP) as the bearer

interface underneath WDP. It also supports Short Message Service (SMS) and General

Packet Radio System (GPRS). SMS passes text and binary data between digital phones.

GPRS is a relatively new technology that implements faster, "always-on" connections for

wireless devices; GPRS actually runs on top of IP.

Domain Name System (DNS) enables you to use hierarchical, friendly names to easily

locate computers and other resources on an IP network. The following sections describe the

basic DNS concepts, including features explained in newer Requests for Comments (RFCs),

such as dynamic update, from the Internet Engineering Task Force (IETF). The

Microsoft® Windows® 2000–specific implementation of DNS is not covered within this

chapter, except where indicated.



For information about the Windows 2000 implementation of DNS, see "Windows 2000

DNS" in this book.

DNS is a distributed database that contains mappings of DNS domain names to data. It is also

a protocol for Transmission Control Protocol/Internet Protocol (TCP/IP) networks, defined

by the Requests for Comments (RFCs) that pertain to DNS. DNS defines the following:

Mechanism for querying and updating the database.

Mechanism for replicating the information in the database among servers Schema for

the database

DNS servers store information about no zones, one zone, or multiple zones. When a

DNS server receives a DNS query, it attempts to locate the requested information by

reteivingdata from its local zones. If this fails because the server is not authoritative

for the DNS domain requested and thus does not have the data for the requested

domain, the server can check its cache, communicate with other DNS servers to

resolve the request, or refer the client to another DNS server that might know the

answer.

DNS servers can host primary and secondary zones. You can configure servers to host

as many different primary or secondary zones as is practical, which means that a

server might host the primary copy of one zone and the secondary copy of another

zone, or it might host only the primary or only the secondary copy for a zone. For

each zone, the server that hosts the primary zones is considered the primary server for

that zone, and the server that hosts the secondary zones is considered the secondary

server for that zone.

Primary zones are locally updated. When a change is made to the zone data, such as

delegating a portion of the zone to another DNS server or adding resource records in

the zone, these changes must be made on the primary DNS server for that zone, so

that the new information can be entered in the local zone.

In contrast, secondary zones are replicated from another server. When a zone is

defined on a secondary server for that zone, the zone is configured with the IP address

of the server from which the zone is to be replicated. The server from which the zone

file replicates can either be a primary or secondary server for the zone, and is

sometimes called a master server for the secondary zone.

When a secondary server for the zone starts up, it contacts the master server for the

zone and initiates a zone transfer. The secondary server for the zone also periodically

contacts the master server for the zone to see whether the zone data has changed. If

so, it can initiate a transfer of the zones, referred to as a zone transfer. For more

information about zone transfers, see "Zone Transfer" later in this chapter.

You must have a primary server for each zone. Additionally, you should have at least

one secondary server for each zone. Otherwise, if the primary server for the zone goes

down, no one will be able to resolve the names in that zone.

Secondary servers provide the following benefits:

Fault tolerance When a secondary server is configured for a zone, clients can still

resolve names for that zone even if the primary server for the zone goes down.

Generally, plan to install the primary and secondary servers for the zone on different

subnets. Therefore, if connectivity to one subnet is lost, DNS clients can still direct

queries to the name server on the other subnet.

Reduction of traffic on wide area links You can add a secondary server for the

zone in a remote location that has a large number of clients, and then configure the



client to try those servers first. This can prevent clients from communicating across

slow links for DNS queries.

Reduction of load on the primary server for the zone The secondary server can

answer queries for the zone, reducing the number of queries the primary server for the

zone must answer

Electronic mail (e mail) is one of the use of the Electronic mail (e mail) is one of the use of

the World Wide Web, according to most businesses, improves productivity. Traditional

methods of sending mail within an office environment are inefficient, as it normally requires

an individual requesting a secretary to type the letter. This must then be proof-read and sent

through the internal mail system, which is relatively slow and can be open to security

breaches.

A faster method, and more secure method of sending information is to use electronic mail

where by a computer user can exchange messages with other computer users (or groups of

users) via a communications network. Electronic mail is one of the most popular uses of the

Internet. For example, a memo with 100 words will be sent in a fraction of a second. Other

types of data can also be sent with mail message such as images, sound, and so on.

The main standards that relate to the protocols of email transmission and reception are:

Simple Mail Transfer Protocol (SMTP) - which is used with the TCP/IP protocol

suite? It has traditionally been limited to the text based electronic messages.

Multipurpose Internet Mail Extension (MIME) - Which allows the transmission and

reception of mail that contains various types of data, such as speech, images, and

motion video? It is a newer standard than STMP and uses much of its basic protocol.

S/MIME (Secure MIME) - RSA Data security created S/MIME which supports

encrypted e-mail transfer and digitally signed electronic mail.

A typical email-architecture contains four elements:

1. Post offices- where outgoing messages are temporally buffered (stored) before

transmission and where incoming messages are stored. The post office runs the server

software capable of routing messages (a message transfer agent) and maintaining the post

office database.

2. Message transfer agents- for forwarding messages between post offices and to the

destination clients. The software can either reside on the local post office or on a physically

separate server.

3. Gateways-which provide parts of the message transfer agent functionally. They translate

between different e-mail systems, different e-mail addressing schemes and messaging

protocols.

4. E-mail clients- normally the computer which connects to the post office. It contains three

parts:

E-mail Application Program Interface (API), such as MAPI, VIM, MHS and CMC.



Messaging protocol. The main messaging protocols are SMTP or X.400.STMP is

defined in RFC 822 and RFC 821, Where as x.400 is an OSI-defined e-mail message

delivery standard.

Network transport protocol, such as Ethernet, FDDI, and so on.

Post Office Protocol (POP)

The Post Office Protocol was first defined in RFC918, but has since been replaced with POP-3,

which is defined in RFC1939.The objective of POP is to create a standard method for users to access

a mail server-mail messages are uploaded onto a mail server using SMTP, and then downloaded using POP. With POP the server listens for a connection, and when this occurs the server sends a greeting

message, and waits a command. The standard port reserved for POP transactions is 110.Like SMTP, it

consists of case-intensive commands with one or more arguments, followed by the Carriage Return(CR) and line feed(LF) characters typically represented by CRLF. These keywords are either

three or four characters long.

The client opens the connection by sending a USER and a PASS command, for the user name and

password, respectively. If successful this will give accesses to the mailbox on the server. The client

can then read the messages with the following commands:

RDEL.Reads and deletes all the messages from the mailbox.

RTER.Reads the messages from the mailbox, and keeps them on the server.



The commands and responses for POP can be summarized by:

Command Description Possible responses

USER name Define the name of the user +OK, -ERR

PASS password Define the password for the user +OK, -ERR

RETR mailbox Begins a mail reading transaction, and deletes

the messages once they have been transferred. The messages are not deleted until a RCEV

command.

+Val, -ERR

RDEL mailbox Begins mail reading transaction, and deletes the

messages once they have been transferred. The messages are not deleted until a RCEV

command.

+Val, -ERR

RVEC Acknowledges the reception of the mail

messages

+OK, or aborted connection

RVCD Confirms that the client has received the mail messages

+OK, -ERR

QUIT Client wishes to end the session. +OK, then close

NOOP No operation but prompts the mail server for an

OK response.

+OK

RSET Sent by the client to inform the server to abort the current transaction

+OK

Telnet

Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional

interactive text-oriented communication facility using a virtual terminal connection. User data is

interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over

the Transmission Control Protocol (TCP).

Telnet is a client-server protocol, based on a reliable connection-oriented transport. Typically, this

protocol is used to establish a connection to Transmission Control Protocol (TCP) port number 23,

where a Telnet server application (telnetd) is listening. Telnet, however, predates TCP/IP and was

originally run over Network Control Program (NCP) protocols.

Before March 5, 1973, Telnet was an ad hoc protocol with no official definition.[1]

Essentially, it used

an 8-bit channel to exchange 7-bit ASCII data. Any byte with the high bit set was a special Telnet

character. On March 5, 1973, a Telnet protocol standard was defined at UCLA[2]

with the publication

of two NIC documents: Telnet Protocol Specification, NIC #15372, and Telnet Option Specifications,

NIC #15373.

Telnet Protocol

http://en.wikipedia.org/wiki/Network_protocol

http://en.wikipedia.org/wiki/Internet

http://en.wikipedia.org/wiki/Local_Area_Network

http://en.wikipedia.org/wiki/Text_terminal

http://en.wikipedia.org/wiki/In-band_signaling

http://en.wikipedia.org/wiki/Byte_oriented

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

http://en.wikipedia.org/wiki/Client-server_protocol

http://en.wikipedia.org/wiki/Reliability_(computer_networking)

http://en.wikipedia.org/wiki/Connection-oriented

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

http://en.wikipedia.org/wiki/Port_number

http://en.wikipedia.org/wiki/Network_Control_Program

http://en.wikipedia.org/wiki/Telnet#cite_note-1

http://en.wikipedia.org/wiki/University_of_California,_Los_Angeles





TELNET is a standard protocol. Its status is recommended.

It is described in RFC 854 - TELNET Protocol Specifications and RFC 855 -

TELNET Option Specifications.

Telnet was the first application demonstrated on the four-IMP (Interface Message

Processor) network installed by December 1969. The final edition took 14 more years

to develop, culminating in Internet Standard #8 in 1983, three years after the final

TCP specification was ratified.

Telnet even predates internetworking and the modern IP packet and TCP transport

layers.

The TELNET protocol provides a standardized interface, through which a program on

one host (the TELNET client) may access the resources of another host (the TELNET

server) as though the client were a local terminal connected to the server.

For example, a user on a workstation on a LAN may connect to a host attached to the

LAN as though the workstation were a terminal attached directly to the host. Of

course, TELNET may be used across WANs as well as LANs.

Most TELNET implementations do not provide you with graphics capabilities.

TELNET Overview

TELNET is a general protocol, meant to support logging in from almost any type of

terminal to almost any type of computer.

It allows a user at one site to establish a TCP connection to a login server or terminal

server at another site.



A TELNET server generally listens on TCP Port 23.

How it works

A user is logged in to the local system, and invokes a TELNET program (the

TELNET client) by typing

telnet xxx.xxx.xxx

where xxx.xxx.xxx is either a host name or an IP address.

The TELNET client is started on the local machine (if it isn't already running). That

client establishes a TCP connection with the TELNET server on the destination

system.

Once the connection has been established, the client program accepts keystrokes from

the user and relays them, generally one character at a time, to the TELNET server.

The server on the destination machine accepts the characters sent to it by the client,

and passes them to a terminal server.

A "terminal server" is just some facility provided by the operating system for entering

keystrokes from a user's keyboard.

o The terminal server treats the remote user as it would any other user logged in

to the system, including relaying commands to other applications.

o The terminal server passes outputs back to the TELNET server, which relays

them to the client, which displays them on the user's screen.

In general, a TELNET server is implemented as a master server with some number of

slave servers. The master server listens for service requests from clients. When it

hears one, it spawns a slave server to handle that specific request, while the master

goes back to listening for more requests.

The only thing that makes TELNET hard to implement is the heterogeneity of the

terminals and operating systems that must be supported. Not all of them use the same

control characters for the same purposes.

To accommodate this heterogeneity, TELNET defines a Network Virtual Terminal

(NVT). Any user TELNET ting in to a remote site is deemed to be on an NVT,

regardless of the actual terminal type being used.



It is the responsibility of the client program to translate user keystrokes from the

actual terminal type into NVT format, and of the server program to translate NVT

characters into the format needed by the destination host. For data sent back from the

destination host, the translation is the reverse.

NVT format defines all characters to be 8 bits (one byte) long. At startup, 7 bit US

ASCII is used for data; bytes with the high order bit = 1 are command sequences.

The 128 7-bit long US ASCII characters are divided into 95 printable characters and

33 control codes. NVT maps the 95 printable characters into their defined values -

decimal 65 = "A", decimal 97 = "a", etc.

The 33 control codes are defined for NVT as:

ASCII Code Decimal value Meaning

NUL 0 NO - OP

BEL 7 Ring "terminal bell"

BS 8 Backspace; move cursor left

HT 9 Horizontal tab; move cursor right

LF 10 Line feed; move down one line stay in same column

VT 11 Vertical tab; move cursor down

FF 12 Form Feed

CR 13 Carriage return; move cursor beginning of current line

all others NO - OP

NVT defines end-of-line to be a CR-LF combination - the two-character sequence.

In addition to the 128 characters mentioned above, there are 128 other possible

characters in an 8-bit encoding scheme. NVT uses these 128 (with decimal values 128

through 255, inclusive) to pass control functions from client to server. More on this

later.

TELNET Operation



The TELNET protocol is based on three ideas:

o The Network Virtual Terminal (NVT) concept. An NVT is an imaginary

device having a basic structure common to a wide range of real terminals.

Each host maps its own terminal characteristics to those of an NVT, and

assumes that every other host will do the same.

o A symmetric view of terminals and processes.

o Negotiation of terminal options. The principle of negotiated options is used by

the TELNET protocol, because many hosts wish to provide additional

services, beyond those available with the NVT. Various options may be

negotiated. Server and client use a set of conventions to establish the

operational characteristics of their TELNET connection via the ``DO, DON'T,

WILL, WON'T'' mechanism discussed later in this document.

The two hosts begin by verifying their mutual understanding. Once this initial

negotiation is complete, they are capable of working on the minimum level

implemented by the NVT.

After this minimum understanding is achieved, they can negotiate additional options

to extend the capabilities of the NVT to reflect more accurately the capabilities of the

real hardware in use.

Because of the symmetric model used by TELNET, both the host and the client may

propose additional options to be used.

The set of options is not part of the TELNET protocol, so that new terminal features

can be incorporated without changing the TELNET protocol (mouse?).

All TELNET commands and data flow through the same TCP connection.

Commands start with a special character called the Interpret as Command escape

character (IAC).

The IAC code is 255.

If a 255 is sent as data - it must be followed by another 255

Each receiver must look at each byte that arrives and look for IAC. If IAC is found

and the next byte is IAC - a single byte is presented to the application/terminal.

If IAC is followed by any other code - the TELNET layer interprets this as a

command.



HyperText Transfer Protocol, HTTP is a set of standards that allow users of the World

Wide Web to exchange information found on web pages. When wanting to access any web

page enter http:// in front of the web address, which tells the browser to communicate over

HTTP. For example, the full URL for Computer Hope is http://www.computerhope.com.

Today's modern browsers no longer require HTTP in front of the URL since it is the default

method of communication. However, it is still used in browsers because of the need to access

other protocols such as FTP through the browser. Below are a few of the major facts on

HTTP.

The term HTTP was coined by Ted Nelson.

HTTP commonly utilizes port 80, 8008, or 8080.

HTTP/0.9 was the first version of the HTTP and was introduced in 1991.

HTTP/1.0 is specified in RFC 1945 and introduced in 1996.

HTTP/1.1 is specified in RFC 2616 and officially released in January 1997.

HTTPS

Short for Hypertext Transfer Protocol over Secure, HTTPS is a secure method of accessing or

sending information across a web page. All data sent over HTTPS is encrypted before it is

sent, this prevents anyone from understanding that information if intercepted. Because data is

encrypted over HTTPS, it is slower than HTTP, which is why HTTPS is only used when

requiring login information or with pages that contain sensitive information such as an online

bank web page.

HTTPS uses port 443 to transfer its information.

HTTPS is first used in HTTP/1.1 and is defined in RFC 2616.

http://www.computerhope.com/people/ted_nelson.htm

http://www.computerhope.com/jargon/p/port.htm

http://www.computerhope.com/history/1991.htm

http://www.computerhope.com/jargon/r/rfc.htm



http://www.computerhope.com/jargon/e/encrypt.htm

http://www.computerhope.com/jargon/p/port.htm



File Transfer Protocol is a protocol through which internet users can upload files from their

computers to a website or download files from a website to their PCs. Originated by Abhay

Bhushan in 1971 for use in the military and scientific research network known as ARPANET,

FTP has evolved into a protocol for far wider applications on the World Wide Web with

numerous revisions throughout the years.

FTP is the easiest way to transfer files between computers via the internet, and utilizes

TCP, transmission control protocol, and IP, internet protocol, systems to perform

uploading and downloading tasks.

How It Works

TCP and IP are the two major protocols that keep the internet running smoothly. TCP

manages data transfer while IP directs traffic to internet addresses. FTP is an underling of

TCP and shuttles files back and forth between FTP server and FTP client. Because FTP

requires that two ports be open--the server's and the client's--it facilitates the exchange of

large files of information.

First, you as client make a TCP control connection to the FTP server's port 21 which will

remain open during the transfer process. In response, the FTP server opens a second

connection that is the data connection from the server's port 20 to your computer.

Using the standard active mode of FTP, your computer communicates the port number

where it will stand by to receive information from the controller and the IP address--

internet location--from which or to which you want files to be transferred.

If you are using a public--or anonymous--FTP server, you will not need proprietary sign-

in information to make a file transfer, but you may be asked to enter your email address.

If you are using a private FTP server, however, you must sign in with a user name and

password to initiate the exchange of data.

Modes of File Transfer

Three modes of transferring data are available via FTP. The system can use a stream

mode, in which it transfers files as a continuous stream from port to port with no

intervention or processing of information into different formats. For example, in a transfer

of data between two computers with identical operating systems, FTP does not need to

modify the files.

In block mode, FTP divides the data to be transferred into blocks of information, each

with a header, byte count, and data field. In the third mode of transfer, the compressed

mode, FTP compresses the files by encoding them. Often these modifications of data are

necessary for successful transfer because the file sender and file receiver do not have

compatible data storage systems.

Passive FTP



Should your computer have firewall protection, you may have difficulties using FTP. A

firewall protects your PC by preventing internet sites from initiating file transfers. You

can circumvent your firewall's function by using the PASV command that reverses the

FTP process, allowing your computer to initiate the transfer request.

Many corporate networks use PASV FTP as a security measure to protect their internal

network from assaults of unwanted external files. Also called passive FTP, the process

requires that any transfer of information from the internet or other external source must be

initiated by the client or private network rather than the external source.

Further FTP Security

In response to the need for a more secure transfer process for sensitive information such

as financial data, Netscape developed a Secure Sockets Layer (SSL) protocol in 1994 that

it used primarily to secure HTTP--HyperText Transfer Protocol--transmissions from

tampering and eavesdropping. The industry subsequently applied this security protocol to

FTP transfers, developing SFTP, a file transfer protocol armoured with SSL for protection

from hackers

Introduction of Browsing

A browser is a program on your computer that enables you to search ("surf") and retrieve

information on the World Wide Web (WWW), which is part of the Internet. The Web is

simply a large number of computers linked together in a global network, that can be

accessed using an address (URL, Uniform Resource Locator,

e.g. http://www.veths.no for the Oslo Veterinary School), in the same way that you can

phone anyone in the world given their telephone number.

URLs are often long and therefore easy to type incorrectly. They all begin with http://,

and many (but not all) begin with http://www. In many cases the first part (http://, or

even http://www.) can be omitted, and you will still be able to access the page. Try this

with http://www.cnn.com. URLs are constructed in a standard fashion. This may be of

use to you. Take, for example, the address of this page

http://oslovet.veths.no/teaching/internet/basics.html

Before you conduct a search, it is important to consider, among others, the following

points:

1. Is your choice of search term is adequate, too restrictive or too general?

2. Is the search you have planned to undertake most suited for a search engine that

categorizes web sites, so that you can browse through appropriate subcategories when the

first results are returned?

3. Are you more interested in using a search engine that merely returns all the web pages

it has found containing the search term?

4. Have you read the Search Help pages that most search pages offer? These will tell

you how the search engine conducts the search, and therefore how you ought to plan your

search.

http://www.veths.no/

http://www.cnn.com/

http://oslovet.veths.no/teaching/internet/basics.html



5. Bear in mind the fact that engines differ in their coverage of the Internet, their speed

and whether they are largely compiled manually by people or automatically by 'robots'

that scan the Internet.

Introduction of Search Engine

A search engine is a web site that collects and organizes content from all over the

internet. Those wishing to locate something would enter a query about what they'd like to

find and the engine provides links to content that matches what they want.

As regards real estate, millions of searches are done each day on multiple search engines

for search queries like "Atlanta real estate", or "atlanta real estate". The search engine

returns results for real estate related sites and content for the Atlanta, GA area in this case.

The sites are ranked by highly secret and complex formulas. These formulas are also

changed frequently by the engines.

Though there are many who attempt to manipulate their sites to get higher placement in

results, it's generally best to provide highly relevant real estate and area content and make

it very useful for your site visitors. As all the engines are striving to be the most popular

based on results that are closest to what the searcher is looking for, it can only be a good

strategy to provide that relevant content.

There are two main kinds of search services commonly used on the Web: the index, and

the directory or subject guide. One way to think of the differences between these two

kinds of engines is to think of web sites as books. Indexes will catalogue every word in

every book it looks at, and will list for you each page that contains word(s) you're looking

for. Directories and Subject Guides take the overall subject matter of the books it looks at

and lists the front covers of the books that match your word(s).

Indexes

Alta Vista and HotBot, -- both popular search indexes. Indexes regularly scan the Internet

for Web pages and record the HTML content and key words. They also have the ability to

follow any links associated with scanned pages and get even more information.

The job of compiling data for indexes is done by spiders (also called robots, bots,

or crawlers ergo the names HotBot and WebCrawler), software programmed by a human

to automatically gather information from all over the 'Net based on specific or broad

search criteria. Most of the time spiders scan pages on the fly, without the owner's

knowledge or consent (if you don't want some or all of your web pages scanned by

spiders, you can write some HTML into your page to keep them out).

The advantages of this kind of service is their data bases are very large and updated often

by spiders working around the clock. They catalogue Web pages in a computational

manner without human intervention. A search engine's spider catalogue all the pages of a

http://www.altavista.digital.com/

http://www.hotbot.com/

http://www.hotbot.com/

http://www.webcrawler.com/



given web site, listing for you only the pages that match the words or phrases you're

searching for.

For instance, if you're looking for information about spiders, you'll get over thirty-nine

thousand hits (links to a Web page) from Alta Vista with the word spiders in them. This

means not only will you get pages referencing Internet robots, you'll mostly get the eight-

legged, living-in-your-shoe-and-going-to-bite-you kind of spider.

A drawback to using services of this type is that sifting through so many hits to find what

you're looking for is sometimes a daunting task. Some indexes include a number of

options you can utilize to help narrow down your search criteria, such as search for this

exact phrase or search for any of the words on HotBot.

Directories/Subject Guides

On the other hand, Yahoo! and Magellan are hierarchical directories of web page

subjects. Each reference is entered and updated by a person manually, placing each web

address in a certain context much like your telephone company's Yellow Page directory.

People catalog the sites in a directory, so the hits often include reviews and/or

recommendations, which can guide you through the content of the pages quicker and

more easily.

To have a Web site listed in a directory you must submit it yourself, or you can hire a

company to do it for you. The directory has the last word on where they catalog your site.

This means directories contain far fewer sites than indexes do, but they are better targeted

to what word(s) you use to search.

For example, you enter the same key word spiders in Yahoo!, and this time you'll get a

list of categories like Science: Zoology: Animals, Insects, and Pets:

Arachnids or Computers and Internet: Internet: World Wide Web: Searching the Web:

Robots, Spiders, etc. Documentation which can narrow and shorten your search

significantly. You'll get fewer hits overall, and hits on pages with headings and content

within the context of the keywords you enter.

One drawback is that Yahoo's hits are usually to home pages (the first page of a site)

only, for instance it would hit a home page calledNancy's Page-O-Spiders but not Nancy's

Home which contains a page exclusively on spiders. Another drawback to directories is

that manually updating directories is tedious and time consuming, and that means old

sites that are no longer valid (dead links) are often listed long after their demise.

Hybrids

Some search services use both schemes -- they are both an index and a directory,

like Infoseek and Excite. These services occasionally send out a spider to collect and cull

Web sites, alongside people cataloging sites that are submitted by Web developers.

http://www.yahoo.com/

http://www.mckinley.com/

http://www.infoseek.com/

http://www.excite.com/



Working of Search engine

When someone is looking for comprehensive information on any particular aspect or

issue by keying in the relevant keywords on the search engine (Google, Yahoo, Hotmail,

MSN and many more), only the most recent quality content is sifted through for search

engine optimization. The staler contents (older than say 12 hours) may have a higher page

rankings based on specific parameters like previous user traffic density but the more

current and refreshed contents have a much better chance of getting more hits from online

users. The reason is simple but prosaic: these contents are more updated and contain more

relevant information that the user is looking for. Search engines are becoming more and

more like archives and libraries that one may look up anytime for any sort of information

or data that can envisage or is looking for.

These search engines are the veritable repositories of the 21st century. The World Wide

Web has become a realm where almost every website or portal is a kind of social

networking site where online users share and interact with each other, exchange news and

views on a variety of subjects of common interests. Every site is a sort of intranet- a

microcosm within a much larger realm. Many social networking sites like Facebook,

Twitter, LinkedIn, Pinterest etc. compete fiercely amongst themselves to be the online

user’s most preferred social automation medium. These sites now have a global reach and

each one claims to have the latest updates on any social, political, and economic event.

Almost at the end of 2011, I had penned a post on the theme: -Google’s New Freshness

Update: Social Media has changed the expectations of searchers. Just a month prior to

that, I wrote on how Yahoo might turn to social media to find out new URLs on fresh and

current topics in Do Search Engines Use Social Media to Discover New Topics. We are

http://www.seocosmo.com/blog/wp-content/uploads/2012/08/indexing-content.gif



left with speculating on what Bing is upto as both Google and Yahoo are enthusiastically

scouting for newer and innovative ways of optimizing search results for quality content.

Google allows me to filter and refine my search results upto the last hour. You can

optimize your search for the last 24 hours, the last week, month, and the entire year or at

least upto a specific period. Same goes for Yahoo but though Yahoo sources its data from

Bing, Bing is laidback when it comes to fine tuning search results.

The mechanism for refining search results harnesses an “in-memory” index apart from the

inverted index process used by Bing. The in-memory catalogue or index would be

restructured throughout the entire day and the contents are more current than Bing’s

inverted index. Content from the in memory index can be compressed and catalogued

inside the inverted index either on an everyday basis or for fixed time periods.

When someone searches for information on any specific issue or subject, the inverted

index returns the results on a primary basis and if one is looking for more detailed data,

the in-memory indices come to the aid as this indexing mechanism has been filtering and

adding information on that particular subject during the last 12 hours. Ultimately, the

results that are thrown up are prioritized in terms of the number of times they have been

searched by users.

Now, we are not aware whether Microsoft has already exploited the patented process or

they concurred on using a different process. This process might be antiquated by now. We

are also not aware whether the percolating or filtering method used by Google known as

Caffeine update used to graduate to a more progressive system for their batch updating is

still in place.

It seems that the patented process furnishes the latest relevant information or data to

online users and simultaneously hangs on to the batch process where the current search

results are condensed or compressed virtually for storage in inverted databases on a

regular basis.

Introduction of Web Server

A Web server is a program that, using the client/server model and the World Wide Web's

Hypertext Transfer Protocol ( HTTP ), serves the files that form Web pages to Web users

(whose computers contain HTTP clients that forward their requests). Every computer on

the Internet that contains a Web site must have a Web server program. Two leading Web

servers are Apache , the most widely-installed Web server, and Microsoft's Internet

Information Server ( IIS ). Other Web servers include Novell's Web Server for users of its

NetWare operating system and IBM's family of Lotus Domino servers, primarily for

IBM's OS/390 and AS/400 customers.

Web servers often come as part of a larger package of Internet- and intranet-related

programs for serving e-mail, downloading requests for File Transfer Protocol ( FTP )

files, and building and publishing Web pages. Considerations in choosing a Web server

http://www.seocosmo.com/bulk-content-writing.php

http://whatis.techtarget.com/definition/server

http://searchnetworking.techtarget.com/definition/client-server

http://searchwindevelopment.techtarget.com/definition/HTTP

http://searchcio-midmarket.techtarget.com/definition/Apache

http://searchwindowsserver.techtarget.com/definition/IIS

http://searchnetworking.techtarget.com/definition/NetWare

http://searchdatacenter.techtarget.com/definition/OS-390

http://search400.techtarget.com/definition/AS-400

http://searchenterprisewan.techtarget.com/definition/File-Transfer-Protocol



include how well it works with the operating system and other servers, its ability to

handle server-side programming, security characteristics, and publishing, search engine,

and site building tools that may come with it.

b

There's a common set of features that you'll find on most web servers. Because web

servers are built specifically to host websites, their features are typically focussed around

setting up and maintaining a website's hosting environment.

Most web servers have features that allow you to do the following:

Create one or more websites. (No I don't mean build a set of web pages. What I mean

is, set up the website in the web server, so that the website can be viewed via HTTP)

Configure log file settings, including where the log files are saved, what data to

include on the log files etc. (Log files can be used to analyse traffic etc)

Configure website/directory security. For example, which user accounts are/aren't

allowed to view the website, which IP addresses are/aren't allowed to view the

website etc.

Create an FTP site. An FTP site allows users to transfer files to and from the site.

Create virtual directories, and map them to physical directories

Configure/nominate custom error pages. This allows you to build and display user

friendly error messages on your website. For example, you can specify which page is

displayed when a user tries to access a page that doesn't exist (i.e. a "404 error").

Specify default documents. Default documents are those that are displayed when no

file name is specified. For example, if you open "http://localhost", which file should

be displayed? This is typically "index.html" or similar but it doesn't need to be. You

could nominate "index.cfm"

Caching in Web Server

A web cache is a mechanism for the temporary storage (caching) of web documents,

such as HTML pages and images, to reduce bandwidth usage, server load, and

perceived lag. A web cache stores copies of documents passing through it; subsequent

requests may be satisfied from the cache if certain conditions are met.[1]

Google's

cache link in its search results provides a way of retrieving information from websites

that have recently gone down and a way of retrieving data more quickly than by

clicking the direct link.

http://en.wikipedia.org/wiki/Cache_(computing)

http://en.wikipedia.org/wiki/Web_document

http://en.wikipedia.org/wiki/Webpage

http://en.wikipedia.org/wiki/Digital_image

http://en.wikipedia.org/wiki/Bandwidth_(computing)

http://en.wikipedia.org/wiki/Web_server

http://en.wikipedia.org/wiki/Web_cache#cite_note-1

http://en.wikipedia.org/wiki/Google_Search



Web caches can be used in various systems.

A search engine may cache a website.

A forward cache is a cache outside the webserver's network, e.g. on the client

software's ISP or company network.

A network-aware forward cache is just like a forward cache but only caches heavily

accessed items.

A reverse cache sits in front of one or more Web servers and web applications,

accelerating requests from the Internet.

A client, such as a web browser, can store web content for reuse. For example, if the

back button is pressed, the local cached version of a page may be displayed instead of

a new request being sent to the web server.

A web proxy sitting between the client and the server can evaluate HTTP headers and

choose to store web content.

A content delivery network can retain copies of web content at various points

throughout a network.

three basic mechanisms for controlling caches: freshness, validation, and invalidation.

Freshness

allows a response to be used without re-checking it on the origin server, and can be

controlled by both the server and the client. For example, the Expires response header

gives a date when the document becomes stale, and the Cache-Control: max-age

directive tells the cache how many seconds the response is fresh for.

Validation

can be used to check whether a cached response is still good after it becomes stale.

For example, if the response has a Last-Modified header, a cache can make

a conditional request using the If-Modified-Since header to see if it has changed.

The ETag (entity tag) mechanism also allows for both strong and weak validation.

Invalidation

is usually a side effect of another request that passes through the cache. For example,

if a URL associated with a cached response subsequently gets a POST, PUT or

DELETE request, the cached response will be invalidated.

http://en.wikipedia.org/wiki/Search_engine

http://en.wikipedia.org/wiki/Reverse_proxy


http://en.wikipedia.org/wiki/Web_application

http://en.wikipedia.org/wiki/Web_browser

http://en.wikipedia.org/wiki/Web_proxy

http://en.wikipedia.org/wiki/HTTP_header

http://en.wikipedia.org/wiki/Content_delivery_network

http://en.wikipedia.org/wiki/HTTP_ETag



Configuration of Web Server

For those who wish to manage their own Web server, this activity describes the steps

required to set up your own Web server. In order to set up a Web server, you need a

dedicated computer (PC or Macintosh) running Windows/95, Windows/NT, or Linux or a

Macintosh computer running MacOS. You also need a direct Internet connection and TCP/IP

software. You can download shareware HTTP software for these platforms and operate your

own Web server.

Objectives

Learn how to find and download shareware software for a Web server.

Learn about PC/Windows and Macintosh Web server programs that are available.

Learn what is required to set up a Web server.

Learn where to find additional information on setting up a Web server.

Materials and Resources

In developing our lessons and activities, we made some assumptions about the hardware

and software that would be available in the classroom for teachers who visit the LETSNet

Website. We assume that teachers using our Internet-based lessons or activities have a

computer with the necessary hardware components (mouse, keyboard, and monitor) as well

as a World Wide Web browser. In the section below, we specify any "special" hardware or

software requirements for a lesson or activity (in addition to those described above) and the

level of Internet access required to do the activity.

1. Special hardware requirements: none.

2. Special software requirements: none.

3. Internet access: High-speed connection (greater than 28,800 BPS).

Activity Description

The tasks below lead you through the process of setting up your own Web server. While

the process described is necessarily incomplete, it is offered as a guide to the things you must

do to successfully set up a Web server. Links to software (for networking, HTTP, CGI, etc.)

are provided to encourage you to gather your own toolkit of Web server products.

Commercial software is also available, including Netscape's Communication Server

(available free to educational institutions) and Microsoft's Internet Information Server, for

both PC and Macintosh platforms (see Internet Resources below).

Step 1 - The computer: A Web server requires a dedicated computer that is directly

connected to the Internet, usually through an ethernet network (LAN/WAN). You can

run a Web server on a low-end computer (80386-based PC or 68040 Macintosh), but



if you want your server to be responsive to Web surfers you should probably use a

more powerful computer (such as a Pentium or PowerPC-based Macintosh). A Web

server needs a fast and large hard drive and should have lots of RAM (over 16 MB).

Step 2 - The operating system software: The following operating systems can

support a Web server: Windows/NT, Windows/95, MacOS, Unix, and Linux. Of

these, most of the existing Web servers run on Windows/NT, MacOS (on a

PowerMac) or Unix. Linux is a PC/DOS-based version of Unix.

Step 3 - The networking software: All Internet computers need TCP/IP, and a Web

server is no exception. As stated above, your computer should be directly connected

to the Internet and thus may require appropriate ethernet software.

Step 4 - The Web server software: There are a variety of Web server programs

available for a variety of platforms, from Unix to DOS machines. For the Macintosh,

a popular Web server is WebStar from StarNine (see Internet Resources below). For

the Windows/NT platform, both Microsoft and Netscape offer a powerful Web server

program free to educational institutions (see Internet Resources below). Download or

purchase the Web server software and install it on your computer using the

instructions provided.

Step 5 - Configuring your Web server: Whey you install your Web server, you will

be prompted for basic settings - default directory or folder, whether to allow visitors

to see the contents of a directory or folder, where to store the log file, etc. Depending

on the Web software you install, you will have to configure the software per the

instructions that come with it.

Step 6 - Managing your Web server: As your Web server is accessed by more and

more people, you may need to monitor the log file to see which files people are

reading, identify peak access times, and consider upgrading your computer. You can

always add more RAM and disk space to your Web server computer to improve its

performance. Also check for bottlenecks - such as your TCP/IP software. For

example, Open Transport 1.1 from Apple has been modified to support faster TCP/IP

access if installed on a Web server.

Step 7 - Getting more information on operating a Web server: For more

information on finding, downloading, installing, and operating a Web server, see the

Internet Resources below. For example, Web66 has information on setting up a

Macintosh and Windows/95 Web server, and there are many other useful resources

available.

IIS

Internet Information Services (IIS, formerly Internet Information Server) is an

extensible web server created byMicrosoft for use with Windows NT family.[2]

IIS

supports HTTP, HTTPS, FTP, FTPS, SMTP and NNTP. It has been an integral part of the


http://en.wikipedia.org/wiki/Microsoft

http://en.wikipedia.org/wiki/Windows_NT

http://en.wikipedia.org/wiki/Internet_Information_Services#cite_note-2

http://en.wikipedia.org/wiki/HTTP

http://en.wikipedia.org/wiki/HTTPS

http://en.wikipedia.org/wiki/File_Transfer_Protocol

http://en.wikipedia.org/wiki/FTPS

http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol

http://en.wikipedia.org/wiki/Network_News_Transfer_Protocol



Windows NT family since Windows NT 4.0, though it may be absent from some editions

(e.g. Windows XP Home edition). IIS is not turned on by default when Windows is installed.

The IIS Manager is accessed through the Microsoft Management Console or Administrative

Tools in the Control Panel.

Case Study of IIS

Internet Information Services (IIS) on Windows Server 2012 is NUMA-aware and

provides the optimal configuration for the IT administrators. The following section

describes how IIS 8.0 takes advantage of NUMA hardware to provide optimal

performance.

IIS supports following two ways of partitioning the workload:

1. Run multiple worker processes in one application pool (i.e. web garden).

If you are using this mode, by default, the application pool is configured to run in a

single worker process. For maximum performance, you should consider running the

same number of worker processes as there are NUMA nodes, so that there is 1:1

affinity between the worker processes and NUMA nodes. This can be done by setting

"Maximum Worker Processes" App Pool setting to 0. When this setting is configured,

IIS will determine how many NUMA nodes are available on the hardware and will

start the same number of worker processes.

2. Run multiple applications pools in single workload/site.

In this configuration, the workload/site is divided into multiple application pools. For

example, the site may contain several applications that are configured to run in

separate application pools. Effectively, this configuration results in running multiple

IIS worker processes for the workload/site and IIS intelligently distributes process

affinity for maximum performance.

Depending upon the workload, administrator partitions the workload into multiple worker

processes. Once a workload is correctly partitioned, IIS 8.0 identifies the most optimal

NUMA node when the IIS worker process is about to start. By default, IIS picks the

NUMA node with the most available memory. IIS has the knowledge of the memory

consumption by each NUMA node and uses this information to "load balance" the IIS

worker processes. This option is different from Windows default of round-robin and

specially designed for IIS workload.

Finally, there are two different ways to configure the affinity for threads from an IIS

worker process to a NUMA node.

1. Soft Affinity (default)

With soft affinity, if other NUMA nodes have available cycles, the threads from an

IIS worker process may get scheduled to a NUMA node that was not configured for

affinity. This approach helps to maximize all available resources on the system as

whole.

http://en.wikipedia.org/wiki/Windows_NT_4.0

http://en.wikipedia.org/wiki/Microsoft_Management_Console



2. Hard Affinity

With hard affinity, regardless of what the load may be on other NUMA nodes on the

system; all threads from an IIS worker process are assigned to the chosen NUMA

node that was selected for affinity using the design above.

sApache

The Apache HTTP Server, commonly referred to as Apache, is a web server application

notable for playing a key role in the initial growth of the World Wide Web. Originally based

on the NCSA HTTPd server, development of Apache began in early 1995 after work on the

NCSA code stalled. Apache quickly overtook NCSA HTTPd as the dominant HTTP server,

and has remained the most popular HTTP server in use since April 1996. In 2009, it became

the first web server software to serve more than 100 million websites.

Apache is developed and maintained by an open community of developers under the auspices

of the Apache Software Foundation. Most commonly used on a Unix-like system, the

software is available for a wide variety of operating systems, including Unix,

FreeBSD, Linux, Solaris, Novell NetWare, OS X, Microsoft

Windows, OS/2, TPF, OpenVMS and eComStation. Released under theApache License,

Apache is open-source software.

Although the main design goal of Apache is not to be the "fastest" web server, Apache does

have performance similar to other "high-performance" web servers. Instead of implementing

a single architecture, Apache provides a variety of MultiProcessing Modules (MPMs) which

allow Apache to run in a process-based, hybrid (process and thread) or event-hybrid mode, to

better match the demands of each particular infrastructure. This implies that the choice of

correct MPM and the correct configuration is important. Where compromises in performance

need to be made, the design of Apache is to reduce latency and increase throughput, relative

to simply handling more requests, thus ensuring consistent and reliable processing of requests

within reasonable time-frames.


http://en.wikipedia.org/wiki/World_Wide_Web

http://en.wikipedia.org/wiki/NCSA_HTTPd

http://en.wikipedia.org/wiki/Apache_Software_Foundation

http://en.wikipedia.org/wiki/Unix-like

http://en.wikipedia.org/wiki/Operating_system

http://en.wikipedia.org/wiki/Unix

http://en.wikipedia.org/wiki/FreeBSD

http://en.wikipedia.org/wiki/Linux

http://en.wikipedia.org/wiki/Solaris_(operating_system)

http://en.wikipedia.org/wiki/Novell_NetWare

http://en.wikipedia.org/wiki/OS_X

http://en.wikipedia.org/wiki/Microsoft_Windows



http://en.wikipedia.org/wiki/OS/2

http://en.wikipedia.org/wiki/Transaction_Processing_Facility

http://en.wikipedia.org/wiki/OpenVMS

http://en.wikipedia.org/wiki/EComStation

http://en.wikipedia.org/wiki/Apache_License

http://en.wikipedia.org/wiki/Open-source_software

http://en.wikipedia.org/wiki/Throughput

web engineering unit 1 as per rgpv syllabus

Engineering

tcpip tcpip

internet protocol

web pages

tcpip protocol stack

web site

web browser

world wide web

osi protocol stacks